A story of the great cicada census

From www.norsemathology.org

Jump to: navigation, search

Contents

In summary

Female wasps in Newberry are smaller, and can't carry cicadas TBs (as shown by these graphs of right-wing-lengths).This is the subject of a story that Grayson is in the process of telling.
Female wasps in Newberry are smaller, and can't carry cicadas TBs (as shown by these graphs of right-wing-lengths).

This is the subject of a story that Grayson is in the process of telling.

Female wasps carry NHs and DOs until they can carry TGs, and then TBs. One of the really beautiful things about this smoothing is that the plateaus are exactly what we'd expect: the level of the first table is roughly the mean of the small cicadas; the mean of the second plateau is the mean of TG; and the mean of the third plateau is roughly the mean of the TB.
Female wasps carry NHs and DOs until they can carry TGs, and then TBs.

One of the really beautiful things about this smoothing is that the plateaus are exactly what we'd expect: the level of the first table is roughly the mean of the small cicadas; the mean of the second plateau is the mean of TG; and the mean of the third plateau is roughly the mean of the TB.

Since female wasps in Newberry are smaller, they can't carry TBs.Suppose this data accurately reflects the distribution of the cicada species in both areas of Florida (this is ultimately something we want to determine -- we don't believe this currently, because our sampling has not been random).Single cicadas are purportedly transformed into single male wasps.Female wasps from each population sample the cicada distributions according to their own distinct distributions, and produce males accordingly.
Since female wasps in Newberry are smaller, they can't carry TBs.

Suppose this data accurately reflects the distribution of the cicada species in both areas of Florida (this is ultimately something we want to determine -- we don't believe this currently, because our sampling has not been random).

Single cicadas are purportedly transformed into single male wasps.

Female wasps from each population sample the cicada distributions according to their own distinct distributions, and produce males accordingly.

Convert cicada RWL  mass (although we know that this is not a single relationship, but actually species-specific). We might write M = f(W,s), where M is cicada mass, and f is a power model based on W (wing length) and s (species).
Convert cicada
RWL \longrightarrow mass
(although we know that this is not a single relationship, but actually species-specific). We might write M = f(W,s), where M is cicada mass, and f is a power model based on W (wing length) and s (species).
According to Frank, the conversion of cicada mass into wasp mass should be carried out via a model of the form
m = γMδ
where m is wasp mass, and M is cicada mass.
Male wasp masses in Newberry and St. Johns (in particular) should hence accurately reflect the cicada distribution as sampled by their females.
Male wasp masses in Newberry and St. Johns (in particular) should hence accurately reflect the cicada distribution as sampled by their females.
Convert wasp mass  RWL
Convert wasp
mass \longrightarrow RWL
The male wasp distributions in Newberry and St. Johns should be essentially transformed versions of the cicada populations. Since it's one-cicada-per-wasp, we should see the sample of the cicada population in the sample of the male wasp populations (and we have two populations, because of the different sizes of wasps in the two locations).
The male wasp distributions in Newberry and St. Johns should be essentially transformed versions of the cicada populations. Since it's one-cicada-per-wasp, we should see the sample of the cicada population in the sample of the male wasp populations (and we have two populations, because of the different sizes of wasps in the two locations).
:

The calculations and estimates

There are many parameters to estimate in the process described above (unknown a priori, to varying degrees). Here's a list of (most of) them (and some preliminary estimates):

  • 2 for the transformation of cicada mass into wasp mass (assuming a power model);
  • 8 (as many as) for the conversion of wasp RWL (Right Wing Length) into mass (assuming power models including species);
    • Tentative model (assuming only two -- treating all cicadas the same): M_c=.00902*RWL_c^{3.21}
  • 2 for the conversion of wasp RWL into mass (assuming a power model -- we actually need this conversion in the opposite direction, according to the story above);
    • Tentative model: M_w=.0152*RWL_w^{3.27}
  • 8 for the means and standard deviations of the cicada RWL (assuming normal distributions and species dependence -- estimates in the table below);
  • 3 for the relative proportions of the cicada species in the population of cicadas (assuming we want a density, since the fourth is determined once three are obtained); and
  • some parameters related to the distributions of male and female wasps in the two distinct locations of Newberry and St. Johns. Both populations are important, as the females are the census takers (but don't sample opportunistically), and the male distributions should be a direct result of the census taken.
Cicada Species Mean SD
Ne.h. 26.5 1.14
D.o. 27.9 1.10
T.g. 43.3 1.59
T.b. 52.9 1.95

Table containing the estimates for means and standard deviations (parameters) for the (presumed) normal distributions of cicada right-wing lengths, by species.

We do not estimate all the parameters in the same way, however. Ideally we might estimate all of these parameters simultaneously, and find the best combination of parameters to make the story fit together best; however, the chance of being able to find a global minimum of a system with 23 or more parameters starting from scratch is pretty small -- so we take a different approach. We estimate parameters along the way, where it's easy, and then leave a few that we can't estimate "locally" to find using the procedure we're about to describe.

What do we mean by estimating "locally?" An example is the parameters linking the wasp mass and the wasp RWL. We have good data on the wasps, including their masses and their RWLs. We plot the scatterplot, we see what appears to be a power relation, and we estimate a model. "Case closed" (or almost): because this model feeds into the rest of our estimation procedures, any errors that we make in estimating these parameters will feed into errors in the estimates of other parameters.

As the saying goes, "All models are wrong; some models are useful." We know that we will make errors: do we want to estimate certain parameters and treat them as perfect and then estimate other parameters based on them, or treat all parameters as error prone and estimate them all at once (supposing the errors to be distributed up and down the line)? We have essentially estimated 18 parameters above (with several of them estimated as zeros). That leaves a few parameters that haven't yet been estimate, but which we can estimate by the methods we describe below.

The parameters we estimate non-locally in the following are the parameters related to the relative abundance of the four (well, three, really) types of cicadas -- small, medium, and large -- and the parameters related to the conversion of cicada mass into wasp mass. The mass-to-mass conversion parameters can be estimated in the laboratory (and have been -- Jon? Is that right?).

The Estimates

There are two ways that I propose by which we make the estimates for the relative sizes of the cicada populations (and that's all we can get: not real numbers, but relative numbers). Common to both will be several assumptions.

Assumptions:

  1. Male cicadas are produced from a single cicada.
  2. If the wasps practice sex allocation by size of cicada (e.g. small ones are turned into males, larger ones are reserved for females), then there needs to be an adjustment: the females wouldn't be sampling randomly from the cicada populations to turn them into males. According to Grant, that is not a problem.
  3. Another question for sex allocation is this: do the wasps use times of prey scarcity to turn to male production, and produce females when prey are abundant?

General Method

Let L be the RWL of an animal and let M be the mass of an animal (a male animal, in the case of the wasps). The subscript "w" will indicate wasps, and "c" subscripts indicate cicadas; "t" subscripts refer to a "transfer" (between wasps and cicadas). In this model, we're going to assume that

  • wasps sample the cicadas according to their wing length, according to the relationship indicated in the kernel-smoothed graph obtained by Katie;
  • there are only three types of cicadas (small, medium, and large -- we combine the NH and DO cicadas);
  • the cicada distribution in the two locations -- Newberry and St. Johns -- is the same;
  • the cicada distribution is stable in structure, as are the wasp populations;
  • there is a single, species-independent equation that transforms cicada RWL into mass.

Several (if not all) of those assumptions are suspect: for example, backwards-stepwise regression showed that there is a species effect in the relationship between mass and RWL for cicadas.

We have a distribution of wing lengths of cicadas in these Florida locations which we model with the following density as follows (where there are three kinds of cicadas -- small, medium, and large):

d_c(L_c)=\alpha N_s\left(L_c\right)+ \beta N_m(L_c) + \gamma N_l(L_c)
This is the default, empirical guess for the census
This is the default, empirical guess for the census

where N represents a normal density and γ = 1 - α - β. The parameters \left(\alpha, \beta, c_T, p_T\right) are to be determined. (see below for cT and pT) All other parameters have been estimated locally.

Now, for a particular cicada of given right wing length \left(L_c\right), we have that its mass is estimated as

M_c=c_cL_c^{p_c}
.
This is the regression result.
This is the regression result.


Then the cicada's mass is converted to wasp mass (according to Frank) via a power model, so that the wasp formed by eating this cicada would have mass

M_w=c_TM_c^{p_T}

Lab estimates suggest that this relationship is linear (i.e. pT = 1), with a constant of cT = .25.

Converting from the mass of the wasp to the wasp's length \left(L_w\right),

L_w=c_wM_w^{p_w}
This is the regression result.
This is the regression result.

then finally we've got that

L_w=c_w(c_T M_c^{p_T})^{p_w}=c_w(c_T (c_cL_c^{p_c})^{p_T})^{p_w}

That is, that there is a simple power model relationship between the two lengths:

L_w(L_c)=\kappa\left(L_c\right)^{\rho}

(where we have hid a little of the mess by defining a couple of new parameters -- if we can estimate \kappa=c_w c_T^{p_w} c_c^{p_T p_w} and ρ = pcpTpw, then we'll have estimated cT and pT as

p_T=\frac{\rho}{p_cp_w}

and

c_T=\left(\frac{\kappa}{c_wc_c^{p_Tp_w}}\right)^{1/p_w}.

From this, we'd predict that the distribution of the wasp lengths would satisfy d_w\left(L_w\left(L_c\right)\right) \propto d_c\left(L_c\right), and that the predicted mean wing length of the wasps would be

\overline{L}_w=\int_{-\infty}^{\infty}L_w(L_c)d_c(L_c)dL_c

(and our results indicate that this is almost perfectly so: in St. Johns, 24.86 versus 24.25; in Newberry, 21.85 versus 21.75).

Our objective now is to find the best set of parameters \left(\alpha, \beta, \kappa, \rho\right); that is, the set that provides the best fit to the distribution(s) of male wasp wing lengths in the samples obtained in Newberry and St. Johns. Notice that we thus end up with two separate estimation problems. This is important. The wasps in St. Johns are larger, so they census a part of the cicada population that the wasps in Newberry can't touch.

So let's think about one of those populations of wasp wing lengths, given by density d_w\left(L_w\right). Here is one strategy for choosing \left(\alpha, \beta, \kappa, \rho\right), based on minimizing the difference in two function over a range of wing length values.

We have an empirical cumulative distribution (let's call it C^e\left(x\right)) of male wasp wing lengths, which is a (step) function. It's not critical that this empirical distribution be differentiable.

This is the empirical CDF for the male wasps of Newberry.
This is the empirical CDF for the male wasps of Newberry.

Now, we need the cumulative distribution of the modeled male wing lengths. Let's call the modeled cumulative C^*\left(x\right): then

C^*(L_w(L_c))=\frac{1}{\int_{-\infty}^{\infty}L_w(x;\kappa, \rho)d_c(x;\alpha, \beta)dx}\int_{-\infty}^{L_c}L_w(x;\kappa, \rho)d_c(x;\alpha, \beta)dx

We then find the parameters \left(\alpha, \beta, \kappa, \rho\right) that minimize the integral

E\left(\alpha, \beta, \kappa, \rho\right)=\int_{-\infty}^{\infty}\left(C^*(x;\alpha, \beta, \kappa, \rho)-C^e(L_w(x;\kappa, \rho))\right)^2dx


Prior in St. Johns, based on the models above.
Prior in St. Johns, based on the models above.
Results from St. Johns, applying the procedure above.Interestingly, the power on the mass transfer function -- pT -- is about 1, 1.09, as expected; but the estimate for the constant cT is about .149. Not sure what to make of this. The value of ν = 5.73 suggests that about 5-6 small cicadas are used to make a single male cicada. This model suggests that the TG cicada population don't contribute, which we don't believe: the parameter estimates are:      α  0.8564358097811305     β  0.14149665620740512     γ 0.0020675340114643392     ν 5.733564124845539     ct 0.14885469024588277     pt 1.0954081551787762
Results from St. Johns, applying the procedure above.

Interestingly, the power on the mass transfer function -- pT -- is about 1, 1.09, as expected; but the estimate for the constant cT is about .149. Not sure what to make of this. The value of ν = 5.73

suggests that about 5-6 small cicadas are used to make a single male cicada. This model suggests that the TG cicada population don't contribute, which we don't believe: the parameter estimates are:
α 0.8564358097811305
β 0.14149665620740512
γ 0.0020675340114643392
ν 5.733564124845539
ct 0.14885469024588277
pt 1.0954081551787762
Results from Newberry. Constant: 1.00; power: 0.96 (suggests four small cicadas?). Notice that in both of these "after" plots, there is a suggestion that there are some really long-winged wasps, which don't really exist.
Results from Newberry. Constant: 1.00; power: 0.96 (suggests four small cicadas?). Notice that in both of these "after" plots, there is a suggestion that there are some really long-winged wasps, which don't really exist.

Results suggest that the neocicadas are more prevalent than we would have expected, and that the tib cicadas less so:


Example Application: St. John

In the following two applications, we relax the requirement that the wasps sample the cicada population randomly; instead, we assume that they sample according to their wing length, as indicated by the kernel-smoothed graph obtained by Katie. Katie used smoothing techniques to obtain her graph. Because we need a function, which we can consult to determine how many wasps are taking small versus medium versus large cicadas, we modeled this using cumulative distribution functions (cdfs) from a normal distribution. The results are as follows:

Female wasps carry NHs and DOs until they can carry TGs, and then TBs. One of the really beautiful things about this smoothing is that the plateaus are exactly what we'd expect: the level of the first table is roughly the mean of the small cicadas; the mean of the second plateau is the mean of TG; and the mean of the third plateau is roughly the mean of the TB.This was the starting model in a non-linear regression scheme to seek another, better model, based on two cumulative normal distributions. It has a residual sum of squares of 2284.
Female wasps carry NHs and DOs until they can carry TGs, and then TBs.

One of the really beautiful things about this smoothing is that the plateaus are exactly what we'd expect: the level of the first table is roughly the mean of the small cicadas; the mean of the second plateau is the mean of TG; and the mean of the third plateau is roughly the mean of the TB.

This was the starting model in a non-linear regression scheme to seek another, better model, based on two cumulative normal distributions. It has a residual sum of squares of 2284.
This model, obtained after starting with the one on the left, has a residual sum of squares of 1471. It's basically two step functions, and its levels correspond very closely with the means of the three types of cicadas: 26.59, 42.63, 53.21, compared to means of 27, 43.3, and 52.9.
This model, obtained after starting with the one on the left, has a residual sum of squares of 1471. It's basically two step functions, and its levels correspond very closely with the means of the three types of cicadas: 26.59, 42.63, 53.21, compared to means of 27, 43.3, and 52.9.

Example Application: Newberry

Personal tools