# In summary Female wasps in Newberry are smaller, and can't carry cicadas TBs (as shown by these graphs of right-wing-lengths).This is the subject of a story that Grayson is in the process of telling. Female wasps carry NHs and DOs until they can carry TGs, and then TBs. One of the really beautiful things about this smoothing is that the plateaus are exactly what we'd expect: the level of the first table is roughly the mean of the small cicadas; the mean of the second plateau is the mean of TG; and the mean of the third plateau is roughly the mean of the TB. Since female wasps in Newberry are smaller, they can't carry TBs.Suppose this data accurately reflects the distribution of the cicada species in both areas of Florida (this is ultimately something we want to determine -- we don't believe this currently, because our sampling has not been random).Single cicadas are purportedly transformed into single male wasps.Female wasps from each population sample the cicada distributions according to their own distinct distributions, and produce males accordingly. Convert cicada RWL $\longrightarrow$ mass (although we know that this is not a single relationship, but actually species-specific). We might write $M=f(W,s)$ , where M is cicada mass, and f is a power model based on W (wing length) and s (species). According to Frank, the conversion of cicada mass into wasp mass should be carried out via a model of the form $m=\gamma M^{\delta }$ where m is wasp mass, and M is cicada mass. Male wasp masses in Newberry and St. Johns (in particular) should hence accurately reflect the cicada distribution as sampled by their females. Convert wasp mass $\longrightarrow$ RWL The male wasp distributions in Newberry and St. Johns should be essentially transformed versions of the cicada populations. Since it's one-cicada-per-wasp, we should see the sample of the cicada population in the sample of the male wasp populations (and we have two populations, because of the different sizes of wasps in the two locations).:

# The calculations and estimates

There are many parameters to estimate in the process described above (unknown a priori, to varying degrees). Here's a list of (most of) them (and some preliminary estimates):

• 2 for the transformation of cicada mass into wasp mass (assuming a power model);
• 8 (as many as) for the conversion of wasp RWL (Right Wing Length) into mass (assuming power models including species);
• Tentative model (assuming only two -- treating all cicadas the same): $M_{c}=.00902*RWL_{c}^{3.21}$ • 2 for the conversion of wasp RWL into mass (assuming a power model -- we actually need this conversion in the opposite direction, according to the story above);
• Tentative model: $M_{w}=.0152*RWL_{w}^{3.27}$ • 8 for the means and standard deviations of the cicada RWL (assuming normal distributions and species dependence -- estimates in the table below);
• 3 for the relative proportions of the cicada species in the population of cicadas (assuming we want a density, since the fourth is determined once three are obtained); and
• some parameters related to the distributions of male and female wasps in the two distinct locations of Newberry and St. Johns. Both populations are important, as the females are the census takers (but don't sample opportunistically), and the male distributions should be a direct result of the census taken.
 Cicada Species Mean SD Ne.h. 26.5 1.14 D.o. 27.9 1.10 T.g. 43.3 1.59 T.b. 52.9 1.95

Table containing the estimates for means and standard deviations (parameters) for the (presumed) normal distributions of cicada right-wing lengths, by species.

We do not estimate all the parameters in the same way, however. Ideally we might estimate all of these parameters simultaneously, and find the best combination of parameters to make the story fit together best; however, the chance of being able to find a global minimum of a system with 23 or more parameters starting from scratch is pretty small -- so we take a different approach. We estimate parameters along the way, where it's easy, and then leave a few that we can't estimate "locally" to find using the procedure we're about to describe.

What do we mean by estimating "locally?" An example is the parameters linking the wasp mass and the wasp RWL. We have good data on the wasps, including their masses and their RWLs. We plot the scatterplot, we see what appears to be a power relation, and we estimate a model. "Case closed" (or almost): because this model feeds into the rest of our estimation procedures, any errors that we make in estimating these parameters will feed into errors in the estimates of other parameters.

As the saying goes, "All models are wrong; some models are useful." We know that we will make errors: do we want to estimate certain parameters and treat them as perfect and then estimate other parameters based on them, or treat all parameters as error prone and estimate them all at once (supposing the errors to be distributed up and down the line)? We have essentially estimated 18 parameters above (with several of them estimated as zeros). That leaves a few parameters that haven't yet been estimate, but which we can estimate by the methods we describe below.

The parameters we estimate non-locally in the following are the parameters related to the relative abundance of the four (well, three, really) types of cicadas -- small, medium, and large -- and the parameters related to the conversion of cicada mass into wasp mass. The mass-to-mass conversion parameters can be estimated in the laboratory (and have been -- Jon? Is that right?).

# The Estimates

There are two ways that I propose by which we make the estimates for the relative sizes of the cicada populations (and that's all we can get: not real numbers, but relative numbers). Common to both will be several assumptions.

Assumptions:

2. If the wasps practice sex allocation by size of cicada (e.g. small ones are turned into males, larger ones are reserved for females), then there needs to be an adjustment: the females wouldn't be sampling randomly from the cicada populations to turn them into males. According to Grant, that is not a problem.
3. Another question for sex allocation is this: do the wasps use times of prey scarcity to turn to male production, and produce females when prey are abundant?

# General Method

Let $L$ be the $RWL$ of an animal and let $M$ be the mass of an animal (a male animal, in the case of the wasps). The subscript "w" will indicate wasps, and "c" subscripts indicate cicadas; "t" subscripts refer to a "transfer" (between wasps and cicadas). In this model, we're going to assume that

• wasps sample the cicadas according to their wing length, according to the relationship indicated in the kernel-smoothed graph obtained by Katie;
• there are only three types of cicadas (small, medium, and large -- we combine the NH and DO cicadas);
• the cicada distribution in the two locations -- Newberry and St. Johns -- is the same;
• the cicada distribution is stable in structure, as are the wasp populations;
• there is a single, species-independent equation that transforms cicada RWL into mass.

Several (if not all) of those assumptions are suspect: for example, backwards-stepwise regression showed that there is a species effect in the relationship between mass and RWL for cicadas.

We have a distribution of wing lengths of cicadas in these Florida locations which we model with the following density as follows (where there are three kinds of cicadas -- small, medium, and large):

 $d_{c}(L_{c})=\alpha N_{s}\left(L_{c}\right)+\beta N_{m}(L_{c})+\gamma N_{l}(L_{c})$  This is the default, empirical guess for the census

where $N$ represents a normal density and $\gamma =1-\alpha -\beta$ . The parameters $\left(\alpha ,\beta ,c_{T},p_{T}\right)$ are to be determined. (see below for $c_{T}$ and $p_{T})$ All other parameters have been estimated locally.

Now, for a particular cicada of given right wing length $\left(L_{c}\right)$ , we have that its mass is estimated as

 $M_{c}=c_{c}L_{c}^{p_{c}}$ .

Then the cicada's mass is converted to wasp mass (according to Frank) via a power model, so that the wasp formed by eating this cicada would have mass

$M_{w}=c_{T}M_{c}^{p_{T}}$ Lab estimates suggest that this relationship is linear (i.e. $p_{T}=1$ ), with a constant of $c_{T}=.25$ .

Converting from the mass of the wasp to the wasp's length $\left(L_{w}\right)$ ,

 $L_{w}=c_{w}M_{w}^{p_{w}}$ then finally we've got that

$L_{w}=c_{w}(c_{T}M_{c}^{p_{T}})^{p_{w}}=c_{w}(c_{T}(c_{c}L_{c}^{p_{c}})^{p_{T}})^{p_{w}}$ That is, that there is a simple power model relationship between the two lengths:

$L_{w}(L_{c})=\kappa \left(L_{c}\right)^{\rho }$ (where we have hid a little of the mess by defining a couple of new parameters -- if we can estimate $\kappa =c_{w}c_{T}^{p_{w}}c_{c}^{p_{T}p_{w}}$ and $\rho =p_{c}p_{T}p_{w}$ , then we'll have estimated $c_{T}$ and $p_{T}$ as

$p_{T}={\frac {\rho }{p_{c}p_{w}}}$ and

$c_{T}=\left({\frac {\kappa }{c_{w}c_{c}^{p_{T}p_{w}}}}\right)^{1/p_{w}}.$ From this, we'd predict that the distribution of the wasp lengths would satisfy $d_{w}\left(L_{w}\left(L_{c}\right)\right)\propto d_{c}\left(L_{c}\right)$ , and that the predicted mean wing length of the wasps would be

${\overline {L}}_{w}=\int _{-\infty }^{\infty }L_{w}(L_{c})d_{c}(L_{c})dL_{c}$ (and our results indicate that this is almost perfectly so: in St. Johns, 24.86 versus 24.25; in Newberry, 21.85 versus 21.75).

Our objective now is to find the best set of parameters $\left(\alpha ,\beta ,\kappa ,\rho \right)$ ; that is, the set that provides the best fit to the distribution(s) of male wasp wing lengths in the samples obtained in Newberry and St. Johns. Notice that we thus end up with two separate estimation problems. This is important. The wasps in St. Johns are larger, so they census a part of the cicada population that the wasps in Newberry can't touch.

So let's think about one of those populations of wasp wing lengths, given by density $d_{w}\left(L_{w}\right)$ . Here is one strategy for choosing $\left(\alpha ,\beta ,\kappa ,\rho \right)$ , based on minimizing the difference in two function over a range of wing length values.

 We have an empirical cumulative distribution (let's call it $C^{e}\left(x\right)$ ) of male wasp wing lengths, which is a (step) function. It's not critical that this empirical distribution be differentiable. This is the empirical CDF for the male wasps of Newberry.

Now, we need the cumulative distribution of the modeled male wing lengths. Let's call the modeled cumulative $C^{*}\left(x\right)$ : then

$C^{*}(L_{w}(L_{c}))={\frac {1}{\int _{-\infty }^{\infty }L_{w}(x;\kappa ,\rho )d_{c}(x;\alpha ,\beta )dx}}\int _{-\infty }^{L_{c}}L_{w}(x;\kappa ,\rho )d_{c}(x;\alpha ,\beta )dx$ We then find the parameters $\left(\alpha ,\beta ,\kappa ,\rho \right)$ that minimize the integral

$E\left(\alpha ,\beta ,\kappa ,\rho \right)=\int _{-\infty }^{\infty }\left(C^{*}(x;\alpha ,\beta ,\kappa ,\rho )-C^{e}(L_{w}(x;\kappa ,\rho ))\right)^{2}dx$  Results from St. Johns, applying the procedure above.

Interestingly, the power on the mass transfer function -- $p_{T}$ -- is about 1, 1.09, as expected; but the estimate for the constant $c_{T}$ is about .149. Not sure what to make of this. The value of $\nu =5.73$ suggests that about 5-6 small cicadas are used to make a single male cicada. This model suggests that the TG cicada population don't contribute, which we don't believe: the parameter estimates are:
 $\alpha$ 0.856436 $\beta$ 0.141497 $\gamma$ 0.00206753 $\nu$ 5.73356 $c_{t}$ 0.148855 $p_{t}$ 1.09541 Results from Newberry. Constant: 1.00; power: 0.96 (suggests four small cicadas?). Notice that in both of these "after" plots, there is a suggestion that there are some really long-winged wasps, which don't really exist.

Results suggest that the neocicadas are more prevalent than we would have expected, and that the tib cicadas less so:

## Example Application: St. John

In the following two applications, we relax the requirement that the wasps sample the cicada population randomly; instead, we assume that they sample according to their wing length, as indicated by the kernel-smoothed graph obtained by Katie. Katie used smoothing techniques to obtain her graph. Because we need a function, which we can consult to determine how many wasps are taking small versus medium versus large cicadas, we modeled this using cumulative distribution functions (cdfs) from a normal distribution. The results are as follows: Female wasps carry NHs and DOs until they can carry TGs, and then TBs. One of the really beautiful things about this smoothing is that the plateaus are exactly what we'd expect: the level of the first table is roughly the mean of the small cicadas; the mean of the second plateau is the mean of TG; and the mean of the third plateau is roughly the mean of the TB.This was the starting model in a non-linear regression scheme to seek another, better model, based on two cumulative normal distributions. It has a residual sum of squares of 2284. This model, obtained after starting with the one on the left, has a residual sum of squares of 1471. It's basically two step functions, and its levels correspond very closely with the means of the three types of cicadas: 26.59, 42.63, 53.21, compared to means of 27, 43.3, and 52.9.