Agenda

Announcements

Suzanne has just recently submitted the request for meal reimbursement -- hold on!

• Figure 3: Katie will produce them (please do one with symbol by location, and one with symbol by size -- small, medium, large -- per Jon's request); Grayson will comment on it (rough version here)
• Wasp RWL to Mass: Katie will finalize, and comment
• Cicada RWL to Mass: Grayson and I will continue to work on the step-wise regression, then Katie will produce the final figure. Grayson will comment.
• Stair-step figure: I will finalize the model, and get the approximate SEs. I will comment.
• Crude opportunistic statistic: I will calculate and comment.

Last time we were looking at some of the results of some propositions in materials Jon sent from the literature. Katie was able to contest the notion that there was a change in the regression of RWL versus Mass for wasps: we don't see the hypothesized decrease in RWL growth with increasing Mass here's an animation of the results... In fact, the relationship seemed to be in the opposite direction.

Grayson was able to determine that the Dow data provided evidence consistent with about a 25% mass conversion relationship.

I presented some information related to the non-linear regression standard errors, which we will need to present along with our models.

These relationships will not be playing a role in the current paper, however. We will focus on the more limited objectives required by Jon Hastings in this manuscript for the Florida Entomologist

For this time:

• Do I recall that Katie agreed to look at the Florida Entomologist requirements for authors? Sorry, seems like a long time ago....
• Here are the directions from the Florida Entomologist requirements for tables and figures:
• Submit all figures and photographs as .jpg or .tiff files. All figures and tables must be referenced in the text with Arabic numerals in the order in which they appear in the text. Table footnotes are written below the table and indicated as superscript Arabic numerals. Table legend in uppercase. All captions for figures are listed together on a separate page. All illustrations must be complete and final. Make a composite figure if numerous line drawings or photographs are needed; do not combine photographs and drawings in the same figure.
• I have been talking to Jon about what we need to put together for the paper.
• First of all, what are the analyses we are going to present:
• Wasp RWL and Mass (use FLORIDA.xls)
• Stair-Step Model (using newprovisioning_florida.xls)
• In discussions with Jon, we will discuss how we arrived at our final model, but only present the final stair-step model
• I was able to obtain approximate standard error estimates, using our non-linear regression approximations discussed last time:
 baseline 26.588 0.342 step-one 16.038 0.671 step-two 10.96 1.299 center-one 28.372 0.009 width-one 0.008 0.01 center-two 33.664 0.008 width-two 0.007 0.006
. These include the width of the normal cdf sigmoidal functions, but I think that we can simply eliminate those widths (since they are not significantly different from zero) and recalculate. And we won't get much change, although it might be worth trying....
• Reproduce Figure 3 from Grant's paper (using newprovisioning_florida.xls) Here's newprovisioning_florida_table_3.txt
• Use the breakpoints of our stair-step model and the census provided by the wasps in St. John's to calculate the probability that the wasps could be choosing their cicadas opportunistically.
• We need to assure that we're using the same data. Let's check numbers against the files that Jon wants to use (above):

R code for Cicada stepwise regression

Data files:

Here's my take, Grayson:

• xlispstat came up with two models: either lnRWL alone, or lnRWL with the middle*lnRWL terms. There is one rule of thumb which states that you don't throw an interaction term in unless you throw in the intercept term.
• Using R, we see, through both add1 and drop1, that the conclusion is that it's certain that we need lnRWL, and then it's a toss-up between medium and medium*lnRWL interaction term. Since it's virtually a toss-up, I'd say that we go with the three term model

Fit <- lm(lnMass ~ lnRWL + medium)

Results:

Call:
lm(formula = lnMass ~ lnRWL + medium)

Residuals:
Min        1Q    Median        3Q       Max
-0.309325 -0.081741  0.001887  0.090330  0.273934

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.08275    0.14327  -28.50  < 2e-16 ***
lnRWL        3.02359    0.04220   71.66  < 2e-16 ***
medium       0.22861    0.03008    7.60 3.87e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1137 on 140 degrees of freedom
Multiple R-squared: 0.9829,     Adjusted R-squared: 0.9827
F-statistic:  4033 on 2 and 140 DF,  p-value: < 2.2e-16


R-code:

ctable<-read.delim("cicada_prey_species.txt")
species<-ctable[,1]
mass<-ctable[,2]
RWL<-ctable[,3]
lnMass <- log(mass)
lnRWL <- log(RWL)
levels(species) <- c("s", "s", "l", "m") # Grayson: T.b. comes before T.g.
Fit <- lm(lnMass ~ lnRWL + factor(species) + factor(species)*lnRWL)
lm1 <- lm(lnMass~1, data=ctable)
Fit <- lm(lnMass ~ lnRWL + factor(species)*lnRWL) # automatically does constant plus cross terms
summary(Fit)

# Let's try some new data, in which I add the indicator variables:
lnMass<-ctable[,1]
lnRWL<-ctable[,2]
small<-ctable[,3]
medium<-ctable[,4]
large<-ctable[,5]

mediumCross <-medium*lnRWL
largeCross <-large*lnRWL

lm1 <- lm(lnMass~1,data=ctable)
add1(lm1, ~ lnRWL + medium + large  + mediumCross + largeCross)

lm2 <- lm(lnMass~ lnRWL)
summary(lm2)
add1(lm2, ~ lnRWL + medium + large  + mediumCross + largeCross)

lm3 <- lm(lnMass~ lnRWL + medium)
summary(lm3)
add1(lm3, ~ lnRWL + medium + large + mediumCross + largeCross)

Fit <- lm(lnMass ~ lnRWL + medium + large  + mediumCross + largeCross)
summary(Fit)
drop1(Fit, test="F")

Fit <- lm(lnMass ~ lnRWL + medium + large  + mediumCross)
summary(Fit)
drop1(Fit, test="F")

Fit <- lm(lnMass ~ lnRWL + medium + mediumCross)
summary(Fit)
drop1(Fit, test="F")

Fit <- lm(lnMass ~ lnRWL + medium)
summary(Fit)
drop1(Fit, test="F")


Output from xlispstat:

• Regression:
Linear Regression:        Estimate            SE              Prob

Constant                 -2.74044       (0.803583)          0.00085
lnRWL                    2.61456        (0.244668)          0.00000
medium-ind               -5.52714       (2.58019)           0.03395
large-ind                0.682505       (2.92283)           0.81572
medium-ind*lnRWL         1.58078        (0.695402)          0.02457
large-ind*lnRWL          -0.100850      (0.750453)          0.89330

R Squared:               0.983722
Sigma hat:               0.112296
Number of cases:               143
Degrees of freedom:            137

• Step-wise Regression:
Linear Regression:        Estimate            SE              Prob

Constant                 -4.63425       (0.146302)          0.00000
lnRWL                    3.19455        (4.227782E-2)       0.00000

R Squared:               0.975899
Sigma hat:               0.134690
Number of cases:               143
Degrees of freedom:            141

• Backward Step-wise Regression:
Linear Regression:        Estimate            SE              Prob

Constant                 -4.08042       (0.143162)          0.00000
lnRWL                    3.02288        (4.216392E-2)       0.00000
medium-ind*lnRWL         6.088277E-2    (7.979796E-3)       0.00000

R Squared:               0.982977
Sigma hat:               0.113600
Number of cases:               143
Degrees of freedom:            140


Output from R:

• Regression:
Call:
lm(formula = lnMass ~ lnRWL + factor(species) * lnRWL)

Residuals:
Min        1Q    Median        3Q       Max
-0.280596 -0.081156  0.001235  0.082626  0.251145

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)             -2.7404     0.8036  -3.410 0.000853 ***
lnRWL                    2.6146     0.2447  10.686  < 2e-16 ***
factor(species)l         0.6825     2.9228   0.234 0.815715
factor(species)m        -5.5271     2.5802  -2.142 0.033950 *
lnRWL:factor(species)l  -0.1008     0.7505  -0.134 0.893295
lnRWL:factor(species)m   1.5808     0.6954   2.273 0.024570 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1123 on 137 degrees of freedom
Multiple R-squared: 0.9837,     Adjusted R-squared: 0.9831
F-statistic:  1656 on 5 and 137 DF,  p-value: < 2.2e-16

• Step-wise Regression and Backward Step-wise Regression give virtually a dead-heat between the following two models::
Call:
lm(formula = lnMass ~ lnRWL + medium)

Residuals:
Min        1Q    Median        3Q       Max
-0.309325 -0.081741  0.001887  0.090330  0.273934

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.08275    0.14327  -28.50  < 2e-16 ***
lnRWL        3.02359    0.04220   71.66  < 2e-16 ***
medium       0.22861    0.03008    7.60 3.87e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1137 on 140 degrees of freedom
Multiple R-squared: 0.9829,     Adjusted R-squared: 0.9827
F-statistic:  4033 on 2 and 140 DF,  p-value: < 2.2e-16

Call:
lm(formula = lnMass ~ lnRWL + mediumCross)

Residuals:
Min        1Q    Median        3Q       Max
-0.309276 -0.081488  0.001927  0.090330  0.273895

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.08042    0.14316  -28.50  < 2e-16 ***
lnRWL        3.02288    0.04216   71.69  < 2e-16 ***
mediumCross  0.06088    0.00798    7.63 3.29e-12 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1136 on 140 degrees of freedom
Multiple R-squared: 0.983,      Adjusted R-squared: 0.9827
F-statistic:  4042 on 2 and 140 DF,  p-value: < 2.2e-16


Non-linear Regression

For our models, we're looking at models that contain several variables (not simple linear)

Last time, we examined how standard errors are handled in the simple linear regression case, and how the Hessian matrix can be used to approximate them. The non-linear regression approximation was verified in this case:

$\underline{SE_0}=\sqrt{2\frac{S(\Theta_0)}{N-p}diagonal(H^{-1})}$

where

• $\left.S(\Theta_0)\right.$ is the sum of squared errors (which is what we minimize) for the estimated parameter values $\left.\Theta_0\right.$;
• N is the length of the data vector x;
• p is the length of the parameter vector Θ0;
• diagonal extracts the diagonal of a matrix as a vector; and
• H is the Hessian matrix.

This code dumped into this site illustrates that the approximation works (or follow the link in this file).

The Hessian matrix approach is important because, whereas we don't have the X - 1X matrix in the non-linear case, we can always approximate the Hessian matrix numerically, so we can get something reasonable.