CURM Nineteenth Meeting, 1/29/2009

From www.norsemathology.org

Jump to: navigation, search

Contents

Agenda

New Business

  • Reminder: we need to decide today on a title and an abstract. We actually have a few more days (it's due by January 31), but why wait?
    • Title:
    • Abstract:
    • Let's talk about the very first slide (and maybe the second or third) that we want to produce for our presentation. Dr. Hastings has also passed along some images, that we'll want to consider using in our presentation: http://www.norsemathology.org/longa/images/Hastings/wasps/
  • NKU's Celebration of Student Research and Creativity will be held April 7-9, 2009. As in past years, students have an opportunity to show their excellent work in posters, interactive or oral presentations and performances of various kinds. The website http://celebration.nku.edu/ is now ready for registration, as well as providing further information. The deadline for students to submit abstracts to appear in the program is February 18.
  • In terms of the non-linear regression:
While I manage to carry out the regression on the left in xlispstat, it seems to be causing trouble in R. The problem is that the standard deviations of the cumulative normals become so small, and then, at a certain point, changing the sd doesn't make any change in the model fit. This is what's causing the R code to balk (I think).

The same problem happens in spades when one goes to a step-function model. Small changes in x-location of the step won't typically ,change the fit, of course, because if you're between two data locations in the x-axis, shifting the heaviside a little has no effect. This causes the regression to balk.

This model, obtained after starting with the one on the left, has a residual sum of squares of 1471. It's basically two step functions, and its levels correspond very closely with the means of the three types of cicadas: 26.59, 42.63, 53.21, compared to means of 27, 43.3, and 52.9.Model parameters:floor height=26.59; step1 height=16.04; step2 height=10.58; x-location of step1=28.38; sd of step1=1.1502E-2; x-location of step2=33.54; sd of step2=1.5529E-3
This model, obtained after starting with the one on the left, has a residual sum of squares of 1471. It's basically two step functions, and its levels correspond very closely with the means of the three types of cicadas: 26.59, 42.63, 53.21, compared to means of 27, 43.3, and 52.9.

Model parameters:

floor height=26.59; step1 height=16.04; step2 height=10.58; x-location of step1=28.38; sd of step1=1.1502E-2; x-location of step2=33.54; sd of step2=1.5529E-3
Starting Model
Starting Model
Ending Model: 26.58 16.50  10.13 28.57 0.40 33.3 1.0
Ending Model: 26.58 16.50 10.13 28.57 0.40 33.3 1.0
Heaviside causes problems for non-linear regression
Heaviside causes problems for non-linear regression

Problem of the Week

Something interesting

I had the R2 stuff below all laid out to talk about, but then I heard something even more interesting (from Jon): Honey Bees Can Tell The Difference Between Different Numbers At A Glance! So: can wasps count cicadas?


R2 is "unrealistically high" in some cases

I had this discussion with Jon Hastings, which he found to be interesting -- so I'll pass it along to you. Jon and I were talking about a regression, from his paper Hastings, Jon M., Charles W. Holliday and Joseph R. Coelho. Body size relationship between sphecius speciosus (hymenoptera: crabronidae) and their prey: prey size determines wasp size, Florida Entomologist (December 2008), 91(4), 657-663. They used what I considered an inappropriate linear model (because we're fairly certain that the relationship between RWL and mass is non-linear). In the course of discussing that model we got to talking about the great R2 values we were getting.

One could easily think to oneself that if the R2 is really good, why buck the model? Well, the R2 would be high for a wide range of inappropriate models. The reason why we get such good R2 numbers in this case is because we have three populations, spaced out widely, and we can fit a pretty good model to the separate clouds. Let's see how R2 is determined, to help understand this phenomenon.

Actual data: ln(RWL) versus ln(Mass): "Separated regression clouds". Clearly it looks like a linear model makes sense.
Actual data: ln(RWL) versus ln(Mass): "Separated regression clouds". Clearly it looks like a linear model makes sense.
Here are the estimates, based on our linear model. Also shown is the line y=x: ideally these points would fit on this line.
Here are the estimates, based on our linear model. Also shown is the line y=x: ideally these points would fit on this line.
Residuals clearly show some pattern -- which shouldn't be the case if our model captured the data trend.R2 is equal to one minus the square of the correlation between the actual response value ln(RWL) and the residuals,ln(RWL)-f(ln(Mass))Looking at the residuals here, what do you think will be the correlation? Really small, right?
Residuals clearly show some pattern -- which shouldn't be the case if our model captured the data trend.

R2 is equal to one minus the square of the correlation between the actual response value ln(RWL) and the residuals,

ln(RWL)-f(ln(Mass))

Looking at the residuals here, what do you think will be the correlation? Really small, right?


Here's some random data that will illustrate something very similar. I generate 100 random normal values for x and for y. Then I do it again, but add 10 to every value; and then I do it once more, adding 20. Here's the code:
(setq
 x (rand.n 100)
 y (rand.n 100)
 xx (+ 10 (rand.n 100))
 yy (+ 10 (rand.n 100))
 xxx (+ 20 (rand.n 100))
 yyy (+ 20 (rand.n 100))
 x (combine x xx xxx)
 y (combine y yy yyy)
 reg (regress (list x) y)
 )
Here are the normally distributed data values, separated into "regression clouds". R2 = 0.970206! Pretty good for random data....;)
Here are the normally distributed data values, separated into "regression clouds". R2 = 0.970206! Pretty good for random data....;)
Residuals clearly show some correlation -- the "false tilt" of the gross model induces a correlation in the otherwise uncorrelated local clouds.
Residuals clearly show some correlation -- the "false tilt" of the gross model induces a correlation in the otherwise uncorrelated local clouds.

My Conclusion: it seems as though the residuals in the model for the ln(RWL) versus ln(Mass) also have a little bit of a tilt in them, due to the regression model which is mostly fitting the centers of the groups.

Old Business

Links

Personal tools