CURM Twenty-second Meeting, 2/26/2009

From www.norsemathology.org

Jump to: navigation, search

Contents

Agenda

Announcements

New Business

  • We need to promote the talk. What shall we say?
    • Title
    • Picture
    • Speakers
    • 3:05, ST111, 3/6
    • Abstract
  • Meet next week
    • Tuesday: 4:15
    • Thursday: 3:00
    • Friday: 12:00
  • Grayson is working on slide three, the Florida slide, which is supposed to be a cliff-hanger: it will leave us at the half-way mark with a lead-in to the fundamental mystery (and principal stake in the heart of the one-if-by-male dogma): how is it that with three populations of cicadas of radically different sizes, we don't see three radically different sized male wasps? Mass provisioning could explain the maintenance of two distinct sizes of wasps in Florida, if the wasp female population is ever truncated by some event to eliminate the largest females.
  • Katie is at work on slide four, which addresses issues of allometry: as we prepare to create a model of mass-provisioning (rather than the one-if-by-male, two-if-by-female sex-allocation theory), we need to move between RWL and mass for cicadas, and between mass and RWL for wasps. In between those two processes, we need to transform the cicada mass into wasp mass, which we believe occurs as 1/4 of the mass of the cicada-meal (laboratory results).
  • Grayson and I got onto the idea of looking at wasp masses, and looking at the cicada distributions, to see if a 1-if-by strategy was even possible. Then we talked about the relative masses of male and female wasps. I ran a little regression program which produced estimates of the best multiplier (e.g. factor of 2?) which would transform the male sample cdf into the female cdf. The results were as follows:
The multiplicative factor for Newberry was 1.90. Females ranged in size from 238-1119; males from 176-591. The multiplicative factor for St. Johns was 2.13. Clearly the fit for St. Johns is not as good. Females ranged in size from 514-1926; males from 331-966.

Old Business

Previous slides

  • Grayson is at work on slide one, giving an overview of the project, the dogmas, the biological facts, etc. A picture of "what's to come". He's going to add a "title slide" to have up at the beginning of the show, including author names (and mine should appear to, upon consideration, as "Advisor: Andy Long"). Things we discussed today:
    • Keep the "horror" in at the outset; make sure that we know that it's the females who are the killers.
    • Make sure we have a picture of the male and female wasps from Florida.
    • We want to recast that second mystery, as how to get a mass-provisioning model that will allow three different sized cicada populations to produce a single male wasp size.
  • Katie is at work on slide two, which shoots down the dogma of opportunistic hunting.
    • We need to have all the plots on the same platform -- the data won't change over the course of the images, so changing the format would simply distract. I suggest R.
    • Keep the technical vocabulary down, emphasizing the concepts in simple words.

Back to our regression problem, as outlined in the "census" attempt. I'm going to put the code in here, and try to explain the purpose of each part. Could be kind of yucky!;)

This example is for Newberry.

Setup

(setq
 ;; These are two parameters that serve different roles: numerical integration will
 ;; require a level of precision:
 integral-precision 1e-4
 ;; and we've got to make sure that some parameters don't go negative, so I throw in
 ;; some big punishing term to keep the code from "going negative"
 prevent_negative_gamma 10

 ;; These are counts of caught cicadas, a rough census by the wasps of the two sites:
 NH_n 232
 DO_n 27
 TG_n 33
 TB_n 18
 Total_N (+ NH_n DO_n TG_n TB_n)

Okay, now we're going to create some guesses for some of the parameters in our problem which are going to be estimated. The parameters alpha, beta, and gamma were the original "census" variables: they were supposed to be the weights on the small, medium, and large species populations of cicadas. Because non-linear regression requires guesses to get started, I used the census estimates provided by our cast of census takers: the female wasps!

 alpha_guess (/ (+ NH_n DO_n) Total_N)
 beta_guess (/ TG_n Total_N)
 gamma_guess (- 1 alpha_guess beta_guess)

Now, these values are means and standard deviations for the three species. Those will come in handy, as you can well imagine:

 ;; got this from cicada_regress.lsp
 small_mean 26.711683168316828
 small_sd 1.2250976037091335
 medium_mean 43.260000000000005
 medium_sd 1.5856143059624321
 large_mean 52.88285714285714 
 large_sd 1.9474892260842644
 
 ;; estimates for the mass transformation parameters from wasp_regress.lsp, 
 ;; from wasp mass to wasp RWL:
 cw 4.0712235210396965
 pw 0.286412932220204

The "cicada_conversion_coefs" describe our model for turning cicada RWL into cicada mass. Katie found this to be species dependent, so we've got several different constants, ai, and slopes, bi:

 cicada_conversion_coefs 
 '(-5.417981732963593 3.4388543377658705 3.736489595715526 -1.1506759329159233 -0.07939482664135919)
 ;; from (send reg :coef-estimates) of cicada_regress.lsp
 a0 (exp (first cicada_conversion_coefs))
 b0 (second cicada_conversion_coefs)
 as (exp (third cicada_conversion_coefs))
 bs (fourth cicada_conversion_coefs)
 bl (fifth cicada_conversion_coefs)

I turn these ultimately into parameters for small cicadas, medium cicadas, and large cicadas:

 small_const (* a0 as)
 large_const a0
 small_slope (+ b0 bs) 
 medium_slope b0
 large_slope (+ b0 bl)
 medium_large_split 48 ;; this is the artificial divide between medium and large cicadas

Starting guesses for the transformation of cicada mass into wasp mass come from a personal communication from Jon Hastings, who said that his colleague had found that a wasp's mass is about 25% of the cicada meal's mass. This would say that the power on the power model is 1, and the constant is .25. I also include a "mass-provisioning" factor, ν, which I estimate at about 4 (based on personal communication from Jon: 4 if by male, 7 if by female):

 nu_guess 4
 ct_guess .25
 pt_guess 1

This is data, for Newberry:

 NEfemales 
 '( 31.08 29.89 28.14 29.07 29.25 27.95 28.46 27.49 28.13 25.4 27.05 27.31 27.3
	  27.15 27.42 26.02 27.25 27.4 26.04 27.11 25.98 26.42 26.55 27.22 26.75 25.6
	  25.4 26.25 25.57 26.64 25.49 27.97 25.23 26.94 25.89 25.07 26.35 26.3 25.48
	  25.95 24.76 24.8 25.44 25.2 26.21 26.89 25.74 25.75 25.27 27.11 25.22 25.93
	  25.9 26.09 25.52 26.75 24.68 26.62 26.31 26.12 25.7 25.63 25 24.41 27.07 24.13
	  26.13 25.3 25.72 25.02 24.22 24.07 24.51 24.31 25.39 24.64 23.05 25.4 25.13
	  24.99 25.06 25.01 24.36 25.14 25.11 24.79 23.01 24.76 25.77 25.75 23.12 25.07
	  22.97 22.97 24.49 24.37 23.68 22.85 22.78 23.33 24 23.58 24.4 23.82 24.2 23.28
	  22.8 24.01 23.59 22.2 22.56 23.43 23.14 24.06 22.23 22.53 22.18 23.94 21.38
	  22.09 23.28 22.45 22.4 23.03 20.75 22.41 22.3 19.3 20.17 21.57 21.05 23.45
	  24.37 19.88 25.18 23.09 23.42 25.3 24.59 26 25.15 26.19
	  )
 NEmales 
 '(24.98 24.8 24.86 24.21 23.4 23.95 24.17 24.28 23.44 22.61 22.18 23.22 22.55
	 22.68 22.15 22.79 20.02 23.88 22 23.4 22.28 22.43 22.59 22.34 21.95 21.52 22.69
	 21.83 21.38 21.89 21.65 21.44 21.97 21.78 22.71 22.69 22.11 21.75 22.58 22.12
	 21.95 20.61 21.58 21.9 21.17 21.42 20.78 20.94 20.39 21.86 21.85 21.28 20.75
	 20.83 20.84 20.74 20.04 20.74 21.28 20.68 21.22 20.14 20.39 20.13 20.57 20.13
	 19.91 20.53 19.44 19.72 20.22 19.19 17.82 21.93 19.27 21.38 19.96 21.34 21.72
	 21.99 19.26 24.53 24.32 22.2 21.22 21.73 22.65 23.37 21.22 21.35
	 )
 NEmales_n (length NEmales)

Now we compute some useful bounds: these will be important for the integration. I'm saying that, four standard deviations out, we can asssume that each population has zero representation. This is pretty good, based on the empirical rule.

 small_min (- small_mean (* 4 small_sd))
 small_max (+ small_mean (* 4 small_sd))
 medium_min (- medium_mean (* 4 medium_sd))
 medium_max (+ medium_mean (* 4 medium_sd))
 large_min (- large_mean (* 4 large_sd))
 large_max (+ large_mean (* 4 large_sd))

The entire range of cicadas should run between the minimum and the maximum of those maxes and mins:

 cicada_RWL_min small_min
 cicada_RWL_max large_max

Statistics on the wasps: mean, sd, and bounds on RWL:

 NEmaleMuEst (mean NEmales)
 NEmaleSigmaEst (sd NEmales)
 waspRWL_min (- (min NEmales) (* 4 NEmaleSigmaEst))
 waspRWL_max (+ (max NEmales) (* 4 NEmaleSigmaEst))
 )

Functions

This first function simply returns a density for a normal with mean mu and sd sigma (based on the standard normal distribution):

(defun scale-normal-dens(x mu sigma)
  (/ (normal-dens (/ (- x mu) sigma)) sigma)
  )

dc is the density function of our best guess: it is the "census" function, made up of three scaled normal densities for the three populations of cicadas:

(defun dc(Lc &key (alpha alpha_guess) (beta beta_guess))
  (+ 
   (* alpha (scale-normal-dens Lc small_mean small_sd))
   (* beta (scale-normal-dens Lc medium_mean medium_sd))
   (* (- 1 alpha beta) (scale-normal-dens Lc large_mean large_sd))
   )
  )

Here's how that plot works in lisp:

(let ()
  (lines on)
  (title "combined densities")
  (variable-labels "Cicada RWL" "density (empirical from our census)")
  (plot dc cicada_RWL_min cicada_RWL_max)
  ;; check: should integrate to about 1:
  (integrate (lambda(x) (dc x)) cicada_RWL_min cicada_RWL_max)
  (postscript "CicadaDensityDefault.ps")
  )

Male Wasp cumulative is our objective in this whole thing: here's how an empirical cumulative can be defined:

(defun empirical-cdf(x) (/ (sum (heaviside (- x NEmales))) NEmales_n))
;; and the following should look like a cumulative distribution function:
(plot (lambda(x) (empirical-cdf x))  waspRWL_min waspRWL_max

This function is a little hairy: what it does is turn a cicada RWL into a wasp RWL. It depends on species (or at least small, medium, large):

(defun Lw(Lc &key (nu nu_guess) (ct ct_guess) (pt pt_guess) )
  (let* (
	 (small (< Lc small_max))
	 (const (if small (* nu small_const) large_const))
	 (slope (cond
		 (small small_slope)
		 ((> Lc medium_large_split)
		  large_slope)
		 (t medium_slope)
		 )
		)
	 (tmp (* const (^ Lc slope)))
	 )
    (setq
     tmp (* ct (^ tmp pt))
     tmp (* cw (^ tmp pw))
     )
    )
  )

Lc turns a wasp RWL into a cicada RWL:

(defun Lc(Lw &key (nu nu_guess) (ct ct_guess) (pt pt_guess) (small t) (medium nil))
  (let* (
	 (const (if small (* nu small_const) large_const))
	 (slope (cond
		 (small small_slope)
		 (medium large_slope)
		 (t medium_slope)
		 )
		)
	 (tmp (^ (/ Lw cw) (/ pw)))
	 )
    (setq
     tmp (^ (/ tmp ct) (/ pt))
     tmp (^ (/ tmp const) (/ slope))
     )
    )
  )

Function cstar is our estimate for a cumulative. It's our model attempt to hit the objective cumulative. It takes as its argument a cicada RWL, and passes it through to a wasp RWL:

(defun cstar(Lc &key
	       (alpha alpha_guess)
	       (beta beta_guess)
	       (nu nu_guess)
	       (ct ct_guess)
	       (pt pt_guess)
	       (ns 100)
	       )
  (flet (
	 (ftryS(x) (* (/ nu) (lw x :nu nu :ct ct :pt pt) 
		      (* alpha (scale-normal-dens x small_mean small_sd))))
	 (ftryM(x) (* (lw x :nu nu :ct ct :pt pt) 
		      (* beta (scale-normal-dens x medium_mean medium_sd))))
	 (ftryL(x) (* (lw x :nu nu :ct ct :pt pt) 
		      (* (- 1 alpha beta) (scale-normal-dens x large_mean large_sd))))
	 )
	(let* (
	       (Smax (min (Lc Lc :nu nu :ct ct :pt pt) small_max))
	       (Mmax (min (Lc Lc :nu nu :ct ct :pt pt :small nil :medium t) medium_max))
	       (Lmax (Lc Lc :nu nu :ct ct :pt pt :small nil :medium nil))
	       (SmaxInt (trap ftryS small_min small_max ns))
	       (MmaxInt (trap ftryM medium_min medium_max ns))
	       (LmaxInt (trap ftryL large_min large_max ns))
	       )
	  (/ 
	   (+ 
	    (if (= Smax small_max) SmaxInt (trap ftryS small_min Smax ns) )
	    (if (<= Mmax medium_min) 0 
	      (if (= Mmax medium_max) MmaxInt 
		(trap ftryM medium_min Mmax ns)))
	    (if (<= Lmax large_min) 0 (trap ftryL large_min Lmax ns) )
	    )
	   (+ SmaxInt MmaxInt LmaxInt)
	   )
	  )
	)
  )
(let* (
       (cicadaRWL (rseq cicada_RWL_min cicada_RWL_max 1000))
       (xs (mapcar #'lw cicadaRWL))
       (ys (mapcar (lambda(x) (cstar x :ns 20)) xs))
       (order (order xs))
       (xs (select xs order))
       (ys (select ys order))
       )
  (variable-labels "Wasp RWL" "C*")
  (lines on)
  (plotter xs ys)
  (postscript "CstarDistribution.ps")
  )
(plot  (lambda(x) (cstar (lw x))) (lambda(x) (empirical-cdf (lw x))) cicada_RWL_min cicada_RWL_max)

The non-linear regression

This requires an "objective function" which will try to hit the empirical cumulative. It's a function of five parameters:

(defun objective(params)
  (let (
	(alpha (first params))
	(beta (second params))
	(nu (third params))
	(ct (fourth params))
	(pt (fifth params))
	)
    (flet 
     ((ftry(x) 
	   (square (- (cstar x :nu nu :ct ct :pt pt :alpha alpha :beta beta :ns 20) 
		      (empirical-cdf x)))))
     ;; (- (integrate-simpson-recursive #'ftry cicada_RWL_min cicada_RWL_max integral-precision))
     ;; (sum (mapcar #'ftry (rseq cicada_RWL_min cicada_RWL_max 100)))
     (- 
      (+ (trap ftry waspRWL_min waspRWL_max 100)
	 (* prevent_negative_gamma (- 3 (sum (signum (list alpha beta (- 1 alpha beta))))))
	 )
      )
     )
    )
  )
(objective (list alpha_guess beta_guess nu_guess ct_guess pt_guess))

This is the non-linear regression routine: from here we let it go, cross our fingers, and hope for the best!

(setq
 nreg (newtonmax 
       #'objective 
       (list alpha_guess beta_guess nu_guess ct_guess pt_guess) 
       :return-derivs t
       :verbose 2
       :COUNT-LIMIT 10
       )
 ;; (setq
 params (first nreg)
 alpha (first params)
 beta (second params)
 nu (third params)
 ct (fourth params)
 pt (fifth params)
 )

Links

Personal tools