# CURM Twenty-first Meeting, 2/19/2009

# Agenda

## Announcements

The NKU Research Foundation sponsors three student competitive funding opportunities each spring, which encourage and support students to pursue research and creative projects that will contribute new insights in their chosen field. Each of these initiatives must be supported by a faculty mentor.

• The Student Undergraduate Research & Creativity Award (SURCA) provides up to $600 per undergraduate student to complete a research project or creative activity.

• The Graduate Student Research Grant (GSRG) provides up to $750 per graduate student to complete a research project or creative activity.

• The Summer Undergraduate Research/Creativity Fellowship (SURF) provides a $3,500 stipend (plus $600 for supplies) for an undergraduate to complete an intensive research or creative project.

For students to determine their eligibility, visit http://rgc.nku.edu and link to Student Opportunities.

The spring 2009 competition deadline is 4:00 p.m. on March 20, 2009.

## New Business

- Grayson will work on slide one, giving an overview of the project, the dogmas, the biological facts, etc. A picture of "what's to come"
- Katie would work on slide two, which shoots down the dogma of opportunistic hunting.

## Old Business

Back to our regression problem, as outlined in the "census" attempt. I'm going to put the code in here, and try to explain the purpose of each part. Could be kind of yucky!;)

This example is for Newberry.

## Setup

(setq ;; These are two parameters that serve different roles: numerical integration will ;; require a level of precision: integral-precision 1e-4 ;; and we've got to make sure that some parameters don't go negative, so I throw in ;; some big punishing term to keep the code from "going negative" prevent_negative_gamma 10 ;; These are counts of caught cicadas, a rough census by the wasps of the two sites: NH_n 232 DO_n 27 TG_n 33 TB_n 18 Total_N (+ NH_n DO_n TG_n TB_n)

Okay, now we're going to create some guesses for some of the parameters in our problem which are going to be estimated. The parameters alpha, beta, and gamma were the original "census" variables: they were supposed to be the weights on the small, medium, and large species populations of cicadas. Because non-linear regression requires guesses to get started, I used the census estimates provided by our cast of census takers: the female wasps!

alpha_guess (/ (+ NH_n DO_n) Total_N) beta_guess (/ TG_n Total_N) gamma_guess (- 1 alpha_guess beta_guess)

Now, these values are means and standard deviations for the three species. Those will come in handy, as you can well imagine:

;; got this from cicada_regress.lsp small_mean 26.711683168316828 small_sd 1.2250976037091335 medium_mean 43.260000000000005 medium_sd 1.5856143059624321 large_mean 52.88285714285714 large_sd 1.9474892260842644

;; estimates for the mass transformation parameters from wasp_regress.lsp, from wasp mass to wasp RWL: cw 4.0712235210396965 pw 0.286412932220204

The "cicada_conversion_coefs" describe our model for turning cicada RWL into cicada mass. Katie found this to be species dependent, so we've got several different constants, , and slopes, :

cicada_conversion_coefs '(-5.417981732963593 3.4388543377658705 3.736489595715526 -1.1506759329159233 -0.07939482664135919) ;; from (send reg :coef-estimates) of cicada_regress.lsp a0 (exp (first cicada_conversion_coefs)) b0 (second cicada_conversion_coefs) as (exp (third cicada_conversion_coefs)) bs (fourth cicada_conversion_coefs) bl (fifth cicada_conversion_coefs)

I turn these ultimately into parameters for small cicadas, medium cicadas, and large cicadas:

small_const (* a0 as) large_const a0 small_slope (+ b0 bs) medium_slope b0 large_slope (+ b0 bl) medium_large_split 48 ;; this is the artificial divide between medium and large cicadas

Starting guesses for the transformation of cicada mass into wasp mass come from a personal communication from Jon Hastings, who said that his colleague had found that a wasp's mass is about 25% of the cicada meal's mass. This would say that the power on the power model is 1, and the constant is .25. I also include a "mass-provisioning" factor, , which I estimate at about 4 (based on personal communication from Jon: 4 if by male, 7 if by female):

nu_guess 4 ct_guess .25 pt_guess 1

This is data, for Newberry:

NEfemales '( 31.08 29.89 28.14 29.07 29.25 27.95 28.46 27.49 28.13 25.4 27.05 27.31 27.3 27.15 27.42 26.02 27.25 27.4 26.04 27.11 25.98 26.42 26.55 27.22 26.75 25.6 25.4 26.25 25.57 26.64 25.49 27.97 25.23 26.94 25.89 25.07 26.35 26.3 25.48 25.95 24.76 24.8 25.44 25.2 26.21 26.89 25.74 25.75 25.27 27.11 25.22 25.93 25.9 26.09 25.52 26.75 24.68 26.62 26.31 26.12 25.7 25.63 25 24.41 27.07 24.13 26.13 25.3 25.72 25.02 24.22 24.07 24.51 24.31 25.39 24.64 23.05 25.4 25.13 24.99 25.06 25.01 24.36 25.14 25.11 24.79 23.01 24.76 25.77 25.75 23.12 25.07 22.97 22.97 24.49 24.37 23.68 22.85 22.78 23.33 24 23.58 24.4 23.82 24.2 23.28 22.8 24.01 23.59 22.2 22.56 23.43 23.14 24.06 22.23 22.53 22.18 23.94 21.38 22.09 23.28 22.45 22.4 23.03 20.75 22.41 22.3 19.3 20.17 21.57 21.05 23.45 24.37 19.88 25.18 23.09 23.42 25.3 24.59 26 25.15 26.19 ) NEmales '(24.98 24.8 24.86 24.21 23.4 23.95 24.17 24.28 23.44 22.61 22.18 23.22 22.55 22.68 22.15 22.79 20.02 23.88 22 23.4 22.28 22.43 22.59 22.34 21.95 21.52 22.69 21.83 21.38 21.89 21.65 21.44 21.97 21.78 22.71 22.69 22.11 21.75 22.58 22.12 21.95 20.61 21.58 21.9 21.17 21.42 20.78 20.94 20.39 21.86 21.85 21.28 20.75 20.83 20.84 20.74 20.04 20.74 21.28 20.68 21.22 20.14 20.39 20.13 20.57 20.13 19.91 20.53 19.44 19.72 20.22 19.19 17.82 21.93 19.27 21.38 19.96 21.34 21.72 21.99 19.26 24.53 24.32 22.2 21.22 21.73 22.65 23.37 21.22 21.35 ) NEmales_n (length NEmales)

Now we compute some useful bounds: these will be important for the integration. I'm saying that, four standard deviations out, we can asssume that each population has zero representation. This is pretty good, based on the empirical rule.

small_min (- small_mean (* 4 small_sd)) small_max (+ small_mean (* 4 small_sd)) medium_min (- medium_mean (* 4 medium_sd)) medium_max (+ medium_mean (* 4 medium_sd)) large_min (- large_mean (* 4 large_sd)) large_max (+ large_mean (* 4 large_sd))

The entire range of cicadas should run between the minimum and the maximum of those maxes and mins:

cicada_RWL_min small_min cicada_RWL_max large_max

Statistics on the wasps: mean, sd, and bounds on RWL:

NEmaleMuEst (mean NEmales) NEmaleSigmaEst (sd NEmales) waspRWL_min (- (min NEmales) (* 4 NEmaleSigmaEst)) waspRWL_max (+ (max NEmales) (* 4 NEmaleSigmaEst)) )

### Functions

This first function simply returns a density for a normal with mean mu and sd sigma (based on the standard normal distribution):

(defun scale-normal-dens(x mu sigma) (/ (normal-dens (/ (- x mu) sigma)) sigma) )

dc is the density function of our best guess: it is the "census" function, made up of three scaled normal densities for the three populations of cicadas:

(defun dc(Lc &key (alpha alpha_guess) (beta beta_guess)) (+ (* alpha (scale-normal-dens Lc small_mean small_sd)) (* beta (scale-normal-dens Lc medium_mean medium_sd)) (* (- 1 alpha beta) (scale-normal-dens Lc large_mean large_sd)) ) )

Here's how that plot works in lisp:

(let () (lines on) (title "combined densities") (variable-labels "Cicada RWL" "density (empirical from our census)") (plot dc cicada_RWL_min cicada_RWL_max) ;; check: should integrate to about 1: (integrate (lambda(x) (dc x)) cicada_RWL_min cicada_RWL_max) (postscript "CicadaDensityDefault.ps") )

Male Wasp cumulative is our objective in this whole thing: here's how an empirical cumulative can be defined:

(defun empirical-cdf(x) (/ (sum (heaviside (- x NEmales))) NEmales_n)) ;; and the following should look like a cumulative distribution function: (plot (lambda(x) (empirical-cdf x)) waspRWL_min waspRWL_max

This function is a little hairy: what it does is turn a cicada RWL into a wasp RWL. It depends on species (or at least small, medium, large):

(defun Lw(Lc &key (nu nu_guess) (ct ct_guess) (pt pt_guess) ) (let* ( (small (< Lc small_max)) (const (if small (* nu small_const) large_const)) (slope (cond (small small_slope) ((> Lc medium_large_split) large_slope) (t medium_slope) ) ) (tmp (* const (^ Lc slope))) ) (setq tmp (* ct (^ tmp pt)) tmp (* cw (^ tmp pw)) ) ) )

Lc turns a wasp RWL into a cicada RWL:

(defun Lc(Lw &key (nu nu_guess) (ct ct_guess) (pt pt_guess) (small t) (medium nil)) (let* ( (const (if small (* nu small_const) large_const)) (slope (cond (small small_slope) (medium large_slope) (t medium_slope) ) ) (tmp (^ (/ Lw cw) (/ pw))) ) (setq tmp (^ (/ tmp ct) (/ pt)) tmp (^ (/ tmp const) (/ slope)) ) ) )

Function cstar is our estimate for a cumulative. It's our model attempt to hit the objective cumulative. It takes as its argument a cicada RWL, and passes it through to a wasp RWL:

(defun cstar(Lc &key (alpha alpha_guess) (beta beta_guess) (nu nu_guess) (ct ct_guess) (pt pt_guess) (ns 100) ) (flet ( (ftryS(x) (* (/ nu) (lw x :nu nu :ct ct :pt pt) (* alpha (scale-normal-dens x small_mean small_sd)))) (ftryM(x) (* (lw x :nu nu :ct ct :pt pt) (* beta (scale-normal-dens x medium_mean medium_sd)))) (ftryL(x) (* (lw x :nu nu :ct ct :pt pt) (* (- 1 alpha beta) (scale-normal-dens x large_mean large_sd)))) ) (let* ( (Smax (min (Lc Lc :nu nu :ct ct :pt pt) small_max)) (Mmax (min (Lc Lc :nu nu :ct ct :pt pt :small nil :medium t) medium_max)) (Lmax (Lc Lc :nu nu :ct ct :pt pt :small nil :medium nil)) (SmaxInt (trap ftryS small_min small_max ns)) (MmaxInt (trap ftryM medium_min medium_max ns)) (LmaxInt (trap ftryL large_min large_max ns)) ) (/ (+ (if (= Smax small_max) SmaxInt (trap ftryS small_min Smax ns) ) (if (<= Mmax medium_min) 0 (if (= Mmax medium_max) MmaxInt (trap ftryM medium_min Mmax ns))) (if (<= Lmax large_min) 0 (trap ftryL large_min Lmax ns) ) ) (+ SmaxInt MmaxInt LmaxInt) ) ) ) ) (let* ( (cicadaRWL (rseq cicada_RWL_min cicada_RWL_max 1000)) (xs (mapcar #'lw cicadaRWL)) (ys (mapcar (lambda(x) (cstar x :ns 20)) xs)) (order (order xs)) (xs (select xs order)) (ys (select ys order)) ) (variable-labels "Wasp RWL" "C*") (lines on) (plotter xs ys) (postscript "CstarDistribution.ps") ) (plot (lambda(x) (cstar (lw x))) (lambda(x) (empirical-cdf (lw x))) cicada_RWL_min cicada_RWL_max)

## The non-linear regression

This requires an "objective function" which will try to hit the empirical cumulative. It's a function of five parameters:

(defun objective(params) (let ( (alpha (first params)) (beta (second params)) (nu (third params)) (ct (fourth params)) (pt (fifth params)) ) (flet ((ftry(x) (square (- (cstar x :nu nu :ct ct :pt pt :alpha alpha :beta beta :ns 20) (empirical-cdf x))))) ;; (- (integrate-simpson-recursive #'ftry cicada_RWL_min cicada_RWL_max integral-precision)) ;; (sum (mapcar #'ftry (rseq cicada_RWL_min cicada_RWL_max 100))) (- (+ (trap ftry waspRWL_min waspRWL_max 100) (* prevent_negative_gamma (- 3 (sum (signum (list alpha beta (- 1 alpha beta)))))) ) ) ) ) ) (objective (list alpha_guess beta_guess nu_guess ct_guess pt_guess))

This is the non-linear regression routine: from here we let it go, cross our fingers, and hope for the best!

(setq nreg (newtonmax #'objective (list alpha_guess beta_guess nu_guess ct_guess pt_guess) :return-derivs t :verbose 2 :COUNT-LIMIT 10 ) ;; (setq params (first nreg) alpha (first params) beta (second params) nu (third params) ct (fourth params) pt (fifth params) )