EXAMPLE 1 Rivers in North Carolina contain small concentrations of mercury which can accumulate in fish over their lifetimes. Because mercury cannot be excreted from the body, it builds up in the tissues. The concentration of mercury in fish tissue can be obtained at considerable expense by catching fish and sending samples to a lab for analysis. Directly measuring the mercury concentration in the water is impossible since it is almost always below detectable limits. A study was recently conducted in the Wacamaw and Lumber Rivers to investigate mercury levels in tissues of large mouth bass. At several stations along each river, a group of fish were caught, weighed, and measured. In addition a filet from each fish caught was sent to the lab so that the tissue concentration of mercury could be determined for each fish. Data are in fishHG.txt file (available on our website). Each fish caught corresponds to a single row of the file. In order, the recorded information for each fish is river station length in cm weight in grams mercury concentration in parts per million Data analysis Some questions: Is there a relationship between mercury concentration and size (weight and/or length) of a fish? Is this relationship the same for the two rivers? A concentration over 1 part per million (0 on the log scale) is considered unsafe for human consumption. In light of this, what recommendations can you make for fish caught from these rivers? R commands: # variables: river station length_cm weight_g HG_conc_ppm fishHG = read.table("("../week3_4/examples/fishHG.txt", header=TRUE) attach(fishHG) # classical two-sample t-test with equal variances # testing equality of two rivers with respect to mean HG concentration fishHGW = fishHG[river=='wacamaw',] fishHGL = fishHG[river=='lumber',] par(mfrow=c(1,1)) plot(river,HG_conc_ppm) par(mfrow=c(1,2)) hist(HG_conc_ppm,xlim=c(0,4),main = 'Wacamaw',xlab='HG concentration (ppm)') hist(HG_conc_ppm,xlim=c(0,4),main = 'Lumber',xlab='HG concentration (ppm)') t.test(fishHGW$HG_conc_ppm,fishHGL$HG_conc_ppm,var.equal=TRUE) # linear models # linear model equivalent to t-test Hg.river.lm = lm(HG_conc_ppm ~ river, data=fishHG) summary(Hg.river.lm) #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) 1.07808 0.08866 12.160 <2e-16 *** #riverwacamaw 0.19835 0.11712 1.694 0.0922 . #F-statistic: 2.868 on 1 and 169 DF, p-value: 0.09218 # linear model with river and length as predictors Hg.river.length.lm = lm(HG_conc_ppm ~ river+length_cm, data=fishHG) summary(Hg.river.length.lm) #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -1.194229 0.216287 -5.521 1.26e-07 *** #riverwacamaw 0.142027 0.089496 1.587 0.114 #length_cm 0.057657 0.005213 11.061 < 2e-16 *** #F-statistic: 63.63 on 2 and 168 DF, p-value: < 2.2e-16 # linear model with river, length and weight Hg.river.l.w.lm = lm(HG_conc_ppm ~ river+length_cm+weight_g, data=fishHG) summary(Hg.river.l.w.lm) #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -1.4568077 0.3661228 -3.979 0.000103 *** #riverwacamaw 0.1232408 0.0920110 1.339 0.182256 #length_cm 0.0675456 0.0122838 5.499 1.41e-07 *** #weight_g -0.0001062 0.0001194 -0.889 0.375193 #F-statistic: 42.63 on 3 and 167 DF, p-value: < 2.2e-16 # testing for significance of two predictors simultaneously anova(Hg.river.l.w.lm,Hg.river.lm) #Analysis of Variance Table # #Model 1: HG_conc_ppm ~ river + length_cm + weight_g #Model 2: HG_conc_ppm ~ river # Res.Df RSS Df Sum of Sq F Pr(>F) #1 167 55.849 #2 169 96.976 -2 -41.127 61.49 < 2.2e-16 *** # additional simpler models Hg.l.w.lm = lm(HG_conc_ppm ~ length_cm+weight_g, data=fishHG) summary(Hg.l.w.lm) #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -1.4961930 0.3658015 -4.090 6.67e-05 *** #length_cm 0.0713530 0.0119785 5.957 1.47e-08 *** #weight_g -0.0001429 0.0001165 -1.227 0.222 #F-statistic: 62.76 on 2 and 168 DF, p-value: < 2.2e-16 Hg.l.lm = lm(HG_conc_ppm ~ length_cm, data=fishHG) summary(Hg.l.lm) #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -1.131645 0.213615 -5.298 3.62e-07 *** #length_cm 0.058127 0.005228 11.119 < 2e-16 *** #F-statistic: 123.6 on 1 and 169 DF, p-value: < 2.2e-16 anova(Hg.river.l.w.lm, Hg.l.lm) #Model 1: HG_conc_ppm ~ river + length_cm + weight_g #Model 2: HG_conc_ppm ~ length_cm # Res.Df RSS Df Sum of Sq F Pr(>F) #1 167 55.849 #2 169 56.954 -2 -1.1056 1.653 0.1946 plot(fitted(Hg.river.l.w.lm), fishHG$HG_conc_ppm, main='Full model') abline(0,1) plot(fitted(Hg.l.lm), fishHG$HG_conc_ppm, main = 'Reduced model (length only)') abline(0,1)