Introduction to Statistical Modeling

This course provides an introduction to statistical models and overview of modern methods for statistical modeling. It builds on basic statistics (Statistical Methods), with the goal of providing a solid introduction to methods, theory and applications of statistical models. Starting with linear models (simple and multiple linear regression) the first half of this class will cover issues related to design, estimation, residual diagnostics, goodness of fit, transformations, and various strategies for variable selection and model comparison. The second half will continue with linear hierarchical models, and then generalized linear models and generalized linear mixed models. Time permitting, we will also cover generalized additive models. The techniques discussed will be illustrated by many real examples involving life sciences, engineering, and social sciences data.

Part of the objective of this course is to introduce you to modern data analysis with a help of a statistical software. This can be any software or programming language of your choice - Matlab, R, Stata, SPSS, SAS, Python... If you do not have any background computing knowledge, email me for extra references. This course will help you get familiar with R enough to solve homework problems.

Coursera has a 4-week introductory course on R computing: taught by Prof. Roger Peng from Johns Hopkins University. The sessions are monthly. Please also check our software page for more tips on statistical computing.


Required text:

1. "Regression Analysis by Example", by S. Chatterjee and A. Hadi. (Note: old editions (co-authored by B. Price) can also be used.)
2. "An Introduction to Generalized Linear Models" (Third Edition), by A. Dobson and A. Barnett.

Grading policy:

Homework (6): 50%
 Midterm:  25%
 Final report:  25%