Semester: Autumn 2021
Classroom: virtual
Time: MWF 3:00-3:50pm
Main Website
Instructor: Dr. Vanja Dukic
office Hours: MW 5-6:15pm, virtual
Course Assistant: Ruyu Tan,
Office Hours: Tue/Wed/Thu 9am-10am, virtual
Department of Applied Mathematics
University of Colorado-Boulder
Note 2: If you can't solve some of the problems, please come to the office
hours, or email us a very specific and short question. Email will not work well or at all for
involved or unclear questions.
For Homework 1, please complete the following problems:
- Problem 1
Say your research deals with social networks. Your first step is to study the properties of the Facebook network of college students at CU-Boulder campus. The next step is to compare your findings to the national college student Facebook network.
- a) What are the populations you are concerned with?
- b) What is the relationship between these populations?
- c) What are some of the characteristics of the networks you might consider? Pick three as an example.
- d) If you had infinite time and resources, would you be able to measure these characteristics for every member of these populations?
- e) Say you don't have infinite time and resources -- how would you go about estimating those population characteristics?
Problem 2
You're working for a US public health surveillance team, keeping an eye on infectious diseases such as the flu in the US.
- a) If your goal is to estimate the average yearly flu infection rate among those over 65 years of age in the US, what is the population you are working with?
- b) Given that surveillance is done only via doctor's offices, what is the actual population of people whose infection rates you'll be observing?
- c) What kind of estimates will you get? Can they be generalized to the entire population you'd like to be working with? Under what assumptions the answer is yes?
Problem 3
- a) What is the difference between mean, median, and mode? When would you prefer to use one and not the others?
- b) What is the difference between standard deviation and range? When would you report one and not the other to communicate how variable the data are?
Problem 4
A friend has given you a 2 pound bag of ordinary M&M's for your birthday. Incidentally, you've recently had a discussion with the same friend who is convinced that the blue ones are less frequent than the other 5 colors (red, yellow, green, orange and brown). You (and the rest of your friends) think that all colors are equally likely. The 2 pound bag has about 1200 M&M's. So to put the matter to rest, you actually counted the M&M's and found there were 1215 in the bag -- and you've found that there are 150 blue ones, 220 red, 230 yellow, 215 orange, 190 green, and 210 brown ones).
- a) Sketch a barplot of the observed relative frequency of colors in that bag
- b) If your friend is not correct (and you are), what would the true relative frequency of colors look like (sketch)
- c) How many blue ones would you expect to see if all colors are equally likely?
- d) Do you think your friend is right, based on the one bag evidence? Give a heuristic answer here - you don't need to be precise. What are some of the limitations of this one-bag "evidence" approach? In an ideal world, how would you design a study to test this more rigorously?
Problem 5
The following dataset (sample size=40) is given to you for further analysis in a following text file.
- a) Plot a default histogram in your favorite software package/program. How many bins does it plot by default for this dataset? What is the size of each bin?
- b) Change the number of bins -- first use 10, then 20, and finally 25. What differences (if any) can you see between these histograms and your histogram from part (a)?
- c) Change the starting point to -2, -1.5, and then -1.45 -- and plot a histogram with 25 bins for each. What differences do you see?
You can also do this problem by hand if you choose. Some programs might not let you change all these "inputs" - so if all else fails, sketches of histograms by hand will be accepted.