Zipfian Academy accepts participants from diverse backgrounds that include data analysts, software engineers, and quantitative PhDs. Work through the questions below - if you can solve these problems, you’re ready to apply for the program.

Data Science Immersive

Part 1: Probability and Statistics

From MIT Open Courseware, Introduction to Probability and Statistics, 18.05

(1) Suppose that 10 cards, of which five are red and five are green, are placed at random in 10 envelopes, of which five are red and five are green. Determine the probability that exactly two envelopes will contain a card with a matching color.

(2) Suppose that a box contains one fair coin and one coin with a head on each side. Suppose that a coin is selected at random and that when it is tossed three times, a head is  obtained three times. Determine the probability that the coin is the fair coin.

Solutions in course lecture notes:

http://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2005/lecture-notes/18_05_lec15.pdf

From MIT Open Courseware, Statistical Thinking and Data Analysis, 15.075

(1) Two methods of memorizing words are to be compared. You choose two groups of 5 people, where the first person in the first group has the same characteristics as the first person in the second group (they have the same educational level, age, etc.). Same thing about the second person in each group - they are also similar to each other in terms of education, age, etc. Same thing for the third, fourth and fifth people from each group. The first group is assigned to the first method of memorization and the second group to the other method. The number of words recalled in a memory test after a week’s training with these two methods is shown below.

 Pair 1 Pair 2 Pair 3 Pair 4 Pair 5 Method 1 25 30 22 27 29 Method 2 21 20 23 18 17

Test the hypothesis that the first method is better than the second method at the 0.05 level. You may assume normality of the data.

(2) A company makes high-definition televisions and does not like to have defective pixels. Historically, the mean number of defective pixels in a TV is 20. An MIT engineer is hired to make better TV’s that have fewer defective pixels. After her first week of work she claims that she can significantly improve the current method. To check her claim you try her new method on 100 new televisions. The average number of defective pixels in those 100 TV’s is 19.1. Assume that the new method doesn’t change the standard deviation of defective pixels, which has always been 4.

a. Test if the new method is significantly better than the old one at the α = 0.05 level.

b. Using the new method, assume that the mean number of defective pixels is actually 19. What is the chance that your test from part 1 will conclude that the new method is statistically more effective?

c. How many televisions will you have to check so that the test you did in part A will conclude that the new method is effective, with 95% probability? Assume again that the mean number of defective pixels is actually 19.

Part 2: Programming in Python

Go to HackerRank.com and look at the sample problems. Try completing a few Moderate Difficulty challenges using Python.

https://www.hackerrank.com/domains/algorithms/arrays-and-sorting