1 of 46

Powerball lottery

2 of 46

Plan which data you will organize

  • Counting frequency
  • Counting frequency of powerball number
  • By No1~No5 ?
  • By draw?
  • Any game rule change?

3 of 46

Extract counting from the data

  • Import data by choosing csv file -> select powerball data (please download from the website)

4 of 46

  • Go “pivot” and choose Number_1 and create pivot table (select show table and show plot)
  • Store in “powerball_n1”

5 of 46

  • You can create separate five data of power_n1 … powerball_n5 which shows frequency distribution by Number_1~Number_5.

6 of 46

  • Collect all data(distribution) into one
  • Combine -> select two data one for ‘datasets’ and the other for ‘combine with’ -> combine type ‘bind columns’
  • Comined dataset as “powerball_n1_n2”
  • You should do the same thing with n3, n4, n5.

7 of 46

  • Transform->create-> total=n1+n2+n3+n4+n5

8 of 46

  • Now you have n1~n5 counting data and total counting data!
  • Let’s play with data

9 of 46

  • Top 10 frequent numbers
  • Top 10 not-frequent numbers

Click here to arrange the data

10 of 46

  • Is this frequency the normal that we can get from the data?
  • How we can evaluate this result?

Draw: 587

Probability: 5/69

11 of 46

Draw: 587

Probability: 5/69

You will pick Number 28?? Based on the data?

12 of 46

13 of 46

Standard for frequency?

14 of 46

Higher than the normal probablity

Lower than the normal probablity

15 of 46

Must-have numbers?

Must-not-have numbers?

16 of 46

Distribution of frequency

Normal distribution zone

17 of 46

  • Rule changed

261 draw

18 of 46

Logistic regression

19 of 46

Logit model

Y=exp(1+x)/(1+exp(1+x))

It shows two distinc values across x values

20 of 46

Titanic

21 of 46

22 of 46

23 of 46

24 of 46

Titanic

Y=exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked)/(1+exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked))

25 of 46

(25 years old, 3 class, male, no sib, no spouse, Queenstown)

Y=exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked)/(1+exp(4.316+pclass+sex-0.038*age-0.332*sibsp+embarked))

Y=exp(4.316-2.069-2.633-0.038*25-1.471)/(1+exp(4.316-2.069-2.633-0.038*25-1.471))=exp(-2.807)/(1+exp(-2.807))=0.057, 5.7%

26 of 46

Prediction

27 of 46

28 of 46

29 of 46

Evaluation

If pred_logit>0.5 -> Yes

30 of 46

Create val

31 of 46

Pivot->val

32 of 46

Data process – group by

33 of 46

-First, to figure out your data and its structure.

X1, X2, X3, X4, … Xn

34 of 46

-To understand the data by X1, use the function of ‘group by’

X1, X3, X4, … Xn

group by X1 for X2

X2

35 of 46

No

36 of 46

group by X1 and X2, data size=max 69*69 (in case of powerball)

37 of 46

38 of 46

X1, X2, X3, X4, … Xn

X1

X2

X3

X4

39 of 46

40 of 46

X1, X2, X3, X4, … Xn

X1

X2

X3

X4

Filter -> we should find each variable

Need to look at one variable

41 of 46

How can we find the frequency of the number set?

42 of 46

How many of (12, 3) in the data?

43 of 46

Filter 23 in gathered data (tidy data) -> save draw -> ‘innerjoin’ with original data

Innerjoin saves only shared data between two data (<->antijoin)

44 of 46

Filter 12 in gathered data and save only data with ‘12’

45 of 46

Innerjoin data with ‘12’ and original gathered data with ‘draw’

46 of 46

6 times showing the data has (12,3)