East-west beer price comparison using NHST
Assumptions:
Beer prices in east-end and west-end normally distributed with uknown means mu_E and mu_W
2Variance = sigma^2 =5
(assumed to be known)
Step 1
H_0same meanH_A
means are different
Δ = 0 Δ = -2.62
An assmption about the effect size we're interested in detecting
Step 2-1.640.05050258347
alpha =0.05-3.33750.0004226786094
z_\alpha-1.644853625
beta = 0.2
statistical power = 0.8
z_\beta0.8416212327
Step 3-1.6448536250.8416212327
Need to choose sample size n and cutoff c required to guarantee the chosen error rates \alpha and \beta
c = -1.644853625*SE(n)
c = Δ + 0.8416212327*SE(n)
A. Table solution method
solving these two equations simultanously for n and c we find
nSE(n)-1.644853625*SE(n)
Δ + 0.8416212327*SE(n)
n =981.118033989-1.839002259-1.679038856
91.054092553-1.733827958-1.732853326
101-1.644853625-1.778378767
SE(8) = 1.054092553110.9534625892-1.568306396-1.81754564
120.9128709292-1.501539057-1.851708443
130.8770580193-1.442632062-1.881849349
c = -1.733827958(via alpha req)c=-1.732853326(via beta req)140.8451542547-1.39015504-1.908700234
150.8164965809-1.343017361-1.932819141
Data samples collected from the two populations:
160.790569415-1.300370968-1.954639994
x_Ex_W170.7669649888-1.261545142-1.974505981
7.711.8180.7453559925-1.226001506-1.992692571
5.910190.7254762501-1.19330224-2.009423784
711200.7071067812-1.163087152-2.024883919
4.88.6
6.38.3
6.39.4
B. Formula solution method
5.58n = 9.00669719
5.46.8
6.58.5
C. Solution using SymPy method
https://live.sympy.org/?evaluate=from%20sympy%20import%20*%0Afrom%20sympy.stats%20import%20P%2C%20E%2C%20variance%2C%20Die%2C%20Normal%2C%20cdf%2C%20density%2C%20std%0A%23--%0An%2C%20c%20%3D%20symbols(%27n%20c%27)%0Asigmasq%20%3D%205%20%20%23%20%0ASE%20%3D%20sqrt(sigmasq%2Fn%20%2B%20sigmasq%2Fn)%0ASE%0A%23--%0A%23%20We%20solve%20for%20c%20and%20n%20simultaneously%20for%20fixed%20significance%20and%20power%20level%0A%23%20solve%20%20eqn1%20%20%20%20and%20%20%20%20%20%20%20eqn2%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20for%20unknowns%20n%20and%20c%20%0Asolve(%5B%20%20c%2B1.644853625*SE%2C%20c%2B2.62%20-%200.8416212327*SE%5D%2C%20%20%20%20%20%20%20%20%20%20%20%20%20n%2C%20%20%20%20c%20)%0A%23--%0A%23%20So%20we%20need%20sample%20size%20n%3D9%2C%20and%20cutofff%0Ac%20%3D%20-1.644853625*SE.subs(n%2C9).n()%0Ac%0A%23--%0A
sum x55.482.4
\bar{x}6.1555555569.155555556d = -3
z = -2.846049894
47
Step 4
Decision = reject H_0because d < c
(equivalently because z < z_\alpha)
Step 5 -- computing the p-value
p-value = 0.002213262929
statistically significant since p-value < alpha-level chosen
Step 5 -- Confidence interval of the effect size
58
\gamma0.1
= 90% confidence interval
59
1-\gamma/20.95
60
z_{1-gamma/2)1.644853625
61
variance of CI1.111111111
62
sqrt of variance of CI1.054092553
63
CI centre = -3
64
CI lower =-4.733827958
65
CI upper = -1.266172042
66
