Statistics
From BSCS: Interaction of experiments and ideas, 2nd Edition. Prentice Hall, 1970 and Statistics for the Utterly Confused by Lloyd Jaisingh, McGraw-Hill, 2000
What is statistics?
Statistics deals with numbers
Can you figure out…
Populations and Samples
Sample Populations avoiding Bias
Is there bias?
Statistical Computations (the Math)
The sum of all the scores �divided by the total number of scores.
http://en.wikipedia.org/wiki/Table_of_mathematical_symbols
Looking at profile of data: Distribution
Class (height of plants-cm) | Number of plants in each class |
0.0-0.9 | 3 |
1.0-1.9 | 10 |
2.0-2.9 | 21 |
3.0-3.9 | 30 |
4.0-4.9 | 20 |
5.0-5.9 | 14 |
6.0-6.9 | 2 |
Distribution Chart of Heights of 100 Control Plants
Distribution Chart of Heights of 100 Control Plants
Histogram-Frequency Distribution Charts
This is called a “normal” curve or a bell curve
This is an “idealized” curve and is theoretical based on an infinite number derived from a sample
Mode and Median
Variance (s2)
http://www.mnstate.edu/wasson/ed602calcvardevs.htm
Calculating the variance for a whole population
Σ = sum of; X = score, value,
µ = mean, N= total of scores or values
OR use the VAR function in Excel
http://www.mnstate.edu/wasson/ed602calcvardevs.htm
Calculating the variance for a Biased SAMPLE population
Σ = sum of; X = score, value,
n -1 = total of scores or values-1
(often read as “x bar”) is the mean (average value of xi).
Note the sample variance is larger…why?
Heights in Centimeters of Five Randomly Selected Pea Plants Grown at 8-10 °C
Plant | Height (cm) | Deviations from mean | Squares of deviation from mean |
| (xi) | (xi- x) | (xi- x)2 |
A | 10 | 2 | 4 |
B | 7 | -1 | 1 |
C | 6 | -2 | 4 |
D | 8 | 0 | 0 |
E | 9 | 1 | 1 |
| Σ xi = 40 | Σ (xi- x) = 0 | Σ (xi- x)2 = 10 |
Xi = score or value; X (bar) = mean; Σ = sum of
Variance helps to characterize the data concerning a sample by indicating the degree to which individual members within the sample vary from the mean
Finish Calculating the Variance
Σ xi = 40 | Σ (xi- x) = 0 | Σ (xi- x)2 = 10 |
There were five plants; n=5; therefore n-1=4
So 10/4= 2.5
r2 or R2…
is the fraction of the variation in the values of y that is explained by the least-squares regression line of y on x.
Class Attendance
Grades
Example: If r2 = 0.61 in the graph to the left, this means
that about 61% of one’s grade is accounted for by the linear
relationship with attendance. The other 39% could be due
to a multitude of factors. Or even more simply you correlate 61% of your data with attendance or have 61% confidence in the relationship… the higher the r value the stronger the correlation.
Standard Deviation
The square root of 2.5 ; s=1.6
What does “S” mean?
Pea Plant Normal Distribution Curve with Std Dev
The Normal Curve and Standard Deviation
http://classes.kumc.edu/sah/resources/sensory_processing/images/bell_curve.gif
A normal curve:
Each vertical line is a unit of standard deviation
68% of values fall within +1 or -1 of the mean
95% of values fall within +2 & -2 units
Nearly all members (>99%) fall within 3 std dev units
Standard Error of the Sample Means� AKA Standard Error
A Simple Method for estimating standard error
Standard error is the calculated standard deviation divided by the square root of the size, or number of the population
Standard error of the means is used to test the reliability of the data
Example… If there are 10 corn plants with a standard deviation of 0.2
Sex = 0.2/ sq root of 10 = 0.2/3.03 = 0.006
0.006 represents one std dev in a sample of 10 plants
If there were 100 plants the standard error would drop to 0.002 Why?
Because when we take larger samples, our sample means get closer to the true mean value of the population. Thus, the distribution of the sample means would be less spread out and would have a lower standard deviation.
Probability Tests
Laws of Probability
Laws of Probability (continued)
The Use of the Null Hypothesis
T-test or Chi Square? Testing the validity of the null hypothesis
T-test
STUDENT’S T TEST
�Read more: http://www.experiment-resources.com/students-t-test.html#ixzz0Oll72cbi�
http://www.experiment-resources.com/students-t-test.html
EXAMPLE
The student’s t test can then be used to try and disprove the null hypothesis.
RESTRICTIONS
�Read more: http://www.experiment-resources.com/students-t-test.html#ixzz0OlllZOPZ�
http://www.experiment-resources.com/students-t-test.html
RESULTS
�by Martyn Shuttleworth (2008).
�Read more: http://www.experiment-resources.com/students-t-test.html#ixzz0OlmGvVWD�
http://www.experiment-resources.com/students-t-test.html
Use t-test to determine whether or not sample population A and B came from the same or different population
t = x1-x2 / sx1-sx2
x1 (bar x) = mean of A ; x2 (bar x) = mean of B
sx1 = std error of A; sx2 = std error of B
Example: Sample A mean =8
Sample B mean =12
Std error of difference of populations =1
12-8/1 = 4 std deviation units
Comparison of A and B
B’s mean lies outside (less than 1% chance of being the normal distribution curve of population A
Reject Null Hypothesis
http://www.physics.csbsju.edu/stats/t-test_bulk_form.htmlhttp://www.physics.csbsju.edu/stats/t-test_bulk_form.html online calculates for you… and a box plot also http://www.graphpad.com/quickcalcs/ttest1.cfm
The t statistic to test whether the means are different can be calculated as follows:
Amount of O2 Used by Germinating Seeds of Corn and Pea Plants
| mL O2/hour | at 25 °C |
Reading Number | Corn | Pea |
1 | 0.20 | 0.25 |
2 | 0.24 | 0.23 |
3 | 0.22 | 0.31 |
4 | 0.21 | 0.27 |
5 | 0.25 | 0.23 |
6 | 0.24 | 0.33 |
7 | 0.23 | 0.25 |
8 | 0.20 | 0.28 |
9 | 0.21 | 0.25 |
10 | 0.20 | 0.30 |
Total | 2.20 | 2.70 |
Mean | 0.22 | 0.27 |
Variance | .0028 | .0106 |
Excel file located in AccBio file folder
How to do this all in EXCEL
Ho = null hypothesis if the t value is larger than the chart value (the yellow regions) then reject the null hypothesis and accept the HA that there is a difference between the means of the two groups… there is a significant difference between the treatment group and the control group.
T table of values (5% = 0.05)
For example:
For 10 degrees of freedom (2N-2)
The chart value to compare your t value to is 2.228
If your calculated t value is between
+2.228 and -2.228
Then accept the null hypothesis the mean are similar
If your t value falls outside
+2.228 and -2.228 (larger than 2.228 or smaller than -2.228)
Fail to reject the null hypothesis (accept the alternative hypothesis) there is a significant difference.
So if the mean of the corn = 0.22 and the mean of the peas =0.27
The variance (s2)of the corn is 0.000311 and the peas is .001178.
Each sample population is equal to ten.
Then:
0.22-0.27 / √ (.000311+.001178)/10
-0.05/ √ 0.001489/10
-0.05/ √ .0001489
(ignore negative sign)
t= 4.10
Df = 2N-2 = 2(10) -2=18
Chart value =2.102
Value is higher than t-value… reject the null hypothesis there is a difference in the means.
The “z” test
-used if your population samples are greater than 30
-formula: note: “σ” (sigma) is used instead of the letter “s”
z= mean of pop #1 – mean of pop #2/
√ of variance of pop #1/n1 + variance of pop#2/n2
Also note that if you only had the standard deviation you can square that value and substitute for variance
Z table (sample table with 3 probabilities�
α | Zα (one tail) | Zα/2 (two tails) |
0.1 | 1.28 | 1.64 |
0.05 | 1.645 | 1.96 |
0.01 | 2.33 | 2.576 |
Z table use:
α = alpha (the probability of) 10%, 5% and 1 %
Z α: z alpha refers to the normal distribution curve is on one side only of the curve “one tail” can be left of the mean or right of the mean. Also your null hypothesis is either expected to be greater or less than your experimental or alternative hypothesis
Z α/2 = z alpha 2: refers to an experiment where your null hypothesis predicts no difference between the means of the control or the experimental hypothesis (no difference expected). Your alternative hypothesis is looking for a significant difference
Use a one-tail test to show that sample mean A is significantly greater than (or less than)
sample mean B. Use a two-tail test to show a significant difference (either greater than
Or less than) between sample mean A and sample mean B.
Example z-test
= (85-83)/√3^2/75 + 2^2/60
= 2/0.4321 = 4.629
Example continued
Z table (sample table with 3 probabilities)�
α | Zα (one tail) | Zα/2 (two tails) |
0.1 | 1.28 | 1.64 |
0.05 | 1.645 | 1.96 |
0.01 | 2.33 | 2.576 |
Z= 4.6291
Ho = null hypothesis would be Method 1 is not better than method 2
HA = alternative hypothesis would be that Method 1 is better than method 2
This is a one tailed z test (since the null hypothesis doesn’t predict that there will be no difference)
So for the probability of 0.05 (5% significance or 95% confidence) that Method one is not better than method 2 … that chart value = Zα 1.645
So 4.629 is greater than the 1.645 (the null hypothesis states that method 1 would not be better and the value had to be less than 1.645; it is not less therefore reject the null hypothesis and indeed method 1 is better
Chi square
Interpreting a chi square
How to use a chi square chart
http://faculty.southwest.tn.edu/jiwilliams/probab2.gif
Interpreting your chi square calculation.
Ask yourself this question:
Is your calculated value less than or equal to the chart value for the degrees of freedom:
Is 0.931 < or equal to 3.84 (p=0.05, df=1)
If the answer is yes, then there is no significant difference between your observed and expected values you can accept the null hypothesis. (ex. Cats show no preference between wet and dry food)
If the answer is no, then there is a significant difference between your observed and expected value, you can REJECT the null hypothesis and ACCEPT the alternate hypothesis (ex. Cats prefer wet food vs. dry food).
What to do if you have more than 250 data points beside panic?