1 of 65

AP Biology Experimental Design and Statistics

adopted in part by BSCS: Interaction of experiments and ideas, 2nd Edition. Prentice Hall, 1970 and Statistics for the Utterly Confused by Lloyd Jaisingh, McGraw-Hill, 2000

Using BioInteractive Resources to Teach Mathematics and Statistics in Biology by

Paul Strode, PhD,Ann Brokaw, 2015

Error bars in experimental biology, Geoff Cumming,Fiona Fidler, and David L. VauxThe Journal of Cell Biology, Vol. 177, No. 1, April 9, 2007 7–11

2 of 65

Basic Format of the Scientific Method that we will use:

1: Hypothesis:

2: Defining of Independent Variable and Dependant Variable.

3: Experimental design and defining controls.

4: Data collection:

5: Data Analysis and Statistical Analysis.

6: Conclusion and Error analysis.

3 of 65

Everything that we have learned in science we have obtained from carefully planned and executed experiments. All experiments begin with a Hypothesis.

An experiment is used to test a hypothesis by providing data that is analyzed using Statistics.

Hypothesis - A proposed explanation based on limited evidence. It must be testable to be a hypothesis. It is based on previous observations that have not been explained. IT IS NOT AN EDUCATED GUESS!

Theory - Well confirmed type of explanation of nature based on overwheming consensus of the scientific community. A Theory is able to predict the outcome of controlled events. Scientific theories are the most reliable, rigorous, and comprehensive form of scientific knowledge,

4 of 65

Hypothesis - A proposed explanation based on limited evidence. It must be testable to be a hypothesis. It is based on previous observations that have not been explained. IT IS NOT AN EDUCATED GUESS!

Theory - Well confirmed type of explanation of nature based on overwhelming consensus of the scientific community. A Theory is able to predict the outcome of controlled events. Scientific theories are the most reliable, rigorous, and comprehensive form of scientific knowledge.

Examples: Theory of Evolution, Quantum Theory, Plate Tectonics, Big Bang Theory

Law - Describe from observations “what happens” under a set of definite conditions. Often decribed with formulas: E = mc2

Examples: Law of Gravity, Laws of Thermodynamics, Law Independent Assortment

5 of 65

Hypothesis - A proposed explanation based on limited evidence. It must be testable to be a hypothesis. It is based on previous observations that have not been explained.

Theory - Well confirmed type of explanation of nature based on overwhelming consensus of the scientific community. A Theory is able to predict the outcome of controlled events.

Law - Describe from observations “what happens” under a set of definite conditions. Often decribed with formulas: E = mc2

Can become a Theory but usually never does! Most Hypotheses in experiments lead to more refined and better Hypotheses in other experiments.

Theories never become Laws!!!! Theories explain why, while Laws just describe what happens!

In our investigations we will primarily test the validity of our hypothesis with Statistics (MATH!!).

6 of 65

Basic Format of the Scientific Method that we will use:

1: Hypothesis:

2: Defining of Independent Variable and Dependant Variable.

3: Experimental design and defining controls.

4: Data collection:

5: Data Analysis and Statistical Analysis.

6: Conclusion and Error analysis.

7 of 65

Qualitative property - a property that can be observed but not measured numerically. It is measured by the quality of something rather than its quantity. The value can be comparative or has an all or none value.

Example: - Water has lead in it.

- The line graph has an increasing value (positive).

- The solution is warm.

- Mr. Grodski smells.

Quantitative property - a property that can be measured numerically. It can demonstrate the degree of a qualitative property.

Example: - Water has 7 ppm of lead in it.

- The line graph has a slope of +.87 y/x units

- The solution has a temperature of 27℃

- Mr. Grodski odor temporarily paralyzed 30 ducks in a

controlled experiment.

Types of properties that are evaluated as variables in experiments:

8 of 65

Before we get to the quantitative values we need to review how graphs are constructed. In the graph below in was the number of birds with a the beak depth was investigated. The researchers wanted to see how the beak depth of the birds changed during the drought. The dependant variable was the birds, while the independent variable was the beak depth. We usually place the Dependant variable on the y - axis while the Independent variable is placed on the x - axis.

The number of birds depended on beak depth (that they had).

The beak depth DID NOT depend on the number of birds.

9 of 65

Independent Variable - Represents the factors that cause the results. Its an input that tests the ability of the dependant variable to change. This variable may not produce or cause any significant change in some cases.

Dependant Variable - Represents the results or the output of the effect of the experiment, which are the quantitative things that you measure. This variable may not change or be effected in some cases.

This is the data from a study that investigated the relationship the number of seeds produced and a plant's biomass (size).

IV: ?

DV: ?

Biomass of the plants

Number of seeds

The number of seeds depends on the biomass of the plant!

10 of 65

Basic Format of the Scientific Method that we will use:

1: Hypothesis:

2: Defining of Independent Variable and Dependant Variable.

3: Experimental design and defining controls.

4: Data collection:

5: Data Analysis and Statistical Analysis.

6: Conclusion and Error analysis.

We will work with these principles in Lab

11 of 65

Part 1: Descriptive Statistics Used in Biology

Scientists typically collect data on a sample of a population and use these data to draw conclusions, or make inferences, about the entire population. An example of such a data set is shown in Table 1. It shows beak measurements taken from two groups of medium ground finches that lived on the island of Daphne Major, one of the Galápagos Islands, during a major drought in 1977. One group of finches died during the drought, and one group survived.

12 of 65

13 of 65

Table 1 (continued)

14 of 65

How would you describe the data in Table 1, and what does it tell you about the populations of medium ground finches of Daphne Major?

These are difficult questions to answer by looking at a table of numbers.

15 of 65

One of the first steps in analyzing a small data set like the one shown is to graph the data and examine the distribution.

16 of 65

17 of 65

Notice that the measurements tend to be more or less symmetrically distributed across a range, with most measurements around the center of the distribution. This is a characteristic of a normal distribution. Most statistical methods covered here apply to data that are normally distributed, like the beak measurements above; other types of distributions require either different kinds of statistics or transforming data to make them normally distributed.

18 of 65

How would you describe these two graphs? How are they the same or different? Descriptive statistics allows you to describe and quantify these differences.

Graphing the data will help determine trends and patterns but we need quantitative values to interpret the data and come up with conclusions that support or oppose the any hypothesis used in an experiment.

19 of 65

  • a branch of mathematics that provides techniques to analyze whether or not your data is significant (meaningful)

  • Statistical applications are based on probability statements

  • Nothing is “proved” with statistics

  • Statistics are reported

  • Statistics report the probability that similar results would occur if you repeated the experiment

What is Statistics?

20 of 65

  • Need to know nature of numbers collected

Continuous variables: type of numbers associated with measuring or weighing; any value in a continuous interval of measurement.

Examples: Weight of students, height of plants, time to flowering

Discrete variables: type of numbers that are counted or categorical

Examples: Numbers of boys, girls, insects, plants

Statistics deals with numbers (IT IS QUANTITATIVE!)

21 of 65

In the two graphs below, the center and spread of each distribution is different. The center of the distribution can be described by the mean, median, or mode. These are referred to as measures of central tendency.

Measures of Average: Mean, Median, and Mode

22 of 65

Measures of Average: Mean, Median, and Mode

Mean

You calculate the sample mean (also referred to as the average or arithmetic mean) by summing all the data points in a data set (ΣX) and then dividing this number by the total number of data points (N):

What we want to understand is the mean of the entire population, which is represented by μ. They use the sample mean, represented by 𝑥̅, as an estimate of μ.

Samples are a selection of the population which are used to make inferences about large populations Statistics are used to describe sample populations as estimators of the corresponding population

Many times, finding complete information about a population is costly and time consuming. We can use samples to represent a population.

Sample population must be representative of the entire population = random sampling

23 of 65

Individuals in a sample population

Must be a fair representation of the entire population.

Therefore sample members must be randomly selected (to avoid bias)

Example: if you were looking at strength in students:

picking students from the football team would NOT be random

Avoiding Bias

24 of 65

Avoiding Bias - Which examples would not produce random samples?

1: A cage has 1000 rats, you pick the first 20 you can catch for your

experiment

2: A public opinion poll is conducted using the telephone directory

3: You are conducting a study of a new diabetes drug; you advertise for

participants in the newspaper and TV

  • All are biased: Rats-you grab the slower rats. Telephone-you call only people with a phone (wealth?) and people who are listed (responsible?). Newspaper/TV-you reach only people with newspaper (wealth/educated?) and TV( wealth?).

25 of 65

If you are using a sample population

Arithmetic Mean (average)

The mean shows that ½ the members of the pop fall on either side of an estimated value : mean*

*This assumes that we have a normal distribution or curve of values.

26 of 65

Students in a biology class planted eight bean seeds in separate plastic cups and placed them under a bank of fluorescent lights. Fourteen days later, the students measured the height of the bean plants that grew from those seeds and recorded their results below:

7.5 + 10.1 + 8.3 + 9.8 + 5.7 + 10.3 + 9.2 + 8.7 = 69.6 centimeters

mean = 69.6 cm/8 = 8.7 centimeters

The mean for this sample of eight plants is 8.7 centimeters and serves as an estimate for the true mean of the population of bean plants growing under these conditions. In other words, if the students collected data from hundreds of plants and graphed the data, the center of the distribution should be around 8.7 centimeters.

27 of 65

Median

When the data are ordered from the largest to the smallest, the median is the midpoint of the data. It is not distorted by extreme values, or even when the distribution is not normal. For this reason, it may be more useful for you to use the median as the main descriptive statistic for a sample of data in which some of the measurements are extremely large or extremely small.

To determine the median of a set of values, you first arrange them in numerical order from lowest to highest. The middle value in the list is the median. If there is an even number of values in the list, then the median is the mean of the middle two values.

28 of 65

A researcher studying mouse behavior recorded in Table 3 the time (in seconds) it took 13 different mice to locate food in a maze.

Length of Time for Mice to Locate Food in a Maze

Median = 32 seconds

Median

29 of 65

A researcher studying mouse behavior recorded in Table 3 the time (in seconds) it took 13 different mice to locate food in a maze.

Length of Time for Mice to Locate Food in a Maze

Median = 32 seconds

Median

mean = 41 seconds

In this case, the median is 32 seconds, but the mean is 41 seconds, which is longer than all but one of the mice took to search for food. In this case, the mean would not be a good measure of central tendency unless the really slow mouse is excluded from the data set.

The mean is not the central tendency in this case because the values are not evenly distributed due to the large value in mouse number 3.

30 of 65

Mode

The mode is another measure of the average. It is the value that appears most often in a sample of data. In the example shown the mode is 33 seconds.

MODE = 33

31 of 65

Mode

The mode is not typically used as a measure of central tendency in biological research, but it can be useful in describing some distributions. Describing these data with a measure of central tendency like the mean or median would obscure this fact.

Clearly there is no central tendency here but in fact 2 central tendencies. The graph at the left shows a distribution of body lengths with two peaks, or modes—called a bimodal distribution.

32 of 65

Measures of Variability: Range, Standard Deviation,

and Variance

Variability describes the extent to which numbers in a data set diverge from the central tendency. It is a measure of how “spread out” the data are from the mean.

A mean, mode, or median do not give a sense of how far apart the values are in the sample. The amount of variability (spread) is important to know because the amount of variability will let you analyze data more effectively by providing insight on the population that you gathered information on.

33 of 65

For Example:

Here are 2 sets of data that have the same mean.

Will a statistic on variability help us view these populations differently? If we just reported the mean would we be missing something?

34 of 65

Range

The simplest measure of variability in a sample of normally distributed data is the range, which is the difference between the largest and smallest values in a set of data.

Width of Maple Tree Leaves

Students in a biology class measured the width in centimeters of eight leaves from eight different maple trees and recorded their results in the table below.

I. Identify the largest and smallest values in the data set:

largest = 10.3 centimeters, smallest = 5.7 centimeters

II. To determine the range, subtract the smallest value from the largest value:

:

range = 10.3 centimeters – 5.7 centimeters = 4.6 centimeters

35 of 65

Range = 10.3 centimeters – 5.7 centimeters = 4.6 centimeters

A larger range value indicates a greater spread of the data—in other words, the larger the range, the greater the variability. However, an extremely large or small value in the data set will make the variability appear high. For example, if the maple leaf sample had not included the very small leaf number 5, the range would have been only 2.8 centimeters. The variance and standard deviation provides a more reliable measure of the “true” spread of the data.

Width of Maple Tree Leaves

36 of 65

Variance (s2)

Mathematically expressing the degree of variation of scores (data) from the mean

A large variance means that the individual scores (data) of the sample deviate a lot from the mean.

A small variance indicates the scores (data) deviate little from the mean

37 of 65

Variance (s2)

Notice that Deviations from the mean are just a subtraction from the mean (to give variance).

Then they are squared to make any negative values (below the mean) become positive values.

They are then added together.

38 of 65

Variance (s2)

There were 5 plants in the population thus n - 1 = 4

10/4 = 2.5 = Variance

degrees of freedom

  • the degree to which individual members within the sample vary from the mean

39 of 65

Standard Deviation = s

Notice the standard deviation is just the square root of the s2 of the variance.

The Standard Deviation (s) is a more powerful descriptive statistic than variance (s2) because it because it reveals predicted limits of finding a particular value.

In our previous example our variance (s2) = 2.5 thus √2.5 = 1.6

The Standard Deviation of 1.6 means that the we can predict the height of a pea plant that will be below the mean by 1.6cm or above the mean by 1.6 cm (+/-1.6) 68% of all the plants.

40 of 65

8 cm

9.6

6.4

11.2

4.8

12.8

3.2

S = 1.6 cm

+1

+2

+3

-1

-2

-3

68%

We can predict 68% of the population will have heights between 6.4 and 9.6. This is within +/- 1 Standard Deviation from the mean (8).

95%

We can predict 95% of the population will have heights between 4.8 and 11.2. This is within +/- 2 Standard Deviations from the mean (8).

99%

The probability of finding a pea plant above 12.8 cm or below 3.2 cm is less than 1%

41 of 65

S

+1

+2

+3

-1

-2

-3

95%

If you wished to see if a red blood cell count was normal, you could see whether it was within 2 SD of the mean of the population as a whole. Less than 5% of all red blood cell counts are more than 2 SD from the mean, so if the count in question is more than 2 SD from the mean, you might consider it to be abnormal.

42 of 65

- As you increase the size of your sample, or repeat the experiment more times, the mean of your results ( ) will tend to get closer and closer to the true mean, or the mean of the whole population, μ.

- We can use as our best estimate of the unknown μ. Similarly, as you repeat an experiment more and more times, the SD of your results will tend to more and more closely approximate the true standard deviation (σ) that you would get if the experiment was performed an infinite number of times, or on the whole population.

- However, the SD of the experimental results will approximate to σ, whether n is large or small. Like , SD does not change systematically as n changes, and we can use SD as our best estimate of the unknown σ, whatever the value of n.

43 of 65

Measures of Confidence: Standard Error of the Mean

The standard deviation provides a measure of the spread of the data from the mean. A different type of statistic reveals the uncertainty in the calculation of the mean.

Every sample that is collected represents some variation from the true mean of the entire population.In fact, every time you take a sample and calculate a sample mean, you would expect a slightly different value. In other words, the sample means (averages) themselves have variability. This variability can be expressed by calculating the standard error of the mean (abbreviated as SE𝑥̅ or SEM).

The sample means of many different samples would be normally distributed. The standard error of the mean represents the standard deviation of such a distribution and estimates how close the sample mean is to the population mean.

44 of 65

Measures of Confidence: Standard Error of the Mean

The Standard error of the mean tells you how accurate your estimate of the mean from your sample is likely to be of the true population mean. The smaller the SEM the more confidence we have in the data or the more reliable the data.

To calculate SE𝑥̅ or SEM divide the standard deviation

by the square root of the sample size:

So from our example earlier using from the pea plants heights we had a sample of 5.

√5 = 2.24

1.6

2.24

= .714 = SEM

So this means that 68% of the sample means would fall +/- .714 cm of the population mean. This is a very large SEM and has low confidence because the sample size is very low (5).

45 of 65

The greater each sample size, the more closely the sample mean will estimate the population mean, and therefore the standard error of the mean becomes smaller.

Because when we take larger samples, our sample means get closer to the true mean value of the population. Thus, the distribution of the sample means would be less spread out and would have a lower standard deviation.

If the sample size was increased to 100 from 5 our SEM would become:

√100 = 10

1.6

10

= .16 = SEM

with 100 samples

.714 = SEM

with 5 samples

46 of 65

8

Mean Height

n = 5

9

7

+0.714

- 0.714

SEM = +/- 1

We can illustrate the SEM by using ERROR Bars when making graphs.

The red error bar reveals that the true population mean can be anywhere from 8.714 to 7.286 in 68% of the time.

So using our same example with 5 sample and a mean of 8 cm.

Error Bars with +/- 2 SEM are used to show 95% Confidence

Error Bars - SEM

47 of 65

8 cm

8.714

7.286

Remember that SEM is the Deviation of possible values of the true mean of the entire population, assuming all sample means from a population would are a normal distribution..

68%

SEM

-1

+1

Error Bars - SEM

48 of 65

8 cm

8.714

7.286

95%

SEM

-1

+1

-2

+2

6.572

9.428

Error bars often will report the SEM +/- 2 to illustrate 95% confidence that the mean from an experimental sample approximates the true population mean.

Error Bars - SEM

49 of 65

Error Bars - SEM

The Error Bars on the left represent +/- 1 Standard error from the measured mean from the two populations in the study. We can say with confidence that 68% of all sample means would fall in this range. The Larger the Error Bars the larger the SEM and the LOWER the confidence we have in our measured means for the 2 populations.

SEM = 1.33

SEM = 1.05

50 of 65

Error Bars - SEM

Are the 2 sets of data significant from each other when we consider the error bars?

How would the error bars affect this analysis?

51 of 65

Determining Significance- Null Hypothesis and P- value

In every experiment, there is an effect or difference between groups that the researchers are testing. It could be the effectiveness of a new drug, new fertilizer, or other variables that has benefits. Unfortunately for the researchers, there is always the possibility that there is no effect, that is, that there is no difference between the groups. This lack of a difference is called the null hypothesis, which is essentially the position a devil’s advocate would take when evaluating the results of an experiment.

52 of 65

Determining Significance- Null Hypothesis and P- value

To see why, let’s imagine an experiment for a pea plant that we know is totally ineffective. The null hypothesis is true: there is no difference between the experimental groups at the population level.

Despite the null being true, it’s entirely possible that there will be an effect in the sample data due to random sampling error. In fact, it is extremely unlikely that the sample groups will ever exactly equal the null hypothesis value. Consequently, the devil’s advocate position is that the observed difference in the sample does not reflect a true difference between populations.

53 of 65

Determining Significance- Null Hypothesis and P- value

A P value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis.

This means if the null Hypothesis is true, then the probability (P value) of attaining our experimental results is due to some chance sampling error (not a representative sample of the true population).

If the P value is high then your data has a high probability that the differences you find

54 of 65

Determining Significance- Null Hypothesis and P- value

If the P value is High Enough (p > .05), then your data has a high probability that the differences you find between 2 sets of data are just do to the type of samples that were studied. (You truly did not get a random sample of the population). In this case we would accept the Null Hypothesis.

If the P value is Low Enough, ( < .05) then your data has a low probability that the differences you find between 2 sets of data are due the type of sample that were studied and the effects are due to other variables.

In this case we would reject the Null Hypothesis.

55 of 65

Determining Significance- Null Hypothesis and P- value

By convention, if P < 0.05 you say the result is statistically significant, and if P < 0.01 you say the result is highly

significant and you can be more confident you have found a true effect.

Why is the P < 0.05 value the cutoff?

As always with statistical inference, you may be wrong! Perhaps there really is no effect, and you had the bad luck to get one of the 5% (if P < 0.05) or 1% (if P < 0.01) of sets of results that suggests a difference where there is none. Of course, even if results are statistically highly significant, it does not mean they are necessarily biologically important and vica versa.

56 of 65

The panels on the right show what is needed

when n ≥ 10: a gap equal to SE indicates

P ≈ 0.05 and a gap of 2SE indicates P ≈

0.01. To assess the gap, use the average SE

for the two groups, meaning the average of

one arm of the group C bars and one arm of

the E bars. In this case, P ≈ 0.05 if double the SE bars just touch, meaning a gap of 2 SE.

The smaller the overlap of bars, or the larger the gap

between bars, the smaller the P value and the stronger the evidence for a true difference (rejecting Null Hypothesis).

57 of 65

So in this set of data, the span of the error bars for the DARK and LIGHT Data are far enough away from each other to suggest that there is a statistical difference between the groups and that amount of light may have had an effect on the growth of the seedlings.

58 of 65

Sample Multiple-Choice Question Using Data Analysis

Testosterone oxido-reductase is a liver enzyme that regulates testosterone levels in

alligators. One study compared testosterone oxido-reductase activity between male and female

alligators from Lake Woodruff, a relatively pristine environment, and from Lake Apopka, an area

that has suffered severe contamination. The following graph depicts the findings of that study.

59 of 65

Lake Woodruff is a relatively pristine environment while Lake Apopka is an area

that has suffered severe contamination.

Why can’t choice D be correct?

60 of 65

Observing errors bars on graphs is not the best way to identify significance of data. The best way is to calculate p values is to use statistical tests. One such test that we use for frequency of discrete data is the Chi - Squared Test. *Remember that discrete data represents categorical values (ex. pea plants that have white or red flowers)

A t-test is used for to compare 2 sets of data that use continuous values (height, weight). We will not be using t-tests in this course but you should be familiar with the idea that a t-test is a statistical test that results in obtaining a p-value.

Chi-Square Test

61 of 65

Chi-Square Test

In an experiment that flipped a coin 50 times the expected results are 25 heads and 25 tails. Is the difference between observed and expected results purely due to chance? Or could it be due to something else, such as something might be wrong with the coin?

Notice the summation sign ∑ which means that you need to sum all the categories.

𝛘 = 1.28

62 of 65

Chi-Square Test

𝛘 = 1.28

Ok great but how do we get a P value?

We need to use a Critical Values Table to obtain a p value.

First we need to determine the degrees of freedom (df).

df = number of categories − 1

df = (2) − 1 = 1

Heads or Tails

Now that we have the df we can use the Chi Squared Table.

63 of 65

Chi-Square Test

𝛘 = 1.28

𝛘 = 1.28

The value falls between 0.455 and 2.706 which means that the result of the coin flip experiment is likely to occur 50 to 10 percent of the time. Therefore, you cannot reject the null hypothesis that the results have likely occurred simply by chance because the p value is greater than 5% or 0.05.

http://www.bio.miami.edu/dana/250/25008_6.html

64 of 65

If we use the Chi-Squared Table in the AP Biology Reference tables we can interpret the result the following manner:

The calculated Chi-Squared value (1.28) is less than the 3.83 value for 1 df and a p value of 0.05. Notice the p value of 0.01 has a larger value (6.64).

The larger the 𝛘 value the lower the p value.

Therefore, you cannot reject the null hypothesis that the results have likely occurred simply by chance because the p value is greater than 5% or 0.05.

𝛘 = 1.28

65 of 65

Lets not lose sight of the fact that statistics are statistics.

They are tools to help us make some judgments on our data but they are just tools. We still need our understanding of science and biology to interpret what the statistics might mean.

The quality of your lab investigations will be centered on the possible meaning of your data. The data and statistics will never give definite outcomes. It will just give the chance or probability that the outcomes have meaning.