BVCOE, New Delhi
STATISTICS, STATISTICAL MODELLING AND DATA ANALYTICS Unit-2
Semester-VI DA304T
Contents
BVCOE, New Delhi
1
Introduction and Descriptive Statistics
Mean, Median Mode and Standard Deviation, Data Visualization
Introduction to Probability Distribution Hypothesis Testing
Linear Algebra
Population Statistics
Mathematical Methods and ProbabilityTheory
Sampling Distribution
Statistical Inference
Quantitative Analysis
Conformal Mapping by Other Functions
2
3
4
5
6
7
8
9
10
11
12
3
Department of Applied Sciences, BVCOE, New Delhi
BVCOE, New Delhi
,BVCOE, New Delhi
Department of Applied Sciences, BVCOE, New Delhi
BVCOE, New Delhi
Calculating Mean
The mean identifies the average value of the set of numbers. For example, consider the data set containing the values 20, 24, 25, 36, 25, 22, 23.
To find the mean, use the formula: Mean equals the sum of the numbers in the data set divided by the number of values in the data set. In mathematical terms:
Mean=how many terms or values in the setsum of all terms
Add the numbers in the example data set:
20+24+25+36+25+22+23=17520+24+25+36+25+22+23=175
Divide by the number of data points in the set. This set has seven values so divide by 7.
Insert the values into the formula to calculate the mean. The mean equals the sum of the values (175) divided by the number of data points (7). Since
1757=257175=25
the mean of this data set equals 25. Not all mean values will equal a whole number.
�
Calculating Mode
The mode identifies the most common value or values in the data set. Depending on the data, there might be one or more modes, or no mode at all.
Like finding the median, order the data set from smallest to largest. In the example set, the ordered values become: 20, 22, 23, 24, 25, 25, 36.
A mode occurs when values repeat. In the example set, the value 25 occurs twice. No other numbers repeat. Therefore, the mode is the value 25.
In some data sets, more than one mode occurs. The data set 22, 23, 23, 24, 27, 27, 29 contains two modes, one each at 23 and 27. Other data sets may have more than two modes, may have modes with more than two numbers (as 23, 23, 24, 24, 24, 28, 29: mode equals 24) or may not have any modes at all (as 21, 23, 24, 25, 26, 27, 29). The mode may occur anywhere in the data set, not just in the middle.
�
Calculating Range
Range shows the mathematical distance between the lowest and highest values in the data set. Range measures the variability of the data set. A wide range indicates greater variability in the data, or perhaps a single outlier far from the rest of the data. Outliers may skew, or shift, the mean value enough to impact data analysis.
In the sample group, the lowest value is 20 and the highest value is 36.
To calculate range, subtract the lowest value from the highest value. Since
36−20=1636−20=16
the range equals 16.
In the sample set, the high data value of 36 exceeds the previous value, 25, by 11. This value seems extreme, given the other values in the set. The value of 36 might be an outlier data point.
Calculating Standard Deviation
Standard deviation measures the variability of the data set. Like range, a smaller standard deviation indicates less variability.
Finding standard deviation requires summing the squared difference between each data point and the mean [∑(x − µ)2], adding all the squares, dividing that sum by one less than the number of values (N − 1), and finally calculating the square root of the dividend. In one formula, this is:
SD=∑�(��−�)2�−1SD=N−1∑i(xi−μ)2
Mathematically, start with calculating the mean.
Calculate the mean by adding all the data point values, then dividing by the number of data points. In the sample data set,
20+24+25+36+25+22+23=17520+24+25+36+25+22+23=175
Divide the sum, 175, by the number of data points, 7, or
7175=25
The mean equals 25.
Next, subtract the mean from each data point, then square each difference. The formula looks like this:
∑��(��−�)2∑iN(xi−μ)2
where ∑ means sum, xi represents each data set value and µ represents the mean value. Continuing with the example set, the values become:
20−25=−5 and −52=2524−25=−1 and −12=125−25=0 and 02=036−25=11 and 112=12125−25=0 and 02=022−25=−3 and −32=923−25=−2 and −22=420−25=−5 and −52=2524−25=−1 and −12=125−25=0 and 02=036−25=11 and 112=12125−25=0 and 02=022−25=−3 and −32=923−25=−2 and −22=4
Adding the squared differences yields:
25+1+0+121+0+9+4=16025+1+0+121+0+9+4=160
Divide the sum of the squared differences by one less than the number of data points. The example data set has 7 values, so N − 1 equals 7 − 1 = 6. The sum of the squared differences, 160, divided by 6 equals approximately 26.6667.
Calculate the standard deviation by finding the square root of the division by N − 1. In the example, the square root of 26.6667 equals approximately 5.164. Therefore, the standard deviation equals approximately 5.164.
Standard deviation helps evaluate data. Numbers in the data set that fall within one standard deviation of the mean are part of the data set. Numbers that fall outside of two standard deviations are extreme values or outliers. In the example set, the value 36 lies more than two standard deviations from the mean, so 36 is an outlier. Outliers may represent erroneous data or may suggest unforeseen circumstances and should be carefully considered when interpreting data.
Example 1: Study the bar graph given below and find the mean, median, and mode of the given data set.
Find the mean, median, mode, standard deviation and range for the given data
190, 153, 168, 179, 194, 153, 165, 187, 190, 170, 165, 189, 185, 153, 147, 161, 127, 180
Find the Mean, Median, mode and standard deviation of the data 25, 12, 5, 24, 15, 22, 23, 25
Calculating Median
The median identifies the midpoint or middle value of a set of numbers.
Put the numbers in order from smallest to largest. Use the example set of values: 20, 24, 25, 36, 25, 22, 23. Placed in order, the set becomes: 20, 22, 23, 24, 25, 25, 36.
Since this set of numbers has seven values, the median or value in the center is 24.
If the set of numbers has an even number of values, calculate the average of the two center values. For example, suppose the set of numbers contains the values 22, 23, 25, 26. The middle lies between 23 and 25. Adding 23 and 25 yields 48. Dividing 48 by two gives a median value of 24.
�
INTRODUCTION TO PROBABILITY THEORY
Probability is defined as the chance of happening or occurrences of an event. Generally, the possibility of analyzing the occurrence of any event with respect to previous data is called probability. For example, if a fair coin is tossed, what is the chance that it lands on the head? These types of questions are answered under probability. In this article, we will learn about, Probability theory, its formulas, and others in detail.
Probability theory uses the concept of random variables and probability distribution to find the outcome of any situation. Probability theory is an advanced branch of mathematics that deals with the odds and statistics of happening an event.
As soon as you flip a coin, the result is random. It may be tails or heads. both heads and tails have an equal probability of landing so both have a 50-50 chance. Thus, we can say that probability of either head or tail is 1/2.
Probability theory studied random events and tells us about their occurrence. The two main approaches for studying probability theory are.
IV. Theoretical and Experimental Probabilities
The image given below shows the Theoretical and Experimental Probabilities and their differences.
V. Theoretical Probability
Theoretical Probability deals with assumptions in order to avoid unfeasible or expensive repetition of experiments. The theoretical Probability for an Event A can be calculated as follows:
P(A) = (Number of outcomes favourable to Event A) / (Number of all possible outcomes)
The image shown below shows the theoretical probability formula.
Note: Here we assume the outcomes of an event as equally likely.
Now, as we learn the formula, let’s put this formula in our coin-tossing case. In tossing a coin, there are two outcomes: Head or Tail. Hence, The Probability of occurrence of Head on tossing a coin is
P(H) = 1/2
Similarly, The Probability of the occurrence of a Tail on tossing a coin is P(T) = 1/2
The following image shows an unbiased coin that has an equal probability of landing both heads and tails
VI. Experimental Probability
Experimental probability is found by performing a series of experiments and observing their outcomes. These random experiments are also known as trials. The experimental probability for Event A can be calculated as follows:
P(E) = (Number of times event A happened) / (Total number of trials)
The following image shows the Experimental Probability Formula,
Now, as we learn the formula, let’s put this formula in our coin-tossing case. If we tossed a coin 10 times and recorded heads for 4 times and a tail 6 times then the Probability of Occurrence of Head on tossing a coin:
P(H) = 4/10
Similarly, the Probability of Occurrence of Tails on tossing a coin: P(T) = 6/10
VII. Probability Theory Example
We can study the concept of probability with the help of the example discussed below,
Example: Let’s take two random dice and roll them randomly, now the probability of getting a total of 10 is calculated.
Solution:
Total Possible events that can occur (sample space) {(1,1), (1,2),…, (1,6),…, (6,6)}. The
total spaces are 36.
Now the required events, {(4,6), (5,5), (6,4)} are all which adds up to 10. So the probability of getting a total of 10 is = 3/36 = 1/12
Various terms used in probability theory are discussed below,
In probability theory, any event which can be repeated multiple times and its outcome is not hampered by its repetition is called a Random Experiment. Tossing a coin, rolling dice, etc. are random experiments.
The set of all possible outcomes for any random experiment is called sample space. For example, throwing dice results in six outcomes, which are 1, 2, 3, 4, 5, and 6. Thus, its
sample space is (1, 2, 3, 4, 5, 6)
The outcome of any experiment is called an event. Various types of events used in probability theory are,
A variable that can assume the value of all possible outcomes of an experiment is called a random variable in Probability Theory. Random variables in probability theory are of two types which are discussed below,
Discrete Random Variable: Variables that can take countable values such as 0, 1, 2,…
are called discrete random variables.
Continuous Random Variable: Variables that can take an infinite number of values in a given range are called continuous random variables.
There are various formulas that are used in probability theory and some of them are discussed below,
XIV. Applications of Probability Theory
Probability theory is widely used in our life, it is used to find answers to various types of questions, such as will it rain tomorrow? what is the chance of landing on the Moon? what is the chance of the evolution of humans? and others. Some of the important uses of probability theory are,
XV. Solved Examples on Probability
Example 1: Consider a jar with 7 red marbles, 3 green marbles, and 4 blue marbles. What is the probability of randomly selecting a non-blue marble from the jar?
Solution:
Given,
Number of Red Marbles = 7, Number of Green Marbles = 3, Number of Blue Marbles = 4 So, Total number of possible outcomes in this case: 7 + 3 + 4 = 14
Now, Number of non-blue marbles are: 7 + 3 = 10
According to the formula of theoretical Probability we can find, P(Non-Blue) = 10/14 = 5/7
Hence, theoretical probability of selecting a non-blue marble is 5/7.
Example 2: Consider Two players, Naveena and Isha, playing a table tennis match. The probability of Naveena winning the match is 0.76. What is the probability of Isha winning the match?
Solution:
Let N and M represent the events that Naveena wins the match and Ashlesha wins the match, respectively.
The probability of Naveena’s winning P(N) = 0.62 (given) The probability of Isha’s winning P(I) = ?
Winning of the match is an mutually exclusive event, since only one of them can win the match.
Therefore,
P(N) + P(I) =1
P(I) = 1 – P(N)
P(I) = 1 – 0.62 = 0.38
Thus, the Probability of Isha winning the match is 0.38.
Example 3: If someone takes out one card from a 52-card deck, what is the probability of the card being a heart? What is the probability of obtaining a 7- number card?
Solution:
Total number of cards in a deck = 52
Total Number of heart cards in a deck = 13 So, the probability of obtaining a heart, P(heart) = 13/52 = 1/4
Total number of 7-number cards in a deck = 4
So, the probability of obtaining a 7-number card, P(7-number) = 4/52 = 1/13
Example 4: Find the probability of rolling an even number when you roll a die containing the numbers 1-6. Express the probability as a fraction, decimal, ratio, or percent.
Solution:
Out of 1 to 6 number, even numbers are 2, 4, and 6.
So, Number of favorable outcomes = 3. Total number of outcomes = 6.
Probability of obtaining an even number P(Even)= 1/2 = 0.5 = 1 : 2 = 50%
PROBABILITY DISTRIBUTION
Probability distribution formula mainly refers to two types of probability distribution which are normal probability distribution (or Gaussian distribution) and binomial probability distribution. To recall, a table that assigns a probability to each of the possible outcomes of a random experiment is a probability distribution table. In simple words, it gives the probability for each value of the random variable.
A. Formulas for Probability Distribution
The formulas for two types of the probability distribution are:
1) Normal Probability Distribution Formula
It is also known as Gaussian distribution and it refers to the equation or graph which are bell-shaped.
The formula for normal probability distribution is as stated:
Where,
Note: If mean(μ) = 0 and standard deviation(σ) = 1, then this distribution is known to be
normal distribution.
2) Binomial Probability Distribution Formula
It is defined as the probability that occurred when the event consists of “n” repeated trials and the outcome of each trial may or may not occur.
The formula for binomial probability is as stated below:
Where,
∙
3) Solved Probability Distribution Example Questions
Question 1: Calculate the probability of getting 8 tails, if a coin is tossed 10 times.
Solution:
Given,
Number of trails(n) = 10
Number of success(r) = 8(getting 8 tails) Probability of single trail(p) =
12
= 0.5
To find
= 10!8!(10–8)!
= 10×9×8!8!2!
= 45
To find pr = 0.58 = 0.00390625
So, the probability of getting 8 tails is:
P(x) = nCr pr (1-
45 × 0.00390625 × (1 –
0.17578125
× 0.52 =
0.0439453125
The probability of getting 8 tails = 0.0439
Question 2: Find the probability of normal distribution with population mean 2, standard deviation 3 of random variable 5.
Solution:
Given, x = 5
Mean = μ = 2
Standard deviation = σ = 3
Normal probability distribution:
Question 3: The probability of a man hits the target is ¼. If he fires 9 times, then find the probability that he hits the target exactly 4 times.
Solution:
Number of fires = n = 9 Number of success hits = r = 4
Probability of hitting the target = p = ¼
Probability of not hitting the target = q = 1 – p = 1 – (¼) = ¾ Finding nCr :
9C4 = 9!/[4! (9-4)!] = 9!/(4! 5!) = (9 × 8 × 7 × 6 × 5!)/(4 × 3 × 2 × 1 × 5!) = 126
Probability of the person hits the target exactly 4 times
= 9C4 (¼)4(¾)(9-4)
= 126 × (1/256) × (243/1024)
= 0.1168
BINOMIAL DISTRIBUTION
The binomial distribution formula helps to check the probability of getting “x” successes in “n” independent trials of a binomial experiment. To recall, the binomial distribution is a type of probability distribution in statistics that has two possible outcomes. In probability theory, the binomial distribution comes with two parameters n and p.
The probability distribution becomes a binomial probability distribution when it meets the following requirements.
Binomial Distribution Formula in Probability
The formula for the binomial probability distribution is as stated below:
Where,
∙ nCr = [n!/r!(n−r)]!
Try This: Binomial Distribution Calculator
1) Examples on Binomial Distribution Formula
Example 1:
A coin is tossed12 times. What is the probability of getting exactly 7 heads?
Solution:
Given that a coin is tossed 12 times. (i.e) n= 12
Binomial Distribution Formula | |
Binomial Distribution | P(x) = nCx · px (1 − p)n−x |
Or, | P(r) = [n!/r!(n−r)!]· pr (1 − p)n−r |
Thus, a probability pf gettig head in single toss = ½. (i.e) p = ½.
So, 1-p = 1-½ = ½.
We know that the binomial probability distribution is P(r) = nCr · pr (1 − p)n−r. Now, we have to find the probability of getting exactly 7 heads.(i.e) r = 7.
Substituting the values in the binomial distribution formula, we get P(7) = 12C7 · (½)7 (½)12−7
P(7) = 792· (½)7 (½)5
P(7) = 792.(½)12 P(7) = 792 (1/4096) P(7) = 0.193
Therefore, the probability of getting exactly 7 heads is 0.193.
Example 2:
A coin that is fair in nature is tossed n number of times. The probability of the occurrence of a head six times is the same as the probability that a head comes 8 times, then find the value of n.
Solution:
The probability that head occurs 6 times = nC6 (½)6 (½)n-6 Similarly, the probability that head occurs 8 times = nC8 (½)8 (½)n-8
Given that, the probability of the occurrence of a head six times is the same as the probability that a head comes 8 times,
(i.e) nC6 (½)6 (½)n-6 = nC8 (½)8 (½)n-8
⇒nC6(½)n = nC8 (½)n
⇒nC6 = nC8
⇒ 6 = n-8
⇒ n= 14.
Therefore, the value of n is 14.
Example 3:
The probability that a person can achieve a target is 3/4. The count of tries is 5. What is the probability that he will attain the target at least thrice?
Solution:
Given that, p = ¾, q = ¼, n = 5.
Using binomial distribution formula, we get P(X) = nCx · px (1 − p)n−x
Thus, the required probability is: P(X = 3) + P(X=4) + P(X=5)
= 5C3 · (¾)3 (¼ )2 + 5C4 · (¾)4 (¼ )1 +5C5 · (¾)5
= 459/512.
Therefore, the probability that the person will attain the target atleast thrice is 459/512.
NORMAL DISTRIBUTION
In probability and statistics, the normal distribution or Gaussian distribution or bell curve is one of the most important continuous probability distributions. The normal distribution is defined as the probability density function f(x) for the continuous random variable, say x, in the system. A normal distribution is a very important statistical data distribution pattern occurring in many natural phenomena, such as height, blood pressure, lengths of objects produced by machines, etc. Here, we are going to discuss the normal distribution formula and examples in detail.
C. Normal Distribution Formula
For a random variable x, with mean “μ” and standard deviation “σ”, the probability density function for the normal distribution is given by:
Normal Distribution Formula:
Where
μ = Mean
σ = Standard deviation
x = Normal random variable
1) Solved Example on Normal Distribution Formula
Example:
Find the probability density function for the normal distribution where mean = 4 and standard deviation = 2 and x = 3.
Solution: Given: Mean,μ = 4
Standard deviation, σ = 2 Random variable, x = 3.
We know that the normal distribution formula is:
Therefore, the probability density function for the normal distribution is 0.17603.
POISSON DISTRIBUTION FORMULA
Poisson distribution is actually another probability distribution formula. As per binomial distribution, we won’t be given the number of trials or the probability of success on a certain trail. The average number of successes will be given in a certain time interval. The average number of successes is called “Lambda” and denoted by the symbol “λ”.
The formula for Poisson Distribution formula is given below:
is the average number
x is a Poisson random variable.
e is the base of logarithm and e = 2.71828 (approx).
2) Solved Example
Question: As only 3 students came to attend the class today, find the probability for exactly 4 students to attend the classes tomorrow.
Solution:
Given,
Average rate of value(λ
) = 3
Poisson random variable(x) = 4
GAUSSIAN DISTRIBUTION
Gaussian distribution is very common in a continuous probability distribution. The Gaussian distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables. Check out the Gaussian distribution formula below.
D. Formula of Gaussian Distribution
The probability density function formula for Gaussian distribution is given by,
References
Department of Applied Sciences, BVCOE, New Delhi