1 of 77

BVCOE, New Delhi

STATISTICS, STATISTICAL MODELLING AND DATA ANALYTICS Unit-2

Semester-VI DA304T

2 of 77

Contents

BVCOE, New Delhi

1

Introduction and Descriptive Statistics

Mean, Median Mode and Standard Deviation, Data Visualization

Introduction to Probability Distribution Hypothesis Testing

Linear Algebra

Population Statistics

Mathematical Methods and ProbabilityTheory

Sampling Distribution

Statistical Inference

Quantitative Analysis

Conformal Mapping by Other Functions

2

3

4

5

6

7

8

9

10

11

12

3

3 of 77

Department of Applied Sciences, BVCOE, New Delhi

BVCOE, New Delhi

4 of 77

,BVCOE, New Delhi

5 of 77

Department of Applied Sciences, BVCOE, New Delhi

6 of 77

BVCOE, New Delhi

7 of 77

8 of 77

9 of 77

10 of 77

11 of 77

12 of 77

13 of 77

Calculating Mean

The mean identifies the average value of the set of numbers. For example, consider the data set containing the values 20, 24, 25, 36, 25, 22, 23.

To find the mean, use the formula: Mean equals the sum of the numbers in the data set divided by the number of values in the data set. In mathematical terms:

Mean=how many terms or values in the setsum of all terms​

Add the numbers in the example data set:

20+24+25+36+25+22+23=17520+24+25+36+25+22+23=175

Divide by the number of data points in the set. This set has seven values so divide by 7.

Insert the values into the formula to calculate the mean. The mean equals the sum of the values (175) divided by the number of data points (7). Since

1757=257175​=25

the mean of this data set equals 25. Not all mean values will equal a whole number.

14 of 77

Calculating Mode

The mode identifies the most common value or values in the data set. Depending on the data, there might be one or more modes, or no mode at all.

Like finding the median, order the data set from smallest to largest. In the example set, the ordered values become: 20, 22, 23, 24, 25, 25, 36.

A mode occurs when values repeat. In the example set, the value 25 occurs twice. No other numbers repeat. Therefore, the mode is the value 25.

In some data sets, more than one mode occurs. The data set 22, 23, 23, 24, 27, 27, 29 contains two modes, one each at 23 and 27. Other data sets may have more than two modes, may have modes with more than two numbers (as 23, 23, 24, 24, 24, 28, 29: mode equals 24) or may not have any modes at all (as 21, 23, 24, 25, 26, 27, 29). The mode may occur anywhere in the data set, not just in the middle.

15 of 77

Calculating Range

Range shows the mathematical distance between the lowest and highest values in the data set. Range measures the variability of the data set. A wide range indicates greater variability in the data, or perhaps a single outlier far from the rest of the data. Outliers may skew, or shift, the mean value enough to impact data analysis.

In the sample group, the lowest value is 20 and the highest value is 36.

To calculate range, subtract the lowest value from the highest value. Since

36−20=1636−20=16

the range equals 16.

In the sample set, the high data value of 36 exceeds the previous value, 25, by 11. This value seems extreme, given the other values in the set. The value of 36 might be an outlier data point.

16 of 77

Calculating Standard Deviation

Standard deviation measures the variability of the data set. Like range, a smaller standard deviation indicates less variability.

Finding standard deviation requires summing the squared difference between each data point and the mean [∑(​x​ − ​µ​)2], adding all the squares, dividing that sum by one less than the number of values (​N​ − 1), and finally calculating the square root of the dividend. In one formula, this is:

SD=∑�(��−�)2�−1SD=N−1∑i​(xi​−μ)2​​

Mathematically, start with calculating the mean.

Calculate the mean by adding all the data point values, then dividing by the number of data points. In the sample data set,

20+24+25+36+25+22+23=17520+24+25+36+25+22+23=175

Divide the sum, 175, by the number of data points, 7, or

7175​=25

The mean equals 25.

Next, subtract the mean from each data point, then square each difference. The formula looks like this:

∑��(��−�)2∑iN​(xi​−μ)2

where ∑ means sum, ​xi represents each data set value and ​µ​ represents the mean value. Continuing with the example set, the values become:

20−25=−5 and −52=2524−25=−1 and −12=125−25=0 and 02=036−25=11 and 112=12125−25=0 and 02=022−25=−3 and −32=923−25=−2 and −22=420−25=−5 and −52=2524−25=−1 and −12=125−25=0 and 02=036−25=11 and 112=12125−25=0 and 02=022−25=−3 and −32=923−25=−2 and −22=4

Adding the squared differences yields:

25+1+0+121+0+9+4=16025+1+0+121+0+9+4=160

Divide the sum of the squared differences by one less than the number of data points. The example data set has 7 values, so ​N​ − 1 equals 7 − 1 = 6. The sum of the squared differences, 160, divided by 6 equals approximately 26.6667.

Calculate the standard deviation by finding the square root of the division by ​N​ − 1. In the example, the square root of 26.6667 equals approximately 5.164. Therefore, the standard deviation equals approximately 5.164.

Standard deviation helps evaluate data. Numbers in the data set that fall within one standard deviation of the mean are part of the data set. Numbers that fall outside of two standard deviations are extreme values or outliers. In the example set, the value 36 lies more than two standard deviations from the mean, so 36 is an outlier. Outliers may represent erroneous data or may suggest unforeseen circumstances and should be carefully considered when interpreting data.

17 of 77

Example 1: Study the bar graph given below and find the mean, median, and mode of the given data set.

18 of 77

Find the mean, median, mode, standard deviation and range for the given data

190, 153, 168, 179, 194, 153, 165, 187, 190, 170, 165, 189, 185, 153, 147, 161, 127, 180

Find the Mean, Median, mode and standard deviation of the data 25, 12, 5, 24, 15, 22, 23, 25

19 of 77

20 of 77

21 of 77

22 of 77

23 of 77

24 of 77

25 of 77

26 of 77

27 of 77

Calculating Median

The median identifies the midpoint or middle value of a set of numbers.

Put the numbers in order from smallest to largest. Use the example set of values: 20, 24, 25, 36, 25, 22, 23. Placed in order, the set becomes: 20, 22, 23, 24, 25, 25, 36.

Since this set of numbers has seven values, the median or value in the center is 24.

If the set of numbers has an even number of values, calculate the average of the two center values. For example, suppose the set of numbers contains the values 22, 23, 25, 26. The middle lies between 23 and 25. Adding 23 and 25 yields 48. Dividing 48 by two gives a median value of 24.

28 of 77

29 of 77

30 of 77

31 of 77

32 of 77

33 of 77

34 of 77

35 of 77

36 of 77

37 of 77

38 of 77

39 of 77

40 of 77

41 of 77

42 of 77

43 of 77

INTRODUCTION TO PROBABILITY THEORY

Probability is defined as the chance of happening or occurrences of an event. Generally, the possibility of analyzing the occurrence of any event with respect to previous data is called probability. For example, if a fair coin is tossed, what is the chance that it lands on the head? These types of questions are answered under probability. In this article, we will learn about, Probability theory, its formulas, and others in detail.

  1. What is Probability Theory?

Probability theory uses the concept of random variables and probability distribution to find the outcome of any situation. Probability theory is an advanced branch of mathematics that deals with the odds and statistics of happening an event.

  1. How does flipping a coin related to Probability?

As soon as you flip a coin, the result is random. It may be tails or heads. both heads and tails have an equal probability of landing so both have a 50-50 chance. Thus, we can say that probability of either head or tail is 1/2.

  1. Probability Theory Definition

Probability theory studied random events and tells us about their occurrence. The two main approaches for studying probability theory are.

  • Theoretical Probability
  • Experimental Probability

IV. Theoretical and Experimental Probabilities

The image given below shows the Theoretical and Experimental Probabilities and their differences.

V. Theoretical Probability

Theoretical Probability deals with assumptions in order to avoid unfeasible or expensive repetition of experiments. The theoretical Probability for an Event A can be calculated as follows:

44 of 77

P(A) = (Number of outcomes favourable to Event A) / (Number of all possible outcomes)

The image shown below shows the theoretical probability formula.

Note: Here we assume the outcomes of an event as equally likely.

Now, as we learn the formula, let’s put this formula in our coin-tossing case. In tossing a coin, there are two outcomes: Head or Tail. Hence, The Probability of occurrence of Head on tossing a coin is

P(H) = 1/2

Similarly, The Probability of the occurrence of a Tail on tossing a coin is P(T) = 1/2

The following image shows an unbiased coin that has an equal probability of landing both heads and tails

45 of 77

VI. Experimental Probability

Experimental probability is found by performing a series of experiments and observing their outcomes. These random experiments are also known as trials. The experimental probability for Event A can be calculated as follows:

P(E) = (Number of times event A happened) / (Total number of trials)

The following image shows the Experimental Probability Formula,

Now, as we learn the formula, let’s put this formula in our coin-tossing case. If we tossed a coin 10 times and recorded heads for 4 times and a tail 6 times then the Probability of Occurrence of Head on tossing a coin:

P(H) = 4/10

Similarly, the Probability of Occurrence of Tails on tossing a coin: P(T) = 6/10

VII. Probability Theory Example

We can study the concept of probability with the help of the example discussed below,

Example: Let’s take two random dice and roll them randomly, now the probability of getting a total of 10 is calculated.

Solution:

Total Possible events that can occur (sample space) {(1,1), (1,2),…, (1,6),…, (6,6)}. The

total spaces are 36.

Now the required events, {(4,6), (5,5), (6,4)} are all which adds up to 10. So the probability of getting a total of 10 is = 3/36 = 1/12

46 of 77

  1. Basics of Probability Theory

Various terms used in probability theory are discussed below,

  1. Random Experiment

In probability theory, any event which can be repeated multiple times and its outcome is not hampered by its repetition is called a Random Experiment. Tossing a coin, rolling dice, etc. are random experiments.

  1. Sample Space

The set of all possible outcomes for any random experiment is called sample space. For example, throwing dice results in six outcomes, which are 1, 2, 3, 4, 5, and 6. Thus, its

sample space is (1, 2, 3, 4, 5, 6)

  1. Event

The outcome of any experiment is called an event. Various types of events used in probability theory are,

  • Independent Events: The events whose outcomes are not affected by the outcomes of other future and/or past events are called independent events. For example, the output of tossing a coin in repetition is not affected by its previous outcome.
  • Dependent Events: The events whose outcomes are affected by the outcome of other events are called dependent events. For example, picking oranges from a bag that contains 100 oranges without replacement.
  • Mutually Exclusive Events: The events that can not occur simultaneously are called mutually exclusive events. For example, obtaining a head or a tail in tossing a coin, because both (head and tail) can not be obtained together.
  • Equally likely Events: The events that have an equal chance or probability of happening are known as equally likely events. For example, observing any face in rolling dice has an equal probability of 1/6.

  1. Random Variable

A variable that can assume the value of all possible outcomes of an experiment is called a random variable in Probability Theory. Random variables in probability theory are of two types which are discussed below,

Discrete Random Variable: Variables that can take countable values such as 0, 1, 2,…

are called discrete random variables.

Continuous Random Variable: Variables that can take an infinite number of values in a given range are called continuous random variables.

  1. Probability Theory Formulas

There are various formulas that are used in probability theory and some of them are discussed below,

47 of 77

  • Theoretical Probability Formula: (Number of Favourable Outcomes) / (Number of Total Outcomes)
  • Empirical Probability Formula: (Number of times event A happened) / (Total number of trials)
  • Addition Rule of Probability: P(A B) = P(A) + P(B) – P(A∩B)
  • Complementary Rule of Probability: P(A’) = 1 – P(A)
  • Independent Events: P(A∩B) = P(A) P(B)
  • Conditional Probability: P(A | B) = P(A∩B) / P(B)
  • Bayes’ Theorem: P(A | B) = P(B | A) P(A) / P(B)

XIV. Applications of Probability Theory

Probability theory is widely used in our life, it is used to find answers to various types of questions, such as will it rain tomorrow? what is the chance of landing on the Moon? what is the chance of the evolution of humans? and others. Some of the important uses of probability theory are,

  • Probability theory is used to predict the performance of stocks and bonds.
  • Probability theory is used in casinos and gambling.
  • Probability theory is used in weather forecasting.
  • Probability theory is used in Risk mitigation.
  • Probability theory is used in consumer industries to mitigate the risk of product failure.

XV. Solved Examples on Probability

Example 1: Consider a jar with 7 red marbles, 3 green marbles, and 4 blue marbles. What is the probability of randomly selecting a non-blue marble from the jar?

Solution:

Given,

Number of Red Marbles = 7, Number of Green Marbles = 3, Number of Blue Marbles = 4 So, Total number of possible outcomes in this case: 7 + 3 + 4 = 14

Now, Number of non-blue marbles are: 7 + 3 = 10

According to the formula of theoretical Probability we can find, P(Non-Blue) = 10/14 = 5/7

Hence, theoretical probability of selecting a non-blue marble is 5/7.

Example 2: Consider Two players, Naveena and Isha, playing a table tennis match. The probability of Naveena winning the match is 0.76. What is the probability of Isha winning the match?

Solution:

Let N and M represent the events that Naveena wins the match and Ashlesha wins the match, respectively.

The probability of Naveena’s winning P(N) = 0.62 (given) The probability of Isha’s winning P(I) = ?

Winning of the match is an mutually exclusive event, since only one of them can win the match.

Therefore,

48 of 77

P(N) + P(I) =1

P(I) = 1 – P(N)

P(I) = 1 – 0.62 = 0.38

Thus, the Probability of Isha winning the match is 0.38.

Example 3: If someone takes out one card from a 52-card deck, what is the probability of the card being a heart? What is the probability of obtaining a 7- number card?

Solution:

Total number of cards in a deck = 52

Total Number of heart cards in a deck = 13 So, the probability of obtaining a heart, P(heart) = 13/52 = 1/4

Total number of 7-number cards in a deck = 4

So, the probability of obtaining a 7-number card, P(7-number) = 4/52 = 1/13

Example 4: Find the probability of rolling an even number when you roll a die containing the numbers 1-6. Express the probability as a fraction, decimal, ratio, or percent.

Solution:

Out of 1 to 6 number, even numbers are 2, 4, and 6.

So, Number of favorable outcomes = 3. Total number of outcomes = 6.

Probability of obtaining an even number P(Even)= 1/2 = 0.5 = 1 : 2 = 50%

49 of 77

PROBABILITY DISTRIBUTION

Probability distribution formula mainly refers to two types of probability distribution which are normal probability distribution (or Gaussian distribution) and binomial probability distribution. To recall, a table that assigns a probability to each of the possible outcomes of a random experiment is a probability distribution table. In simple words, it gives the probability for each value of the random variable.

A. Formulas for Probability Distribution

The formulas for two types of the probability distribution are:

1) Normal Probability Distribution Formula

It is also known as Gaussian distribution and it refers to the equation or graph which are bell-shaped.

The formula for normal probability distribution is as stated:

Where,

  • μ = Mean
  • σ = Standard Distribution.
  • x = Normal random variable.

Note: If mean(μ) = 0 and standard deviation(σ) = 1, then this distribution is known to be

normal distribution.

2) Binomial Probability Distribution Formula

It is defined as the probability that occurred when the event consists of “n” repeated trials and the outcome of each trial may or may not occur.

The formula for binomial probability is as stated below:

Where,

50 of 77

  • n = Total number of events
  • r = Total number of successful events
  • p = Probability of success on a single trial

  • 1 – p = Probability of failure

3) Solved Probability Distribution Example Questions

Question 1: Calculate the probability of getting 8 tails, if a coin is tossed 10 times.

Solution:

Given,

Number of trails(n) = 10

Number of success(r) = 8(getting 8 tails) Probability of single trail(p) =

12

= 0.5

To find

= 10!8!(10–8)!

= 10×9×8!8!2!

= 45

To find pr = 0.58 = 0.00390625

So, the probability of getting 8 tails is:

P(x) = nCr pr (1-

45 × 0.00390625 × (1 –

0.17578125

× 0.52 =

0.0439453125

The probability of getting 8 tails = 0.0439

Question 2: Find the probability of normal distribution with population mean 2, standard deviation 3 of random variable 5.

51 of 77

Solution:

Given, x = 5

Mean = μ = 2

Standard deviation = σ = 3

Normal probability distribution:

Question 3: The probability of a man hits the target is ¼. If he fires 9 times, then find the probability that he hits the target exactly 4 times.

Solution:

Number of fires = n = 9 Number of success hits = r = 4

Probability of hitting the target = p = ¼

Probability of not hitting the target = q = 1 – p = 1 – (¼) = ¾ Finding nCr :

9C4 = 9!/[4! (9-4)!] = 9!/(4! 5!) = (9 × 8 × 7 × 6 × 5!)/(4 × 3 × 2 × 1 × 5!) = 126

Probability of the person hits the target exactly 4 times

= 9C4 (¼)4(¾)(9-4)

= 126 × (1/256) × (243/1024)

= 0.1168

52 of 77

BINOMIAL DISTRIBUTION

The binomial distribution formula helps to check the probability of getting “x” successes in “n” independent trials of a binomial experiment. To recall, the binomial distribution is a type of probability distribution in statistics that has two possible outcomes. In probability theory, the binomial distribution comes with two parameters n and p.

The probability distribution becomes a binomial probability distribution when it meets the following requirements.

  1. Each trial can have only two outcomes or the outcomes that can be reduced to two outcomes. These outcomes can be either a success or a failure.
  2. The trails must be a fixed number.
  3. The outcome of each trial must be independent of each others.
  4. And the success of probability must remain the same for each trial.

Binomial Distribution Formula in Probability

The formula for the binomial probability distribution is as stated below:

Where,

  • n = Total number of events
  • r (or) x = Total number of successful events.
  • p = Probability of success on a single trial.

nCr = [n!/r!(n−r)]!

  • 1 – p = Probability of failure.

Try This: Binomial Distribution Calculator

1) Examples on Binomial Distribution Formula

Example 1:

A coin is tossed12 times. What is the probability of getting exactly 7 heads?

Solution:

Given that a coin is tossed 12 times. (i.e) n= 12

Binomial Distribution Formula

Binomial Distribution

P(x) = nCx · px (1 − p)n−x

Or,

P(r) = [n!/r!(n−r)!]· pr (1 − p)n−r

53 of 77

Thus, a probability pf gettig head in single toss = ½. (i.e) p = ½.

So, 1-p = 1-½ = ½.

We know that the binomial probability distribution is P(r) = nCr · pr (1 − p)n−r. Now, we have to find the probability of getting exactly 7 heads.(i.e) r = 7.

Substituting the values in the binomial distribution formula, we get P(7) = 12C7 · (½)7 (½)12−7

P(7) = 792· (½)7 (½)5

P(7) = 792.(½)12 P(7) = 792 (1/4096) P(7) = 0.193

Therefore, the probability of getting exactly 7 heads is 0.193.

Example 2:

A coin that is fair in nature is tossed n number of times. The probability of the occurrence of a head six times is the same as the probability that a head comes 8 times, then find the value of n.

Solution:

The probability that head occurs 6 times = nC6 (½)6 (½)n-6 Similarly, the probability that head occurs 8 times = nC8 (½)8 (½)n-8

Given that, the probability of the occurrence of a head six times is the same as the probability that a head comes 8 times,

(i.e) nC6 (½)6 (½)n-6 = nC8 (½)8 (½)n-8

nC6(½)n = nC8 (½)n

nC6 = nC8

6 = n-8

54 of 77

n= 14.

Therefore, the value of n is 14.

Example 3:

The probability that a person can achieve a target is 3/4. The count of tries is 5. What is the probability that he will attain the target at least thrice?

Solution:

Given that, p = ¾, q = ¼, n = 5.

Using binomial distribution formula, we get P(X) = nCx · px (1 − p)n−x

Thus, the required probability is: P(X = 3) + P(X=4) + P(X=5)

= 5C3 · (¾)3 (¼ )2 + 5C4 · (¾)4 (¼ )1 +5C5 · (¾)5

= 459/512.

Therefore, the probability that the person will attain the target atleast thrice is 459/512.

NORMAL DISTRIBUTION

In probability and statistics, the normal distribution or Gaussian distribution or bell curve is one of the most important continuous probability distributions. The normal distribution is defined as the probability density function f(x) for the continuous random variable, say x, in the system. A normal distribution is a very important statistical data distribution pattern occurring in many natural phenomena, such as height, blood pressure, lengths of objects produced by machines, etc. Here, we are going to discuss the normal distribution formula and examples in detail.

C. Normal Distribution Formula

For a random variable x, with mean “μ” and standard deviation “σ”, the probability density function for the normal distribution is given by:

Normal Distribution Formula:

55 of 77

Where

μ = Mean

σ = Standard deviation

x = Normal random variable

1) Solved Example on Normal Distribution Formula

Example:

Find the probability density function for the normal distribution where mean = 4 and standard deviation = 2 and x = 3.

Solution: Given: Mean,μ = 4

Standard deviation, σ = 2 Random variable, x = 3.

We know that the normal distribution formula is:

56 of 77

Therefore, the probability density function for the normal distribution is 0.17603.

POISSON DISTRIBUTION FORMULA

Poisson distribution is actually another probability distribution formula. As per binomial distribution, we won’t be given the number of trials or the probability of success on a certain trail. The average number of successes will be given in a certain time interval. The average number of successes is called “Lambda” and denoted by the symbol “λ”.

The formula for Poisson Distribution formula is given below:

is the average number

x is a Poisson random variable.

e is the base of logarithm and e = 2.71828 (approx).

57 of 77

2) Solved Example

Question: As only 3 students came to attend the class today, find the probability for exactly 4 students to attend the classes tomorrow.

Solution:

Given,

Average rate of value(λ

) = 3

Poisson random variable(x) = 4

GAUSSIAN DISTRIBUTION

Gaussian distribution is very common in a continuous probability distribution. The Gaussian distributions are important in statistics and are often used in the natural and social sciences to represent real-valued random variables. Check out the Gaussian distribution formula below.

D. Formula of Gaussian Distribution

The probability density function formula for Gaussian distribution is given by,

58 of 77

59 of 77

60 of 77

61 of 77

62 of 77

63 of 77

64 of 77

65 of 77

66 of 77

67 of 77

68 of 77

69 of 77

70 of 77

71 of 77

72 of 77

73 of 77

74 of 77

75 of 77

76 of 77

77 of 77

References

Department of Applied Sciences, BVCOE, New Delhi