1 of 17

MATHEMATICS IN DATA SCIENCE

BY MANASVI LOGANI

BSc(Hons)Mathematics

Indraprastha College for Women

2 of 17

What is Data Science?

Data Science is about extraction, preparation, analysis, visualization, and maintenance of information. It is a cross-disciplinary field which uses scientific methods and processes to draw insights from data.

It is a multidisciplinary field that combines mathematics, statistics, and computer science.

2

3 of 17

Purpose of Data Science

  • Analyze and draw insights from the data.
  • Making predictions from the data. 
  • Derive conclusions from the data and using these to assist companies in making smarter business decisions.

3

4 of 17

TOP COMPANIES REQUIRE DATA SCIENCE FOR FUNCTIONING

APPLE

AMAZON

NETFLIX

4

5 of 17

650%

11.5 million

Growth of roles of data scientist since 2012.

Jobs will be created by 2026 according to the US Bureau of Labour Statistics.

5

6 of 17

Mathematics for Data Science

7 of 17

Linear Algebra

  • Linear Algebra is widely used in image recognition, text analysis and also dimensionality reduction.
  • Uses the concept of matrices to store images and design algorithm for classification.

7

8 of 17

WHAT DO YOU SEE HERE?

8

9 of 17

XGBOOST

  • It stores the numeric data in the form of Matrix to give predictions. It enables XGBOOST to process data faster and provide more accurate results.

9

10 of 17

NEURAL NETWORKS

  • neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates.
  • Weights learned by a Neural Network are also stored in Matrices.

10

11 of 17

CALCULUS

Calculus is used essentially in optimization techniques. One cannot have a deep knowledge of machine learning without calculus.

The two components of calculus:

  • Differential Calculus
  • Integral Calculus

11

12 of 17

Differential Calculus

Differential Calculus studies the rate at which the quantities change.

  • Derivates are used in optimization techniques where we have to find the minima in order to minimize the error function.
  •  the partial derivates that are used for designing backpropagation in neural networks.
  • Chain Rule is another important concept used to compute backpropagation.

12

13 of 17

Integral Calculus

  • Integral Calculus is the mathematical study of the accumulation of quantities and for finding the area under the curve. 
  • Integration is most widely used in computing probability density functions and variance of the random variable.

13

14 of 17

In the next 10 years, Data Science and Software will do more for Medicines than all of the Biological Sciences together.

-Vinod Khosla, an American Billionaire Businessman and Co-founder of Sun Microsystems 

4

15 of 17

15

16 of 17

REFERENCES

16

17 of 17

THANK YOU!

17