1 of 13

Introduction to Data Science

By

S.V.V.D.Jagadeesh

Sr. Assistant Professor

Dept of Artificial Intelligence & Data Science

LAKIREDDY BALI REDDY COLLEGE OF ENGINEERING

2 of 13

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Previous Class Discussions

  • Session Outcomes
  • Facets of Data
  • Structured Data-Example
  • Unstructured Data-Example
  • Natural Language
  • Machine-Generated Data-Example
  • Graph-Based or Network Data-Example
  • Audio, Image and Video Data
  • Streaming Data

LBRCE

IDS

3 of 13

At the end of this session, Student will be able to:

  • CO1: Understand the data science process. (Understand- L2)

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Session Outcomes

LBRCE

IDS

4 of 13

  • The data science process typically consists of six steps
  • Setting the Research Goal
  • Retrieving Data
  • Data Preparation
  • Data Exploration
  • Data Modeling
  • Presentation and Automation

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Data Science Process

LBRCE

IDS

5 of 13

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Data Science Process

LBRCE

IDS

6 of 13

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Setting the Research Goal

  • Data science is mostly applied in the context of an organization.
  • When the business asks you to perform a data science project, you’ll first prepare a project charter.
  • This charter contains information such as what you’re going to research, how the company benefits from that, what data and resources you need, a timetable, and deliverables.

LBRCE

IDS

7 of 13

  • The second step is to collect data.
  • You’ve stated in the project charter which data you need and where you can find it.
  • In this step you ensure that you can use the data in your program, which means checking the existence of, quality, and access to the data.
  • Data can also be delivered by third-party companies and takes many forms ranging from Excel spreadsheets to different types of databases

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Retrieving Data

LBRCE

IDS

8 of 13

  • Data collection is an error-prone process; in this phase you enhance the quality of the data and prepare it for use in subsequent steps.
  • This phase consists of three sub-phases:
  • data cleansing removes false values from a data source and inconsistencies across data sources
  • data integration enriches data sources by combining information from multiple data sources
  • data transformation ensures that the data is in a suitable format for use in your models.

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Data Preparation

LBRCE

IDS

9 of 13

  • Data exploration is concerned with building a deeper understanding of your data.
  • You try to understand how variables interact with each other, the distribution of the data, and whether there are outliers.
  • To achieve this you mainly use descriptive statistics, visual techniques, and simple modeling.
  • This step often goes by the abbreviation EDA, for Exploratory Data Analysis

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Data Exploration

LBRCE

IDS

10 of 13

  • In this phase you use models, domain knowledge, and insights about the data you found in the previous steps to answer the research question.
  • You select a technique from the fields of statistics, machine learning, operations research, and so on.
  • Building a model is an iterative process that involves selecting the variables for the model, executing the model, and model diagnostics.

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Data Modeling or Model Building

LBRCE

IDS

11 of 13

  • Finally, you present the results to your business.
  • These results can take many forms, ranging from presentations to research reports.
  • Sometimes you’ll need to automate the execution of the process because the business will want to use the insights you gained in another project or enable an operational process to use the outcome from your model.

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Presentation and Automation

LBRCE

IDS

12 of 13

  • The previous description of the data science process gives you the impression that you walk through this process in a linear way, but in reality you often have to step back and rework certain findings.
  • For instance, you might find outliers in the data exploration phase that point to data import errors.
  • As part of the data science process you gain incremental insights, which may lead to new questions.
  • To prevent rework, make sure that you scope the business question clearly and thoroughly at the start.

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

An Iterative Process

LBRCE

IDS

13 of 13

  • Session Outcomes
  • The Data Science Process
  • Setting the Research Goal
  • Retrieving Data
  • Data Preparation
  • Data Exploration
  • Data Modeling or Model Building
  • Presentation and Automation
  • An Iterative Process

S.V.V.D.Jagadeesh

Wednesday, December 18, 2024

Summary

LBRCE

IDS