1 of 15

DATA WRANGLING

Amity School of Engineering & Technology

2 of 15

What is Data Wrangling?

Data wrangling, or data munging, is a crucial process in the data analytics workflow that involves cleaning, structuring, and enriching raw data to transform it into a more suitable format for analysis. This process includes cleaning the data by removing or correcting inaccuracies, inconsistencies, and duplicates. It also involves structuring the data, often converting it into a tabular form that is easier to work with in analytical applications.

Enriching the data is another critical step, where new information is added to make the data more useful for analysis and validated to ensure its accuracy and quality. Data wrangling makes raw data more accessible and meaningful, enabling analysts and data scientists to derive valuable insights more efficiently and accurately.

Amity School of Engineering & Technology

3 of 15

How Data Wrangling Works?

Data wrangling is a comprehensive process involving several key steps to transform raw data into a format ready for analysis. This transformation is critical for uncovering valuable insights influencing decision-making and strategic planning. Here's a detailed breakdown of how data wrangling works:

1. Collection

The first step in data wrangling is collecting raw data from various sources. These sources can include databases, files, external APIs, web scraping, and many other data streams. The data collected can be structured (e.g., SQL databases), semi-structured (e.g., JSON, XML files), or unstructured (e.g., text documents, images).

Amity School of Engineering & Technology

4 of 15

2. Cleaning

Once data is collected, the cleaning process begins. This step removes errors, inconsistencies, and duplicates that can skew analysis results. Cleaning might involve:

Removing irrelevant data that doesn't contribute to the analysis.
Correcting errors in data, such as misspellings or incorrect values.
Dealing with missing values by removing them, attributing them to other data points, or estimating them through statistical methods.
Identifying and resolving inconsistencies, such as different formats for dates or currency.

Amity School of Engineering & Technology

5 of 15

3. Structuring

After cleaning, data needs to be structured or restructured into a more analysis-friendly format. This often means converting unstructured or semi-structured data into a structured form, like a table in a database or a CSV file.

This step may involve:

Parsing data into structured fields.
Normalizing data to ensure consistent formats and units.
Transforming data, such as converting text to lowercase, to prepare for analysis.

Amity School of Engineering & Technology

6 of 15

4. Enriching

Data enrichment involves adding context or new information to the dataset to make it more valuable for analysis.

This can include:

Merging data from multiple sources to develop a more comprehensive dataset.
Creating new variables or features that can provide additional insights when analyzed.

Amity School of Engineering & Technology

7 of 15

5. Validating

Validation ensures the data's accuracy and quality after it has been cleaned, structured, and enriched.

This step may involve:

Data integrity checks, such as ensuring foreign keys in a database match.
Quality assurance testing to ensure the data meets predefined standards and rules.

Amity School of Engineering & Technology

8 of 15

6. Storing

The final wrangled data is then stored in a data repository, such as a database or a data warehouse, making it accessible for analysis and reporting. This storage not only secures the data but also organizes it in a way that is efficient for querying and analysis.

Amity School of Engineering & Technology

9 of 15

Amity School of Engineering & Technology

10 of 15

Amity School of Engineering & Technology

11 of 15

Amity School of Engineering & Technology

12 of 15

Amity School of Engineering & Technology

13 of 15

Amity School of Engineering & Technology

14 of 15

Amity School of Engineering & Technology

15 of 15

Amity School of Engineering & Technology