B.Tech CSI-401�Topic :� � IoT data processing with big data analytics
Amity School of Engineering & Technology
Contents
Amity School of Engineering & Technology
Processing
Processing is generally, "the collection and manipulation of items of data to produce meaningful information" .
Data processing therefore refers to the process of transforming raw data into meaningful output i.e. information.
This is done either manually (humans) or automatically (computers or data center).
CS 503
3
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology
CS 503
4
<SELO: 1,9>
<Reference No.: 1>
IoT architecture and big data analytics
Amity School of Engineering & Technology
CS 503
5
<SELO: 1,9>
<Reference No.: 1>
IoT architecture and big data analytics
Amity School of Engineering & Technology
CS 503
6
<SELO: 1,9>
<Reference No.: 1>
IoT architecture and big data analytics
Amity School of Engineering & Technology
ETL
ETL– This technology extracts data from source systems ,transform (Restructuring, Reconciliation, content cleaning, content aggregation) it to satisfy business requirements ,and loads the results into target destination.
ETL (Extract Transform Load) tool move data from one place to another
by performing three functions.
CS 503
7
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology
ETL
Extract Transform Load
CS 503
8
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology
ETL
The goal of ETL is to prepare data for analysis or business intelligence.(BI)
ETL pulls data from sources , transforms it into an understandable format
and then transfer (load) it to another database or data warehouse.
The ETL process cleans , filters and transforms data and then applies
business rules before data populates the new source.
CS 503
9
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology
Data extraction
Data extraction is the process of obtaining data from a database or SaaS platform so that it can be replicated to a destination — such as a data warehouse — designed to support online analytical processing (OLAP).
Data extraction is the first step in a data ingestion process called ETL — extract, transform, and load.
The goal of ETL is to prepare data for analysis or business intelligence (BI).
CS 503
10
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology
How ETL works
Step 1: Extraction
Before data can be moved to a new destination, it must first be extracted from its source. In this first step of the ETL process, structured and unstructured data is imported and consolidated into a single repository. Raw data can be extracted from a wide range of sources, including:
CRM , ERP , SAP , WEBSITE TRAFFIC , MOBILE DEVICES and APPSs
CS 503
11
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology
How ETL works
Step 2: Transformation
The process of data transformation is comprised of several sub-processes:
Cleansing — inconsistencies and missing values in the data are resolved.
Standardization — formatting rule are applied to the data set.
Deduplication — redundant data is excluded or discarded.
Verification — unusable data is removed and anomalies are flagged.
Sorting — data is organized according to type.
CS 503
12
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology
How ETL works
Step 3: Loading
The final step in the ETL process is to load the newly transformed data into a new destination. Data can be loaded all at once (full load) or at scheduled intervals (incremental load).
CS 503
13
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology
Purpose
Extract Transform Load
CS 503
14
<SELO: 1,9>
<Reference No.: 3>
Amity School of Engineering & Technology
Amity School of Engineering & Technology
Data Integration
Data integration involves combining data from several disparate sources, which are stored using various technologies and provide a unified view of the data.
Ex. Customer data integration involves the extraction of information about each individual customer from disparate business systems such as sales, accounts, and marketing, which is then combined into a single view of the customer to be used for customer service, reporting and analysis.
Data integration areas : Data warehousing , Data migration , Master data management.
CS 503
16
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology
17
CS 503
Data integration
Amity School of Engineering & Technology
Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information.
Disparate data are unable to integrated with one another in their current state. Hadoop brings different data types together in one place.
Some are few common data integration approaches.
CS 503
18
<SELO: 1,9>
<Reference No.: 2>
Data Integration
Amity School of Engineering & Technology
Data Consolidation
ETL pulls data from sources, transforms it into an understandable format, and then transfers it to another database or data warehouse. The ETL process cleans, filters, and transforms data, and then applies business rules before data populates the new source.
ETL –this technology extracts data from source systems ,transform(Restructuring, Reconciliation, content cleaning, content aggregation) it to satisfy business requirements ,and loads the results into target destination.
CS 503
19
<SELO: 1,9>
<Reference No.: 2>
Amity School of Engineering & Technology
Data Propagation
EAI integrates application systems for the exchange of messages and transactions. It is often used for real-time business transaction processing. Integration platform as a service (iPaaS) is a modern approach to EAI integration.
EDR typically transfers large amounts of data between databases, instead of applications. base triggers and logs are used to capture and disseminate data changes between the source and remote databases.
CS 503
20
<SELO: 1,9>
<Reference No.: 2>
Amity School of Engineering & Technology
Data Virtualization
CS 503
21
<SELO: 1,9>
<Reference No.: 2>
Amity School of Engineering & Technology
Data Federation
CS 503
22
<SELO: 1,9>
<Reference No.: 2>
Amity School of Engineering & Technology
Data Warehousing
Data mining:-
It refers to the extraction of useful information from a bulk of data or data warehouse.
CS 503
23
<SELO: 1,9>
<Reference No.: 2>
Amity School of Engineering & Technology
Exercise
CS 503
24
Amity School of Engineering & Technology
Conclusion
Amity School of Engineering & Technology
References
• Getting started with Internet of Things, by Cuno Pfister, Shroff; First
edition (17 May 2011), ISBN-10:9350234130
• Big Data and The Internet of Things, by Robert Stackowiak,
Art licht, Springer Nature; 1st ed. Edition (12 May 2015),
ISBN-10: 1484209877
Amity School of Engineering & Technology
CS 503
27
<SELO: 1,9>
<Reference No.: 1>
Amity School of Engineering & Technology