1 of 27

B.Tech CSI-401�Topic :� IoT data processing with big data analytics

Amity School of Engineering & Technology

2 of 27

Contents

  • IOT & BIGDATA
  • IoT architecture and big data analytics
  • ETL

Amity School of Engineering & Technology

3 of 27

Processing

Processing is generally, "the collection and manipulation of items of data to produce meaningful information" .

Data processing therefore refers to the process of transforming raw data into meaningful output i.e. information.

This is done either manually (humans) or automatically (computers or data center).

CS 503

3

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology

4 of 27

CS 503

4

<SELO: 1,9>

<Reference No.: 1>

IoT architecture and big data analytics

Amity School of Engineering & Technology

5 of 27

CS 503

5

<SELO: 1,9>

<Reference No.: 1>

IoT architecture and big data analytics

Amity School of Engineering & Technology

6 of 27

CS 503

6

<SELO: 1,9>

<Reference No.: 1>

IoT architecture and big data analytics

Amity School of Engineering & Technology

7 of 27

ETL

ETL– This technology extracts data from source systems ,transform (Restructuring, Reconciliation, content cleaning, content aggregation) it to satisfy business requirements ,and loads the results into target destination.

ETL (Extract Transform Load) tool move data from one place to another

by performing three functions.

  1. Extract
  2. Transform
  3. Load

CS 503

7

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology

8 of 27

ETL

Extract Transform Load

CS 503

8

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology

9 of 27

ETL

The goal of ETL is to prepare data for analysis or business intelligence.(BI)

ETL pulls data from sources , transforms it into an understandable format

and then transfer (load) it to another database or data warehouse.

The ETL process cleans , filters and transforms data and then applies

business rules before data populates the new source.

CS 503

9

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology

10 of 27

Data extraction

Data extraction is the process of obtaining data from a database or SaaS platform so that it can be replicated to a destination — such as a data warehouse — designed to support online analytical processing (OLAP).

Data extraction is the first step in a data ingestion process called ETL — extract, transform, and load.

The goal of ETL is to prepare data for analysis or business intelligence (BI).

CS 503

10

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology

11 of 27

How ETL works

Step 1: Extraction

Before data can be moved to a new destination, it must first be extracted from its source. In this first step of the ETL process, structured and unstructured data is imported and consolidated into a single repository. Raw data can be extracted from a wide range of sources, including:

CRM , ERP , SAP , WEBSITE TRAFFIC , MOBILE DEVICES and APPSs

CS 503

11

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology

12 of 27

How ETL works

Step 2: Transformation

The process of data transformation is comprised of several sub-processes:

Cleansing — inconsistencies and missing values in the data are resolved. 

Standardization — formatting rule are applied to the data set.

Deduplication — redundant data is excluded or discarded.

Verification — unusable data is removed and anomalies are flagged.

Sorting — data is organized according to type.

CS 503

12

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology

13 of 27

How ETL works

Step 3: Loading

The final step in the ETL process is to load the newly transformed data into a new destination. Data can be loaded all at once (full load) or at scheduled intervals (incremental load).

CS 503

13

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology

14 of 27

Purpose

Extract Transform Load

  • Better application speed and quality — that's the brilliance of AI-powered integration.

  • Automate your work. Reclaim your time.

CS 503

14

<SELO: 1,9>

<Reference No.: 3>

Amity School of Engineering & Technology

15 of 27

Amity School of Engineering & Technology

16 of 27

Data Integration

Data integration involves combining data from several disparate sources, which are stored using various technologies and provide a unified view of the data.

Ex. Customer data integration involves the extraction of information about each individual customer from disparate business systems such as sales, accounts, and marketing, which is then combined into a single view of the customer to be used for customer service, reporting and analysis.

Data integration areas : Data warehousing , Data migration , Master data management.

CS 503

16

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology

17 of 27

17

CS 503

Data integration

Amity School of Engineering & Technology

18 of 27

Data integration is the combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information.

Disparate data are unable to integrated with one another in their current state. Hadoop brings different data types together in one place.

Some are few common data integration approaches.

  • Data Consolidation
  • Data Propagation
  • Data Virtualization
  • Data Federation
  • Data Warehousing

CS 503

18

<SELO: 1,9>

<Reference No.: 2>

Data Integration

Amity School of Engineering & Technology

19 of 27

Data Consolidation

  • Data Consolidation: Data consolidation physically brings data together from several separate systems, creating a version of the consolidated data in one data store. Often the goal of data consolidation is to reduce the number of data storage locations. Extract, transform, and load (ETL) technology supports data consolidation.

ETL pulls data from sources, transforms it into an understandable format, and then transfers it to another database or data warehouse. The ETL process cleans, filters, and transforms data, and then applies business rules before data populates the new source.

ETL –this technology extracts data from source systems ,transform(Restructuring, Reconciliation, content cleaning, content aggregation) it to satisfy business requirements ,and loads the results into target destination.

CS 503

19

<SELO: 1,9>

<Reference No.: 2>

Amity School of Engineering & Technology

20 of 27

Data Propagation

  • Data Propagation : Data propagation is the use of applications to copy data from one location to another. It is event-driven and can be done synchronously or asynchronously. Most synchronous data propagation supports a two-way data exchange between the source and the target. Enterprise application integration (EAI) and enterprise data replication (EDR) technologies support data propagation.

EAI integrates application systems for the exchange of messages and transactions. It is often used for real-time business transaction processing. Integration platform as a service (iPaaS) is a modern approach to EAI integration.

EDR typically transfers large amounts of data between databases, instead of applications. base triggers and logs are used to capture and disseminate data changes between the source and remote databases.

CS 503

20

<SELO: 1,9>

<Reference No.: 2>

Amity School of Engineering & Technology

21 of 27

Data Virtualization

  • Data Virtualization : Virtualization uses an interface to provide a near real-time, unified view of data from disparate sources with different data models. Data can be viewed in one location, but is not stored in that single location. Data virtualization retrieves and interprets data, but does not require uniform formatting or a single point of access.

CS 503

21

<SELO: 1,9>

<Reference No.: 2>

Amity School of Engineering & Technology

22 of 27

Data Federation

  • Data Federation : Federation is technically a form of data virtualization. It uses a virtual database and creates a common data model for heterogeneous data from different systems. Data is brought together and viewable from a single point of access.
  • Enterprise information integration (EII) is a technology that supports data federation. It uses data abstraction to provide a unified view of data from different sources. That data can then be presented or analyzed in new ways through applications.

CS 503

22

<SELO: 1,9>

<Reference No.: 2>

Amity School of Engineering & Technology

23 of 27

Data Warehousing

  • Data Warehousing : Warehousing is included in this list because it is a commonly used term. However, its meaning is more generic than the other methods previously mentioned. Data warehouses are storage repositories for data. However, when the term “data warehousing,” is used, it implies the cleansing, reformatting, and storage of data, which is basically data integration.

Data mining:-

It refers to the extraction of useful information from a bulk of data or data warehouse.

CS 503

23

<SELO: 1,9>

<Reference No.: 2>

Amity School of Engineering & Technology

24 of 27

Exercise

  1. What is data processing?
  2. What is data propagation?
  3. What is data discovery ?
  4. Define data virtualization and federation.
  5. Why data integration is so important.

CS 503

24

Amity School of Engineering & Technology

25 of 27

Conclusion

  • IOTs bring a lot to Big Data.
  • The more important IoT are in our daily life and that of our city,
  • Store the Data
  • Analyses(processing) the Data
  • Analytics the Data
  • Expand the business 

Amity School of Engineering & Technology

26 of 27

References

  • The Second Machine Age: Work, Progress and Prosperity in a Time of Brilliant Technologies by Erik Brynjolfsson and Andrew McAfee. ISBN-10: 0393239357

• Getting started with Internet of Things, by Cuno Pfister, Shroff; First

edition (17 May 2011), ISBN-10:9350234130

• Big Data and The Internet of Things, by Robert Stackowiak,

Art licht, Springer Nature; 1st ed. Edition (12 May 2015),

ISBN-10: 1484209877

  • https://www.tutor2u.net/business/reference/types-of-integration
  • https://www.globalscape.com/blog/5-types-data-integration
  • Data Analytics with python, NPTEL online course by Dr. A. Ramesh , Department of Management, IIT Roorkee.

Amity School of Engineering & Technology

27 of 27

CS 503

27

<SELO: 1,9>

<Reference No.: 1>

Amity School of Engineering & Technology