1 of 36

BIG DATA ANALYTICS

by

Dr. M. A. Srinuvasu

2 of 36

UNDERSTANDING BIG DATA

Defining Data:

Unorganized and unprocessed facts, raw numbers, figures, images, words, sounds, derived from observations or measurements.

Usually data is static in nature, a set of discrete, objective facts about events.

3 of 36

CHARACTERISTICS OF DATA

Data

Composition

Condition

Context

4 of 36

Composition of data deals with the structure of data, that is , the source , the granularity, the types and the nature of data as to whether it is static or real-time streaming.

Condition of data deals with the state of data, that is, “ Can one use this data as is for analysis? Or” Does it require cleaning for further enhancement and enrichment?”

Context of data deals with “where has this data been generated?” “ Why was this data generated?” “ how sensitive is this data?” what are the events associated with this data? And so on

5 of 36

TYPES OF DATA

6 of 36

CLASSIFICATION OF DIGITAL DATA

Digital data

Structured Data

Semi-Structured Data

Unstructured Data

7 of 36

STRUCTURED DATA

This is the data which is in an organized form ( i.e. rows and columns) and can be easily used by a computer program. Relationships exist between entities of data, such as classes and their objects. Data stored in databases is an example of structured data

8 of 36

SOURCES OF STRUCTURED DATA

Structured Data

Data bases such as Oracle, DB2, Teradata, MySQL, PostgreSQL, etc

SpreedSheets

OLTP system

9 of 36

EASE OF WORKING WITH STRUCTURED DATA

Ease with structured data

    • Input/ update/delete
    • Security
    • Indexing/searching
    • Scalability
    • Transaction Processing

10 of 36

SEMI STRUCTURED DATA

Semi-Structured data

Inconsistent structure

Self-describing (label/value pairs)

Often schema information is blended with data values

Data objects may have different attributes not known beforehand

11 of 36

SOURCES OF SEMI-STRUCTURED DATAT

Semi-structured data

XML(eXtensible Markup Language)

Other Markup Language

JSON (Java Script Object Notation)

12 of 36

UNSTRUCTURED DATA

Unstructured data does not confirm to any pre-defined model. In fact, to explain things a little more, let us take a closer look at the various kinds of text available and the possible structure associated with it.

13 of 36

SOURCES OF UNSTRUCTURED DATA

Unstructured data

Web pages

Images

Free-from text

Audios

Videos

Body of Email

Text messages

Chats

Social media data

Word document

14 of 36

DEAL WITH UNSTRUCTURED DATA

Dealing with unstructured data

Data mining

NLP

Text analysis

Noisy Text Analysis

15 of 36

HOW DATA BEING GENERATED

16 of 36

DIFFERENT SOURCE OF DATA GENERATION

17 of 36

RATE AT WHICH DATA IS BEING GENERATED

18 of 36

EVOLUTION OF BIG DATA

19 of 36

DEFINITION OF BIG DATA

20 of 36

21 of 36

CHALLENGES OF BIG DATA

22 of 36

23 of 36

24 of 36

SOURCES OF BIG DATA

25 of 36

DIFFERENT V’S, VOLUME, VARIETY, VELOCITY, VERACITY, VALUE

In recent years,

Big Data was defined by the “3Vs” but now there is “5Vs” of Big Data which are also termed as the characteristics of Big Data as follows:

26 of 36

VOLUME

  • The name ‘Big Data’ itself is related to a size which is enormous.
  • Volume is a huge amount of data.
  • To determine the value of data, size of data plays a very crucial role. If the volume of data is very large then it is actually considered as a ‘Big Data’. This means whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data.
  • Hence while dealing with Big Data it is necessary to consider a characteristic ‘Volume’.
  • Example: In the year 2016, the estimated global mobile traffic was 6.2 Exabytes(6.2 billion GB) per month. Also, by the year 2020 we will have almost 40000 ExaBytes of data.

27 of 36

VELOCITY

  • Velocity refers to the high speed of accumulation of data.
  • In Big Data velocity data flows in from sources like machines, networks, social media, mobile phones etc.
  • There is a massive and continuous flow of data. This determines the potential of data that how fast the data is generated and processed to meet the demands.
  • Sampling data can help in dealing with the issue like ‘velocity’.
  • Example: There are more than 3.5 billion searches per day are made on Google. Also, FaceBook users are increasing by 22%(Approx.) year by year.

28 of 36

VARIETY

  • It refers to nature of data that is structured, semi-structured and unstructured data.
  • It also refers to heterogeneous sources.
  • Variety is basically the arrival of data from new sources that are both inside and outside of an enterprise. It can be structured, semi-structured and unstructured.
    • Structured data: This data is basically an organized data. It generally refers to data that has defined the length and format of data.
    • Semi- Structured data: This data is basically a semi-organised data. It is generally a form of data that do not conform to the formal structure of data. Log files are the examples of this type of data.
    • Unstructured data: This data basically refers to unorganized data. It generally refers to data that doesn’t fit neatly into the traditional row and column structure of the relational database. Texts, pictures, videos etc. are the examples of unstructured data which can’t be stored in the form of rows and columns.

29 of 36

VERACITY

  • It refers to inconsistencies and uncertainty in data, that is data which is available can sometimes get messy and quality and accuracy are difficult to control.
  • Big Data is also variable because of the multitude of data dimensions resulting from multiple disparate data types and sources.
  • Example: Data in bulk could create confusion whereas less amount of data could convey half or Incomplete Information.

30 of 36

VALUE

  • After having the 4 V’s into account there comes one more V which stands for Value!. The bulk of Data having no Value is of no good to the company, unless you turn it into something useful.
  • Data in itself is of no use or importance but it needs to be converted into something valuable to extract Information.

31 of 36

HOW SINGLE PERSON IS CONTRIBUTING TOWARDS BIG DATA

32 of 36

SIGNIFICANCE FOR BIG DATA

33 of 36

REASON FOR BIG DATA

34 of 36

UNDERSTANDING RDBMS AND WHY IT IS FAILING TO STORE BIG DATA

35 of 36

FUTURE OF BIG DATA

5 V’s

36 of 36

BIG DATA USE CASES FOR MAJOR IT INDUSTRIES