BIG DATA ANALYTICS
by
Dr. M. A. Srinuvasu
UNDERSTANDING BIG DATA
Defining Data:
Unorganized and unprocessed facts, raw numbers, figures, images, words, sounds, derived from observations or measurements.
Usually data is static in nature, a set of discrete, objective facts about events.
CHARACTERISTICS OF DATA
Data
Composition
Condition
Context
Composition of data deals with the structure of data, that is , the source , the granularity, the types and the nature of data as to whether it is static or real-time streaming.
Condition of data deals with the state of data, that is, “ Can one use this data as is for analysis? Or” Does it require cleaning for further enhancement and enrichment?”
Context of data deals with “where has this data been generated?” “ Why was this data generated?” “ how sensitive is this data?” what are the events associated with this data? And so on
TYPES OF DATA
CLASSIFICATION OF DIGITAL DATA
Digital data
Structured Data
Semi-Structured Data
Unstructured Data
STRUCTURED DATA
This is the data which is in an organized form ( i.e. rows and columns) and can be easily used by a computer program. Relationships exist between entities of data, such as classes and their objects. Data stored in databases is an example of structured data
SOURCES OF STRUCTURED DATA
Structured Data
Data bases such as Oracle, DB2, Teradata, MySQL, PostgreSQL, etc
SpreedSheets
OLTP system
EASE OF WORKING WITH STRUCTURED DATA
Ease with structured data
SEMI STRUCTURED DATA
Semi-Structured data
Inconsistent structure
Self-describing (label/value pairs)
Often schema information is blended with data values
Data objects may have different attributes not known beforehand
SOURCES OF SEMI-STRUCTURED DATAT
Semi-structured data
XML(eXtensible Markup Language)
Other Markup Language
JSON (Java Script Object Notation)
UNSTRUCTURED DATA
Unstructured data does not confirm to any pre-defined model. In fact, to explain things a little more, let us take a closer look at the various kinds of text available and the possible structure associated with it.
SOURCES OF UNSTRUCTURED DATA
Unstructured data
Web pages
Images
Free-from text
Audios
Videos
Body of Email
Text messages
Chats
Social media data
Word document
DEAL WITH UNSTRUCTURED DATA
Dealing with unstructured data
Data mining
NLP
Text analysis
Noisy Text Analysis
HOW DATA BEING GENERATED
DIFFERENT SOURCE OF DATA GENERATION
RATE AT WHICH DATA IS BEING GENERATED
EVOLUTION OF BIG DATA
DEFINITION OF BIG DATA
CHALLENGES OF BIG DATA
SOURCES OF BIG DATA
DIFFERENT V’S, VOLUME, VARIETY, VELOCITY, VERACITY, VALUE
In recent years,
Big Data was defined by the “3Vs” but now there is “5Vs” of Big Data which are also termed as the characteristics of Big Data as follows:
VOLUME
VELOCITY
VARIETY
VERACITY
VALUE
HOW SINGLE PERSON IS CONTRIBUTING TOWARDS BIG DATA
SIGNIFICANCE FOR BIG DATA
REASON FOR BIG DATA
UNDERSTANDING RDBMS AND WHY IT IS FAILING TO STORE BIG DATA
FUTURE OF BIG DATA
5 V’s
BIG DATA USE CASES FOR MAJOR IT INDUSTRIES