1 of 66

Big Data Analytics (23MB54)

Unit-I-Introduction to Data Mining and Big Data

Name: R Pavitra

Designation: Asstiant.Professor

Department: IT

College: LBRCE

1.1

2 of 66

Course Outcomes:�

CO1: Apply data mining algorithms for classification and clustering.

CO2: Understand Big data framework.

CO3: Understanding the map reduces the way of solving analytic problems.

CO4: Illustrate the problem and its solutions using Data Analytics .

CO5: Analyze big data applications .

*

2

3 of 66

*

3

4 of 66

AGENDA:

Introduction to Data mining and Big Data

    • Introduction to Data mining
    • KDD process
    • Data Mining Techniques
    • Introduction to Big Data
    • Explosion in Quantity of Data
    • Big Data Characteristics
    • Types of Data
    • Common Big Data Customer Scenarios
    • BIG DATA vs. HADOOP
    • A Holistic View of a Big Data System
    • Limitations of Existing Data Analytics Architecture

*

4

5 of 66

WHAT IS DATA?

  • Data: Data refers to anything that can be recorded and measured.
  • Data can be raw numbers(like stock prices on successive days, mass of different planets).
  • Sounds(the words someone speaks into their cellphones)
  • Pictures
  • Words(the text of news paper article)
  • Extracting the meaningful information from raw data.

*

5

6 of 66

WHAT IS DATA MINING?

  • Data mining is the process of automatically discovering useful information in  large data repositories. 
  • To find novel and useful patterns that might otherwise  remain unknown. 
  • Provide capabilities to predict the outcome of a future observation, 
  • Example
          • predicting whether a newly arrived customer will  spend more than $100 at a department store. 

7 of 66

WHAT IS DATA MINING (Cont’d)

  • Data mining is the process of extracting knowledge or insights from large amounts of data using various statistical and computational techniques.
  • The data can be structured, semi-structured or unstructured, and can be stored in various forms such as databases, data warehouses, and data lakes.
  • The primary goal of data mining is to discover hidden patterns and relationships in the data that can be used to make informed decisions or predictions.

or 

      • Finding particular Web pages via a query to an Internet search engine 
  • To enhance information retrieval systems.

8 of 66

WHAT IS DATA MINING (Cont’d)

Data Mining and Knowledge

  • Data mining is an integral part of Knowledge Discovery in Databases (KDD),
      • process of converting raw data into useful information
      • This process consists of a series of transformation steps

9 of 66

WHAT IS DATA MINING?

  • Data preprocessing is an important step in the data mining process. It refers to the cleaning, transforming, and integrating of data in order to make it ready for analysis.
  • The goal of data preprocessing is to improve the quality of the data and to make it more suitable for the specific data mining task.

Steps of Data Preprocessing

  • Data preprocessing is an important step in the data mining process that involves cleaning and transforming raw data to make it suitable for analysis. Some common steps in data preprocessing include:
  • Data Cleaning: This involves identifying and correcting errors or inconsistencies in the data, such as missing values, outliers, and duplicates. Various techniques can be used for data cleaning, such as imputation, removal, and transformation.

10 of 66

  • Data Integration: This involves combining data from multiple sources to create a unified dataset. Data integration can be challenging as it requires handling data with different formats, structures, and semantics.
  • Data Transformation: This involves converting the data into a suitable format for analysis. Common techniques used in data transformation include normalization, standardization, and discretization.
  • Data Reduction: This involves reducing the size of the dataset while preserving the important information. Data reduction can be achieved through techniques such as feature selection and feature extraction.
  • Data Discretization: This involves dividing continuous data into discrete categories or intervals. Discretization is often used in data mining and machine learning algorithms that require categorical data.

*

10

11 of 66

  • Data Normalization: This involves scaling the data to a common range, such as between 0 and 1 or -1 and 1. Normalization is often used to handle data with different units and scales. Common normalization techniques include min-max normalization, z-score normalization, and decimal scaling.

Preprocessing in Data Mining

  • Data preprocessing is a data mining technique which is used to transform the raw data in a useful and efficient format. 

*

11

12 of 66

WHAT IS DATA MINING?

  • Post Processing:
      • only valid and useful results are incorporated into the decision support system.

      • Visualization
          • allows analysts to explore the data and the data mining results from a variety of viewpoints.

      • Statistical measures or hypothesis testing methods can also be applied
          • to eliminate spurious (false or fake) data mining results.

13 of 66

Motivating Challenges:

  • Challenges that motivated the development of data mining.
      • Scalability
      • High Dimensionality
      • Heterogeneous and Complex Data
      • Data Ownership and Distribution
      • Non-traditional Analysis

14 of 66

Motivating Challenges:

  • Scalability
  • Size of datasets are in the order of GB, TB or PB.
  • special search strategies
  • implementation of novel data structures ( for efficient access)
  • out-of-core algorithms - for large datasets
  • sampling or developing parallel and distributed algorithms.

15 of 66

Motivating Challenges:

  • High Dimensionality
  • common today - data sets with hundreds or thousands of attributes
  • Example
      • Bio-Informatics - microarray technology has produced gene expression data involving thousands of features.
      • Data sets with temporal or spatial components also tend to have high dimensionality.
          • a data set that contains measurements of temperature at various locations.

16 of 66

Motivating Challenges:

Heterogeneous and Complex Data

  • Traditional data analysis methods - data sets - attributes of the same type - either continuous or categorical.
  • Examples of such non-traditional types of data include
      • collections of Web pages containing semi-structured text and hyperlinks;
      • DNA data with sequential and three-dimensional structure and
      • climate data with time series measurements
  • DM should maintain relationships in the data, such as
      • temporal and spatial autocorrelation,
      • graph connectivity, and
      • parent-child relationships between the elements in semi-structured text and XML documents.

17 of 66

Motivating Challenges:

  • Data Ownership and Distribution
      • Data is not stored in one location or owned by one organization
      • geographically distributed among resources belonging to multiple entities.
      • This requires the development of distributed data mining techniques.
      • key challenges in distributed data mining algorithms
          • (1) reduction in the amount of communication needed
          • (2) effective consolidation of data mining results obtained from multiple sources, and
          • (3) Data security issues.

18 of 66

Data Mining Techniques:

1. Association

Association analysis is the finding of association rules showing attribute-value conditions that occur frequently together in a given set of data. Association analysis is widely used for a market basket or transaction data analysis. 

2. Classification

Classification is the processing of finding a set of models (or functions) that describe and distinguish data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. 

Data Mining has a different type of classifier: 

  • Decision Tree SVM(Support Vector Machine) Generalized Linear Models
  • Bayesian classification, Classification by Back propagation
  • K-NN Classifier , Rule-Based Classification, Frequent-Pattern Based Classification
  • Rough set theory, Fuzzy Logic

*

18

19 of 66

3. Prediction 

Data Prediction is a two-step process, similar to that of data classification. Although, for prediction, we do not utilize the phrasing of “Class label attribute” because the attribute for which values are being predicted is consistently valued(ordered) instead of categorical (discrete-esteemed and unordered). 

4. Clustering:

It is a technique used to group similar data instances together based on their intrinsic characteristics or similarities. It aims to discover natural patterns or structures in the data without any predefined classes or labels.

5. Regression 

It is employed to predict numeric or continuous values based on the relationship between input variables and a target variable. It aims to find a mathematical function or model that best fits the data to make accurate predictions.

*

19

20 of 66

6.Anomaly Detection

  • Anomaly detection, sometimes called outlier analysis, aims to identify rare or unusual data instances that deviate significantly from the expected patterns. It is useful in detecting fraudulent transactions, network intrusions, manufacturing defects, or any other abnormal behavior.

*

20

21 of 66

Data Mining Techniques:

  • Data mining tasks are generally divided into two major categories:
    • Predictive tasks. - Use some variables to predict unknown or future values of other variables
      • Task Objective: predict the value of a particular attribute based on the values of other attributes.
      • Target/Dependent Variable: attribute to be predicted
      • Explanatory or independent variables: attributes used for making the prediction
    • Descriptive tasks. - Find human-interpretable patterns that describe the data.
      • Task objective: derive patterns (correlations, trends, clusters, trajectories, and anomalies) that summarize the underlying relationships in data.
      • Descriptive data mining tasks are often exploratory in nature and frequently require post processing techniques to validate and explain the results.

22 of 66

Data Mining Techniques

  • Trajectory data mining enables to predict the moving location details of humans, vehicles, animals and so on.
  • Anomaly detection is a step in data mining that identifies data points, events, and/or observations that deviate from a dataset’s normal behavior.
  • Correlation is a statistical term describing the degree to which two variables move in coordination with one another. 
  • Trends: a general direction in which something is developing or changing.(meaning)
  • Clusters : Clustering is the task of data points into a number of groups such that data points in the same groups are more similar to other data points in the same group than those in other groups

23 of 66

Data Mining Techniques

24 of 66

Data Mining Techniques

  • Predictive modelling refers to the task of building a model for the target variable as a function of the explanatory variables.
  • 2 types of predictive modelling tasks:
    • Classification: Used for discrete target variables
    • Regression: used for continuous target variables.

25 of 66

Data Mining Techniques

  • Predictive modeling refers to the task of building a model for the target variable as a function of the explanatory variables.
  • 2 types of predictive modeling tasks:
    • Classification: Used for discrete target variables
    • Regression: used for continuous target variables.
    • Example:
      • Classification Task : predicting whether a Web user will make a purchase at an online bookstore is a classification task because the target variable is binary-valued.
      • Regression Task: forecasting the future price of a stock is a regression task because price is a continuous-valued attribute.
    • Goal of both tasks: learn a model that minimizes the error between the predicted and true values of the target variable.
    • Predictive modeling can be used to:
      • identify customers that will respond to a marketing campaign,
      • predict disturbances in the Earth’s ecosystem, or
      • judge whether a patient has a particular disease based on the results of medical tests.

26 of 66

Data Mining Techniques

  • Example: (Predicting the Type of a Flower): the task of predicting a species of flower based on the characteristics of the flower.
  • Iris species: Setosa, Versicolour, or Virginica.
  • Requirement: need a data set containing the characteristics of various flowers of these three species.
  • 4 other attributes(dataset): sepal width, sepal length, petal length, and petal width.
  • Petal width is broken into the categories low, medium, and high, which correspond to the intervals [0, 0.75), [0.75, 1.75), [1.75, ∞), respectively.
  • Also, petal length is broken into categories low, medium, and high, which correspond to the intervals [0, 2.5), [2.5, 5), [5, ∞), respectively.
  • Based on these categories of petal width and length, the following rules can be derived:
    • Petal width low and petal length low implies Setosa.
    • Petal width medium and petal length medium implies Versicolour.
    • Petal width high and petal length high implies Virginica.

27 of 66

Data Mining Techniques

  • Example: (Predicting the Type of a Flower):

28 of 66

Data Mining Techniques

Example: (Predicting the Type of a Flower)

29 of 66

Data Mining Techniques

  • Association analysis
    • used to discover patterns that describe strongly associated features in the data.
    • Discovered patterns are represented in the form of implication rules or feature subsets.
    • Goal of association analysis:
      • To extract the most interesting patterns in an efficient manner.
    • Example
      • finding groups of genes that have related functionality,
      • identifying Web pages that are accessed together, or
      • understanding the relationships between different elements of Earth’s climate system.

30 of 66

Data Mining Techniques

  • Association analysis
  • Example (Market Basket Analysis).
    • AIM: find items that are frequently bought together by customers.
    • Association rule {Diapers} −→ {Milk},
      • suggests that customers who buy diapers also tend to buy milk.
  • This rule can be used to identify potential cross-selling opportunities among related items.

The transactions data collected at the checkout counters of a grocery store.

31 of 66

Data Mining Techniques

  • Cluster analysis
    • Cluster analysis seeks to find groups of closely related observations so that observations that belong to the same cluster are more similar than observations that belong to other clusters.
    • Clustering has been used to
      • group sets of related customers,
      • find areas of the ocean that have a significant impact on the Earth’s climate, and
      • compress data.

32 of 66

Data Mining Techniques

  • Cluster analysis
    • Example 1.3 (Document Clustering)
    • Each article is represented as a set of word-frequency pairs (w, c),
      • where w is a word and
      • c is the number of times the word appears in the article.
    • There are two natural clusters in the data set.
    • First cluster -> first four articles (news about the economy)
    • Second cluster-> last four articles ( news about health care)
    • A good clustering algorithm should be able to identify these two clusters based on the similarity between words that appear in the articles.

33 of 66

Data Mining Techniques

  • Anomaly Detection:
    • Task of identifying observations whose characteristics are significantly different from the rest of the data.
    • Such observations are known as anomalies or outliers.
    • A good anomaly detector must have a high detection rate and a low false alarm rate.
    • Applications of anomaly detection include
      • the detection of fraud,
      • network intrusions,
      • unusual patterns of disease, and
      • ecosystem disturbances

34 of 66

Data Mining Techniques

  • Anomaly Detection:
    • Example 1.4 (Credit Card Fraud Detection).
    • A credit card company records the transactions made by every credit card holder, along with personal information such as credit limit, age, annual income, and address.
    • Since the number of fraudulent cases is relatively small compared to the number of legitimate transactions, anomaly detection techniques can be applied to build a profile of legitimate transactions for the users.
    • When a new transaction arrives, it is compared against the profile of the user. If the characteristics of the transaction are very different from the previously created profile, then the transaction is flagged as potentially fraudulent.

35 of 66

What is Big Data?�

  • Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.
  • Big data is a collection of large datasets that cannot be processed using traditional computing techniques.
  • Examples Of Big Data
  • Following are some the examples of Big Data-
  • The New York Stock Exchange generates about one terabyte of new trade data per day

*

35

36 of 66

*

36

37 of 66

  • Social Media

The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day. This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.

  • A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.

TWITTER

38 of 66

Tabular Representation of various Memory Sizes

Name

Equal To

Size(In Bytes)

Bit

1 bit

1/8

Nibble

4 bits

1/2 (rare)

Byte

8 bits

1

Kilobyte

1024 bytes

1024

Megabyte

1, 024kilobytes

1, 048, 576

Gigabyt

1, 024 megabytes

1, 073, 741, 824

Terrabyte

1, 024 gigabytes

1, 099, 511, 627, 776

Petabyte

1, 024 terrabytes

1, 125, 899, 906, 842, 624

Exabyte

1, 024 petabytes

1, 152, 921, 504, 606, 846, 976

Zettabyte

1, 024 exabytes

1, 180, 591, 620, 717, 411, 303, 424

Yottabyte

1, 024 zettabytes

1, 208, 925, 819, 614, 629, 174, 706, 176

39 of 66

Evolution of Big Data by Technology

40 of 66

Big Data Characteristics

  • Big Data contains a large amount of data that is not being processed by traditional data storage or the processing unit. It is used by many multinational companies to process the data and business of many organizations. The data flow would exceed 150 exabytes per day before replication.

There are five v's of Big Data that explains the characteristics.

5 V's of Big Data

  • Volume
  • Veracity
  • Variety
  • Value
  • Velocity

*

40

41 of 66

*

41

42 of 66

Volume:

  • The name Big Data itself is related to an enormous size. Big Data is a vast 'volumes' of data generated from many sources daily, such as business processes, machines, social media platforms, networks, human interactions, and many more.
  • Facebook can generate approximately a billion messages, 4.5 billion times that the "Like" button is recorded, and more than 350 million new posts are uploaded each day. Big data technologies can handle large amounts of data.

*

42

43 of 66

Variety:

  • Big Data can be structured, unstructured, and semi-structured that are being collected from different sources. Data will only be collected from databases and sheets in the past, But these days the data will comes in array forms, that are PDFs, Emails, audios, SM posts, photos, videos, etc.

*

43

44 of 66

The data is categorized as below:

  • Structured data: In Structured schema, along with all the required columns. It is in a tabular form. Structured Data is stored in the relational database management system.
  • Semi-structured: In Semi-structured, the schema is not appropriately defined, e.g., JSON, XML, CSV, TSV, and email. OLTP (Online Transaction Processing) systems are built to work with semi-structured data. It is stored in relations, i.e., tables.
  • Unstructured Data: All the unstructured files, log files, audio files, and image files are included in the unstructured data. Some organizations have much data available, but they did not know how to derive the value of data since the data is raw.
  • Quasi-structured Data:The data format contains textual data with inconsistent data formats that are formatted with effort and time with some tools.

*

44

45 of 66

Veracity:

  • Veracity means how much the data is reliable. It has many ways to filter or translate the data. Veracity is the process of being able to handle and manage data efficiently. Big Data is also essential in business development.

For example, Facebook posts with hashtags.

Value:

  • Value is an essential characteristic of big data. It is not the data that we process or store. It is valuable and reliable data that we store, process, and also analyze.

Velocity:

  • Velocity plays an important role compared to others. Velocity creates the speed by which the data is created in real-time. It contains the linking of incoming data sets speeds, rate of change, and activity bursts. The primary aspect of Big Data is to provide demanding data rapidly.
  • Big data velocity deals with the speed at the data flows from sources like application logs, business processes, networks, and social media sites, sensors, mobile devices, etc.

*

45

46 of 66

47 of 66

TYPES OF BIG DATA

48 of 66

  1. Structured
  2. Unstructured
  3. Semi-structured

  • Structured
    • Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data.
    • Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it.
    • However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the range of multiple zettabytes.

      • Do you know? 1021 bytes equal to 1 zettabyte or one billion terabytes forms a zettabyte. Looking at these figures one can easily understand why the name Big Data is given and imagine the challenges involved in its storage and processing.

49 of 66

  • Do you know? Data stored in a relational database management system is one example of a 'structured' data.

  • Examples Of Structured Data

An 'Employee' table in a database is an example of Structured Data

Employee_I

D

Employee_

Name

Gender

Department

Salary_In_la

cs

2365

Rajesh Kulkarni

Male

Finance

650000

3398

Pratibha Joshi

Female

Admin

650000

7465

Shushil Roy

Male

Admin

500000

7500

Shubhojit Das

Male

Finance

500000

7699

Priya Sane

Female

Finance

550000

50 of 66

  • Unstructured
    • Any data with unknown form or the structure is classified as unstructured data.
    • In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it.
    • A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc.
    • Now day organizations have wealth of data available with them but unfortunately, they don't know how to derive value out of it since this data is in its raw form or unstructured format.
  • Examples Of Un-structured Data

The output returned by 'Google Search'

51 of 66

  • Semi-structured
    • Semi-structured data can contain both the forms of data.
    • We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS.
    • Example of semi-structured data is a data represented in an XML file.

  • Examples Of Semi-structured Data
  • Personal data stored in an XML file-
  • <rec><name>Prashant Rao</name><sex>Male</sex><age>35</age></rec>
  • <rec><name>Seema R.</name><sex>Female</sex><age>41</age></rec>

*

51

52 of 66

Challenges with Big Data

53 of 66

  • The Challenges:
    • For most organizations, Big Data analysis is a challenge. Consider the sheer volume of data and the different formats of the data(both structured and unstructured data) that is collected across the entire organization and the many different ways different types of data can be combined, contrasted and analyzed to find patterns and other useful business information.
    • The first challenge is in breaking down data silos to access all data an organization stores in different places and often in different systems.
    • A second challenge is in creating platforms that can pull in unstructured data as easily as structured data.
    • This massive volume of data is typically so large that it's difficult to process using traditional database and software methods.

54 of 66

  • Application of Big Data

55 of 66

  • Here is the list of top Big Data applications in today’s world:

56 of 66

  • Let’s discuss the applications of Big Data in detail.
  1. Big Data in Retail
    • The retail industry is the one that faces the most fierce competition of all. Retailers constantly hunt for ways that will give them a competitive edge over others. Customers are the real king sounds legit for the retail industry in particular.
  2. Check how Big Data act as a weapon for retailers to connect with their customers
    • Big Data in Retail.
      • Through advanced analysis of their customer’s data, retailers are now able to understand them from every angle possible. They gather this data from various sources such as social media, loyalty programs, etc.

57 of 66

2. Big Data in Healthcare

  • Big Data and healthcare are an ideal match. It complements the healthcare industry better than anything ever will. The amount of data the healthcare industry has to deal with is unimaginable.
  • Big Data and analytics have given them the license to build more personalized medications. Data analysts are harnessing this data to develop more and more effective treatments. Identifying unusual patterns of certain medicines to discover ways for developing more economical solutions is a common practice these days.

58 of 66

3. Big Data in Education

  • When you ask people about the use of the data that an educational institute gathers, the majority of the people will have the same answer that the institute or the student might need it for future references.
  • Some of the top universities are using Big Data as a tool to renovate their academic curriculum. Additionally, universities can even track the dropout rates of the students and are taking the required measures to reduce this rate as much as possible.

59 of 66

4. Big Data in E-commerce

  • Some of the biggest E-commerce companies of the world like Amazon, Flipkart, Alibaba, and many more are now bound to Big Data and analytics is itself an evidence of the level of popularity Big Data has gained in recent times.
  • Big Data is now as important as anyone else in these organizations. Amazon, the biggest E-commerce firm in the world and one of the pioneers of Big Data and analytics, has Big Data as the backbone of its system. Flipkart, the biggest E-commerce firm in India, has one of the most robust data platforms in the country.
  • Big Data’s recommendation engine is one of the most amazing applications the Big Data world has ever witnessed. It furnishes the companies with a 360-degree view of its customers.
  • Companies then suggest customers accordingly. Customers now experience more personalized services than they have ever had. Big Data has completely redefined people’s online shopping experiences.

60 of 66

5. Big Data in Media and Entertainment

  • Media and Entertainment industry is all about art and employing Big Data in it is a sheer piece of art. Art and science are often considered to be the two completely contrasting domains but when employed together, they do make a deadly duo and Big Data’s endeavors in the media industry are a perfect example of it.
  • Viewers these days need content according to their choices only. Content that is relatively new to what they saw the previous time. Earlier the companies broadcasted the Ads randomly without any kind of analysis.
  • But after the advent of Big Data analytics in the industry, companies now are aware of the kind of Ads that attracts a customer and the most appropriate time to broadcast it for seeking maximum attention.

61 of 66

6. Big Data in Finance

  • The functioning of any financial organization depends heavily on its data and to safeguard that data is one of the toughest challenges any financial firm faces. Data has been the second most important commodity for them after money.
  • Digital banking and payments are two of the most trending buzzwords around and Big data has been at the heart of it. Big Data is bossing the key areas of financial firms such as fraud detection, risk analysis, algorithmic trading, and customer contentment.
  • This has brought much-needed fluency in their systems. They are now empowered to focus more on providing better services to their customers rather than focussing on security issues. Big Data has now enhanced the financial system with answers to its hardest of the challenges.

62 of 66

BIG DATA vs. HADOOP

Big Data

Apache Hadoop

Big Data is group of technologies. It is a collection of huge data which is multiplying continuously.

Apache Hadoop is a open source java based framework which involves some of the big data principles.

It is a collection of assets which is quite complex, complicated and ambiguous.

It achieves a set of goals and objectives for dealing with the collection of assets.

It is a complicated problem i.e. huge amount of raw data.

It is a solution being processing machine of those data.

Big Data is harder to access.

It allows the data to be accessed and process faster.

It is hard to store the huge amount of data as it consists all form of data. i.e. structured, unstructured and semi-structured.

It implements Hadoop Distributed File System (HDFS) which allows the storage of different variety of data.

Big data has a wide range of applications in fields such as Telecommunication, the banking sector, Healthcare etc.

Hadoop is used for cluster resource management, parallel processing, and for data storage.

*

62

63 of 66

*

63

64 of 66

What is Analytics Architecture?

  • Analytics architecture refers to the overall design and structure of an analytical system or environment, which includes the hardware, software, data, and processes used to collect, store, analyze, and visualize data.

Key components of Analytics Architecture-

  • Data collection: This refers to the process of gathering data from various sources, such as sensors, devices, social media, websites, and more.
  • Transformation: When the data is already collected then it should be cleaned and transformed before storing.
  • Data storage: This refers to the systems and technologies used to store and manage data, such as databases, data lakes, and data warehouses.
  • Analytics: This refers to the tools and techniques used to analyze and interpret data, such as statistical analysis, machine learning, and visualization.

*

64

65 of 66

Limitations of Analytics Architecture :

There are several limitations to consider when designing and implementing an analytical architecture:

  • Complexity: Analytical architectures can be complex and require a high level of technical expertise to design and maintain.
  • Data quality: The quality of the data used in the analytical system can significantly impact the accuracy and usefulness of the results.
  • Data security: Ensuring the security and privacy of the data used in the analytical system is critical, especially when working with sensitive or personal information.
  • Scalability: As the volume and complexity of the data increase, the analytical system may need to be scaled to handle the increased load. This can be a challenging and costly task.

*

65

66 of 66

  • Integration: Integrating the various components of the analytical system can be a challenge, especially when working with a diverse set of data sources and technologies.
  • Cost: Building and maintaining an analytical system can be expensive, due to the cost of hardware, software, and personnel.
  • Data governance: Ensuring that the data used in the analytical system is properly governed and compliant with relevant laws and regulations can be a complex and time-consuming task.
  • Performance: The performance of the analytical system can be impacted by factors such as the volume and complexity of the data, the quality of the hardware and software used, and the efficiency of the algorithms and processes employed.

*

66