1 of 22

�

Data Mining_Anoop Chaturvedi

Swayam Prabha

Course Title

Multivariate Data Mining- Methods and Applications

Lecture 02

Data Mining Machine Learning and Artificial �Intelligence

Anoop Chaturvedi

Department of Statistics, University of Allahabad

Prayagraj (India)

Slides can be downloaded from https://sites.google.com/view/anoopchaturvedi/swayam-prabha

2 of 22

Data Mining

Knowledge mining from data
Process of discovering patterns, trends, and insights from large datasets using various techniques from statistics, machine learning, and database systems.

Objective

Extract useful information from large datasets.

Use it to make predictions or better decision-making.

Data Mining_Anoop Chaturvedi

3 of 22

Predictive data mining ⇒ Predicting future outcomes based on historical data.

Descriptive data mining ⇒ Summarizing and interpreting data to understand its underlying patterns and relationships.

Data Mining_Anoop Chaturvedi

Data Mining Activities

Descriptive data mining

Predictive data mining

4 of 22

Descriptive data mining: Discover the locations of unexpected structures or relationships, patterns, trends, clusters, and outliers in the massive data sets.

Predictive data mining: Build models and procedures for regression, classification, pattern recognition, or machine learning tasks. Provide the predictive accuracy of the models and procedures when applied to fresh data.

In machine-learning terminology, descriptive data mining is unsupervised learning, whereas predictive data mining is supervised learning.

Data Mining_Anoop Chaturvedi

5 of 22

Data mining methods ⇒ Related to methods developed in statistics and machine learning such as regression, classification, clustering, and visualization.

Enormous sizes of the data sets ⇒ Data mining tools focus on dimensionality-reduction techniques, variable selection, handling situations when high-dimensional data concentrate on lower-dimensional hyperplanes or on nonlinear surfaces or manifolds.

Data Mining_Anoop Chaturvedi

6 of 22

An important issues in data mining is scalability

Scalability ⇒ Algorithm’s ability to handle and process large data sets efficiently and effectively.

Algorithm should remain efficient and accurate as the number of variables and observations increases.

Can be expressed as a function of the data size, (linear, logarithmic, polynomial, or exponential).

Ideally should be a linear or sublinear function, i.e., time and memory grow proportionally or slower than the data size.

Data Mining_Anoop Chaturvedi

7 of 22

Potential Applications of Data mining ⇒ Used in the fields where a large amount of data is stored and processed.

Marketing:

Predict new purchasing trends. Identify “loyal” customers.
Find associations among customers demographic characteristics
Predict customers who respond to direct mailings, telemarketing calls, advertising campaigns, promotions etc.
Market basket analysis ⇒ Which group of products sell together.

Data Mining_Anoop Chaturvedi

8 of 22

Banking:

Predict customers likely to change their credit card affiliation.
Determine credit card spending by customer groups.
Evaluate loan policies using customer characteristics.
Predict behavioral use of automated teller machines (ATMs).
Identify hidden correlations between different financial indicators.

Data Mining_Anoop Chaturvedi

9 of 22

Financial Markets:

Identify stock trading rules from historical market data
Identify relationships between financial indicators.
Track changes in an investment portfolio and predict price turning points.
Analyze volatility patterns in high-frequency stock transactions using volume, price, and time of each transaction.

Data Mining_Anoop Chaturvedi

10 of 22

Insurance and Health Care:

Identify characteristics of buyers of new policies and predict which customers will buy new policies
Claims analysis ⇒ Which medical procedures are claimed together
Identify behavior patterns of risky customers for certain illnesses
Identify successful medical treatments and procedures by examining insurance claims and billing data.

Data Mining_Anoop Chaturvedi

11 of 22

Molecular Biology:

Collect, organize, and integrate the enormous quantities of data on bioinformatics, functional genomics, proteomics, gene expression monitoring, and microarrays.
Analyze amino acid sequences and DNA microarrays.
Analyze and use gene expression data to characterize biological function.
Predict protein structure and identify related proteins.

Data Mining_Anoop Chaturvedi

12 of 22

Forensic:

Identify fraudulent behavior in credit card usage by looking for transactions that do not fit a particular cardholder’s buying habits.
Identify fraud in insurance and medical claims.
Identify instances of tax evasion.
Detect illegal activities leading to suspected money laundering operations.
Identify stock market behaviors that indicate possible insider-trading operations.

Data Mining_Anoop Chaturvedi

13 of 22

Transportation:

Determine the distribution schedules among outlets
Analyze loading patterns

Sports:

Identify which players and designed plays are most effective at specific points in the game and in relation to combinations of opposing players.
Discover game patterns hidden behind summary statistics.

Data Mining_Anoop Chaturvedi

14 of 22

Astronomy:

Catalogue hundreds of millions of stars, galaxies in the sky using hundreds of attributes, such as position, size, shape, age, brightness, and color.
Identify patterns and relationships of objects in the sky.

Data Mining_Anoop Chaturvedi

15 of 22

Data Mining_Anoop Chaturvedi

16 of 22

Knowledge discovery includes the entire process of discovering useful knowledge from data, including data mining as a key component.

KDD is composed of six primary activities:

1. Selecting the target data set (data set/ variables/ cases used for data mining)

2. Data cleaning (removal of noise, identification of outliers, imputing missing data)

Data Mining_Anoop Chaturvedi

17 of 22

3. Preprocessing the data (data transformations, tracking time-dependent information)

4. Deciding appropriate data-mining tasks (regression, classification, clustering, etc.)

5. Analyzing the cleaned data using data-mining software (algorithms for data reduction, dimensionality reduction, fitting models, prediction, extracting patterns etc.)

6. Interpreting and assessing the knowledge obtained from data-mining results.

Data Mining_Anoop Chaturvedi

18 of 22

Artificial Intelligence (AI) Different Definitions

The branch of computer science involving the automation of intelligent behavior
The automation of activities is associated with human thinking, activities such as decision-making, problem-solving, learning, etc.
Creating machines that perform functions requiring intelligence when performed by people
Study of mental faculties through the use of computational models

Data Mining_Anoop Chaturvedi

19 of 22

Study of the computations that make it possible to perceive, reason, and act
A field that seeks to explain and emulate intelligent behavior in terms of computational processes

The approaches to AI can be organized into four categories:

Systems that think like humans.
Systems that act like humans.
Systems that think rationally.
Systems that act rationally.

Data Mining_Anoop Chaturvedi

20 of 22

Acting humanly: Turing Test approach for intelligent behavior

Turning Test (proposed by Alan Turing in 1950)

⇒ The measure of a machine’s ability to demonstrate human-like intelligence

AI is required to pass the Turning Test.

Thinking humanly requires the cognitive modeling approach (How human thinks)

Thinking rationally requires the laws of thought approach (indisputable reasoning processes)

Data Mining_Anoop Chaturvedi

21 of 22

Turning Test: The objective is to evaluate a machine's ability to demonstrate human-like intelligence.

A human evaluator engages in a natural language conversation with a human and a machine through a computer interface, without knowing which is which.

Evaluator’s task ⇒ Distinguish between the human and machine.

If the machine can fool the evaluator into believing that it is a human a significant portion of the time, then it passes the Turing Test.

Data Mining_Anoop Chaturvedi

22 of 22

The Turing Test led to debates about what constitutes intelligence, consciousness, and the nature of human-computer interaction.

Limitations:

The machine may rely on superficial tricks or patterns rather than genuine comprehension.

Focuses primarily on linguistic abilities and may not capture other aspects of intelligence, such as creativity or emotional intelligence.

Data Mining_Anoop Chaturvedi