Unit-I DATA MANAGEMENT
INTRODUCTION:
In the beginning times of computers and Internet, the data used was not as much of as it is today, the data then could be so easily stored and managed by all the users and business enterprises on a single computer, because the data never exceeded to the extent of 19 Texabytes but now in this era, the data has increased about 2.5 quintillion per day.
What are the tools used in Data Analytics?
With the increasing demand for Data Analytics in the market, many tools have emerged with various functionalities for this purpose. Either open-source or user-friendly, the top tools in the data analytics market are as follows.
R programming
Python
Tableau Public
QlikView
SAS
Microsoft Excel
RapidMiner
KNIME
OpenRefine
Apache Spark
Six stages of data processing
Once the data is collected, it then enters the data preparation stage. Data preparation, often referred to as “pre-processing” is the stage at which raw data is cleaned up and organized for the following stage of data processing.
The clean data is then entered into its destination (perhaps a CRM like Salesforce or a data warehouse like Redshift), and translated into a language that it can understand. Data input is the first stage in which raw data begins to take the form of usable information.
***END OF UNIT 1***
UNIT -II
such as Access, Excel, Microsoft SQL, Tera data, Oracle, Sybase etc. This tool is mostly used for predictive analytics, such as data mining, text analytics, machine learning.
�
***END OF UNIT-II***
UNIT-III
�Univariate Analysis:
Precision is the ratio of correctly predicted positive observations to the total predicted positive observations.
�
�
*** END OF UNIT-III***
UNIT –IV Object Segmentation & Time Series Methods�
Basic algorithm for inducing a decision tree from training tuples:
Algorithm:
Generate decision tree. Generate a decision tree from the training tuples of data
partition D.
Input:
Data partition, D, which is a set of training tuples and their associated class labels;
attribute list, the set of candidate attributes;
Attribute selection method, a procedure to determine the splitting criterion that “best” partitions the data tuples into individual classes. This criterion consists of a splitting attribute
and, possibly, either a split point or splitting subset.
Output: A decision tree.
Method:
create a node N;
if tuples in D are all of the same class, C then
return N as a leaf node labeled with the class C;
if attribute list is empty then
return N as a leaf node labeled with the majority class in D;
// majority voting
apply Attribute selection method(D, attribute list) to find the
“best”
splitting criterion;
Label node N with splitting criterion;
if splitting attribute is discrete-valued and multiway splits allowed
then // not restricted to binary trees
attribute list= attribute list - splitting attribute
for each outcome j of splitting criterion
// partition the tuples and grow subtrees for
each partition
let Dj be the set of data tuples in D satisfying outcome j;
// a partition
if Dj is empty then
attach a leaf labeled with the majority class in D to node N;
else
attach the node returned by Generate decision tree(Dj,
attribute list) to node N;
return N;
***END OF UNIT-IV***
UNIT - V�Data Visualization�
***END OF UNIT-V***