1 of 26

WEB INTELLIGENCE AND BIG DATA

VIII SEMESTER

ETCS-418

1

2 of 26

UNIT-I�WEB INTELLIGENCE AND BIG DATA

  • Fundamental Design Aspects of HVDC systems
  • Comparison of EHVAC system with HVDC system
  • Different types of Schemes with basic details
  • HVDC Equipment and their Ratings, Construction and characteristics
  • Six pulse Converter and analysis
  • Twelve pulse Converter and analysis

2

3 of 26

Web-scale artificial intelligence�

  • Web-scale artificial intelligence refers to the use of machine learning algorithms and other AI techniques to process and analyze vast amounts of data on the web. It involves the use of distributed computing systems, such as cloud computing, to handle the large-scale data processing and storage requirements of AI applications.
  • Web-scale AI is used in many applications, including search engines, recommendation systems, fraud detection, and natural language processing. For example, search engines like Google use web-scale AI to index and rank web pages, and to provide personalized search results based on user behavior and preferences. Recommendation systems like those used by Amazon and Netflix use web-scale AI to analyze user data and suggest products or content that users are likely to be interested in.
  • Web-scale AI also involves the use of big data technologies like Hadoop and Spark,

3

4 of 26

Web-scale artificial intelligence

4

5 of 26

application of big data�

  • Big data has many applications across industries, and its use is growing as more data becomes available and more businesses seek to leverage the power of big data to gain insights and make informed decisions. Here are some examples of the application of big data:
  • Healthcare: Big data can be used to analyze patient data and medical records to identify patterns and make diagnoses. It can also be used for drug discovery, clinical trial optimization, and personalized medicine.
  • Finance: Big data can be used for fraud detection, risk assessment, and algorithmic trading. It can also be used to provide personalized financial advice and improve customer experience.
  • Marketing: Big data can be used to analyze customer behavior and preferences to provide personalized recommendations and improve customer engagement. It can also be used to optimize pricing and promotions.
  • Manufacturing: Big data can be used to optimize

5

6 of 26

TYPES OF BIG DATA�

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

6

7 of 26

APACHE HADOOP�

  • Apache Hadoop is one of the main supportive element in Big Data technologies. It simplifies the processing of large amount of structured or unstructured data in a cheap manner. Hadoop is an open source project from apache that is continuously improving over the years. "Hadoop is basically a set of software libraries and frameworks to manage and process big amount of data from a single server to thousands of machines.
  • �It provides an efficient and powerful error detection mechanism based on application layer rather than relying upon hardware."
  • In December 2012 apache releases Hadoop 1.0.0, more information and installation guide can be found at Apache Hadoop Documentation. Hadoop is not a single project but includes a number of other technologies in it.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

7

8 of 26

MAPREDUCE�

  • MapReduce was introduced by google to create large amount of web search indexes. It is basically a framework to write applications that processes a large amount of structured or unstructured data over the web. MapReduce takes the query and breaks it into parts to run it on multiple nodes. By distributed query processing it makes it easy to maintain large amount of data by dividing the data into several different machines. Hadoop MapReduce is a software framework for easily writing applications to manage large amount of data sets with a highly fault tolerant manner. More tutorials and getting started guide can be found at Apache Documentation.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

8

9 of 26

HDFS(Hadoop distributed file system)� 

  • HDFS is a java based file system that is used to store structured or unstructured data over large clusters of distributed servers. The data stored in HDFS has no restriction or rule to be applied, the data can be either fully unstructured of purely structured. In HDFS the work to make data senseful is done by developer's code only. Hadoop distributed file system provides a highly fault tolerant atmosphere with a deployment on low cost hardware machines. HDFS is now a part of Apache Hadoop project, more information and installation guide can be found at Apache HDFS documentation.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

9

10 of 26

HIVE�

  • Hive was originally developed by Facebook, now it is made open source for some time. Hive works something like a bridge in between sql and Hadoop, it is basically used to make Sql queries on Hadoop clusters. Apache Hive is basically a data warehouse that provides ad-hoc queries, data summarization and analysis of huge data sets stored in Hadoop compatible file systems.
  • Hive provides a SQL like called HiveQL query based implementation of huge amount of data stored in Hadoop clusters. In January 2013 apache releases Hive 0.10.0, more information and installation guide can be found at Apache Hive Documentation.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

10

11 of 26

PIG�

  • Pig was introduced by yahoo and later on it was made fully open source. It also provides a bridge to query data over Hadoop clusters but unlike hive, it implements a script implementation to make Hadoop data access able by developers and business persons. Apache pig provides a high level programming platform for developers to process and analyses Big Data using user defined functions and programming efforts. In January 2013 Apache released Pig 0.10.1 which is defined for use with Hadoop 0.10.1 or later releases. More information and installation guide can be found at Apache Pig Getting Started Documentation.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

11

12 of 26

TRADITIONAL VS BIG DATA BUSINESS APPROACH�1. Schema less and Column oriented Databases (No Sql)

  • We are using table and row based relational databases over the years, these databases are just fine with online transactions and quick updates. When unstructured and large amount of data comes into the picture we needs some databases without having a hard code schema attachment. There are a number of databases to fit into this category, these databases can store unstructured, semi structured or even fully structured data.
  • Apart from other benefits the finest thing with schema less databases is that it makes data migration very easy. MongoDB is a very popular and widely used NoSQL database these days. NoSQL and schema less databases are used when the primary concern is to store a huge amount of data and not to maintain relationship between elements. "NoSQL (not only Sql) is a type of databases that does not primarily rely upon schema based structure and does not use Sql for data processing."

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

12

13 of 26

  • Big Data Characteristics

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

13

14 of 26

Web intelligence

  • Web intelligence refers to the study and application of artificial intelligence, data mining, and machine learning techniques to extract useful insights and knowledge from the vast amount of data generated on the web. It involves analyzing and understanding user behavior, preferences, and trends on the web, and using this information to improve web services, user experience, and business performance.
  • Web intelligence is used in a variety of applications, such as web search engines, social media analysis, e-commerce, online advertising, and recommendation systems. By analyzing user behavior and preferences, web intelligence can help companies and organizations make informed decisions and improve their products and services.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

14

15 of 26

types of Web intelligence�

  • Web content mining: This type of web intelligence involves extracting useful information and knowledge from web content such as text, images, videos, and audio. Techniques used for web content mining include natural language processing, text analytics, and image and video analysis.
  • Web structure mining: This type of web intelligence involves analyzing the structure of the web, such as links between web pages, to identify patterns and relationships. Techniques used for web structure mining include network analysis and graph theory.
  • Web usage mining: This type of web intelligence involves analyzing user behavior and interactions on the web, such as clickstream data, to understand user preferences and behavior. Techniques used for web usage mining include data mining, machine learning, and predictive modeling.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

15

16 of 26

application of Web intelligence�

  • Search engines: Web intelligence is used to improve the accuracy and relevance of search results by analyzing user search behavior, website content, and other factors.
  • E-commerce: Web intelligence is used to personalize product recommendations, optimize pricing and promotions, and improve the overall customer experience.
  • Social media analysis: Web intelligence is used to analyze social media data to understand customer sentiment, preferences, and behavior. This information can be used to improve customer engagement and inform marketing strategies.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

16

17 of 26

Look: indexing

  • Indexing is the process of organizing and cataloging information, data, or content in a systematic and structured way to make it easier to search, access, and retrieve later. It involves creating an index or a list of words or phrases, along with their corresponding locations or references in a document or database.
  • Indexing is commonly used in various fields, including information science, library science, computer science, and publishing, to facilitate efficient and accurate retrieval of information. Search engines, for example, use indexing to create a database of web pages and their content, which can be quickly searched and sorted based on relevance and other criteria.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

17

18 of 26

indexing in artificial intelligence�

  • Indexing is also an important concept in artificial intelligence (AI) and machine learning (ML), particularly in the context of information retrieval and natural language processing (NLP). In AI and ML, indexing involves creating an index or a database of information, which can be used to improve the efficiency and accuracy of search algorithms and NLP models.
  • For example, in NLP, indexing can involve creating an index of all the words in a corpus or a dataset, along with their frequency and other relevant metadata. This can be used to create language models that can predict the likelihood of a given word or phrase appearing in a particular context.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

18

19 of 26

Index Creation�

  • Index creation is the process of building an index, which is a data structure used to improve the efficiency and speed of data retrieval operations. An index typically consists of a list of keys, along with pointers or references to the location of the corresponding data. The keys are typically chosen based on the attributes that are frequently used to search or sort the data.
  • In the context of databases, index creation involves analyzing the data and selecting appropriate columns or fields to index. The index is then built using a specific algorithm, such as B-tree or hash indexing, to create a fast-access data structure that can be used to efficiently search or sort the data based on the indexed fields.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

19

20 of 26

types of Index creation�

  • B-tree indexing: This is a type of indexing commonly used in databases, where the data is organized into a balanced tree structure. B-tree indexing allows for fast retrieval of data based on a range of values, such as all the records with a certain value in a specific column.
  • Hash indexing: This is a type of indexing commonly used in databases for exact match searches. Hash indexing involves creating a hash table, which stores pointers to the location of the data, based on a hash function that is applied to a specific column or set of columns.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

20

21 of 26

Ranking�

  • Ranking refers to the process of ordering or prioritizing a set of items based on a certain criteria or score. In the context of artificial intelligence, ranking is often used in information retrieval, natural language processing, and machine learning.
  • In information retrieval, ranking refers to the process of ordering search results based on their relevance to a user's query. Search engines use ranking algorithms to assign scores to web pages or other documents based on their content, relevance, and other factors. The search results are then ordered based on these scores, with the most relevant results appearing at the top of the list.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

21

22 of 26

Page Ranking Searching

  • PageRank is a ranking algorithm used by Google Search to rank web pages in their search engine results. It was developed by Google co-founder Larry Page and named after him.
  • The PageRank algorithm works by assigning a numerical value, or "rank", to each web page based on the number and quality of other web pages that link to it. Pages with more high-quality links are assigned a higher PageRank score, indicating that they are more important or relevant.
  • When a user enters a search query, Google's search engine uses PageRank to order the search results based on their relevance to the query. The search engine considers various factors such as keyword frequency, website quality, and user behavior, in addition to PageRank, to provide the most relevant and high-quality results to the user.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

22

23 of 26

Enterprise search

  • Enterprise search is a software application that enables organizations to search and access information stored in various enterprise systems, such as databases, content management systems, email archives, and file shares. It provides a single point of access to all enterprise information, making it easier for employees to find the information they need to do their jobs more efficiently.
  • Enterprise search technology can be used for a variety of purposes, including:
  • Document and content management: Enterprise search can help organizations manage large volumes of documents and other types of content, making it easier to search for and retrieve specific pieces of information.
  • Knowledge management: Enterprise search can help organizations capture and share knowledge across the organization, making it easier for employees to find answers to common questions and access information they need to do their jobs.
  • E-commerce: Enterprise search can help e-commerce websites provide better search functionality to customers, enabling them to find products and information more quickly and easily.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

23

24 of 26

10 Searching data structure�

  • A data structure is a way of organizing and storing data so that it can be accessed and used efficiently. There are many different types of data structures, each with its own strengths and weaknesses depending on the specific use case.
  • Some commonly used data structures include:
  • Arrays: A collection of elements of the same type, stored in contiguous memory locations.
  • Linked Lists: A collection of nodes that contain data and a pointer to the next node in the list.
  • Stacks: A collection of elements with a Last-In-First-Out (LIFO) structure, where elements can only be added or removed from the top.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

24

25 of 26

Object search

  • Object search is a type of searching algorithm used to find an object or a group of objects within a data set. The data set can be any type of data structure, such as an array, list, tree, or graph. Object search algorithms are used in a wide range of applications, including computer graphics, computer vision, image processing, and machine learning.
  • Object search algorithms can be classified into two categories: exact object search and approximate object search.
  • Exact object search: Exact object search algorithms are used to find an exact match of an object within a data set. Examples of exact object search algorithms include linear search, binary search, hash tables, and B-trees.
  • Approximate object search: Approximate object search algorithms are used to find objects that are similar to a given query object. Examples of approximate object search algorithms include k-nearest neighbor (k-NN) search, range search, and locality-sensitive hashing (LSH).

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

25

26 of 26

12 Locality sensitive hashing �

  • Locality-sensitive hashing (LSH) is a technique used in computer science to approximate nearest neighbor search in high-dimensional spaces. In high-dimensional spaces, traditional search techniques like linear search and binary search can be computationally expensive, making it difficult to find the nearest neighbors of a given point. LSH solves this problem by using hash functions to map similar points to the same buckets, which allows for fast and efficient approximate nearest neighbor search.
  • The key idea behind LSH is to transform the high-dimensional data points into a lower-dimensional space using hash functions. The hash functions are designed to map similar points to the same buckets with a high probability, while mapping dissimilar points to different buckets with a high probability. By mapping similar points to the same buckets, LSH can quickly identify potential nearest neighbors of a given query point by searching through the buckets that contain points similar to the query point.

Department of Electrical and Electronics Engineering, BVCOE New Delhi Subject: SUBJECT NAME , Instructor: INSTRUCTOR NAME

26