1
Sri Raghavendra Educational Institutions Society (R)
(Approved by AICTE, Accredited by NAAC, Affiliated to VTU, Karnataka)
Sri Krishna Institute of Technology
www.skit.org.in
Prepared by:
Latha
Course: Cloud Computing
Department: Computer Science &Engineering
Module-5: FEATURES OF CLOUD AND GRID PLATFORMS
2
CO Addressed: CO5
Cloud Capabilities and Platform Features
3
Traditional Features Common to Grids and Clouds
4
Data Features and Databases
5
6
Programming and Runtime Support
Worker and Web Roles
MapReduce
Cloud Programming Models
SaaS
7
PARALLEL AND DISTRIBUTED PROGRAMMING PARADIGMS
Parallel Computing and Programming Paradigms
• Computation partitioning
• Data partitioning
• Mapping
• Synchronization
• Communication
• Scheduling
8
MapReduce, Twister, and Iterative MapReduce
9
Formal Definition of MapReduce
MapReduce Logical Data Flow
10
Formal Notation of MapReduce Data Flow
MapReduce Actual Data and Control Flow
11
12
13
Hadoop Library from Apache
layers:
14
Architecture of MapReduce in Hadoop
15
16
Running a Job in Hadoop
Three components contribute in running a job in this system: a user node, a JobTracker, and several TaskTrackers.
17
18
Dryad and DryadLINQ from Microsoft
19
20
LINQ-expression execution in DryadLINQ.
21
Sawzall and Pig Latin High-Level Languages
22
Programming the Google App Engine
23
Google File System (GFS)
24
25
The data mutation takes the following steps:
1. The client asks the master which chunk server holds the current lease for the chunk and the locations of the other replicas. If no one has a lease, the master grants one to a replica it chooses (not shown).
2. The master replies with the identity of the primary and the locations of the other (secondary) replicas. The client caches this data for future mutations. It needs to contact the master again only when the primary becomes unreachable or replies that it no longer holds a lease.
3. The client pushes the data to all the replicas. Each chunk server will store the data in an internal LRU buffer cache until the data is used or aged out.
4. Once all the replicas have acknowledged receiving the data, the client sends a write request to the primary. The request identifies the data pushed earlier to all the replicas. The primary assigns consecutive serial numbers to all the mutations it receives
5. The primary forwards the write request to all secondary replicas. Each secondary replica applies mutations in the same serial number order assigned by the primary.
6. The secondaries all reply to the primary indicating that they have completed the operation.
7. The primary replies to the client. Any errors encountered at any replicas are reported to the client. In case of errors, the write corrects at the primary and an arbitrary subset of the secondary replicas. The client request is considered to have failed, and the modified region is left in an inconsistent state.
26
27
BigTable, Google’s NOSQL System
fault-tolerant and persistent database as in a storage service.
1. GFS: stores persistent state
2. Scheduler: schedules jobs involved in BigTable serving
3. Lock service: master election, location bootstrapping
4. MapReduce: often used to read/write BigTable data
28
Tablet Location Hierarchy
29
The first level is a file stored in Chubby that contains the location of the root tablet, which contains the location of all tablets in a special METADATA table.
Each METADATA tablet contains the location of a set of user tablets.
The root tablet is just the first tablet in the METADATA table, which is never split to ensure that the tablet location hierarchy has no more than 3 levels.
The METADATA table stores the location of a tablet under a row key that is an encoding of the tablet’s table identifier and its end row.
BigTable includes many optimizations and fault-tolerant features. Chubby can guarantee the availability of the file for finding the root tablet. The BigTable
master can quickly scan the tablet servers to determine the status of all nodes.
30
Chubby, Google’s Distributed Lock Service
31
32