Hadoop �
What is Hadoop ?�
Now, you must have got an idea why Big Data is a problem statement and how Hadoop solves it.
The first problem is storing the colossal amount of data:�
512 MB
128 MB
128 MB
128 MB
128 MB
Second problem was storing a variety of data.
The third challenge was about processing the data faster
Hadoop Architecture�
Replica placement via Rack awareness in Hadoop
Advantages of Implementing Rack Awareness
YARN�
YARN comprises of two major components: ResourceManager and NodeManager.
ResourceManager �
NodeManager�
Below is the list of few data types in Java along with the equivalent Hadoop variant
Java Data Types | Hadoop Data Types | Description |
Integer | IntWritable | It is the Hadoop variant of Integer. It is used to pass integer numbers as key or value. |
Float | FloatWritable | Hadoop variant of Float used to pass floating point numbers as key or value. |
Long | LongWritable | Hadoop variant of Long data type to store long values. |
Short | ShortWritable | Hadoop variant of Short data type to store short values. |
Double | DoubleWritable | Hadoop variant of Double to store double values. |
String | Text | Hadoop variant of String to pass string characters as key or value. |
Byte | ByteWritable | Hadoop variant of byte to store sequence of bytes. |
null | NullWritable | Hadoop variant of null to pass null as a key or value. Usually NullWritable is used as data type for output key of the reducer, when the output key is not important in the final result. |
Hadoop
Hadoop ECO-System
Conclusion
Top Big Data Technologies
Big Data Technologies in Data Storage
Big Data Technologies in Data Storage.
Big Data Technologies in Data Storage.
Big Data Technologies in Data Storage.
Big Data Technologies used in Data Mining.
Big Data Technologies used in Data Mining.
Big Data Technologies used in Data Mining.
Big Data Technologies used in Data Analytics.
Big Data Technologies used in Data Analytics.
Big Data Technologies used in Data Analytics.
�
Big Data Technologies used in Data Analytics.
Big Data Technologies used in Data Analytics.
Big Data Technologies used in Data Analytics.
�
Data Visualization Big Data technologies
Data Visualization Big Data technologies
�
Emerging Big Data Technologies
Emerging Big Data Technologies
�
Emerging Big Data Technologies
Emerging Big Data Technologies
Emerging Big Data Technologies
Parallel copying with "distcp"
$:hadoop distcp
HDFS Balancers
HDFS Balancers
�$hdfs balancer [-threshold <threshold>]Percentage of disk capacity
The threshold parameter is number between 0 and 100 .�From the average cluster utilization, the balancer process will try to converge all datanodes’ usage in the range [average – threshold, average + threshold].
HDFS Balancers
– Higher (average + threshold): 60%�– Lower (average – threshold): 40%
Cluster balancing algorithm: The HDFS Balancer runs in iterations. Each iteration contains the following four steps:
This class is present in org.apache.hadoop.conf package.
The local implementation is LocalFileSystem and distributed implementation is DistributedFileSystem.
— Returns the FileSystem for this URI.
It is present in org.apache.hadoop.io package.