HBase
What is HBase?
HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an open-source project and is horizontally scalable.
HBase is a data model that is similar to Google’s big table designed to provide quick random access to huge amounts of structured data. It leverages the fault tolerance provided by the Hadoop File System (HDFS).
It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.
One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and provides read and write access.
HBase and HDFS
HDFS | HBase |
HDFS is a distributed file system suitable for storing large files. | HBase is a database built on top of the HDFS. |
HDFS does not support fast individual record lookups. | HBase provides fast lookups for larger tables. |
It provides high latency batch processing; no concept of batch processing. | It provides low latency access to single rows from billions of records (Random access). |
It provides only sequential access of data. | HBase internally uses Hash tables and provides random access, and it stores the data in indexed HDFS files for faster lookups. |
Storage Mechanism in HBase
HBase is a column-oriented database and the tables in it are sorted by row.
The table schema defines only column families, which are the key value pairs.
Row ID | Column Family | Column Family | Column Family | ||||||
col1 | col2 | col3 | col1 | col2 | col3 | col1 | col2 | col3 | |
1 |
|
|
|
|
|
|
|
|
|
2 |
|
|
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
|
|
|
Column Oriented and Row Oriented
Column-oriented databases are those that store data tables as sections of columns of data, rather than as rows of data. Shortly, they will have column families.
Row-Oriented Database | Column-Oriented Database |
It is suitable for Online Transaction Process (OLTP). | It is suitable for Online Analytical Processing (OLAP). |
Such databases are designed for small number of rows and columns. | Column-oriented databases are designed for huge tables. |
HBase and RDBMS
HBase | RDBMS |
HBase is schema-less, it doesn't have the concept of fixed columns schema; defines only column families. | An RDBMS is governed by its schema, which describes the whole structure of tables. |
It is built for wide tables. HBase is horizontally scalable. | It is thin and built for small tables. Hard to scale. |
No transactions are there in HBase. | RDBMS is transactional. |
It has de-normalized data. | It will have normalized data. |
It is good for semi-structured as well as structured data. | It is good for structured data. |
Features of HBase
Where to Use HBase
Applications of HBase
HBase - Architecture
MasterServer
Assigns regions to the region servers and takes the help of Apache ZooKeeper for this task.
Handles load balancing of the regions across region servers. It unloads the busy servers and shifts the regions to less occupied servers.
Maintains the state of the cluster by negotiating the load balancing.
Is responsible for schema changes and other metadata operations such as creation of tables and column families.
Regions
Regions are nothing but tables that are split up and spread across the region servers.
Region server:
The region servers have regions that -
When we take a deeper look into the region server, it contain regions and stores as shown below:
Zookeeper
Zookeeper is an open-source project that provides services like maintaining configuration information, naming, providing distributed synchronization, etc.
Zookeeper has ephemeral nodes representing different region servers. Master servers use these nodes to discover available servers.
In addition to availability, the nodes are also used to track server failures or network partitions.
Clients communicate with region servers via zookeeper.
In pseudo and standalone modes, HBase itself will take care of zookeeper.
HBase Shell : HBase contains a shell using which you can communicate with HBase.
General Commands:
Data Definition Language:
These are the commands that operate on the tables in HBase.
create - Creates a table.
list - Lists all the tables in HBase.
disable - Disables a table.
is_disabled - Verifies whether a table is disabled.
enable - Enables a table.
is_enabled - Verifies whether a table is enabled.
Data Manipulation Language:
Table Creation in HBase
Creating a Table using HBase Shell
Creating a Table Using java API
Creating a Table using HBase Shell
You can create a table using the create command, here you must specify the table name and the Column Family name.
create ‘<table name>’,’<column family>’
create 'emp', 'personal data', 'professional data'
Row key | personal data | professional data |
| | |
| | |
Creating a Table Using java API
You can create a table in HBase using the createTable() method of
HBaseAdmin class. This class belongs to the org.apache.hadoop.hbase.client package.
steps to create a table in HBase using java API.
Step1: Instantiate HBaseAdmin
Step2: Create TableDescriptor
Step3: Execute through Admin
Step1: Instantiate HBaseAdmin
This class requires the Configuration object as a parameter, therefore initially instantiate the Configuration class and pass this instance to HBaseAdmin.
Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
Step2: Create TableDescriptor
HTableDescriptor is a class that belongs to the org.apache.hadoop.hbase classThis class is like a container of table names and column families.
//creating table descriptor
HTableDescriptor table = new HTableDescriptor(toBytes("Table name"));
//creating column family descriptor
HColumnDescriptor family = new HColumnDescriptor(toBytes("column family"));
//adding coloumn family to HTable
table.addFamily(family);
Step 3: Execute through Admin
Using the createTable() method of HBaseAdmin class, you can execute the created table in Admin mode.
admin.createTable(table);
import java.io.IOException;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.conf.Configuration;
public class CreateTable {
public static void main(String[] args) throws IOException {
// Instantiating configuration class
Configuration con = HBaseConfiguration.create();
// Instantiating HbaseAdmin class
HBaseAdmin admin = new HBaseAdmin(con);
// Instantiating table descriptor class
HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf("emp"));
// Adding column families to table descriptor
tableDescriptor.addFamily(new HColumnDescriptor("personal"));
tableDescriptor.addFamily(new HColumnDescriptor("professional"));
// Execute the table through admin
admin.createTable(tableDescriptor);
System.out.println(" Table created ");
} }