1 of 38

What's new in the recent and upcoming HBase releases

Duo Zhang, Chair of Apache HBase PMC

CommunityOverCode

2 of 38

Pervasive...

...distributed, scalable, big data store

3 of 38

In a nutshell

...19,764 commits made by 638 contributors

...representing 973,452 lines of code

...mostly written in Java

...has a well established, mature codebase

...maintained by a very large development team

...with stable Y-O-Y commits

...took an estimated 273 years of effort (COCOMO model)

...starting with its first commit in April, 2007 (>15 years old!)

Source https://www.openhub.net/p/hbase

4 of 38

Our Project

Apache HBase is an Open Source Apache project.
It's what we want to make of it.
No owners!
Anyone can help!
All welcome!
The more, the merrier!

5 of 38

Recent and Upcoming Releases

Release Line	Latest Release	State
2.5.x	2.5.5	Current stable release line
2.6.x	N/A	Next minor release line
3.0.x	3.0.0-alpha-4	Next major release line

6 of 38

Tracing Improvements
TLS Support
Cloud Native Improvements
Other Notable Improvements
Future

CONTENTS

7 of 38

Part 01

Tracing improvements

CommunityOverCode

8 of 38

History

HTrace

Basic support
Can trace down to hadoop components

HTrace is dead

Cloudera dropped the support
The tracing related code is broken and finally removed

9 of 38

Alternates

OpenTracing?
OpenCensus?
OpenTelemetry✓

OpenTracing + OpenCensus
A standard rather than an implementation
Can support different backend

10 of 38

Different ways to implement

Instrumentation?

Pros: no HBase side code change, theoretically can even support old versions of HBase
Cons: the code of HBase always changes, so not easy to implement

Annotation?

Pros: less code at HBase side
Cons: not flexible, can not bring detailed information, can only support newer HBase version

API✓

Pros: most flexible, can bring any detailed information you want
Cons: need to write a lot of code, can only support newer HBase Version
HBASE-22120

11 of 38

Otel Semantic Conventions

Semantic Conventions

A standard on the naming of span and also common attribute names

HBASE-26419

Correct the namings
Fill necessary attributes

Has already been released in 2.5.0

12 of 38

Part 02

TLS Support

CommunityOverCode

13 of 38

TLS vs SASL

Security

SASL: Handshake is plaintext, where we may pass some tokens
TLS: A modern certificate based protocol, more secure

Performance

SASL: Pure java byte array encryption/decryption
TLS: Can leverage netty/openssl to perform offheap encryption/decryption

14 of 38

Implementation

Use Netty SslHandler

Add SslHandler for NettyRpcServer/NettyRpcClient when setting up pipeline
BlockingRpcClient/SimpleRpcServer are not supported

Use Hadoop Credentials API

For retrieving passwords

Ported a lot code from ZooKeeper TLS implementation

Like HBASE-27347, thanks to the ZooKeeper community, especially Andor Molnar

Will be released in 2.6.0

HBASE-26666

Performance impact

HBASE-27947, OOM under high load and bad clients

15 of 38

Part 03

Cloud Native Improvements

CommunityOverCode

16 of 38

StoreFileTracker

HBASE-26067

An abstraction layer on how we track store files

DefaultSFT

Track store files by listing
Listing is slow for OSS

FileBasedSFT

Inspired by Iceberg, using a file to store the list of valid HFiles
Implementation trick: use two files to store the list, swap when updating, reading both files and get the newer one when reading
Avoid listing(almost)

Will be released in 2.6.0, but still have rooms to improve

HBASE-27841: Layout improvements, do not need directory structure on OSS
HBASE-27826: Commit all the store file changes at once when splitting/merging

17 of 38

No Persistent data on ZooKeeper

Move meta location off ZooKeeper(HBASE-26193)

design.invariants.zk.data

Store it in master local region(Introduced in HBASE-23326)
Still need to publish the location to ZooKeeper

Client still needs it if uses ZKConnectionRegistry

Meta replicas?

Different qualifiers

Auto migration

All data are in a single row, so easy to make it transactional

Has already been released in 2.5.0

18 of 38

No Persistent data on ZooKeeper

Some state flags

Split enabled
Merge enabled
Balancer enabled
etc.

Move them to master local region(too)

A new ‘state’ family
One qualifier for one state

19 of 38

No Persistent data on ZooKeeper

Table based replication queue storage(HBASE-27109)

Long history, HBASE-15867, but only half down
Cyclic dependency

Need to initialize replication tracking system before setup WAL
So what if replication depends on a HBase table, thus WAL?

Delay the recording of replication progress

Can get all the WAL files(in order) by listing
Only need an offset, i.e, a <file, offset_in_file> pair
Only need to record after replicating something out!

Challenge

Claim queue: Need to create replication queues before claiming
Replication log cleaner: ditto
Fencing: disable peer modifications while claiming queue or cleaning replication log
No free lunch: easier in normal path, much difficult in recovery!

Will be released in 3.0.0

20 of 38

No Persistent data on ZooKeeper

File system based replication peer storage(HBASE-27110)

Cyclic dependency too if you want a separated table
Data are small, so better find another place to store

Master local region vs FileSystem

Region Server needs to load the data too
Need to expose new rpc methods if choose master local region
FileSystem based is simple and enough

One file per peer(actually, two files rotating)

Will be released in 3.0.0

21 of 38

Redeploy Cluster on Cloud

‘Redeploy’

Flush everything to OSS, destroy the cluster, and recreate a new cluster pointing to the root directory on OSS

No persistent data on ZooKeeper first

Mostly done, besides Replication(need 3.0.0 release)

Recovery problem

Need to scan WAL directory to find the region server list
Why? To make code cleaner

We need SCP to bring regions online, including meta regions
How to schedule SCP? By comparing live/all region server list
How to find all region server list? By scanning meta?

Store region server list in master local region(HBASE-26245)

Do not need to scan WAL directory any more
But make sure you flushed the master local region too before destroying the cluster…

22 of 38

Kubernetes Deployment Support

Kubernetes is the new standard

Some companies already have their in house tool to deploy HBase on kubernetes
[DISCUSS] Kubernetes Orchestration for ZK, HDFS, and HBase

The guys from Apple started to contribute their tool to open source

For deploying ZooKeeper, HDFS and HBase all on kubernetes
No new operators

Still in progress

A new repo: hbase-kustomize
HBASE-27827

23 of 38

Part 04

Other Notable Improvements

CommunityOverCode

24 of 38

Log4j1 -> Log4j2

Log4j1 is dead and have critical CVEs

Reload4j: just fixed the CVEs, the project itself is still ‘dead’

Challenge

Directly depend on log4j1 in code

Introduce a hbase-logging module, where we depend on log4j, ban log4j imports in all other modules

Lots of log4j properties file across the whole code base

Unify all properties file for tests and put a single properties file to hbase-logging module, and only place one properties file for production usage under conf directory

log4j2 uses a different way to set root logger and level

LOG4J2-3341, they finally added back the '-Dlog4j.rootLogger=INFO,Console' support

Hadoop is still on log4j1

Still need log4j1 when running UTs, but do not ship log4j1 in tarball
HADOOP-16206, still in progress…

25 of 38

New Region Replication Framework

Region Replication?

Introduced in HBASE-10070, read replicas to increase read availability, i.e, timeline consistent read
Share the same HFiles, but how to reduce the latency? Send modifications to secondary replicas directly. Replication!
HBASE-11183, add replication support for read replicas, which is called ‘region replication’
HBASE-18070, add replication support for meta region

Problem?

Can not work when SKIP_WAL is used
Increase disk load as we need to tail the WAL files(by reading again and again)
Need special handling for meta region, as we do not replicate meta region for normal replication

26 of 38

New Region Replication Framework

Basic idea

The inconsistency can be fixed by a flush, so we do not need very serious accounting like normal replication
Just send the edits through RPC, if fails or lags too much, drop all the pending edits and trigger a flush
Do not depend on WAL, so can also work with SKIP_WAL, and no special handling for meta region

Implementation

Extend the MVCC write entry, to bring an action, once the entry is completed, we run the action, which will send the edits in the entry to all the secondary replicas
Introduce a new method at region server side to receive the edits

27 of 38

New Region Replication Framework

Rolling upgrade

Removing legacy ‘region_replica_replication’ peer while upgrading master
if new replica method is not available, fallback to use old ‘replay’ method
Everything will be OK after a flush, so not a critical problem even if we miss some edits while rolling upgrading

Will be released in 3.0.0

3.0.0-alpha-3
HBASE-26233

28 of 38

HBase on Ozone

HFile can already be stored on Ozone

As long as Ozone has a Hadoop FileSystem interface

Write-Ahead-Log(WAL) on Ozone(HBASE-27740)

Make Ozone support hflush/hsync: HDDS-7593
Also eliminate other unnecessary fs operations

truncate: HBASE-27982
setStoragePolicy: HBASE-27746
etc.

Still in progress

29 of 38

Part 05

Future

CommunityOverCode

30 of 38

‘Pure’ Cloud Native

Storage and computing separation architecture

HBase is born to be cloud native :)

Pure?

No self deployed services other than HBase itself
Currently we still need ZooKeeper and HDFS

31 of 38

‘Pure’ Cloud Native

The most challenge thing: Write-Ahead-Log(WAL)

Currently

Hadoop FileSystem based, must have hflush/hsync support
Only HDFS, maybe Ozone in the future

New reader/writer implementation

Path -> URI?
Fencing?
Self host logging service vs. External storage based logging service

Need to consider replication too

32 of 38

‘Pure’ Cloud Native

What about ZooKeeper?

No persistent data now so less hurt
To eliminate all the watcher usage, make it a pure transient storage, then can replace it with any cloud storage systems, like redis/RDS

Service Discovery?

How to find meta location
And how to find master if you want to do admin operation
Plugable connection registry

Kubernetes service?

33 of 38

Thanks

Duo Zhang, Chair of Apache HBase PMC

zhangduo@apache.org

Thanks to other people who contributed to this presentation

TBD

CommunityOverCode

36 of 38

Thanks

Speaker name and title

Contact info

CommunityOverCode

37 of 38

设计标准 Design Criteria

主题色彩 THEME COLORS :

主题字体 THEME FONTS :

下载链接 Download link :

OPPOSans

https://fontmeme.com/fonts/oppo-sans-font/

38 of 38

1.Subtitle Here

2….

3….

4….

CONTENTS

1 of 38

2 of 38

3 of 38

4 of 38

5 of 38

6 of 38

7 of 38

8 of 38

9 of 38

10 of 38

11 of 38

12 of 38

13 of 38

14 of 38

15 of 38

16 of 38

17 of 38

18 of 38

19 of 38

20 of 38

21 of 38

22 of 38

23 of 38

24 of 38

25 of 38

26 of 38

27 of 38

28 of 38

29 of 38

30 of 38

31 of 38

32 of 38

33 of 38

34 of 38

35 of 38

36 of 38

37 of 38

38 of 38