1 of 38

What's new in the recent and upcoming HBase releases

Duo Zhang, Chair of Apache HBase PMC

CommunityOverCode

2 of 38

Pervasive...

...distributed, scalable, big data store

3 of 38

In a nutshell

...19,764 commits made by 638 contributors

...representing 973,452 lines of code

...mostly written in Java

...has a well established, mature codebase

...maintained by a very large development team

...with stable Y-O-Y commits

...took an estimated 273 years of effort (COCOMO model)

...starting with its first commit in April, 2007 (>15 years old!)

Source https://www.openhub.net/p/hbase

4 of 38

Our Project

  • Apache HBase is an Open Source Apache project.
  • It's what we want to make of it.
  • No owners!
  • Anyone can help!
  • All welcome!
  • The more, the merrier!

5 of 38

Recent and Upcoming Releases

Release Line

Latest Release

State

2.5.x

2.5.5

Current stable release line

2.6.x

N/A

Next minor release line

3.0.x

3.0.0-alpha-4

Next major release line

6 of 38

  1. Tracing Improvements
  2. TLS Support
  3. Cloud Native Improvements
  4. Other Notable Improvements
  5. Future

CONTENTS

7 of 38

Part 01

Tracing improvements

CommunityOverCode

8 of 38

History

  • HTrace
    • Basic support
    • Can trace down to hadoop components
  • HTrace is dead
    • Cloudera dropped the support
    • The tracing related code is broken and finally removed

9 of 38

Alternates

  • OpenTracing?
  • OpenCensus?
  • OpenTelemetry✓
    • OpenTracing + OpenCensus
    • A standard rather than an implementation
    • Can support different backend

10 of 38

  • Different ways to implement
  • Instrumentation?
    • Pros: no HBase side code change, theoretically can even support old versions of HBase
    • Cons: the code of HBase always changes, so not easy to implement
  • Annotation?
    • Pros: less code at HBase side
    • Cons: not flexible, can not bring detailed information, can only support newer HBase version
  • API✓
    • Pros: most flexible, can bring any detailed information you want
    • Cons: need to write a lot of code, can only support newer HBase Version
    • HBASE-22120

11 of 38

Otel Semantic Conventions

  • Semantic Conventions
    • A standard on the naming of span and also common attribute names
  • HBASE-26419
    • Correct the namings
    • Fill necessary attributes
  • Has already been released in 2.5.0

12 of 38

Part 02

TLS Support

CommunityOverCode

13 of 38

TLS vs SASL

  • Security
    • SASL: Handshake is plaintext, where we may pass some tokens
    • TLS: A modern certificate based protocol, more secure
  • Performance
    • SASL: Pure java byte array encryption/decryption
    • TLS: Can leverage netty/openssl to perform offheap encryption/decryption

14 of 38

Implementation

  • Use Netty SslHandler
    • Add SslHandler for NettyRpcServer/NettyRpcClient when setting up pipeline
    • BlockingRpcClient/SimpleRpcServer are not supported
  • Use Hadoop Credentials API
    • For retrieving passwords
  • Ported a lot code from ZooKeeper TLS implementation
    • Like HBASE-27347, thanks to the ZooKeeper community, especially Andor Molnar
  • Will be released in 2.6.0
    • HBASE-26666
  • Performance impact
    • HBASE-27947, OOM under high load and bad clients

15 of 38

Part 03

Cloud Native Improvements

CommunityOverCode

16 of 38

StoreFileTracker

  • HBASE-26067
    • An abstraction layer on how we track store files
  • DefaultSFT
    • Track store files by listing
    • Listing is slow for OSS
  • FileBasedSFT
    • Inspired by Iceberg, using a file to store the list of valid HFiles
    • Implementation trick: use two files to store the list, swap when updating, reading both files and get the newer one when reading
    • Avoid listing(almost)
  • Will be released in 2.6.0, but still have rooms to improve
    • HBASE-27841: Layout improvements, do not need directory structure on OSS
    • HBASE-27826: Commit all the store file changes at once when splitting/merging

17 of 38

No Persistent data on ZooKeeper

  • Move meta location off ZooKeeper(HBASE-26193)
    • design.invariants.zk.data
  • Store it in master local region(Introduced in HBASE-23326)
  • Still need to publish the location to ZooKeeper
    • Client still needs it if uses ZKConnectionRegistry
  • Meta replicas?
    • Different qualifiers
  • Auto migration
    • All data are in a single row, so easy to make it transactional
  • Has already been released in 2.5.0

18 of 38

No Persistent data on ZooKeeper

  • Some state flags
    • Split enabled
    • Merge enabled
    • Balancer enabled
    • etc.
  • Move them to master local region(too)
    • A new ‘state’ family
    • One qualifier for one state

19 of 38

No Persistent data on ZooKeeper

  • Table based replication queue storage(HBASE-27109)
    • Long history, HBASE-15867, but only half down
    • Cyclic dependency
      • Need to initialize replication tracking system before setup WAL
      • So what if replication depends on a HBase table, thus WAL?
  • Delay the recording of replication progress
    • Can get all the WAL files(in order) by listing
    • Only need an offset, i.e, a <file, offset_in_file> pair
    • Only need to record after replicating something out!
  • Challenge
    • Claim queue: Need to create replication queues before claiming
    • Replication log cleaner: ditto
    • Fencing: disable peer modifications while claiming queue or cleaning replication log
    • No free lunch: easier in normal path, much difficult in recovery!
  • Will be released in 3.0.0

20 of 38

No Persistent data on ZooKeeper

  • File system based replication peer storage(HBASE-27110)
    • Cyclic dependency too if you want a separated table
    • Data are small, so better find another place to store
  • Master local region vs FileSystem
    • Region Server needs to load the data too
    • Need to expose new rpc methods if choose master local region
    • FileSystem based is simple and enough
      • One file per peer(actually, two files rotating)
  • Will be released in 3.0.0

21 of 38

Redeploy Cluster on Cloud

  • ‘Redeploy’
    • Flush everything to OSS, destroy the cluster, and recreate a new cluster pointing to the root directory on OSS
  • No persistent data on ZooKeeper first
    • Mostly done, besides Replication(need 3.0.0 release)
  • Recovery problem
    • Need to scan WAL directory to find the region server list
    • Why? To make code cleaner
      • We need SCP to bring regions online, including meta regions
      • How to schedule SCP? By comparing live/all region server list
      • How to find all region server list? By scanning meta?
  • Store region server list in master local region(HBASE-26245)
    • Do not need to scan WAL directory any more
    • But make sure you flushed the master local region too before destroying the cluster…

22 of 38

Kubernetes Deployment Support

  • Kubernetes is the new standard
  • The guys from Apple started to contribute their tool to open source
    • For deploying ZooKeeper, HDFS and HBase all on kubernetes
    • No new operators
  • Still in progress
    • A new repo: hbase-kustomize
    • HBASE-27827

23 of 38

Part 04

Other Notable Improvements

CommunityOverCode

24 of 38

Log4j1 -> Log4j2

  • Log4j1 is dead and have critical CVEs
    • Reload4j: just fixed the CVEs, the project itself is still ‘dead’
  • Challenge
    • Directly depend on log4j1 in code
      • Introduce a hbase-logging module, where we depend on log4j, ban log4j imports in all other modules
    • Lots of log4j properties file across the whole code base
      • Unify all properties file for tests and put a single properties file to hbase-logging module, and only place one properties file for production usage under conf directory
    • log4j2 uses a different way to set root logger and level
      • LOG4J2-3341, they finally added back the '-Dlog4j.rootLogger=INFO,Console' support
    • Hadoop is still on log4j1
      • Still need log4j1 when running UTs, but do not ship log4j1 in tarball
      • HADOOP-16206, still in progress…

25 of 38

New Region Replication Framework

  • Region Replication?
    • Introduced in HBASE-10070, read replicas to increase read availability, i.e, timeline consistent read
    • Share the same HFiles, but how to reduce the latency? Send modifications to secondary replicas directly. Replication!
    • HBASE-11183, add replication support for read replicas, which is called ‘region replication’
    • HBASE-18070, add replication support for meta region
  • Problem?
    • Can not work when SKIP_WAL is used
    • Increase disk load as we need to tail the WAL files(by reading again and again)
    • Need special handling for meta region, as we do not replicate meta region for normal replication

26 of 38

New Region Replication Framework

  • Basic idea
    • The inconsistency can be fixed by a flush, so we do not need very serious accounting like normal replication
    • Just send the edits through RPC, if fails or lags too much, drop all the pending edits and trigger a flush
    • Do not depend on WAL, so can also work with SKIP_WAL, and no special handling for meta region
  • Implementation
    • Extend the MVCC write entry, to bring an action, once the entry is completed, we run the action, which will send the edits in the entry to all the secondary replicas
    • Introduce a new method at region server side to receive the edits

27 of 38

New Region Replication Framework

  • Rolling upgrade
    • Removing legacy ‘region_replica_replication’ peer while upgrading master
    • if new replica method is not available, fallback to use old ‘replay’ method
    • Everything will be OK after a flush, so not a critical problem even if we miss some edits while rolling upgrading
  • Will be released in 3.0.0
    • 3.0.0-alpha-3
    • HBASE-26233

28 of 38

HBase on Ozone

  • HFile can already be stored on Ozone
    • As long as Ozone has a Hadoop FileSystem interface
  • Write-Ahead-Log(WAL) on Ozone(HBASE-27740)
    • Make Ozone support hflush/hsync: HDDS-7593
    • Also eliminate other unnecessary fs operations
      • truncate: HBASE-27982
      • setStoragePolicy: HBASE-27746
      • etc.
  • Still in progress

29 of 38

Part 05

Future

CommunityOverCode

30 of 38

‘Pure’ Cloud Native

  • Storage and computing separation architecture
    • HBase is born to be cloud native :)
  • Pure?
    • No self deployed services other than HBase itself
    • Currently we still need ZooKeeper and HDFS

31 of 38

‘Pure’ Cloud Native

  • The most challenge thing: Write-Ahead-Log(WAL)
    • Currently
      • Hadoop FileSystem based, must have hflush/hsync support
      • Only HDFS, maybe Ozone in the future
    • New reader/writer implementation
      • Path -> URI?
      • Fencing?
      • Self host logging service vs. External storage based logging service
    • Need to consider replication too

32 of 38

‘Pure’ Cloud Native

  • What about ZooKeeper?
    • No persistent data now so less hurt
    • To eliminate all the watcher usage, make it a pure transient storage, then can replace it with any cloud storage systems, like redis/RDS
  • Service Discovery?
    • How to find meta location
    • And how to find master if you want to do admin operation
    • Plugable connection registry
      • Kubernetes service?

33 of 38

Thanks

Duo Zhang, Chair of Apache HBase PMC

zhangduo@apache.org

Thanks to other people who contributed to this presentation

TBD

CommunityOverCode

34 of 38

35 of 38

36 of 38

Thanks

Speaker name and title

Contact info

CommunityOverCode

37 of 38

设计标准 Design Criteria

主题色彩 THEME COLORS :

主题字体 THEME FONTS :

下载链接 Download link :

OPPOSans

https://fontmeme.com/fonts/oppo-sans-font/

38 of 38

1.Subtitle Here

2….

3….

4….

CONTENTS