1 of 24

Things I Wish I'd Known About Elasticsearch When I Started Use It As NoSQL

Jimin Hsieh

2 of 24

Agenda

Summary
What is Elasticsearch?
What I wish I’d known since the beginning.

3 of 24

Summary

Think hard (and write down!) how you will access your data, then carefully model your table to satisfy those access patterns.
Semi-structured data doesn’t mean you can put arbitrary data into it.

4 of 24

What is Elasticsearch?

Distributed data store
NoSQL
JSON Document

Semi-structured and schema free

5 of 24

Architecture

https://github.com/exo-archives/exo-es-search#es-architecture

6 of 24

Inverted Index

https://community.hitachivantara.com/s/article/search-the-inverted-index

7 of 24

Analogy between ES and SQL

Index = database
Type = table
Document = row
Field = column

Mapping = schema

8 of 24

Document

{

"_index": "hanmvngj",

"_type": "stresstest",

"_id": "7y2lLXkB3FI9wq1OkmCn",

"_score": 1,

"_source": {

"f": "amhksvjeocyogkmlzibvmpakuc",

"i": "xcwclqmpniijqgeebapkspmcfvzqoypitjvzbcqtba",

"ak": "gtomqdwncpymlvfljsnhufojytjqouvxsuidoqwtttplvrm",

"qnxxdsf": "gqrktshotoqijetacvbnqbpdhbumhomuutbiqdqjsfqros",

"qhxcumzqc": "kuiwcgnrcpzggeldypdvnijq",

"hwx": "cyexgvrrcdismskdcflwgytcijkoibuvsqxijlgwlxjv",

"eih": "x"

}

9 of 24

Mapping

{

"hanmvngj": {

"mappings": {

"properties": {

"afh": {

"type": "text",

"fields": {

"keyword": {

"type": "keyword",

"ignore_above": 256

}

10 of 24

Data Type

JSON data type

String
Number
Boolean
Null
Array
Object

Elasticsearch data type

Number

Long
Integer
Short
Byte

Text
Keyword
…ect

11 of 24

Mapping

Don’t use default dynamic mappings.

Wrong data type
Don’t use default dynamic string mappings.

12 of 24

Mapping

Dynamic field mapping

Pick up the type of data for you

Explicit mapping

Decide on your own

13 of 24

Text vs Keyword

	Analyzer	Structured	Example
Keyword	No	Structured Content	Service Name, Log Level, IP…etc
Text	Yes	Unstructured�Content	Log

14 of 24

Analyzer

Default Analyzer

Standard analyzer

Standard tokenizer
Lowercase token filter
Stop token filter

15 of 24

Pagination

Don’t use from-size
Scroll search

Sliced Scroll

Point in time + search after

You can use it only if you pay for the commercial version and after 7.10.

16 of 24

Must vs Filter

Filter will ignore scoring.
Filter could be cached.

17 of 24

Increase write throughput

Turn off replica
index.refresh_interval

18 of 24

Turn Off Replica

https://learning.oreilly.com/library/view/elasticsearch-the-definitive/9781449358532/assets/elas_0401.png

19 of 24

Index Refresh Interval

https://tech.ebayinc.com/assets/Uploads/_resampled/ResizedImageWzgwMCw1MzZd/Picture23.png

20 of 24

Mapping Explosion

NoSQL prefers flatten data model.
When you have too many fields in your document

index.mapping.total_fields.limit = 1000 (default)
If your field mappings contain a large, arbitrary set of keys, consider using the flattened data type. By official tips

But you have to pay for the commercial version.

21 of 24

Spark Elasticsearch

Use a maximum limit of task writing to Elasticsearch

coalesce

Be aware imbalanced data

22 of 24

Java API

Scrolled search

Implementation

Clear search

Reduced unnecessary return

Source filtering

23 of 24

Reference

24 of 24

FAQ

Thank you for your attention.