1
On data set of 5TB+ of data
Optimizing data heavy GQL endpoints in production
Dominik Polzer
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
Optimizing data heavy GQL endpoints in production
2
Agenda
3
WHY THE GRAPHQL API
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
4
GQL CLIENT – SCHEMA EXPLORER
NOTE: fields with underline still have more subfields but they were too big to fit in the single screenshot.
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
5
HOW WE GOT HERE
6
CODE STATE BEFORE THE CHANGES
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
7
CODE: INITIAL STATE
QUERY TARGETED BY MULTIPLE PORTALS / APPS
NOTICE MATERIALIZATION
DIFFERENCES BEFORE VS AFTER LINQ QUERY EXPRESSION BLOCK
> EF trick: exclude join from expr. tree
> user source resolution (mobile or portals)
8
SOME OF THE EXPRESSIONS USED IN THE MAIN QUERY
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
9
THIS IS NOT THE WHOLE QUERY…
RESOLVER COMPOSITION (AGGREGATION LAYER)
LIST RESOLVER > RESOLUTION ORDER
POSITION SEARCH RESOLVER
NOTE: SHOW FILTER / SORTER CODE
AGGREGATION LAYER
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
10
Client 1 DB
Client 2 DB
NOTE: THESE ARE SOME OF THE TABLES, IN TOTAL THERE IS ABOUT 20 TABLES USED TO COMPLETE THE FULL QUERY
AMOUNT OF DATA WE ARE WORKING WITH TO RESOLVE THE SINGLE POSITION SEARCH QUERY
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
11
DB QUERY�INITIAL STATE
EF GENERATED QUERY CAUGHT WITH THE EXTENDED EVENTS
EXECUTION SCHEMA
NOTE: SQL PROFILER CUT OF PART OF THE QUERY BECAUSE IT EXCEEDED CONFIG LIMIT SO EXTENDED EVENTS WERE USED TO GET THE FULL QUERY
40MB WHEN EXPORTED TO XML
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
12
FIX = ADD 2 NEW INDEXES
NOTICE THE BOLD ARROWS HERE
MEANING WE READ THROUGH A LOT OF DATA TO FIND MATCHES
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
13
CODE STATE AFTER THE CHANGES
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
14
PULLED OUT COMPUTED FIELDS
FROM CODE TO VIEW WHERE POSSIBLE
NOTE: � NOT THE FULL VIEW, GROUPING IS CUT OFF
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
15
SQL View mapped as a normal Entity with EF6
Related Match Entity mapping
Related Candidate Application Entity mapping
VIEW MAPPED AS AN ENTITY
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
16
= VIEW
MAIN QUERY CODE AFTER USING A VIEW AS ENTITY / BASE QUERY
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
17
Opportunity to clean up Common Query / Resolver components(used by other portals)
To improve their performance too.
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
18
SMALLER IMPROVEMENTS ADD UP
EF Functions : To exclude ‘TIME’ part In SQL when filtering large dataset
Bellow filter gets interpreted as expression on SQL server (CASE WHEN ….)
Full EF interpreted SQL Query after changes (from ~6k lines of SQL)
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
19
MATCHES QUERY ENDPOINT
Conditional exclusion from EF Expression tree
CODE diff (EF interpreted Query)
IQueryable extension method
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
20
TOOLING
Extended Events
Execution Plan
Showplan Analysis (easier pain point detection)
Set Statistics IO ON
SQL Sentry Plan Explorer
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
21
THAT’S IT
KEY POINTS
- SOMETIMES IT’S WORTH PRE JOINING DATA HEAVY LOGIC IN VIEW’S AND USING THEM AS A BASE QUERY (not materialized) TO MAKE IT EASIER FOR SQL TO RESOLVE HUGE QUERIES.�
- IT SEEMS LIKE A GOOD COMPROMISE BEFORE MOVING THE WHOLE LOGIC TO POSSIBLY UNREADABLE STORED PROCEDURE� (
THIS WAY YOU KEEP COMPLICATED JOIN LOGIC INSIDE SQL AND AND LESS COMPLICATED FILTERING LOGIC IN THE BACKEND CODE,
MAKING BOTH PARTS OF THE QUERY READABLE TO BACKEND DEVS & DBA.
IN THE END YOU COMBINE THEM INTO SINGLE PERFORMANT QUERY HITTING THE DB. (+ A Few smaller queries if you use GQL & Dataloader)
)
CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0
22
THANK YOU!
Dominik Polzer �Software Engineer