1 of 22

1

On data set of 5TB+ of data

Optimizing data heavy GQL endpoints in production

Dominik Polzer

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

2 of 22

Optimizing data heavy GQL endpoints in production

2

  1. GraphQL vs Rest key difference
  2. Explanation of product needs / requirements
  3. Explanation of the problem
  4. Code explanations
  5. Steps taken to get to ~512ms response
  6. Key point to take home

Agenda

3 of 22

3

WHY THE GRAPHQL API

  1. PORTALS AND MOBILE APP EACH NEED TO ACCESS SIMILAR DATA�
  2. DIFFERENT FILTERS�
  3. DIFFERENT FIELDS USED ON THE FRONTENDS�
  4. MORE COMPLEX TO SETUP INITIALLY BUT LESS ENDPOINTS TO MAINTAIN AS API GROWS COMPARED TO THE REST

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

4 of 22

4

GQL CLIENT – SCHEMA EXPLORER

NOTE: fields with underline still have more subfields but they were too big to fit in the single screenshot.

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

5 of 22

5

  1. 2.5 years of constant expansion of the business requirements

  • 2.5 years of extending API Query code

  • Multiple clients, portals, roles using the same GQL Query schema

  • To few code cleanup / performance optimization tickets ☹

HOW WE GOT HERE

6 of 22

6

CODE STATE BEFORE THE CHANGES

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

7 of 22

7

CODE: INITIAL STATE

QUERY TARGETED BY MULTIPLE PORTALS / APPS

NOTICE MATERIALIZATION

DIFFERENCES BEFORE VS AFTER LINQ QUERY EXPRESSION BLOCK

> EF trick: exclude join from expr. tree

> user source resolution (mobile or portals)

8 of 22

8

SOME OF THE EXPRESSIONS USED IN THE MAIN QUERY

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

9 of 22

9

THIS IS NOT THE WHOLE QUERY…

RESOLVER COMPOSITION (AGGREGATION LAYER)

LIST RESOLVER > RESOLUTION ORDER

POSITION SEARCH RESOLVER

NOTE: SHOW FILTER / SORTER CODE

AGGREGATION LAYER

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

10 of 22

10

Client 1 DB

Client 2 DB

NOTE: THESE ARE SOME OF THE TABLES, IN TOTAL THERE IS ABOUT 20 TABLES USED TO COMPLETE THE FULL QUERY

AMOUNT OF DATA WE ARE WORKING WITH TO RESOLVE THE SINGLE POSITION SEARCH QUERY

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

11 of 22

11

DB QUERY�INITIAL STATE

EF GENERATED QUERY CAUGHT WITH THE EXTENDED EVENTS

EXECUTION SCHEMA

NOTE: SQL PROFILER CUT OF PART OF THE QUERY BECAUSE IT EXCEEDED CONFIG LIMIT SO EXTENDED EVENTS WERE USED TO GET THE FULL QUERY

40MB WHEN EXPORTED TO XML

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

12 of 22

12

FIX = ADD 2 NEW INDEXES

NOTICE THE BOLD ARROWS HERE

MEANING WE READ THROUGH A LOT OF DATA TO FIND MATCHES

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

13 of 22

13

CODE STATE AFTER THE CHANGES

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

14 of 22

14

PULLED OUT COMPUTED FIELDS

FROM CODE TO VIEW WHERE POSSIBLE

NOTE: � NOT THE FULL VIEW, GROUPING IS CUT OFF

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

15 of 22

15

SQL View mapped as a normal Entity with EF6

Related Match Entity mapping

Related Candidate Application Entity mapping

VIEW MAPPED AS AN ENTITY

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

16 of 22

16

= VIEW

MAIN QUERY CODE AFTER USING A VIEW AS ENTITY / BASE QUERY

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

17 of 22

17

Opportunity to clean up Common Query / Resolver components(used by other portals)

To improve their performance too.

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

18 of 22

18

SMALLER IMPROVEMENTS ADD UP

EF Functions : To exclude ‘TIME’ part In SQL when filtering large dataset

Bellow filter gets interpreted as expression on SQL server (CASE WHEN ….)

Full EF interpreted SQL Query after changes (from ~6k lines of SQL)

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

19 of 22

19

MATCHES QUERY ENDPOINT

Conditional exclusion from EF Expression tree

CODE diff (EF interpreted Query)

IQueryable extension method

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

20 of 22

20

TOOLING

Extended Events

Execution Plan

Showplan Analysis (easier pain point detection)

Set Statistics IO ON

SQL Sentry Plan Explorer

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

21 of 22

21

THAT’S IT

KEY POINTS

- SOMETIMES IT’S WORTH PRE JOINING DATA HEAVY LOGIC IN VIEW’S AND USING THEM AS A BASE QUERY (not materialized) TO MAKE IT EASIER FOR SQL TO RESOLVE HUGE QUERIES.�

- IT SEEMS LIKE A GOOD COMPROMISE BEFORE MOVING THE WHOLE LOGIC TO POSSIBLY UNREADABLE STORED PROCEDURE� (

THIS WAY YOU KEEP COMPLICATED JOIN LOGIC INSIDE SQL AND AND LESS COMPLICATED FILTERING LOGIC IN THE BACKEND CODE,

MAKING BOTH PARTS OF THE QUERY READABLE TO BACKEND DEVS & DBA.

IN THE END YOU COMBINE THEM INTO SINGLE PERFORMANT QUERY HITTING THE DB. (+ A Few smaller queries if you use GQL & Dataloader)

)

CLIENT NAME // PRESENTATION NAME // © COPYRIGHT 2020 ENDAVA // CONFIDENTIAL AND PROPRIETARY // VERSION 1.0

22 of 22

22

THANK YOU!

Dominik Polzer �Software Engineer