1 of 22

1

On data set of 5TB+ of data

Optimizing data heavy GQL endpoints in production

Dominik Polzer

2 of 22

Optimizing data heavy GQL endpoints in production

2

GraphQL vs Rest key difference
Explanation of product needs / requirements
Explanation of the problem
Code explanations
Steps taken to get to ~512ms response
Key point to take home

Agenda

3 of 22

3

WHY THE GRAPHQL API

PORTALS AND MOBILE APP EACH NEED TO ACCESS SIMILAR DATA�
DIFFERENT FILTERS�
DIFFERENT FIELDS USED ON THE FRONTENDS�
MORE COMPLEX TO SETUP INITIALLY BUT LESS ENDPOINTS TO MAINTAIN AS API GROWS COMPARED TO THE REST

4 of 22

4

GQL CLIENT – SCHEMA EXPLORER

NOTE: fields with underline still have more subfields but they were too big to fit in the single screenshot.

5 of 22

5

2.5 years of constant expansion of the business requirements

2.5 years of extending API Query code

Multiple clients, portals, roles using the same GQL Query schema

To few code cleanup / performance optimization tickets ☹

HOW WE GOT HERE

6 of 22

6

CODE STATE BEFORE THE CHANGES

7 of 22

7

CODE: INITIAL STATE

QUERY TARGETED BY MULTIPLE PORTALS / APPS

NOTICE MATERIALIZATION

DIFFERENCES BEFORE VS AFTER LINQ QUERY EXPRESSION BLOCK

> EF trick: exclude join from expr. tree

> user source resolution (mobile or portals)

8 of 22

8

SOME OF THE EXPRESSIONS USED IN THE MAIN QUERY

9 of 22

9

THIS IS NOT THE WHOLE QUERY…

RESOLVER COMPOSITION (AGGREGATION LAYER)

LIST RESOLVER > RESOLUTION ORDER

POSITION SEARCH RESOLVER

NOTE: SHOW FILTER / SORTER CODE

AGGREGATION LAYER

10 of 22

10

Client 1 DB

Client 2 DB

NOTE: THESE ARE SOME OF THE TABLES, IN TOTAL THERE IS ABOUT 20 TABLES USED TO COMPLETE THE FULL QUERY

AMOUNT OF DATA WE ARE WORKING WITH TO RESOLVE THE SINGLE POSITION SEARCH QUERY

11 of 22

11

DB QUERY�INITIAL STATE

EF GENERATED QUERY CAUGHT WITH THE EXTENDED EVENTS

EXECUTION SCHEMA

NOTE: SQL PROFILER CUT OF PART OF THE QUERY BECAUSE IT EXCEEDED CONFIG LIMIT SO EXTENDED EVENTS WERE USED TO GET THE FULL QUERY

40MB WHEN EXPORTED TO XML

12 of 22

12

FIX = ADD 2 NEW INDEXES

NOTICE THE BOLD ARROWS HERE

MEANING WE READ THROUGH A LOT OF DATA TO FIND MATCHES

13 of 22

13

CODE STATE AFTER THE CHANGES

14 of 22

14

PULLED OUT COMPUTED FIELDS

FROM CODE TO VIEW WHERE POSSIBLE

NOTE: � NOT THE FULL VIEW, GROUPING IS CUT OFF

15 of 22

15

SQL View mapped as a normal Entity with EF6

Related Match Entity mapping

Related Candidate Application Entity mapping

VIEW MAPPED AS AN ENTITY

16 of 22

16

= VIEW

MAIN QUERY CODE AFTER USING A VIEW AS ENTITY / BASE QUERY

17 of 22

17

Opportunity to clean up Common Query / Resolver components(used by other portals)

To improve their performance too.

18 of 22

18

SMALLER IMPROVEMENTS ADD UP

EF Functions : To exclude ‘TIME’ part In SQL when filtering large dataset

Bellow filter gets interpreted as expression on SQL server (CASE WHEN ….)

Full EF interpreted SQL Query after changes (from ~6k lines of SQL)

19 of 22

19

MATCHES QUERY ENDPOINT

Conditional exclusion from EF Expression tree

CODE diff (EF interpreted Query)

IQueryable extension method

20 of 22

20

TOOLING

Extended Events

Execution Plan

Showplan Analysis (easier pain point detection)

Set Statistics IO ON

SQL Sentry Plan Explorer

21 of 22

21

THAT’S IT

KEY POINTS

- SOMETIMES IT’S WORTH PRE JOINING DATA HEAVY LOGIC IN VIEW’S AND USING THEM AS A BASE QUERY (not materialized) TO MAKE IT EASIER FOR SQL TO RESOLVE HUGE QUERIES.�

- IT SEEMS LIKE A GOOD COMPROMISE BEFORE MOVING THE WHOLE LOGIC TO POSSIBLY UNREADABLE STORED PROCEDURE� (

THIS WAY YOU KEEP COMPLICATED JOIN LOGIC INSIDE SQL AND AND LESS COMPLICATED FILTERING LOGIC IN THE BACKEND CODE,

MAKING BOTH PARTS OF THE QUERY READABLE TO BACKEND DEVS & DBA.

IN THE END YOU COMBINE THEM INTO SINGLE PERFORMANT QUERY HITTING THE DB. (+ A Few smaller queries if you use GQL & Dataloader)

)

22 of 22

22

THANK YOU!

Dominik Polzer �Software Engineer