ABCDEFGHIJKLMNOPQRSTUVWXYZAAAB
1
What to split out?What's the size of the split-out graph in number of triples?What's the size of the split-out graph in number of Items?% count of all queriesIs it a semanticly sensible split?Are we minimizing the need to query the graphs together?How big is the affected group?Does this work together with the plans for splitting the dumps?other thoughtsNext ActionWhowhenHowKey Result:
Success Metrics
2
scholarly articlesXXL (it's the largest semantic subgraph we have)XXL (it's the largest semantic subgraph we have)2.6yesyes from the main graph to the split-out graph but not the other way around (meaning people querying for scholarly articles will need to federate for basically everything)SyesTesting Experiment, Extract of Current QueriesGL, David, Desiree, LydiaPhabricator backlog- till Davids capacity becomes available to start testsNew End point, Users: re-write queries, Query returns are correctCorrect Results are returned for User Queries, Low Volume of Broken Queries, Fewer Slower Queries (Whollistic Graph Stability: Pre-Post data comparisons from this test), Impact on Tripples, Growth on Federated instance vs Graph to guide for future DR planning etc
3
astronomical objectsXL (it's the second-largest semantic subgraph we have)XL (it's the second-largest semantic subgraph we have)1.3yessame as for scholarly articles except that there are a few planets and stars people will expect in the main graphXSprobablyWe might have to duplicate a few planets and stars
4
humansXL (it's the third-largest semantic subgraph we have)XL (it's the third-largest semantic subgraph we have)31.9yesno, humans are involved in many different topics people query for so they are not very containedXXLprobably
5
Wikimedia-internal stuff (templates, categories, ...)? probably high? maybe lowyeslikely yesXSprobablycheck number of Items, TripplesLydia, Manuel, Ayesha
6
Lexemes??yesYes from Items to Lexemes but not from Lexemes to ItemsSyes
7
terms (labels, aliases, descriptions)1/7th of all triples apparently for descriptions-maybevery likelyprobablyWould maybe not be a proper split in our sense of the wordcheck numbers for labels, aliases to make this compelling. Decision: to integrate into SPARQL query engine is a big architectural change- not sure to do thisLydia, Nikki
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100