LC-QuAD
Large-scale Complex �Question Answering �Dataset
Priyansh Trivedi1, Gaurav Maheshwari1, Mohnish Dubey1, Jens Lehmann1,2��1 University of Bonn, Bonn, Germany
2 Fraunhofer IAIS, St. Augustin, Germany
ISWC 2017, Vienna
Outline
Question Answering
Motivation
Dataset(s)
LC-QuAD
Process of Creating LC-QuAD
Takeaways
Future Directions
Question Answering over KG
Understand the intent of a factual question, and return the implicit KB resource.�
Typically treated as a translation problem from natural language to formal language.
Seen major advancement in the past five years.
3
Motivation
4
Challenges when set, can then be overcome.
In other words ...
5
Datasets precede Research
In 2013, Berant et al. released WebQuestions.
Current State of the Art: 69% (Liang et al., 2016)
Over the past 8 years, Question Answering over Linked Data (QALD) challenge is being held.
Over 38 submissions.
6
Other Incentives
Dearth of large QA dataset over DBpedia.
Traditional dataset generation methods are time consuming,
do not scale.
7
Dataset(s)
8
Dataset | Size | Logical Forms | Complex Questions | Target KB |
Free917 �(Cai et al., 2013 ) | 917 | Yes | Yes | Freebase |
WebQuestions �(Berant et al., 2013) | 5 810 | No | Yes | Freebase |
SimpleQuestions (Bordes et al., 2015) | 108 442 | No | No | Freebase |
30M Factoid (Serban et al., 2016 ) | 30 000 000 | No | No | Freebase |
QALD (Unger et al., 2016) | 450 | Yes | Yes | DBpedia |
*unless unintentionally made complex
Dataset(s)
9
Dataset | Size | Logical Forms | Complex Questions | Target KB |
Free917 �(Cai et al., 2013 ) | 917 | Yes | Yes | Freebase |
WebQuestions �(Berant et al., 2013) | 5 810 | No | Yes | Freebase |
SimpleQuestions (Bordes et al., 2015) | 108 442 | No | No | Freebase |
30M Factoid (Serban et al., 2016 ) | 30 000 000 | No | No | Freebase |
QALD (Unger et al., 2016) | 450 | Yes | Yes | DBpedia |
LC-QuAD | 5000 | Yes | Yes | DBpedia |
LC-QuAD
10
LC-QuAD
11
LC-QuAD
✔ has complex questions
12
LC-QuAD
✔ has complex questions
✔ has SPARQL queries
13
LC-QuAD
✔ has complex questions
✔ has SPARQL queries
✔ has boolean and aggregate based queries
14
LC-QuAD
✔ has complex questions
✔ has SPARQL queries
✔ has boolean and aggregate based queries
✔ is supervised (gold standard)
15
LC-QuAD
✔ has complex questions
✔ has SPARQL queries
✔ has boolean and aggregate based queries
✔ is supervised (gold standard)
✔ is extensible
16
LC-QuAD
✔ has complex questions
✔ has SPARQL queries
✔ has boolean and aggregate based queries
✔ is supervised (gold standard)
✔ is extensible
✔ is awesome 😄
17
Dataset Creation Process
18
Traditionally
Natural Language Questions are collected/created.
Thereafter, their logical form is manually created.
19
Traditionally
20
Name someone influenced by J. R. R. Tolkien, who won the Hugo Award?
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award
}
Traditionally
21
Name someone influenced by J. R. R. Tolkien, who won the Hugo Award?
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award
}
Prerequisites
✔ understanding KG Schema
✔ understanding target formal language (SPARQL)
✔ no room for errors
✔ understand NL
Inverting the Process
22
Name someone influenced by J. R. R. Tolkien, who won the Hugo Award?
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award
}
Inverting the Process
23
Name someone influenced by J. R. R. Tolkien, who won the Hugo Award?
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award
}
Prerequisites
✔ understanding KG Schema
✔ understanding target formal language (SPARQL)
✔ no room for errors
✔ understand NL
Upon Further Simplification
24
Name someone influenced by J. R. R. Tolkien, who won the Hugo Award?
What is person whose influenced by is J. R. R. Tolkien, and award is the Hugo Award.
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award }
automatic
Upon Further Simplification
25
Name someone influenced by J. R. R. Tolkien, who won the Hugo Award?
What is person whose influenced by is J. R. R. Tolkien, and award is the Hugo Award.
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award }
Prerequisites
✔ understanding KG Schema
✔ understanding target formal language (SPARQL)
✔ no room for errors.
✔ understand NL
Upon Further Simplification
26
Increases the speed of creating questions.
Reduces domain expertise required.
Can afford slight errors.
Allowing us to scale up!
Prerequisites
✔ understanding KG Schema
✔ understanding target formal language (SPARQL)
✔ no room for errors.
✔ understand NL
Remaining Challenges
27
Automatically create SPARQL queries.
Convert SPARQL queries to intermediary NLQs.
Automatically create SPARQL Queries
28
Create SPARQL Templates.
SELECT DISTINCT ?uri WHERE {
?uri e_to_e_out1 e_out1. � ?uri e_to_e_out2 e_out2
}
Automatically create SPARQL Queries
29
Create SPARQL Templates.
Automatically create SPARQL Queries
30
Manually select entities as answers to our queries.
Stephen King
Automatically create SPARQL Queries
31
Collect the 2-hop subgraph around these entities.
Automatically create SPARQL Queries
32
Juxtapose the SPARQL triple pattern on this subgraph.
SPARQL Queries Generated
33
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award
}
Remaining Challenges
34
Automatically create SPARQL queries.
Convert SPARQL queries to intermediary NLQs.
Creating intermediary NLQs
35
Whose influenced by is J. R. R. Tolkien, and award is the Hugo Award.
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award
}
Question Templates (NNQT)
36
Whose e_to_e_out1 is e_out1, and e_to_e_out2 is the e_out2.
SELECT DISTINCT ?uri WHERE {
?uri e_to_e_out1 e_out1. ?uri e_to_e_out2 e_out2
}
Template Instances
37
Whose influenced by is J. R. R. Tolkien, and award is the Hugo Award.
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award }
Summary
38
Whose influenced by is J. R. R. Tolkien, and award is the Hugo Award.
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award }
Name someone influenced by J. R. R. Tolkien, who won the Hugo Award?
Summary
39
Name someone influenced by J. R. R. Tolkien, who won the Hugo Award?
Whose influenced by is J. R. R. Tolkien, and award is the Hugo Award.
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award }
Automatic
Summary
40
Name someone influenced by J. R. R. Tolkien, who won the Hugo Award?
Whose influenced by is J. R. R. Tolkien, and award is the Hugo Award.
SELECT DISTINCT ?uri WHERE {
?uri dbo:influencedBy dbr:J.R.R._Tolkien.
?uri dbo:award dbr:Hugo_Award }
Automatic
Manual
Manual Work
41
Two Step Process:
(Note: Outdated fact here 😭 )
42
Discussion
43
Dataset Characteristics
44
5,000�Questions
33�SPARQL Templates
18%�Simple Questions
12.29w�Avg. Question Size
04 ‘16�DBpedia Version
150+�Downloads*
*as of 16th October, 2017
Controlling Size and Variety
45
Too many queries generated per subgraph.
Predicate links disproportionate (eg. dbp:birthplace).
Metadata triples.
Filters based on predicate whitelist.
Stochastically prune the subgraph.
Limitations
46
No literals are included in the questions.
No UNION, OPTIONAL queries.
No conditional aggregates.
No out-of-scope questions.
Future Directions
47
Creating baselines.
Automatic grammar correction.
Complex SPARQL templates.
Keeping up with DBpedia versions.
References, Citations
48
Cai, Qingqing, and Alexander Yates. "Large-scale Semantic Parsing via Schema Matching and Lexicon Extension." ACL (1). (2013).
Berant, Jonathan, et al. "Semantic Parsing on Freebase from Question-Answer Pairs." EMNLP. Vol. 2. No. 5. (2013).
Bordes, Antoine, et al. "Large-scale simple question answering with memory networks." arXiv preprint arXiv:1506.02075 (2015).
Serban, Iulian Vlad, et al. "Generating factoid questions with recurrent neural networks: The 30m factoid question-answer corpus." arXiv preprint arXiv:1603.06807 (2016).
Unger, Christina, Axel-Cyrille Ngonga Ngomo, and Elena Cabrio. "6th open challenge on question answering over linked data (qald-6)." Semantic Web Evaluation Challenge. Springer International Publishing, (2016).
Liang, Chen, et al. "Neural symbolic machines: Learning semantic parsers on freebase with weak supervision." arXiv preprint arXiv:1611.00020 (2016).
This presentation uses licensed works by generous individuals/organizations, namely:
Questions?
49
See what I did there?
See for yourself.
50
LC-QuAD Website
lc-quad.sda.tech