Incorporating External Knowledge to Enhance Tabular Reasoning
https://knowledge-infotabs.github.io/
J. Neeraja1*, Vivek Gupta2*, Vivek Srikumar2
1IIT Guwahati; 2University of Utah
TABULAR INFERENCE PROBLEM
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
2
Check out InfoTabs (Gupta et al., 2020) https://infotabs.github.io
MOTIVATION
Recent work for tabular reasoning focuses on building sophisticated neural models.
Questions?
3
CHALLENGES
4
Can we fix these problems by changing how tabular information is provided to a standard RoBERTa model?
CHALLENGE: POOR TABLE REPRESENTATION
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
5
Universal template: “The k of t are v.”
The Founded of New York Stock Exchange are May 17, 1792; 226 years ago.
SOLUTION: BETTER PARAGRAPH REPRESENTATION
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
6
New York Stock Exchange was founded on May 17, 1792; 226 years ago.
New York Stock Exchange is an organization.
More grammatical and meaningful sentences
CHALLENGE: MISSING IMPLICIT LEXICAL KNOWLEDGE
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
7
e.g.
H2`: Fewer than 2,500 stocks are listed in the NYSE
H2`: contradictory
SOLUTION: IMPLICIT KNOWLEDGE ADDITION
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
8
Can pre-training on large dataset help?
Exposes model to diverse lexical constructions
Representation is better tuned for the NLI task
e.g.
H2`: Fewer than 2,500 stocks are listed in NYSE
H2`: entailed
CHALLENGE: PRESENCE OF DISTRACTING INFORMATION
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
9
SOLUTION: DISTRACTING ROW REMOVAL
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
10
Select only rows relevant to hypothesis
Use alignment based retrieval algorithm with fastText vectors (Yadav et al. (2019, 2020))
E.g. for H1 & H2, prune table to
row No. of listings is sufficient.
CHALLENGE: MISSING DOMAIN KNOWLEDGE ABOUT KEYS
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
11
For H3, we need to interpret the key Volume in the financial context.
✅ In capital markets, volume, is the total number of a security that was traded during a given period of time.
rather than
❌ In thermodynamics, the volume, of a system is an extensive parameter for describing its thermodynamic state.
SOLUTION: EXPLICIT KNOWLEDGE ADDITION
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
12
Add explicit information to enrich keys
This improves model’s ability to disambiguate meaning of keys
SOLUTION: EXPLICIT KNOWLEDGE ADDITION
In this example from the InfoTabS dataset (Gupta et al., 2020),
H1: entailed ; H2: contradictory ; H3: neutral
13
Approach
For H3, add to the table in the end:
Volume : total number of a security that was traded during a given period of time.
PROPOSED SOLUTION
14
Better Representation
Explicit Knowledge Addition
Distracting Row Removal
Implicit
Knowledge
Addition
Original Table
RoBERTa
Model
RESULTS AND ANALYSIS
InfoTabS dataset splits :
15
Check out InfoTabS: https://infotabs.github.io
RESULTS AND ANALYSIS
16
RESULTS AND ANALYSIS
17
Human performance
RESULTS AND ANALYSIS
Observation
Overall Pre-processing improves performance
Ablation : all changes needed, knowledge is the most important
18
RESULTS AND ANALYSIS
Observations
19
RESULTS AND ANALYSIS
Observations
20
RESULTS AND ANALYSIS
Observations
21
CONCLUSION
Check out Knowledge_InfoTabs: https://knowledge-infotabs.github.io/
22