ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
TitleAuthorsAnthology URLAbstract
COLING session
Q&A dateQ&A time
2
Exploring Controllable Text Generation Techniques
Shrimai Prabhumoye, Alan W Black, Ruslan Salakhutdinov
https://www.aclweb.org/anthology/2020.coling-main.1
Neural controllable text generation is an important area gaining attention due to its plethora of applications. Although there is a large body of prior work in controllable text generation, there is no unifying theme. In this work, we provide a new schema of the pipeline of the generation process by classifying it into five modules. The control of attributes in the generation process requires modification of these modules. We present an overview of different techniques used to perform the modulation of these modules. We also provide an analysis on the advantages and disadvantages of these techniques. We further pave ways to develop new architectures based on the combination of the modules described in this paper.
LONG1: Language Modelling 1
Tuesday, December 8, 202015:30
3
Infusing Sequential Information into Conditional Masked Translation Model with Self-Review Mechanism
Pan Xie, Zhi Cui, Xiuying Chen, XiaoHui Hu, Jianwei Cui, Bin Wang
https://www.aclweb.org/anthology/2020.coling-main.2
Non-autoregressive models generate target words in a parallel way, which achieve a faster decoding speed but at the sacrifice of translation accuracy. To remedy a flawed translation by non-autoregressive models, a promising approach is to train a conditional masked translation model (CMTM), and refine the generated results within several iterations. Unfortunately, such approach hardly considers the sequential dependency among target words, which inevitably results in a translation degradation. Hence, instead of solely training a Transformer-based CMTM, we propose a Self-Review Mechanism to infuse sequential information into it. Concretely, we insert a left-to-right mask to the same decoder of CMTM, and then induce it to autoregressively review whether each generated word from CMTM is supposed to be replaced or kept. The experimental results (WMT14 En ↔ De and WMT16 En ↔ Ro) demonstrate that our model uses dramatically less training computations than the typical CMTM, as well as outperforms several state-of-the-art non-autoregressive models by over 1 BLEU. Through knowledge distillation, our model even surpasses a typical left-to-right Transformer model, while significantly speeding up decoding.
LONG1: Language Modelling 1
Tuesday, December 8, 202015:30
4
Building Hierarchically Disentangled Language Models for Text Generation with Named Entities
Yash Agarwal, Devansh Batra, Ganesh Bagler
https://www.aclweb.org/anthology/2020.coling-main.3
Named entities pose a unique challenge to traditional methods of language modeling. While several domains are characterised with a high proportion of named entities, the occurrence of specific entities varies widely. Cooking recipes, for example, contain a lot of named entities — viz. ingredients, cooking techniques (also called processes), and utensils. However, some ingredients occur frequently within the instructions while most occur rarely. In this paper, we build upon the previous work done on language models developed for text with named entities by introducing a Hierarchically Disentangled Model. Training is divided into multiple branches with each branch producing a model with overlapping subsets of vocabulary. We found the existing datasets insufficient to accurately judge the performance of the model. Hence, we have curated 158,473 cooking recipes from several publicly available online sources. To reliably derive the entities within this corpus, we employ a combination of Named Entity Recognition (NER) as well as an unsupervised method of interpretation using dependency parsing and POS tagging, followed by a further cleaning of the dataset. This unsupervised interpretation models instructions as action graphs and is specific to the corpus of cooking recipes, unlike NER which is a general method applicable to all corpora. To delve into the utility of our language model, we apply it to tasks such as graph-to-text generation and ingredients-to-recipe generation, comparing it to previous state-of-the-art baselines. We make our dataset (including annotations and processed action graphs) available for use, considering their potential use cases for language modeling and text generation research.
LONG1: Language Modelling 1
Tuesday, December 8, 202015:30
5
CharBERT: Character-aware Pre-trained Language Model
Wentao Ma, Yiming Cui, Chenglei Si, Ting Liu, Shijin Wang, Guoping Hu
https://www.aclweb.org/anthology/2020.coling-main.4
Most pre-trained language models (PLMs) construct word representations at subword level with Byte-Pair Encoding (BPE) or its variations, by which OOV (out-of-vocab) words are almost avoidable. However, those methods split a word into subword units and make the representation incomplete and fragile.In this paper, we propose a character-aware pre-trained language model named CharBERT improving on the previous methods (such as BERT, RoBERTa) to tackle these problems. We first construct the contextual word embedding for each token from the sequential character representations, then fuse the representations of characters and the subword representations by a novel heterogeneous interaction module. We also propose a new pre-training task named NLM (Noisy LM) for unsupervised character representation learning. We evaluate our method on question answering, sequence labeling, and text classification tasks, both on the original datasets and adversarial misspelling test sets. The experimental results show that our method can significantly improve the performance and robustness of PLMs simultaneously.
LONG1: Language Modelling 1
Tuesday, December 8, 202015:30
6
A Graph Representation of Semi-structured Data for Web Question Answering
Xingyao Zhang, Linjun Shou, Jian Pei, Ming Gong, Lijie Wen, Daxin Jiang
https://www.aclweb.org/anthology/2020.coling-main.5
The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and lists have inherent structures, which carry semantic correlations among various elements in tables and lists. Many existing studies treat tables and lists as flat documents with pieces of text and do not make good use of semantic information hidden in structures. In this paper, we propose a novel graph representation of Web tables and lists based on a systematic categorization of the components in semi-structured data as well as their relations. We also develop pre-training and reasoning techniques on the graph model for the QA task. Extensive experiments on several real datasets collected from a commercial engine verify the effectiveness of our approach. Our method improves F1 score by 3.90 points over the state-of-the-art baselines.
LONG1: Language Modelling 1
Tuesday, December 8, 202015:30
7
Catching Attention with Automatic Pull Quote SelectionTanner Bohn, Charles Ling
https://www.aclweb.org/anthology/2020.coling-main.6
To advance understanding on how to engage readers, we advocate the novel task of automatic pull quote selection. Pull quotes are a component of articles specifically designed to catch the attention of readers with spans of text selected from the article and given more salient presentation. This task differs from related tasks such as summarization and clickbait identification by several aspects. We establish a spectrum of baseline approaches to the task, ranging from handcrafted features to a neural mixture-of-experts to cross-task models. By examining the contributions of individual features and embedding dimensions from these models, we uncover unexpected properties of pull quotes to help answer the important question of what engages readers. Human evaluation also supports the uniqueness of this task and the suitability of our selection models. The benefits of exploring this problem further are clear: pull quotes increase enjoyment and readability, shape reader perceptions, and facilitate learning. Code to reproduce this work is available at https://github.com/tannerbohn/AutomaticPullQuoteSelection.
LONG2: Information Extraction 1
Tuesday, December 8, 202015:30
8
MZET: Memory Augmented Zero-Shot Fine-grained Named Entity Typing
Tao Zhang, Congying Xia, Chun-Ta Lu, Philip Yu
https://www.aclweb.org/anthology/2020.coling-main.7
Named entity typing (NET) is a classification task of assigning an entity mention in the context with given semantic types. However, with the growing size and granularity of the entity types, few previous researches concern with newly emerged entity types. In this paper, we propose MZET, a novel memory augmented FNET (Fine-grained NET) model, to tackle the unseen types in a zero-shot manner. MZET incorporates character-level, word-level, and contextural-level information to learn the entity mention representation. Besides, MZET considers the semantic meaning and the hierarchical structure into the entity type representation. Finally, through the memory component which models the relationship between the entity mention and the entity type, MZET transfers the knowledge from seen entity types to the zero-shot ones. Extensive experiments on three public datasets show the superior performance obtained by MZET, which surpasses the state-of-the-art FNET neural network models with up to 8% gain in Micro-F1 and Macro-F1 score.
LONG2: Information Extraction 1
Tuesday, December 8, 202015:30
9
Span-based Joint Entity and Relation Extraction with Attention-based Span-specific and Contextual Semantic Representations
Bin Ji, Jie Yu, Shasha Li, Jun Ma, Qingbo Wu, Yusong Tan, Huijun Liu
https://www.aclweb.org/anthology/2020.coling-main.8
Span-based joint extraction models have shown their efficiency on entity recognition and relation extraction. These models regard text spans as candidate entities and span tuples as candidate relation tuples. Span semantic representations are shared in both entity recognition and relation extraction, while existing models cannot well capture semantics of these candidate entities and relations. To address these problems, we introduce a span-based joint extraction framework with attention-based semantic representations. Specially, attentions are utilized to calculate semantic representations, including span-specific and contextual ones. We further investigate effects of four attention variants in generating contextual semantic representations. Experiments show that our model outperforms previous systems and achieves state-of-the-art results on ACE2005, CoNLL2004 and ADE.
LONG2: Information Extraction 1
Tuesday, December 8, 202015:30
10
Hierarchical Chinese Legal event extraction via Pedal Attention Mechanism
Shirong Shen, Guilin Qi, Zhen Li, Sheng Bi, Lusheng Wang
https://www.aclweb.org/anthology/2020.coling-main.9
Event extraction plays an important role in legal applications, including case push and auxiliary judgment. However, traditional event structure cannot express the connections between arguments, which are extremely important in legal events. Therefore, this paper defines a dynamic event structure for Chinese legal events. To distinguish between similar events, we design hierarchical event features for event detection. Moreover, to address the problem of long-distance semantic dependence and anaphora resolution in argument classification, we propose a novel pedal attention mechanism to extract the semantic relation between two words through their dependent adjacent words. We label a Chinese legal event dataset and evaluate our model on it. Experimental results demonstrate that our model can surpass other state-of-the-art models.
LONG2: Information Extraction 1
Tuesday, December 8, 202015:30
11
Is Killed More Significant than Fled? A Contextual Model for Salient Event Detection
Disha Jindal, Daniel Deutsch, Dan Roth
https://www.aclweb.org/anthology/2020.coling-main.10
Identifying the key events in a document is critical to holistically understanding its important information. Although measuring the salience of events is highly contextual, most previous work has used a limited representation of events that omits essential information. In this work, we propose a highly contextual model of event salience that uses a rich representation of events, incorporates document-level information and allows for interactions between latent event encodings. Our experimental results on an event salience dataset demonstrate that our model improves over previous work by an absolute 2-4% on standard metrics, establishing a new state-of-the-art performance for the task. We also propose a new evaluation metric that addresses flaws in previous evaluation methodologies. Finally, we discuss the importance of salient event detection for the downstream task of summarization.
LONG2: Information Extraction 1
Tuesday, December 8, 202015:30
12
Appraisal Theories for Emotion Classification in Text
Jan Hofmann, Enrica Troiano, Kai Sassenberg, Roman Klinger
https://www.aclweb.org/anthology/2020.coling-main.11
Automatic emotion categorization has been predominantly formulated as text classification in which textual units are assigned to an emotion from a predefined inventory, for instance following the fundamental emotion classes proposed by Paul Ekman (fear, joy, anger, disgust, sadness, surprise) or Robert Plutchik (adding trust, anticipation). This approach ignores existing psychological theories to some degree, which provide explanations regarding the perception of events. For instance, the description that somebody discovers a snake is associated with fear, based on the appraisal as being an unpleasant and non-controllable situation. This emotion reconstruction is even possible without having access to explicit reports of a subjective feeling (for instance expressing this with the words “I am afraid.”). Automatic classification approaches therefore need to learn properties of events as latent variables (for instance that the uncertainty and the mental or physical effort associated with the encounter of a snake leads to fear). With this paper, we propose to make such interpretations of events explicit, following theories of cognitive appraisal of events, and show their potential for emotion classification when being encoded in classification models. Our results show that high quality appraisal dimension assignments in event descriptions lead to an improvement in the classification of discrete emotion categories. We make our corpus of appraisal-annotated emotion-associated event descriptions publicly available.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
13
A Symmetric Local Search Network for Emotion-Cause Pair Extraction
Zifeng Cheng, Zhiwei Jiang, Yafeng Yin, Hua Yu, Qing Gu
https://www.aclweb.org/anthology/2020.coling-main.12
Emotion-cause pair extraction (ECPE) is a new task which aims at extracting the potential clause pairs of emotions and corresponding causes in a document. To tackle this task, a two-step method was proposed by previous study which first extracted emotion clauses and cause clauses individually, then paired the emotion and cause clauses, and filtered out the pairs without causality. Different from this method that separated the detection and the matching of emotion and cause into two steps, we propose a Symmetric Local Search Network (SLSN) model to perform the detection and matching simultaneously by local search. SLSN consists of two symmetric subnetworks, namely the emotion subnetwork and the cause subnetwork. Each subnetwork is composed of a clause representation learner and a local pair searcher. The local pair searcher is a specially-designed cross-subnetwork component which can extract the local emotion-cause pairs. Experimental results on the ECPE corpus demonstrate the superiority of our SLSN over existing state-of-the-art methods.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
14
Jointly Learning Aspect-Focused and Inter-Aspect Relations with Graph Convolutional Networks for Aspect Sentiment Analysis
Bin Liang, Rongdi Yin, Lin Gui, Jiachen Du, Ruifeng Xu
https://www.aclweb.org/anthology/2020.coling-main.13
In this paper, we explore a novel solution of constructing a heterogeneous graph for each instance by leveraging aspect-focused and inter-aspect contextual dependencies for the specific aspect and propose an Interactive Graph Convolutional Networks (InterGCN) model for aspect sentiment analysis. Specifically, an ordinary dependency graph is first constructed for each sentence over the dependency tree. Then we refine the graph by considering the syntactical dependencies between contextual words and aspect-specific words to derive the aspect-focused graph. Subsequently, the aspect-focused graph and the corresponding embedding matrix are fed into the aspect-focused GCN to capture the key aspect and contextual words. Besides, to interactively extract the inter-aspect relations for the specific aspect, an inter-aspect GCN is adopted to model the representations learned by aspect-focused GCN based on the inter-aspect graph which is constructed by the relative dependencies between the aspect words and other aspects. Hence, the model can be aware of the significant contextual and aspect words when interactively learning the sentiment features for a specific aspect. Experimental results on four benchmark datasets illustrate that our proposed model outperforms state-of-the-art methods and substantially boosts the performance in comparison with BERT.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
15
METNet: A Mutual Enhanced Transformation Network for Aspect-based Sentiment Analysis
Bin Jiang, Jing Hou, Wanyue Zhou, Chao Yang, Shihan Wang, Liang Pang
https://www.aclweb.org/anthology/2020.coling-main.14
Aspect-based sentiment analysis (ABSA) aims to determine the sentiment polarity of each specific aspect in a given sentence. Existing researches have realized the importance of the aspect for the ABSA task and have derived many interactive learning methods that model context based on specific aspect. However, current interaction mechanisms are ill-equipped to learn complex sentences with multiple aspects, and these methods underestimate the representation learning of the aspect. In order to solve the two problems, we propose a mutual enhanced transformation network (METNet) for the ABSA task. First, the aspect enhancement module in METNet improves the representation learning of the aspect with contextual semantic features, which gives the aspect more abundant information. Second, METNet designs and implements a hierarchical structure, which enhances the representations of aspect and context iteratively. Experimental results on SemEval 2014 Datasets demonstrate the effectiveness of METNet, and we further prove that METNet is outstanding in multi-aspect scenarios.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
16
Making the Best Use of Review Summary for Sentiment Analysis
Sen Yang, Leyang Cui, Jun Xie, Yue Zhang
https://www.aclweb.org/anthology/2020.coling-main.15
Sentiment analysis provides a useful overview of customer review contents. Many review websites allow a user to enter a summary in addition to a full review. Intuitively, summary information may give additional benefit for review sentiment analysis. In this paper, we conduct a study to exploit methods for better use of summary information. We start by finding out that the sentimental signal distribution of a review and that of its corresponding summary are in fact complementary to each other. We thus explore various architectures to better guide the interactions between the two and propose a hierarchically-refined review-centric attention model. Empirical results show that our review-centric model can make better use of user-written summaries for review sentiment analysis, and is also more effective compared to existing methods when the user summary is replaced with summary generated by an automatic summarization system.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
17
From Sentiment Annotations to Sentiment Prediction through Discourse Augmentation
Patrick Huber, Giuseppe Carenini
https://www.aclweb.org/anthology/2020.coling-main.16
Sentiment analysis, especially for long documents, plausibly requires methods capturing complex linguistics structures. To accommodate this, we propose a novel framework to exploit task-related discourse for the task of sentiment analysis. More specifically, we are combining the large-scale, sentiment-dependent MEGA-DT treebank with a novel neural architecture for sentiment prediction, based on a hybrid TreeLSTM hierarchical attention model. Experiments show that our framework using sentiment-related discourse augmentations for sentiment prediction enhances the overall performance for long documents, even beyond previous approaches using well-established discourse parsers trained on human annotated data. We show that a simple ensemble approach can further enhance performance by selectively using discourse, depending on the document length.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
18
End-to-End Emotion-Cause Pair Extraction with Graph Convolutional Network
Ying Chen, Wenjun Hou, Shoushan Li, Caicong Wu, Xiaoqiang Zhang
https://www.aclweb.org/anthology/2020.coling-main.17
Emotion-cause pair extraction (ECPE), which aims at simultaneously extracting emotion-cause pairs that express emotions and their corresponding causes in a document, plays a vital role in understanding natural languages. Considering that most emotions usually have few causes mentioned in their contexts, we present a novel end-to-end Pair Graph Convolutional Network (PairGCN) to model pair-level contexts so that to capture the dependency information among local neighborhood candidate pairs. Moreover, in the graphical network, contexts are grouped into three types and each type of contexts is propagated by its own way. Experiments on a benchmark Chinese emotion-cause pair extraction corpus demonstrate the effectiveness of the proposed model.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
19
A Unified Sequence Labeling Model for Emotion Cause Pair Extraction
Xinhong Chen, Qing Li, Jianping Wang
https://www.aclweb.org/anthology/2020.coling-main.18
Emotion-cause pair extraction (ECPE) aims at extracting emotions and causes as pairs from documents, where each pair contains an emotion clause and a set of cause clauses. Existing approaches address the task by first extracting emotion and cause clauses via two binary classifiers separately, and then training another binary classifier to pair them up. However, the extracted emotion-cause pairs of different emotion types cannot be distinguished from each other through simple binary classifiers, which limits the applicability of the existing approaches. Moreover, such two-step approaches may suffer from possible cascading errors. In this paper, to address the first problem, we assign emotion type labels to emotion and cause clauses so that emotion-cause pairs of different emotion types can be easily distinguished. As for the second problem, we reformulate the ECPE task as a unified sequence labeling task, which can extract multiple emotion-cause pairs in an end-to-end fashion. We propose an approach composed of a convolution neural network for encoding neighboring information and two Bidirectional Long-Short Term Memory networks for two auxiliary tasks. Experiment results demonstrate the feasibility and effectiveness of our approaches.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
20
Regrexit or not Regrexit: Aspect-based Sentiment Analysis in Polarized Contexts
Vorakit Vorakitphan, Marco Guerini, Elena Cabrio, Serena Villata
https://www.aclweb.org/anthology/2020.coling-main.19
Emotion analysis in polarized contexts represents a challenge for Natural Language Processing modeling. As a step in the aforementioned direction, we present a methodology to extend the task of Aspect-based Sentiment Analysis (ABSA) toward the affect and emotion representation in polarized settings. In particular, we adopt the three-dimensional model of affect based on Valence, Arousal, and Dominance (VAD). We then present a Brexit scenario that proves how affect varies toward the same aspect when politically polarized stances are presented. Our approach captures aspect-based polarization from newspapers regarding the Brexit scenario of 1.2m entities at sentence-level. We demonstrate how basic constituents of emotions can be mapped to the VAD model, along with their interactions respecting the polarized context in ABSA settings using biased key-concepts (e.g., “stop Brexit” vs. “support Brexit”). Quite intriguingly, the framework achieves to produce coherent aspect evidences of Brexit’s stance from key-concepts, showing that VAD influence the support and opposition aspects.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
21
Affective and Contextual Embedding for Sarcasm Detection
Nastaran Babanejad, Heidar Davoudi, Aijun An, Manos Papagelis
https://www.aclweb.org/anthology/2020.coling-main.20
Automatic sarcasm detection from text is an important classification task that can help identify the actual sentiment in user-generated data, such as reviews or tweets. Despite its usefulness, sarcasm detection remains a challenging task, due to a lack of any vocal intonation or facial gestures in textual data. To date, most of the approaches to addressing the problem have relied on hand-crafted affect features, or pre-trained models of non-contextual word embeddings, such as Word2vec. However, these models inherit limitations that render them inadequate for the task of sarcasm detection. In this paper, we propose two novel deep neural network models for sarcasm detection, namely ACE 1 and ACE 2. Given as input a text passage, the models predict whether it is sarcastic (or not). Our models extend the architecture of BERT by incorporating both affective and contextual features. To the best of our knowledge, this is the first attempt to directly alter BERT’s architecture and train it from scratch to build a sarcasm classifier. Extensive experiments on different datasets demonstrate that the proposed models outperform state-of-the-art models for sarcasm detection with significant margins.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
22
Understanding Pre-trained BERT for Aspect-based Sentiment Analysis
Hu Xu, Lei Shu, Philip Yu, Bing Liu
https://www.aclweb.org/anthology/2020.coling-main.21
This paper analyzes the pre-trained hidden representations learned from reviews on BERT for tasks in aspect-based sentiment analysis (ABSA). Our work is motivated by the recent progress in BERT-based language models for ABSA. However, it is not clear how the general proxy task of (masked) language model trained on unlabeled corpus without annotations of aspects or opinions can provide important features for downstream tasks in ABSA. By leveraging the annotated datasets in ABSA, we investigate both the attentions and the learned representations of BERT pre-trained on reviews. We found that BERT uses very few self-attention heads to encode context words (such as prepositions or pronouns that indicating an aspect) and opinion words for an aspect. Most features in the representation of an aspect are dedicated to the fine-grained semantics of the domain (or product category) and the aspect itself, instead of carrying summarized opinions from its context. We hope this investigation can help future research in improving self-supervised learning, unsupervised learning and fine-tuning for ABSA. The pre-trained model and code can be found at https://github.com/howardhsu/BERT-for-RRC-ABSA.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
23
Weighed Domain-Invariant Representation Learning for Cross-domain Sentiment Analysis
Minlong Peng, Qi Zhang
https://www.aclweb.org/anthology/2020.coling-main.22
Cross-domain sentiment analysis is currently a hot topic in both the research and industrial areas. One of the most popular framework for the task is domain-invariant representation learning (DIRL), which aims to learn a distribution-invariant feature representation across domains. However, in this work, we find out that applying DIRL may degrade domain adaptation performance when the label distribution \rm{P}(\rm{Y}) changes across domains. To address this problem, we propose a modification to DIRL, obtaining a novel weighted domain-invariant representation learning (WDIRL) framework. We show that it is easy to transfer existing models of the DIRL framework to the WDIRL framework. Empirical studies on extensive cross-domain sentiment analysis tasks verified our statements and showed the effectiveness of our proposed solution.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
24
Improving Sentiment Analysis over non-English Tweets using Multilingual Transformers and Automatic Translation for Data-Augmentation
Valentin Barriere, Alexandra Balahur
https://www.aclweb.org/anthology/2020.coling-main.23
Tweets are specific text data when compared to general text. Although sentiment analysis over tweets has become very popular in the last decade for English, it is still difficult to find huge annotated corpora for non-English languages. The recent rise of the transformer models in Natural Language Processing allows to achieve unparalleled performances in many tasks, but these models need a consequent quantity of text to adapt to the tweet domain. We propose the use of a multilingual transformer model, that we pre-train over English tweets on which we apply data-augmentation using automatic translation to adapt the model to non-English languages. Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
25
Joint Aspect Extraction and Sentiment Analysis with Directional Graph Convolutional Networks
Guimin Chen, Yuanhe Tian, Yan Song
https://www.aclweb.org/anthology/2020.coling-main.24
End-to-end aspect-based sentiment analysis (EASA) consists of two sub-tasks: the first extracts the aspect terms in a sentence and the second predicts the sentiment polarities for such terms. For EASA, compared to pipeline and multi-task approaches, joint aspect extraction and sentiment analysis provides a one-step solution to predict both aspect terms and their sentiment polarities through a single decoding process, which avoid the mismatches in between the results of aspect terms and sentiment polarities, as well as error propagation. Previous studies, especially recent ones, for this task focus on using powerful encoders (e.g., Bi-LSTM and BERT) to model contextual information from the input, with limited efforts paid to using advanced neural architectures (such as attentions and graph convolutional networks) or leveraging extra knowledge (such as syntactic information). To extend such efforts, in this paper, we propose directional graph convolutional networks (D-GCN) to jointly perform aspect extraction and sentiment analysis with encoding syntactic information, where dependency among words are integrated in our model to enhance its ability of representing input sentences and help EASA accordingly. Experimental results on three benchmark datasets demonstrate the effectiveness of our approach, where D-GCN achieves state-of-the-art performance on all datasets.
POSTER1: Sentiment and Emotion. Posters
Tuesday, December 8, 202015:30
26
Train Once, and Decode As You Like
Chao Tian, Yifei Wang, Hao Cheng, Yijiang Lian, Zhihua Zhang
https://www.aclweb.org/anthology/2020.coling-main.25
In this paper we propose a unified approach for supporting different generation manners of machine translation, including autoregressive, semi-autoregressive, and refinement-based non-autoregressive models. Our approach works by repeatedly selecting positions and generating tokens at these selected positions. After being trained once, our approach achieves better or competitive translation performance compared with some strong task-specific baseline models in all the settings. This generalization ability benefits mainly from the new training objective that we propose. We validate our approach on the WMT’14 English-German and IWSLT’14 German-English translation tasks. The experimental results are encouraging.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
27
A Representation Learning Approach to Animal Biodiversity Conservation
Meet Mukadam, Mandhara Jayaram, Yongfeng Zhang
https://www.aclweb.org/anthology/2020.coling-main.26
Generating knowledge from natural language data has aided in solving many artificial intelligence problems. Vector representations of words have been the driving force behind the majority of natural language processing tasks. This paper develops a novel approach for predicting the conservation status of animal species using custom generated scientific name embeddings. We use two different vector embeddings generated using representation learning on Wikipedia text and animal taxonomy data. We generate name embeddings for all species in the animal kingdom using unsupervised learning and build a model on the IUCN Red List dataset to classify species into endangered or least-concern. To our knowledge, this is the first work that makes use of learnt features instead of handcrafted features for this task and achieves competitive results. Based on the high confidence results of our model, we also predict the conservation status of data deficient species whose conservation status is still unknown and thus steering more focus towards them for protection. These embeddings have also been made publicly available here. We believe this will greatly help in solving various downstream tasks and further advance research in the cross-domain involving natural language processing, conservation biology, and life sciences.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
28
Integrating External Event Knowledge for Script Learning
Shangwen Lv, Fuqing Zhu, Songlin Hu
https://www.aclweb.org/anthology/2020.coling-main.27
Script learning aims to predict the subsequent event according to the existing event chain. Recent studies focus on event co-occurrence to solve this problem. However, few studies integrate external event knowledge to solve this problem. With our observations, external event knowledge can provide additional knowledge like temporal or causal knowledge for understanding event chain better and predicting the right subsequent event. In this work, we integrate event knowledge from ASER (Activities, States, Events and their Relations) knowledge base to help predict the next event. We propose a new approach consisting of knowledge retrieval stage and knowledge integration stage. In the knowledge retrieval stage, we select relevant external event knowledge from ASER. In the knowledge integration stage, we propose three methods to integrate external knowledge into our model and infer final answers. Experiments on the widely-used Multi- Choice Narrative Cloze (MCNC) task show our approach achieves state-of-the-art performance compared to other methods.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
29
Pointing to Subwords for Generating Function Names in Source Code
Shogo Fujita, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura
https://www.aclweb.org/anthology/2020.coling-main.28
We tackle the task of automatically generating a function name from source code. Existing generators face difficulties in generating low-frequency or out-of-vocabulary subwords. In this paper, we propose two strategies for copying low-frequency or out-of-vocabulary subwords in inputs. Our best performing model showed an improvement over the conventional method in terms of our modified F1 and accuracy on the Java-small and Java-large datasets.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
30
Heterogeneous Graph Neural Networks to Predict What Happen Next
Jianming Zheng, Fei Cai, Yanxiang Ling, Honghui Chen
https://www.aclweb.org/anthology/2020.coling-main.29
Given an incomplete event chain, script learning aims to predict the missing event, which can support a series of NLP applications. Existing work cannot well represent the heterogeneous relations and capture the discontinuous event segments that are common in the event chain. To address these issues, we introduce a heterogeneous-event (HeterEvent) graph network. In particular, we employ each unique word and individual event as nodes in the graph, and explore three kinds of edges based on realistic relations (e.g., the relations of word-and-word, word-and-event, event-and-event). We also design a message passing process to realize information interactions among homo or heterogeneous nodes. And the discontinuous event segments could be explicitly modeled by finding the specific path between corresponding nodes in the graph. The experimental results on one-step and multi-step inference tasks demonstrate that our ensemble model HeterEvent[W+E] can outperform existing baselines.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
31
CEREC: A Corpus for Entity Resolution in Email ConversationsParag Pravin Dakle, Dan Moldovan
https://www.aclweb.org/anthology/2020.coling-main.30
We present the first large scale corpus for entity resolution in email conversations (CEREC). The corpus consists of 6001 email threads from the Enron Email Corpus containing 36,448 email messages and 38,996 entity coreference chains. The annotation is carried out as a two-step process with minimal manual effort. Experiments are carried out for evaluating different features and performance of four baselines on the created corpus. For the task of mention identification and coreference resolution, a best performance of 54.1 F1 is reported, highlighting the room for improvement. An in-depth qualitative and quantitative error analysis is presented to understand the limitations of the baselines considered.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
32
SQL Generation via Machine Reading Comprehension
Zeyu Yan, Jianqiang Ma, Yang Zhang, Jianping Shen
https://www.aclweb.org/anthology/2020.coling-main.31
Text-to-SQL systems offers natural language interfaces to databases, which can automatically generates SQL queries given natural language questions. On the WikiSQL benchmark, state-of- the-art text-to-SQL systems typically take a slot-filling approach by building several specialized models for each type of slot. Despite being effective, such modularized systems are complex and also fall short in jointly learning for different slots. To solve these problems, this paper proposes a novel approach that formulates the task as a question answering problem, where different slots are predicted by a unified machine reading comprehension (MRC) model. For this purpose, we use a BERT-based MRC model, which can also benefit from intermediate training on other MRC datasets. The proposed method can achieve competitive results on WikiSQL, suggesting it being a promising direction for text-to-SQL.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
33
Towards Privacy by Design in Learner Corpora Research: A Case of On-the-fly Pseudonymization of Swedish Learner Essays
Elena Volodina, Yousuf Ali Mohammed, Sandra Derbring, Arild Matsson, Beata Megyesi
https://www.aclweb.org/anthology/2020.coling-main.32
This article reports on an ongoing project aiming at automatization of pseudonymization of learner essays. The process includes three steps: identification of personal information in an unstructured text, labeling for a category, and pseudonymization. We experiment with rule-based methods for detection of 15 categories out of the suggested 19 (Megyesi et al., 2018) that we deem important and/or doable with automatic approaches. For the detection and labeling steps,we use resources covering personal names, geographic names, company and university names and others. For the pseudonymization step, we replace the item using another item of the same type from the above-mentioned resources. Evaluation of the detection and labeling steps are made on a set of manually anonymized essays. The results are promising and show that 89% of the personal information can be successfully identified in learner data, and annotated correctly with an inter-annotator agreement of 86% measured as Fleiss kappa and Krippendorff’s alpha.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
34
PG-GSQL: Pointer-Generator Network with Guide Decoding for Cross-Domain Context-Dependent Text-to-SQL Generation
Huajie Wang, Mei Li, Lei Chen
https://www.aclweb.org/anthology/2020.coling-main.33
Text-to-SQL is a task of translating utterances to SQL queries, and most existing neural approaches of text-to-SQL focus on the cross-domain context-independent generation task. We pay close attention to the cross-domain context-dependent text-to-SQL generation task, which requires a model to depend on the interaction history and current utterance to generate SQL query. In this paper, we present an encoder-decoder model called PG-GSQL based on the interaction-level encoder and with two effective innovations in decoder to solve cross-domain context-dependent text-to-SQL task. 1) To effectively capture historical information of SQL query and reuse the previous SQL query tokens, we use a hybrid pointer-generator network as decoder to copy tokens from the previous SQL query via pointer, the generator part is utilized to generate new tokens. 2) We propose a guide component to limit the prediction space of vocabulary for avoiding table-column dependency and foreign key dependency errors during decoding phase. In addition, we design a column-table linking mechanism to improve the prediction accuracy of tables. On the challenging cross-domain context-dependent text-to-SQL benchmark SParC, PG-GSQL achieves 34.0% question matching accuracy and 19.0% interaction matching accuracy on the dev set. With BERT augmentation, PG-GSQL obtains 53.1% question matching accuracy and 34.7% interaction matching accuracy on the dev set, outperforms the previous state-of-the-art model by 5.9% question matching accuracy and 5.2% interaction matching accuracy. Our code is publicly available.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
35
Neural Approaches for Natural Language Interfaces to Databases: A Survey
Radu Cristian Alexandru Iacob, Florin Brad, Elena-Simona Apostol, Ciprian-Octavian Truică, Ionel Alexandru Hosu, Traian Rebedea
https://www.aclweb.org/anthology/2020.coling-main.34
A natural language interface to databases (NLIDB) enables users without technical expertise to easily access information from relational databases. Interest in NLIDBs has resurged in the past years due to the availability of large datasets and improvements to neural sequence-to-sequence models. In this survey we focus on the key design decisions behind current state of the art neural approaches, which we group into encoder and decoder improvements. We highlight the three most important directions, namely linking question tokens to database schema elements (schema linking), better architectures for encoding the textual query taking into account the schema (schema encoding), and improved generation of structured queries using autoregressive neural models (grammar-based decoders). To foster future research, we also present an overview of the most important NLIDB datasets, together with a comparison of the top performing neural models and a short insight into recent non deep learning solutions.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
36
Predicting Stance Change Using Modular ArchitecturesAldo Porco, Dan Goldwasser
https://www.aclweb.org/anthology/2020.coling-main.35
The ability to change a person’s mind on a given issue depends both on the arguments they are presented with and on their underlying perspectives and biases on that issue. Predicting stance changes require characterizing both aspects and the interaction between them, especially in realistic settings in which stance changes are very rare. In this paper, we suggest a modular learning approach, which decomposes the task into multiple modules, focusing on different aspects of the interaction between users, their beliefs, and the arguments they are exposed to. Our experiments show that our modular approach archives significantly better results compared to the end-to-end approach using BERT over the same inputs.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
37
Leveraging HTML in Free Text Web Named Entity Recognition
Colin Ashby, David Weir
https://www.aclweb.org/anthology/2020.coling-main.36
HTML tags are typically discarded in free text Named Entity Recognition from Web pages. We investigate whether these discarded tags might be used to improve NER performance. We compare Text+Tags sentences with their Text-Only equivalents, over five datasets, two free text segmentation granularities and two NER models. We find an increased F1 performance for Text+Tags of between 0.9% and 13.2% over all datasets, variants and models. This performance increase, over datasets of varying entity types, HTML density and construction quality, indicates our method is flexible and adaptable. These findings imply that a similar technique might be of use in other Web-aware NLP tasks, including the enrichment of deep language models.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
38
Multimodal Review Generation with Privacy and Fairness Awareness
Xuan-Son Vu, Thanh-Son Nguyen, Duc-Trong Le, Lili Jiang
https://www.aclweb.org/anthology/2020.coling-main.37
Users express their opinions towards entities (e.g., restaurants) via online reviews which can be in diverse forms such as text, ratings, and images. Modeling reviews are advantageous for user behavior understanding which, in turn, supports various user-oriented tasks such as recommendation, sentiment analysis, and review generation. In this paper, we propose MG-PriFair, a multimodal neural-based framework, which generates personalized reviews with privacy and fairness awareness. Motivated by the fact that reviews might contain personal information and sentiment bias, we propose a novel differentially private (dp)-embedding model for training privacy guaranteed embeddings and an evaluation approach for sentiment fairness in the food-review domain. Experiments on our novel review dataset show that MG-PriFair is capable of generating plausibly long reviews while controlling the amount of exploited user data and using the least sentiment biased word embeddings. To the best of our knowledge, we are the first to bring user privacy and sentiment fairness into the review generation task. The dataset and source codes are available at https://github.com/ReML-AI/MG-PriFair.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
39
Generating Equation by Utilizing Operators : GEO model
Kyung Seo Ki, Donggeon Lee, Bugeun Kim, Gahgene Gweon
https://www.aclweb.org/anthology/2020.coling-main.38
Math word problem solving is an emerging research topic in Natural Language Processing. Recently, to address the math word problem-solving task, researchers have applied the encoder-decoder architecture, which is mainly used in machine translation tasks. The state-of-the-art neural models use hand-crafted features and are based on generation methods. In this paper, we propose the GEO (Generation of Equations by utilizing Operators) model that does not use hand-crafted features and addresses two issues that are present in existing neural models: 1. missing domain-specific knowledge features and 2. losing encoder-level knowledge. To address missing domain-specific feature issue, we designed two auxiliary tasks: operation group difference prediction and implicit pair prediction. To address losing encoder-level knowledge issue, we added an Operation Feature Feed Forward (OP3F) layer. Experimental results showed that the GEO model outperformed existing state-of-the-art models on two datasets, 85.1% in MAWPS, and 62.5% in DRAW-1K, and reached comparable performance of 82.1% in ALG514 dataset.
POSTER2: Applications. Posters
Tuesday, December 8, 202016:00
40
Improving Abstractive Dialogue Summarization with Graph Structures and Topic Words
Lulu Zhao, Weiran Xu, Jun Guo
https://www.aclweb.org/anthology/2020.coling-main.39
Recently, people have been beginning paying more attention to the abstractive dialogue summarization task. Since the information flows are exchanged between at least two interlocutors and key elements about a certain event are often spanned across multiple utterances, it is necessary for researchers to explore the inherent relations and structures of dialogue contents. However, the existing approaches often process the dialogue with sequence-based models, which are hard to capture long-distance inter-sentence relations. In this paper, we propose a Topic-word Guided Dialogue Graph Attention (TGDGA) network to model the dialogue as an interaction graph according to the topic word information. A masked graph self-attention mechanism is used to integrate cross-sentence information flows and focus more on the related utterances, which makes it better to understand the dialogue. Moreover, the topic word features are introduced to assist the decoding process. We evaluate our model on the SAMSum Corpus and Automobile Master Corpus. The experimental results show that our method outperforms most of the baselines.
LONG3: Dialogue 1
Tuesday, December 8, 202016:30
41
Speaker-change Aware CRF for Dialogue Act Classification
Guokan Shang, Antoine Tixier, Michalis Vazirgiannis, Jean-Pierre Lorré
https://www.aclweb.org/anthology/2020.coling-main.40
Recent work in Dialogue Act (DA) classification approaches the task as a sequence labeling problem, using neural network models coupled with a Conditional Random Field (CRF) as the last layer. CRF models the conditional probability of the target DA label sequence given the input utterance sequence. However, the task involves another important input sequence, that of speakers, which is ignored by previous work. To address this limitation, this paper proposes a simple modification of the CRF layer that takes speaker-change into account. Experiments on the SwDA corpus show that our modified CRF layer outperforms the original one, with very wide margins for some DA labels. Further, visualizations demonstrate that our CRF layer can learn meaningful, sophisticated transition patterns between DA label pairs conditioned on speaker-change in an end-to-end way. Code is publicly available.
LONG3: Dialogue 1
Tuesday, December 8, 202016:30
42
LAVA: Latent Action Spaces via Variational Auto-encoding for Dialogue Policy Optimization
Nurul Lubis, Christian Geishauser, Michael Heck, Hsien-chin Lin, Marco Moresi, Carel van Niekerk, Milica Gasic
https://www.aclweb.org/anthology/2020.coling-main.41
Reinforcement learning (RL) can enable task-oriented dialogue systems to steer the conversation towards successful task completion. In an end-to-end setting, a response can be constructed in a word-level sequential decision making process with the entire system vocabulary as action space. Policies trained in such a fashion do not require expert-defined action spaces, but they have to deal with large action spaces and long trajectories, making RL impractical. Using the latent space of a variational model as action space alleviates this problem. However, current approaches use an uninformed prior for training and optimize the latent distribution solely on the context. It is therefore unclear whether the latent representation truly encodes the characteristics of different actions. In this paper, we explore three ways of leveraging an auxiliary task to shape the latent variable distribution: via pre-training, to obtain an informed prior, and via multitask learning. We choose response auto-encoding as the auxiliary task, as this captures the generative factors of dialogue responses while requiring low computational cost and neither additional data nor labels. Our approach yields a more action-characterized latent representations which support end-to-end dialogue policy optimization and achieves state-of-the-art success rates. These results warrant a more wide-spread use of RL in end-to-end dialogue models.
LONG3: Dialogue 1
Tuesday, December 8, 202016:30
43
Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey
Samuel Louvan, Bernardo Magnini
https://www.aclweb.org/anthology/2020.coling-main.42
In recent years, fostered by deep learning technologies and by the high demand for conversational AI, various approaches have been proposed that address the capacity to elicit and understand user’s needs in task-oriented dialogue systems. We focus on two core tasks, slot filling (SF) and intent classification (IC), and survey how neural based models have rapidly evolved to address natural language understanding in dialogue systems. We introduce three neural architectures: independent models, which model SF and IC separately, joint models, which exploit the mutual benefit of the two tasks simultaneously, and transfer learning models, that scale the model to new domains. We discuss the current state of the research in SF and IC, and highlight challenges that still require attention.
LONG3: Dialogue 1
Tuesday, December 8, 202016:30
44
Re-framing Incremental Deep Language Models for Dialogue Processing with Multi-task Learning
Morteza Rohanian, Julian Hough
https://www.aclweb.org/anthology/2020.coling-main.43
We present a multi-task learning framework to enable the training of one universal incremental dialogue processing model with four tasks of disfluency detection, language modelling, part-of-speech tagging and utterance segmentation in a simple deep recurrent setting. We show that these tasks provide positive inductive biases to each other with optimal contribution of each one relying on the severity of the noise from the task. Our live multi-task model outperforms similar individual tasks, delivers competitive performance and is beneficial for future use in conversational agents in psychiatric treatment.
LONG3: Dialogue 1
Tuesday, December 8, 202016:30
45
AprilE: Attention with Pseudo Residual Connection for Knowledge Graph Embedding
Yuzhang Liu, Peng Wang, Yingtai Li, Yizhan Shao, Zhongkai Xu
https://www.aclweb.org/anthology/2020.coling-main.44
Knowledge graph embedding maps entities and relations into low-dimensional vector space. However, it is still challenging for many existing methods to model diverse relational patterns, especially symmetric and antisymmetric relations. To address this issue, we propose a novel model, AprilE, which employs triple-level self-attention and pseudo residual connection to model relational patterns. The triple-level self-attention treats head entity, relation, and tail entity as a sequence and captures the dependency within a triple. At the same time the pseudo residual connection retains primitive semantic features. Furthermore, to deal with symmetric and antisymmetric relations, two schemas of score function are designed via a position-adaptive mechanism. Experimental results on public datasets demonstrate that our model can produce expressive knowledge embedding and significantly outperforms most of the state-of-the-art works.
LONG4: Information Extraction 2
Tuesday, December 8, 202016:30
46
Variational Autoencoder with Embedded Student-t Mixture Model for Authorship Attribution
Benedikt Boenninghoff, Steffen Zeiler, Robert Nickel, Dorothea Kolossa
https://www.aclweb.org/anthology/2020.coling-main.45
Traditional computational authorship attribution describes a classification task in a closed-set scenario. Given a finite set of candidate authors and corresponding labeled texts, the objective is to determine which of the authors has written another set of anonymous or disputed texts. In this work, we propose a probabilistic autoencoding framework to deal with this supervised classification task. Variational autoencoders (VAEs) have had tremendous success in learning latent representations. However, existing VAEs are currently still bound by limitations imposed by the assumed Gaussianity of the underlying probability distributions in the latent space. In this work, we are extending a VAE with an embedded Gaussian mixture model to a Student-t mixture model, which allows for an independent control of the “heaviness” of the respective tails of the implied probability densities. Experiments over an Amazon review dataset indicate superior performance of the proposed method.
LONG4: Information Extraction 2
Tuesday, December 8, 202016:30
47
Knowledge Graph Embeddings in Geometric Algebras
Chengjin Xu, Mojtaba Nayyeri, Yung-Yu Chen, Jens Lehmann
https://www.aclweb.org/anthology/2020.coling-main.46
Knowledge graph (KG) embedding aims at embedding entities and relations in a KG into a low dimensional latent representation space. Existing KG embedding approaches model entities and relations in a KG by utilizing real-valued , complex-valued, or hypercomplex-valued (Quaternion or Octonion) representations, all of which are subsumed into a geometric algebra. In this work, we introduce a novel geometric algebra-based KG embedding framework, GeomE, which utilizes multivector representations and the geometric product to model entities and relations. Our framework subsumes several state-of-the-art KG embedding approaches and is advantageous with its ability of modeling various key relation patterns, including (anti-)symmetry, inversion and composition, rich expressiveness with higher degree of freedom as well as good generalization capacity. Experimental results on multiple benchmark knowledge graphs show that the proposed approach outperforms existing state-of-the-art models for link prediction.
LONG4: Information Extraction 2
Tuesday, December 8, 202016:30
48
Exploiting Node Content for Multiview Graph Convolutional Network and Adversarial Regularization
Qiuhao Lu, Nisansa de Silva, Dejing Dou, Thien Huu Nguyen, Prithviraj Sen, Berthold Reinwald, Yunyao Li
https://www.aclweb.org/anthology/2020.coling-main.47
Network representation learning (NRL) is crucial in the area of graph learning. Recently, graph autoencoders and its variants have gained much attention and popularity among various types of node embedding approaches. Most existing graph autoencoder-based methods aim to minimize the reconstruction errors of the input network while not explicitly considering the semantic relatedness between nodes. In this paper, we propose a novel network embedding method which models the consistency across different views of networks. More specifically, we create a second view from the input network which captures the relation between nodes based on node content and enforce the latent representations from the two views to be consistent by incorporating a multiview adversarial regularization module. The experimental studies on benchmark datasets prove the effectiveness of this method, and demonstrate that our method compares favorably with the state-of-the-art algorithms on challenging tasks such as link prediction and node clustering. We also evaluate our method on a real-world application, i.e., 30-day unplanned ICU readmission prediction, and achieve promising results compared with several baseline methods.
LONG4: Information Extraction 2
Tuesday, December 8, 202016:30
49
RatE: Relation-Adaptive Translating Embedding for Knowledge Graph Completion
Hao Huang, Guodong Long, Tao Shen, Jing Jiang, Chengqi Zhang
https://www.aclweb.org/anthology/2020.coling-main.48
Many graph embedding approaches have been proposed for knowledge graph completion via link prediction. Among those, translating embedding approaches enjoy the advantages of light-weight structure, high efficiency and great interpretability. Especially when extended to complex vector space, they show the capability in handling various relation patterns including symmetry, antisymmetry, inversion and composition. However, previous translating embedding approaches defined in complex vector space suffer from two main issues: 1) representing and modeling capacities of the model are limited by the translation function with rigorous multiplication of two complex numbers; and 2) embedding ambiguity caused by one-to-many relations is not explicitly alleviated. In this paper, we propose a relation-adaptive translation function built upon a novel weighted product in complex space, where the weights are learnable, relation-specific and independent to embedding size. The translation function only requires eight more scalar parameters each relation, but improves expressive power and alleviates embedding ambiguity problem. Based on the function, we then present our Relation-adaptive translating Embedding (RatE) approach to score each graph triple. Moreover, a novel negative sampling method is proposed to utilize both prior knowledge and self-adversarial learning for effective optimization. Experiments verify RatE achieves state-of-the-art performance on four link prediction benchmarks.
LONG4: Information Extraction 2
Tuesday, December 8, 202016:30
50
SentiX: A Sentiment-Aware Pre-Trained Model for Cross-Domain Sentiment Analysis
Jie Zhou, Junfeng Tian, Rui Wang, Yuanbin Wu, Wenming Xiao, Liang He
https://www.aclweb.org/anthology/2020.coling-main.49
Pre-trained language models have been widely applied to cross-domain NLP tasks like sentiment analysis, achieving state-of-the-art performance. However, due to the variety of users’ emotional expressions across domains, fine-tuning the pre-trained models on the source domain tends to overfit, leading to inferior results on the target domain. In this paper, we pre-train a sentiment-aware language model (SentiX) via domain-invariant sentiment knowledge from large-scale review datasets, and utilize it for cross-domain sentiment analysis task without fine-tuning. We propose several pre-training tasks based on existing lexicons and annotations at both token and sentence levels, such as emoticons, sentiment words, and ratings, without human interference. A series of experiments are conducted and the results indicate the great advantages of our model. We obtain new state-of-the-art results in all the cross-domain sentiment analysis tasks, and our proposed SentiX can be trained with only 1% samples (18 samples) and it achieves better performance than BERT with 90% samples.
LONG5: Sentiment Analysis 1
Tuesday, December 8, 202016:30
51
Bayes-enhanced Lifelong Attention Networks for Sentiment Classification
Hao Wang, Shuai Wang, Sahisnu Mazumder, Bing Liu, Yan Yang, Tianrui Li
https://www.aclweb.org/anthology/2020.coling-main.50
The classic deep learning paradigm learns a model from the training data of a single task and the learned model is also tested on the same task. This paper studies the problem of learning a sequence of tasks (sentiment classification tasks in our case). After each sentiment classification task is learned, its knowledge is retained to help future task learning. Following this setting, we explore attention neural networks and propose a Bayes-enhanced Lifelong Attention Network (BLAN). The key idea is to exploit the generative parameters of naive Bayes to learn attention knowledge. The learned knowledge from each task is stored in a knowledge base and later used to build lifelong attentions. The constructed lifelong attentions are then used to enhance the attention of the network to help new task learning. Experimental results on product reviews from Amazon.com show the effectiveness of the proposed model.
LONG5: Sentiment Analysis 1
Tuesday, December 8, 202016:30
52
Arabizi Language Models for Sentiment Analysis
Gaétan Baert, Souhir Gahbiche, Guillaume Gadek, Alexandre Pauchet
https://www.aclweb.org/anthology/2020.coling-main.51
Arabizi is a written form of spoken Arabic, relying on Latin characters and digits. It is informal and does not follow any conventional rules, raising many NLP challenges. In particular, Arabizi has recently emerged as the Arabic language in online social networks, becoming of great interest for opinion mining and sentiment analysis. Unfortunately, only few Arabizi resources exist and state-of-the-art language models such as BERT do not consider Arabizi. In this work, we construct and release two datasets: (i) LAD, a corpus of 7.7M tweets written in Arabizi and (ii) SALAD, a subset of LAD, manually annotated for sentiment analysis. Then, a BERT architecture is pre-trained on LAD, in order to create and distribute an Arabizi language model called BAERT. We show that a language model (BAERT) pre-trained on a large corpus (LAD) in the same language (Arabizi) as that of the fine-tuning dataset (SALAD), outperforms a state-of-the-art multi-lingual pretrained model (multilingual BERT) on a sentiment analysis task.
LONG5: Sentiment Analysis 1
Tuesday, December 8, 202016:30
53
Author's Sentiment Prediction
Mohaddeseh Bastan, Mahnaz Koupaee, Youngseo Son, Richard Sicoli, Niranjan Balasubramanian
https://www.aclweb.org/anthology/2020.coling-main.52
Even though sentiment analysis has been well-studied on a wide range of domains, there hasn’tbeen much work on inferring author sentiment in news articles. To address this gap, we introducePerSenT, a crowd-sourced dataset that captures the sentiment of an author towards the mainentity in a news article. Our benchmarks of multiple strong baselines show that this is a difficultclassification task. BERT performs the best amongst the baselines. However, it only achievesa modest performance overall suggesting that fine-tuning document-level representations aloneisn’t adequate for this task. Making paragraph-level decisions and aggregating over the entiredocument is also ineffective. We present empirical and qualitative analyses that illustrate thespecific challenges posed by this dataset. We release this dataset with 5.3k documents and 38kparagraphs with 3.2k unique entities as a challenge in entity sentiment analysis.
LONG5: Sentiment Analysis 1
Tuesday, December 8, 202016:30
54
Modeling Local Contexts for Joint Dialogue Act Recognition and Sentiment Classification with Bi-channel Dynamic Convolutions
Jingye Li, Hao Fei, Donghong Ji
https://www.aclweb.org/anthology/2020.coling-main.53
In this paper, we target improving the joint dialogue act recognition (DAR) and sentiment classification (SC) tasks by fully modeling the local contexts of utterances. First, we employ the dynamic convolution network (DCN) as the utterance encoder to capture the dialogue contexts. Further, we propose a novel context-aware dynamic convolution network (CDCN) to better leverage the local contexts when dynamically generating kernels. We extended our frameworks into bi-channel version (i.e., BDCN and BCDCN) under multi-task learning to achieve the joint DAR and SC. Two channels can learn their own feature representations for DAR and SC, respectively, but with latent interaction. Besides, we suggest enhancing the tasks by employing the DiaBERT language model. Our frameworks obtain state-of-the-art performances against all baselines on two benchmark datasets, demonstrating the importance of modeling the local contexts.
LONG5: Sentiment Analysis 1
Tuesday, December 8, 202016:30
55
Named Entity Recognition for Chinese biomedical patentsYuting Hu, Suzan Verberne
https://www.aclweb.org/anthology/2020.coling-main.54
There is a large body of work on Biomedical Entity Recognition (Bio-NER) for English but there have only been a few attempts addressing NER for Chinese biomedical texts. Because of the growing amount of Chinese biomedical discoveries being patented, and lack of NER models for patent data, we train and evaluate NER models for the analysis of Chinese biomedical patent data, based on BERT. By doing so, we show the value and potential of this domain-specific NER task. For the evaluation of our methods we built our own Chinese biomedical patents NER dataset, and our optimized model achieved an F1 score of 0.54±0.15. Further biomedical analysis indicates that our solution can help detecting meaningful biomedical entities and novel gene-gene interactions, with limited labeled data, training time and computing power.
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
56
Learning Health-Bots from Training Data that was Automatically Created using Paraphrase Detection and Expert Knowledge
Anna Liednikova, Philippe Jolivet, Alexandre Durand-Salmon, Claire Gardent
https://www.aclweb.org/anthology/2020.coling-main.55
A key bottleneck for developing dialog models is the lack of adequate training data. Due to privacy issues, dialog data is even scarcer in the health domain. We propose a novel method for creating dialog corpora which we apply to create doctor-patient interaction data. We use this data to learn both a generation and a hybrid classification/retrieval model and find that the generation model consistently outperforms the hybrid model. We show that our data creation method has several advantages. Not only does it allow for the semi-automatic creation of large quantities of training data. It also provides a natural way of guiding learning and a novel method for assessing the quality of human-machine interactions.
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
57
A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents
Tuan Lai, Trung Bui, Doo Soon Kim, Quan Hung Tran
https://www.aclweb.org/anthology/2020.coling-main.56
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant portion of these articles contain keyphrases provided by their authors, most other articles lack such kind of annotations. Therefore, to effectively utilize these large amounts of unlabeled articles, we propose a simple and efficient joint learning approach based on the idea of self-distillation. Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction. Furthermore, our best models outperform previous methods for the task, achieving new state-of-the-art results on two public benchmarks: Inspec and SemEval-2017.
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
58
Enhancing Clinical BERT Embedding using a Biomedical Knowledge Base
Boran Hao, Henghui Zhu, Ioannis Paschalidis
https://www.aclweb.org/anthology/2020.coling-main.57
Domain knowledge is important for building Natural Language Processing (NLP) systems for low-resource settings, such as in the clinical domain. In this paper, a novel joint training method is introduced for adding knowledge base information from the Unified Medical Language System (UMLS) into language model pre-training for some clinical domain corpus. We show that in three different downstream clinical NLP tasks, our pre-trained language model outperforms the corresponding model with no knowledge base information and other state-of-the-art models. Specifically, in a natural language inference task applied to clinical texts, our knowledge base pre-training approach improves accuracy by up to 1.7%, whereas in clinical name entity recognition tasks, the F1-score improves by up to 1.0%. The pre-trained models are available at https://github.com/noc-lab/clinical-kb-bert.
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
59
TIMBERT: Toponym Identifier For The Medical Domain Based on BERT
MohammadReza Davari, Leila Kosseim, Tien Bui
https://www.aclweb.org/anthology/2020.coling-main.58
In this paper, we propose an approach to automate the process of place name detection in the medical domain to enable epidemiologists to better study and model the spread of viruses. We created a family of Toponym Identification Models based on BERT (TIMBERT), in order to learn in an end-to-end fashion the mapping from an input sentence to the associated sentence labeled with toponyms. When evaluated with the SemEval 2019 task 12 test set (Weissenbacher et al., 2019), our best TIMBERT model achieves an F1 score of 90.85%, a significant improvement compared to the state-of-the-art of 89.13% (Wang et al., 2019).
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
60
BioMedBERT: A Pre-trained Biomedical Language Model for QA and IR
Souradip Chakraborty, Ekaba Bisong, Shweta Bhatt, Thomas Wagner, Riley Elliott, Francesco Mosconi
https://www.aclweb.org/anthology/2020.coling-main.59
The SARS-CoV-2 (COVID-19) pandemic spotlighted the importance of moving quickly with biomedical research. However, as the number of biomedical research papers continue to increase, the task of finding relevant articles to answer pressing questions has become significant. In this work, we propose a textual data mining tool that supports literature search to accelerate the work of researchers in the biomedical domain. We achieve this by building a neural-based deep contextual understanding model for Question-Answering (QA) and Information Retrieval (IR) tasks. We also leverage the new BREATHE dataset which is one of the largest available datasets of biomedical research literature, containing abstracts and full-text articles from ten different biomedical literature sources on which we pre-train our BioMedBERT model. Our work achieves state-of-the-art results on the QA fine-tuning task on BioASQ 5b, 6b and 7b datasets. In addition, we observe superior relevant results when BioMedBERT embeddings are used with Elasticsearch for the Information Retrieval task on the intelligently formulated BioASQ dataset. We believe our diverse dataset and our unique model architecture are what led us to achieve the state-of-the-art results for QA and IR tasks.
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
61
Extracting Adherence Information from Electronic Health Records
Jordan Sanders, Meghana Gudala, Kathleen Hamilton, Nishtha Prasad, Jordan Stovall, Eduardo Blanco, Jane E Hamilton, Kirk Roberts
https://www.aclweb.org/anthology/2020.coling-main.60
Patient adherence is a critical factor in health outcomes. We present a framework to extract adherence information from electronic health records, including both sentence-level information indicating general adherence information (full, partial, none, etc.) and span-level information providing additional information such as adherence type (medication or nonmedication), reasons and outcomes. We annotate and make publicly available a new corpus of 3,000 de-identified sentences, and discuss the language physicians use to document adherence information. We also explore models based on state-of-the-art transformers to automate both tasks.
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
62
Identifying Depressive Symptoms from Tweets: Figurative Language Enabled Multitask Learning Framework
Shweta Yadav, Jainish Chauhan, Joy Prakash Sain, Krishnaprasad Thirunarayan, Amit Sheth, Jeremiah Schumm
https://www.aclweb.org/anthology/2020.coling-main.61
Existing studies on using social media for deriving mental health status of users focus on the depression detection task. However, for case management and referral to psychiatrists, health-care workers require practical and scalable depressive disorder screening and triage system. This study aims to design and evaluate a decision support system (DSS) to reliably determine the depressive triage level by capturing fine-grained depressive symptoms expressed in user tweets through the emulation of the Patient Health Questionnaire-9 (PHQ-9) that is routinely used in clinical practice. The reliable detection of depressive symptoms from tweets is challenging because the 280-character limit on tweets incentivizes the use of creative artifacts in the utterances and figurative usage contributes to effective expression. We propose a novel BERT based robust multi-task learning framework to accurately identify the depressive symptoms using the auxiliary task of figurative usage detection. Specifically, our proposed novel task sharing mechanism,co-task aware attention, enables automatic selection of optimal information across the BERT lay-ers and tasks by soft-sharing of parameters. Our results show that modeling figurative usage can demonstrably improve the model’s robustness and reliability for distinguishing the depression symptoms.
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
63
French Biomedical Text Simplification: When Small and Precise Helps
Rémi Cardon, Natalia Grabar
https://www.aclweb.org/anthology/2020.coling-main.62
We present experiments on biomedical text simplification in French. We use two kinds of corpora – parallel sentences extracted from existing health comparable corpora in French and WikiLarge corpus translated from English to French – and a lexicon that associates medical terms with paraphrases. Then, we train neural models on these parallel corpora using different ratios of general and specialized sentences. We evaluate the results with BLEU, SARI and Kandel scores. The results point out that little specialized data helps significantly the simplification.
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
64
Summarizing Medical Conversations via Identifying Important Utterances
Yan Song, Yuanhe Tian, Nan Wang, Fei Xia
https://www.aclweb.org/anthology/2020.coling-main.63
Summarization is an important natural language processing (NLP) task in identifying key information from text. For conversations, the summarization systems need to extract salient contents from spontaneous utterances by multiple speakers. In a special task-oriented scenario, namely medical conversations between patients and doctors, the symptoms, diagnoses, and treatments could be highly important because the nature of such conversation is to find a medical solution to the problem proposed by the patients. Especially consider that current online medical platforms provide millions of public available conversations between real patients and doctors, where the patients propose their medical problems and the registered doctors offer diagnosis and treatment, a conversation in most cases could be too long and the key information is hard to be located. Therefore, summarizations to the patients’ problems and the doctors’ treatments in the conversations can be highly useful, in terms of helping other patients with similar problems have a precise reference for potential medical solutions. In this paper, we focus on medical conversation summarization, using a dataset of medical conversations and corresponding summaries which were crawled from a well-known online healthcare service provider in China. We propose a hierarchical encoder-tagger model (HET) to generate summaries by identifying important utterances (with respect to problem proposing and solving) in the conversations. For the particular dataset used in this study, we show that high-quality summaries can be generated by extracting two types of utterances, namely, problem statements and treatment recommendations. Experimental results demonstrate that HET outperforms strong baselines and models from previous studies, and adding conversation-related features can further improve system performance.
POSTER3: Applications: Biomedical, health records and medical texts. Posters
Tuesday, December 8, 202016:30
65
Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case
Adam Dahlgren Lindström, Johanna Björklund, Suna Bensch, Frank Drewes
https://www.aclweb.org/anthology/2020.coling-main.64
Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic embeddings comes from the distillation and enrichment of information through machine learning, their inner workings are poorly understood and there is a shortage of analysis tools. To address this problem, we generalize the notion ofprobing tasks to the visual-semantic case. To this end, we (i) discuss the formalization of probing tasks for embeddings of image-caption pairs, (ii) define three concrete probing tasks within our general framework, (iii) train classifiers to probe for those properties, and (iv) compare various state-of-the-art embeddings under the lens of the proposed probing tasks. Our experiments reveal an up to 16% increase in accuracy on visual-semantic embeddings compared to the corresponding unimodal embeddings, which suggest that the text and image dimensions represented in the former do complement each other.
LONG6: Language Modelling 2
Tuesday, December 8, 202017:00
66
Linguistic Profiling of a Neural Language Model
Alessio Miaschi, Dominique Brunato, Felice Dell’Orletta, Giulia Venturi
https://www.aclweb.org/anthology/2020.coling-main.65
In this paper we investigate the linguistic knowledge learned by a Neural Language Model (NLM) before and after a fine-tuning process and how this knowledge affects its predictions during several classification problems. We use a wide set of probing tasks, each of which corresponds to a distinct sentence-level feature extracted from different levels of linguistic annotation. We show that BERT is able to encode a wide range of linguistic characteristics, but it tends to lose this information when trained on specific downstream tasks. We also find that BERT’s capacity to encode different kind of linguistic properties has a positive influence on its predictions: the more it stores readable linguistic information of a sentence, the higher will be its capacity of predicting the expected label assigned to that sentence.
LONG6: Language Modelling 2
Tuesday, December 8, 202017:00
67
IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP
Fajri Koto, Afshin Rahimi, Jey Han Lau, Timothy Baldwin
https://www.aclweb.org/anthology/2020.coling-main.66
Although the Indonesian language is spoken by almost 200 million people and the 10th most spoken language in the world, it is under-represented in NLP research. Previous work on Indonesian has been hampered by a lack of annotated datasets, a sparsity of language resources, and a lack of resource standardization. In this work, we release the IndoLEM dataset comprising seven tasks for the Indonesian language, spanning morpho-syntax, semantics, and discourse. We additionally release IndoBERT, a new pre-trained language model for Indonesian, and evaluate it over IndoLEM, in addition to benchmarking it against existing resources. Our experiments show that IndoBERT achieves state-of-the-art performance over most of the tasks in IndoLEM.
LONG6: Language Modelling 2
Tuesday, December 8, 202017:00
68
A Closer Look at Linguistic Knowledge in Masked Language Models: The Case of Relative Clauses in American English
Marius Mosbach, Stefania Degaetano-Ortlieb, Marie-Pauline Krielke, Badr M. Abdullah, Dietrich Klakow
https://www.aclweb.org/anthology/2020.coling-main.67
Transformer-based language models achieve high performance on various tasks, but we still lack understanding of the kind of linguistic knowledge they learn and rely on. We evaluate three models (BERT, RoBERTa, and ALBERT), testing their grammatical and semantic knowledge by sentence-level probing, diagnostic cases, and masked prediction tasks. We focus on relative clauses (in American English) as a complex phenomenon needing contextual information and antecedent identification to be resolved. Based on a naturalistic dataset, probing shows that all three models indeed capture linguistic knowledge about grammaticality, achieving high performance.Evaluation on diagnostic cases and masked prediction tasks considering fine-grained linguistic knowledge, however, shows pronounced model-specific weaknesses especially on semantic knowledge, strongly impacting models’ performance. Our results highlight the importance of (a)model comparison in evaluation task and (b) building up claims of model performance and the linguistic knowledge they capture beyond purely probing-based evaluations.
LONG6: Language Modelling 2
Tuesday, December 8, 202017:00
69
Modeling language evolution and feature dynamics in a realistic geographic environment
Rhea Kapur, Phillip Rogers
https://www.aclweb.org/anthology/2020.coling-main.68
Recent, innovative efforts to understand the uneven distribution of languages and linguistic feature values in time and space attest to both the challenge these issues pose and the value in solving them. In this paper, we introduce a model for simulating languages and their features over time in a realistic geographic environment. At its core is a model of language phylogeny and migration whose parameters are chosen to reproduce known language family sizes and geographic dispersions. This foundation in turn is used to explore the dynamics of linguistic features. Languages are assigned feature values that can change randomly or under the influence of nearby languages according to predetermined probabilities. We assess the effects of these settings on resulting geographic and genealogical patterns using homogeneity measures defined in the literature. The resulting model is both flexible and realistic, and it can be employed to answer a wide range of related questions.
LONG6: Language Modelling 2
Tuesday, December 8, 202017:00
70
Syntax-Aware Graph Attention Network for Aspect-Level Sentiment Classification
Lianzhe Huang, Xin Sun, Sujian Li, Linhao Zhang, Houfeng Wang
https://www.aclweb.org/anthology/2020.coling-main.69
Aspect-level sentiment classification aims to distinguish the sentiment polarities over aspect terms in a sentence. Existing approaches mostly focus on modeling the relationship between the given aspect words and their contexts with attention, and ignore the use of more elaborate knowledge implicit in the context. In this paper, we exploit syntactic awareness to the model by the graph attention network on the dependency tree structure and external pre-training knowledge by BERT language model, which helps to model the interaction between the context and aspect words better. And the subwords of BERT are integrated into the dependency tree graphs, which can obtain more accurate representations of words by graph attention. Experiments demonstrate the effectiveness of our model.
LONG7: Sentiment Analysis 2
Tuesday, December 8, 202017:00
71
Attention Transfer Network for Aspect-level Sentiment Classification
Fei Zhao, Zhen Wu, Xinyu Dai
https://www.aclweb.org/anthology/2020.coling-main.70
Aspect-level sentiment classification (ASC) aims to detect the sentiment polarity of a given opinion target in a sentence. In neural network-based methods for ASC, most works employ the attention mechanism to capture the corresponding sentiment words of the opinion target, then aggregate them as evidence to infer the sentiment of the target. However, aspect-level datasets are all relatively small-scale due to the complexity of annotation. Data scarcity causes the attention mechanism sometimes to fail to focus on the corresponding sentiment words of the target, which finally weakens the performance of neural models. To address the issue, we propose a novel Attention Transfer Network (ATN) in this paper, which can successfully exploit attention knowledge from resource-rich document-level sentiment classification datasets to improve the attention capability of the aspect-level sentiment classification task. In the ATN model, we design two different methods to transfer attention knowledge and conduct experiments on two ASC benchmark datasets. Extensive experimental results show that our methods consistently outperform state-of-the-art works. Further analysis also validates the effectiveness of ATN.
LONG7: Sentiment Analysis 2
Tuesday, December 8, 202017:00
72
Label Correction Model for Aspect-based Sentiment Analysis
Qianlong Wang, Jiangtao Ren
https://www.aclweb.org/anthology/2020.coling-main.71
Aspect-based sentiment analysis includes opinion aspect extraction and aspect sentiment classification. Researchers have attempted to discover the relationship between these two sub-tasks and have proposed the joint model for solving aspect-based sentiment analysis. However, they ignore a phenomenon: aspect boundary label and sentiment label of the same word can correct each other. To exploit this phenomenon, we propose a novel deep learning model named the label correction model. Specifically, given an input sentence, our model first predicts the aspect boundary label sequence and sentiment label sequence, then re-predicts the aspect boundary (sentiment) label sequence using the embeddings of the previously predicted sentiment (aspect boundary) label. The goal of the re-prediction operation (can be repeated multiple times) is to use the information of the sentiment (aspect boundary) label to correct the wrong aspect boundary (sentiment) label. Moreover, we explore two ways of using label embeddings: add and gate mechanism. We evaluate our model on three benchmark datasets. Experimental results verify that our model achieves state-of-the-art performance compared with several baselines.
LONG7: Sentiment Analysis 2
Tuesday, December 8, 202017:00
73
Aspect-Category based Sentiment Analysis with Hierarchical Graph Convolutional Network
Hongjie Cai, Yaofeng Tu, Xiangsheng Zhou, Jianfei Yu, Rui Xia
https://www.aclweb.org/anthology/2020.coling-main.72
Most of the aspect based sentiment analysis research aims at identifying the sentiment polarities toward some explicit aspect terms while ignores implicit aspects in text. To capture both explicit and implicit aspects, we focus on aspect-category based sentiment analysis, which involves joint aspect category detection and category-oriented sentiment classification. However, currently only a few simple studies have focused on this problem. The shortcomings in the way they defined the task make their approaches difficult to effectively learn the inner-relations between categories and the inter-relations between categories and sentiments. In this work, we re-formalize the task as a category-sentiment hierarchy prediction problem, which contains a hierarchy output structure to first identify multiple aspect categories in a piece of text, and then predict the sentiment for each of the identified categories. Specifically, we propose a Hierarchical Graph Convolutional Network (Hier-GCN), where a lower-level GCN is to model the inner-relations among multiple categories, and the higher-level GCN is to capture the inter-relations between aspect categories and sentiments. Extensive evaluations demonstrate that our hierarchy output structure is superior over existing ones, and the Hier-GCN model can consistently achieve the best results on four benchmarks.
LONG7: Sentiment Analysis 2
Tuesday, December 8, 202017:00
74
Constituency Lattice Encoding for Aspect Term Extraction
Yunyi Yang, Kun Li, Xiaojun Quan, Weizhou Shen, Qinliang Su
https://www.aclweb.org/anthology/2020.coling-main.73
One of the remaining challenges for aspect term extraction in sentiment analysis resides in the extraction of phrase-level aspect terms, which is non-trivial to determine the boundaries of such terms. In this paper, we aim to address this issue by incorporating the span annotations of constituents of a sentence to leverage the syntactic information in neural network models. To this end, we first construct a constituency lattice structure based on the constituents of a constituency tree. Then, we present two approaches to encoding the constituency lattice using BiLSTM-CRF and BERT as the base models, respectively. We experimented on two benchmark datasets to evaluate the two models, and the results confirm their superiority with respective 3.17 and 1.35 points gained in F1-Measure over the current state of the art. The improvements justify the effectiveness of the constituency lattice for aspect term extraction.
LONG7: Sentiment Analysis 2
Tuesday, December 8, 202017:00
75
A Corpus for Argumentative Writing Support in German
Thiemo Wambsganss, Christina Niklaus, Matthias Söllner, Siegfried Handschuh, Jan Marco Leimeister
https://www.aclweb.org/anthology/2020.coling-main.74
In this paper, we present a novel annotation approach to capture claims and premises of arguments and their relations in student-written persuasive peer reviews on business models in German language. We propose an annotation scheme based on annotation guidelines that allows to model claims and premises as well as support and attack relations for capturing the structure of argumentative discourse in student-written peer reviews. We conduct an annotation study with three annotators on 50 persuasive essays to evaluate our annotation scheme. The obtained inter-rater agreement of α = 0.57 for argument components and α = 0.49 for argumentative relations indicates that the proposed annotation scheme successfully guides annotators to moderate agreement. Finally, we present our freely available corpus of 1,000 persuasive student-written peer reviews on business models and our annotation guidelines to encourage future research on the design and development of argumentative writing support systems for students.
LONG8: Applications1
Tuesday, December 8, 202017:00
76
Do Word Embeddings Capture Spelling Variation?Dong Nguyen, Jack Grieve
https://www.aclweb.org/anthology/2020.coling-main.75
Analyses of word embeddings have primarily focused on semantic and syntactic properties. However, word embeddings have the potential to encode other properties as well. In this paper, we propose a new perspective on the analysis of word embeddings by focusing on spelling variation. In social media, spelling variation is abundant and often socially meaningful. Here, we analyze word embeddings trained on Twitter and Reddit data. We present three analyses using pairs of word forms covering seven types of spelling variation in English. Taken together, our results show that word embeddings encode spelling variation patterns of various types to some extent, even embeddings trained using the skipgram model which does not take spelling into account. Our results also suggest a link between the intentionality of the variation and the distance of the non-conventional spellings to their conventional spellings.
LONG8: Applications1
Tuesday, December 8, 202017:00
77
Don't take “nswvtnvakgxpm” for an answer –The surprising vulnerability of automatic content scoring systems to adversarial input
Yuning Ding, Brian Riordan, Andrea Horbach, Aoife Cahill, Torsten Zesch
https://www.aclweb.org/anthology/2020.coling-main.76
Automatic content scoring systems are widely used on short answer tasks to save human effort. However, the use of these systems can invite cheating strategies, such as students writing irrelevant answers in the hopes of gaining at least partial credit. We generate adversarial answers for benchmark content scoring datasets based on different methods of increasing sophistication and show that even simple methods lead to a surprising decrease in content scoring performance. As an extreme example, up to 60% of adversarial answers generated from random shuffling of words in real answers are accepted by a state-of-the-art scoring system. In addition to analyzing the vulnerabilities of content scoring systems, we examine countermeasures such as adversarial training and show that these measures improve system robustness against adversarial answers considerably but do not suffice to completely solve the problem.
LONG8: Applications1
Tuesday, December 8, 202017:00
78
Automated Prediction of Examinee Proficiency from Short-Answer Questions
Le An Ha, Victoria Yaneva, Polina Harik, Ravi Pandian, Amy Morales, Brian Clauser
https://www.aclweb.org/anthology/2020.coling-main.77
This paper brings together approaches from the fields of NLP and psychometric measurement to address the problem of predicting examinee proficiency from responses to short-answer questions (SAQs). While previous approaches train on manually labeled data to predict the human-ratings assigned to SAQ responses, the approach presented here models examinee proficiency directly and does not require manually labeled data to train on. We use data from a large medical exam where experimental SAQ items are embedded alongside 106 scored multiple-choice questions (MCQs). First, the latent trait of examinee proficiency is measured using the scored MCQs and then a model is trained on the experimental SAQ responses as input, aiming to predict proficiency as its target variable. The predicted value is then used as a “score” for the SAQ response and evaluated in terms of its contribution to the precision of proficiency estimation.
LONG8: Applications1
Tuesday, December 8, 202017:00
79
Exploring Cross-sentence Contexts for Named Entity Recognition with BERT
Jouni Luoma, Sampo Pyysalo
https://www.aclweb.org/anthology/2020.coling-main.78
Named entity recognition (NER) is frequently addressed as a sequence classification task with each input consisting of one sentence of text. It is nevertheless clear that useful information for NER is often found also elsewhere in text. Recent self-attention models like BERT can both capture long-distance relationships in input and represent inputs consisting of several sentences. This creates opportunities for adding cross-sentence information in natural language processing tasks. This paper presents a systematic study exploring the use of cross-sentence information for NER using BERT models in five languages. We find that adding context as additional sentences to BERT input systematically increases NER performance. Multiple sentences in input samples allows us to study the predictions of the sentences in different contexts. We propose a straightforward method, Contextual Majority Voting (CMV), to combine these different predictions and demonstrate this to further increase NER performance. Evaluation on established datasets, including the CoNLL’02 and CoNLL’03 NER benchmarks, demonstrates that our proposed approach can improve on the state-of-the-art NER results on English, Dutch, and Finnish, achieves the best reported BERT-based results on German, and is on par with other BERT-based approaches in Spanish. We release all methods implemented in this work under open licenses.
LONG8: Applications1
Tuesday, December 8, 202017:00
80
Cross-lingual Annotation Projection in Legal Texts
Andrea Galassi, Kasper Drazewski, Marco Lippi, Paolo Torroni
https://www.aclweb.org/anthology/2020.coling-main.79
We study annotation projection in text classification problems where source documents are published in multiple languages and may not be an exact translation of one another. In particular, we focus on the detection of unfair clauses in privacy policies and terms of service. We present the first English-German parallel asymmetric corpus for the task at hand. We study and compare several language-agnostic sentence-level projection methods. Our results indicate that a combination of word embeddings and dynamic time warping performs best.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
81
Deep Learning Framework for Measuring the Digital Strategy of Companies from Earnings Calls
Ahmed Ghanim Al-Ali, Robert Phaal, Donald Sull
https://www.aclweb.org/anthology/2020.coling-main.80
Companies today are racing to leverage the latest digital technologies, such as artificial intelligence, blockchain, and cloud computing. However, many companies report that their strategies did not achieve the anticipated business results. This study is the first to apply state-of-the-art NLP models on unstructured data to understand the different clusters of digital strategy patterns that companies are Adopting. We achieve this by ana-lyzing earnings calls from Fortune’s Global 500 companies between 2015 and 2019. We use Transformer-based architecture for text classification which show a better understanding of the conversation context. We then investigate digital strategy patterns by applying clustering analysis. Our findings suggest that Fortune 500 companies use four distinct strategies which are product-led, customer experience-led, service-led, and efficiency-led . This work provides an empirical baseline for companies and researchers to enhance our understanding of the field.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
82
A Dataset and Evaluation Framework for Complex Geographical Description Parsing
Egoitz Laparra, Steven Bethard
https://www.aclweb.org/anthology/2020.coling-main.81
Much previous work on geoparsing has focused on identifying and resolving individual toponyms in text like Adrano, S.Maria di Licodia or Catania. However, geographical locations occur not only as individual toponyms, but also as compositions of reference geolocations joined and modified by connectives, e.g., “. . . between the towns of Adrano and S.Maria di Licodia, 32 kilometres northwest of Catania”. Ideally, a geoparser should be able to take such text, and the geographical shapes of the toponyms referenced within it, and parse these into a geographical shape, formed by a set of coordinates, that represents the location described. But creating a dataset for this complex geoparsing task is difficult and, if done manually, would require a huge amount of effort to annotate the geographical shapes of not only the geolocation described but also the reference toponyms. We present an approach that automates most of the process by combining Wikipedia and OpenStreetMap. As a result, we have gathered a collection of 360,187 uncurated complex geolocation descriptions, from which we have manually curated 1,000 examples intended to be used as a test set. To accompany the data, we define a new geoparsing evaluation framework along with a scoring methodology and a set of baselines.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
83
DocBank: A Benchmark Dataset for Document Layout Analysis
Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou
https://www.aclweb.org/anthology/2020.coling-main.82
Document layout analysis usually relies on computer vision models to understand documents while ignoring textual information that is vital to capture. Meanwhile, high quality labeled datasets with both visual and textual information are still insufficient. In this paper, we present DocBank, a benchmark dataset that contains 500K document pages with fine-grained token-level annotations for document layout analysis. DocBank is constructed using a simple yet effective way with weak supervision from the LaTeX documents available on the arXiv.com. With DocBank, models from different modalities can be compared fairly and multi-modal approaches will be further investigated and boost the performance of document layout analysis. We build several strong baselines and manually split train/dev/test sets for evaluation. Experiment results show that models trained on DocBank accurately recognize the layout information for a variety of documents. The DocBank dataset is publicly available at https://github.com/doc-analysis/DocBank.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
84
Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain
Dongmin Hyun, Junsu Cho, Hwanjo Yu
https://www.aclweb.org/anthology/2020.coling-main.83
We release large-scale datasets of users’ comments in two languages, English and Korean, for aspect-level sentiment analysis in automotive domain. The datasets consist of 58,000+ commentaspect pairs, which are the largest compared to existing datasets. In addition, this work covers new language (i.e., Korean) along with English for aspect-level sentiment analysis. We build the datasets from automotive domain to enable users (e.g., marketers in automotive companies) to analyze the voice of customers on automobiles. We also provide baseline performances for future work by evaluating recent models on the released datasets.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
85
A High Precision Pipeline for Financial Knowledge Graph Construction
Sarah Elhammadi, Laks V.S. Lakshmanan, Raymond Ng, Michael Simpson, Baoxing Huai, Zhefeng Wang, Lanjun Wang
https://www.aclweb.org/anthology/2020.coling-main.84
Motivated by applications such as question answering, fact checking, and data integration, there is significant interest in constructing knowledge graphs by extracting information from unstructured information sources, particularly text documents. Knowledge graphs have emerged as a standard for structured knowledge representation, whereby entities and their inter-relations are represented and conveniently stored as (subject,predicate,object) triples in a graph that can be used to power various downstream applications. The proliferation of financial news sources reporting on companies, markets, currencies, and stocks presents an opportunity for extracting valuable knowledge about this crucial domain. In this paper, we focus on constructing a knowledge graph automatically by information extraction from a large corpus of financial news articles. For that purpose, we develop a high precision knowledge extraction pipeline tailored for the financial domain. This pipeline combines multiple information extraction techniques with a financial dictionary that we built, all working together to produce over 342,000 compact extractions from over 288,000 financial news articles, with a precision of 78% at the top-100 extractions.The extracted triples are stored in a knowledge graph making them readily available for use in downstream applications.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
86
Financial Sentiment Analysis: An Investigation into Common Mistakes and Silver Bullets
Frank Xing, Lorenzo Malandri, Yue Zhang, Erik Cambria
https://www.aclweb.org/anthology/2020.coling-main.85
The recent dominance of machine learning-based natural language processing methods has fostered the culture of overemphasizing model accuracies rather than studying the reasons behind their errors. Interpretability, however, is a critical requirement for many downstream AI and NLP applications, e.g., in finance, healthcare, and autonomous driving. This study, instead of proposing any “new model”, investigates the error patterns of some widely acknowledged sentiment analysis methods in the finance domain. We discover that (1) those methods belonging to the same clusters are prone to similar error patterns, and (2) there are six types of linguistic features that are pervasive in the common errors. These findings provide important clues and practical considerations for improving sentiment analysis models for financial applications.
N/A, maybe the title changed between acceptance and camera-ready?
N/A, maybe the title changed between acceptance and camera-ready?
N/A, maybe the title changed between acceptance and camera-ready?
87
Answering Legal Questions by Learning Neural Attentive Text Representation
Phi Manh Kien, Ha-Thanh Nguyen, Ngo Xuan Bach, Vu Tran, Minh Le Nguyen, Tu Minh Phuong
https://www.aclweb.org/anthology/2020.coling-main.86
Text representation plays a vital role in retrieval-based question answering, especially in the legal domain where documents are usually long and complicated. The better the question and the legal documents are represented, the more accurate they are matched. In this paper, we focus on the task of answering legal questions at the article level. Given a legal question, the goal is to retrieve all the correct and valid legal articles, that can be used as the basic to answer the question. We present a retrieval-based model for the task by learning neural attentive text representation. Our text representation method first leverages convolutional neural networks to extract important information in a question and legal articles. Attention mechanisms are then used to represent the question and articles and select appropriate information to align them in a matching process. Experimental results on an annotated corpus consisting of 5,922 Vietnamese legal questions show that our model outperforms state-of-the-art retrieval-based methods for question answering by large margins in terms of both recall and NDCG.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
88
Joint Transformer/RNN Architecture for Gesture Typing in Indic Languages
Emil Biju, Anirudh Sriram, Mitesh M. Khapra, Pratyush Kumar
https://www.aclweb.org/anthology/2020.coling-main.87
Gesture typing is a method of typing words on a touch-based keyboard by creating a continuous trace passing through the relevant keys. This work is aimed at developing a keyboard that supports gesture typing in Indic languages. We begin by noting that when dealing with Indic languages, one needs to cater to two different sets of users: (i) users who prefer to type in the native Indic script (Devanagari, Bengali, etc.) and (ii) users who prefer to type in the English script but want the transliterated output in the native script. In both cases, we need a model that takes a trace as input and maps it to the intended word. To enable the development of these models, we create and release two datasets. First, we create a dataset containing keyboard traces for 193,658 words from 7 Indic languages. Second, we curate 104,412 English-Indic transliteration pairs from Wikidata across these languages. Using these datasets we build a model that performs path decoding, transliteration and transliteration correction. Unlike prior approaches, our proposed model does not make co-character independence assumptions during decoding. The overall accuracy of our model across the 7 languages varies from 70-95%.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
89
Automatic Charge Identification from Facts: A Few Sentence-Level Charge Annotations is All You Need
Shounak Paul, Pawan Goyal, Saptarshi Ghosh
https://www.aclweb.org/anthology/2020.coling-main.88
Automatic Charge Identification (ACI) is the task of identifying the relevant charges given the facts of a situation and the statutory laws that define these charges, and is a crucial aspect of the judicial process. Existing works focus on learning charge-side representations by modeling relationships between the charges, but not much effort has been made in improving fact-side representations. We observe that only a small fraction of sentences in the facts actually indicates the charges. We show that by using a very small subset (< 3%) of fact descriptions annotated with sentence-level charges, we can achieve an improvement across a range of different ACI models, as compared to modeling just the main document-level task on a much larger dataset. Additionally, we propose a novel model that utilizes sentence-level charge labels as an auxiliary task, coupled with the main task of document-level charge identification in a multi-task learning framework. The proposed model comprehensively outperforms a large number of recent baselines for ACI. The improvement in performance is particularly noticeable for the rare charges which are known to be especially challenging to identify.
N/A, maybe the title changed between acceptance and camera-ready?
N/A, maybe the title changed between acceptance and camera-ready?
N/A, maybe the title changed between acceptance and camera-ready?
90
Context-Aware Text Normalisation for Historical DialectsMaria Sukhareva
https://www.aclweb.org/anthology/2020.coling-main.89
Context-aware historical text normalisation is a severely under-researched area. To fill the gap we propose a context-aware normalisation approach that relies on the state-of-the-art methods in neural machine translation and transfer learning. We propose a multidialect normaliser with a context-aware reranking of the candidates. The reranker relies on a word-level n-gram language model that is applied to the five best normalisation candidates. The results are evaluated on the historical multidialect datasets of German, Spanish, Portuguese and Slovene. We show that incorporating dialectal information into the training leads to an accuracy improvement on all the datasets. The context-aware reranking gives further improvement over the baseline. For three out of six datasets, we reach a significantly higher accuracy than reported in the previous studies. The other three results are comparable with the current state-of-the-art. The code for the reranker is published as open-source.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
91
RuSemShift: a dataset of historical lexical semantic change in Russian
Julia Rodina, Andrey Kutuzov
https://www.aclweb.org/anthology/2020.coling-main.90
We present RuSemShift, a large-scale manually annotated test set for the task of semantic change modeling in Russian for two long-term time period pairs: from the pre-Soviet through the Soviet times and from the Soviet through the post-Soviet times. Target words were annotated by multiple crowd-source workers. The annotation process was organized following the DURel framework and was based on sentence contexts extracted from the Russian National Corpus. Additionally, we report the performance of several distributional approaches on RuSemShift, achieving promising results, which at the same time leave room for other researchers to improve.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
92
Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models
Seid Muhie Yimam, Hizkiel Mitiku Alemayehu, Abinew Ayele, Chris Biemann
https://www.aclweb.org/anthology/2020.coling-main.91
This paper presents the study of sentiment analysis for Amharic social media texts. As the number of social media users is ever-increasing, social media platforms would like to understand the latent meaning and sentiments of a text to enhance decision-making procedures. However, low-resource languages such as Amharic have received less attention due to several reasons such as lack of well-annotated datasets, unavailability of computing resources, and fewer or no expert researchers in the area. This research addresses three main research questions. We first explore the suitability of existing tools for the sentiment analysis task. Annotation tools are scarce to support large-scale annotation tasks in Amharic. Also, the existing crowdsourcing platforms do not support Amharic text annotation. Hence, we build a social-network-friendly annotation tool called ‘ASAB’ using the Telegram bot. We collect 9.4k tweets, where each tweet is annotated by three Telegram users. Moreover, we explore the suitability of machine learning approaches for Amharic sentiment analysis. The FLAIR deep learning text classifier, based on network embeddings that are computed from a distributional thesaurus, outperforms other supervised classifiers. We further investigate the challenges in building a sentiment analysis system for Amharic and we found that the widespread usage of sarcasm and figurative speech are the main issues in dealing with the problem. To advance the sentiment analysis research in Amharic and other related low-resource languages, we release the dataset, the annotation tool, source code, and models publicly under a permissive.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
93
Effective Few-Shot Classification with Transfer Learning
Aakriti Gupta, Kapil Thadani, Neil O’Hare
https://www.aclweb.org/anthology/2020.coling-main.92
Few-shot learning addresses the the problem of learning based on a small amount of training data. Although more well-studied in the domain of computer vision, recent work has adapted the Amazon Review Sentiment Classification (ARSC) text dataset for use in the few-shot setting. In this work, we use the ARSC dataset to study a simple application of transfer learning approaches to few-shot classification. We train a single binary classifier to learn all few-shot classes jointly by prefixing class identifiers to the input text. Given the text and class, the model then makes a binary prediction for that text/class pair. Our results show that this simple approach can outperform most published results on this dataset. Surprisingly, we also show that including domain information as part of the task definition only leads to a modest improvement in model accuracy, and zero-shot classification, without further fine-tuning on few-shot domains, performs equivalently to few-shot classification. These results suggest that the classes in the ARSC few-shot task, which are defined by the intersection of domain and rating, are actually very similar to each other, and that a more suitable dataset is needed for the study of few-shot text classification.
POSTER4: Applications: Legal, financial, business, humanities. Posters
Tuesday, December 8, 202017:00
94
SWAFN: Sentimental Words Aware Fusion Network for Multimodal Sentiment Analysis
Minping Chen, Xia Li
https://www.aclweb.org/anthology/2020.coling-main.93
Multimodal sentiment analysis aims to predict sentiment of language text with the help of other modalities, such as vision and acoustic features. Previous studies focused on learning the joint representation of multiple modalities, ignoring some useful knowledge contained in language modal. In this paper, we try to incorporate sentimental words knowledge into the fusion network to guide the learning of joint representation of multimodal features. Our method consists of two components: shallow fusion part and aggregation part. For the shallow fusion part, we use crossmodal coattention mechanism to obtain bidirectional context information of each two modals to get the fused shallow representations. For the aggregation part, we design a multitask of sentimental words classification to help and guide the deep fusion of the three modalities and obtain the final sentimental words aware fusion representation. We carry out several experiments on CMU-MOSI, CMU-MOSEI and YouTube datasets. The experimental results show that introducing sentimental words prediction as a multitask can really improve the fusion representation of multiple modalities.
LONG9: Multimodal 1
Tuesday, December 8, 202017:30
95
Multimodal Topic-Enriched Auxiliary Learning for Depression Detection
Minghui An, Jingjing Wang, Shoushan Li, Guodong Zhou
https://www.aclweb.org/anthology/2020.coling-main.94
From the perspective of health psychology, human beings with long-term and sustained negativity are highly possible to be diagnosed with depression. Inspired by this, we argue that the global topic information derived from user-generated contents (e.g., texts and images) is crucial to boost the performance of the depression detection task, though this information has been neglected by almost all previous studies on depression detection. To this end, we propose a new Multimodal Topic-enriched Auxiliary Learning (MTAL) approach, aiming at capturing the topic information inside different modalities (i.e., texts and images) for depression detection. Especially, in our approach, a modality-agnostic topic model is proposed to be capable of mining the topical clues from either the discrete textual signals or the continuous visual signals. On this basis, the topic modeling w.r.t. the two modalities are cast as two auxiliary tasks for improving the performance of the primary task (i.e., depression detection). Finally, the detailed evaluation demonstrates the great advantage of our MTAL approach to depression detection over the state-of-the-art baselines. This justifies the importance of the multimodal topic information to depression detection and the effectiveness of our approach in capturing such information.
LONG9: Multimodal 1
Tuesday, December 8, 202017:30
96
Imagining Grounded Conceptual Representations from Perceptual Information in Situated Guessing Games
Alessandro Suglia, Antonio Vergari, Ioannis Konstas, Yonatan Bisk, Emanuele Bastianelli, Andrea Vanzo, Oliver Lemon
https://www.aclweb.org/anthology/2020.coling-main.95
In visual guessing games, a Guesser has to identify a target object in a scene by asking questions to an Oracle. An effective strategy for the players is to learn conceptual representations of objects that are both discriminative and expressive enough to ask questions and guess correctly. However, as shown by Suglia et al. (2020), existing models fail to learn truly multi-modal representations, relying instead on gold category labels for objects in the scene both at training and inference time. This provides an unnatural performance advantage when categories at inference time match those at training time, and it causes models to fail in more realistic “zero-shot” scenarios where out-of-domain object categories are involved. To overcome this issue, we introduce a novel “imagination” module based on Regularized Auto-Encoders, that learns context-aware and category-aware latent embeddings without relying on category labels at inference time. Our imagination module outperforms state-of-the-art competitors by 8.26% gameplay accuracy in the CompGuessWhat?! zero-shot scenario (Suglia et al., 2020), and it improves the Oracle and Guesser accuracy by 2.08% and 12.86% in the GuessWhat?! benchmark, when no gold categories are available at inference time. The imagination module also boosts reasoning about object properties and attributes.
LONG9: Multimodal 1
Tuesday, December 8, 202017:30
97
Situated and Interactive Multimodal Conversations
Seungwhan Moon, Satwik Kottur, Paul Crook, Ankita De, Shivani Poddar, Theodore Levin, David Whitney, Daniel Difranco, Ahmad Beirami, Eunjoon Cho, Rajen Subba, Alborz Geramifard
https://www.aclweb.org/anthology/2020.coling-main.96
Next generation virtual assistants are envisioned to handle multimodal inputs (e.g., vision, memories of previous interactions, and the user’s utterances), and perform multimodal actions (, displaying a route while generating the system’s utterance). We introduce Situated Interactive MultiModal Conversations (SIMMC) as a new direction aimed at training agents that take multimodal actions grounded in a co-evolving multimodal input context in addition to the dialog history. We provide two SIMMC datasets totalling ~13K human-human dialogs (~169K utterances) collected using a multimodal Wizard-of-Oz (WoZ) setup, on two shopping domains: (a) furniture – grounded in a shared virtual environment; and (b) fashion – grounded in an evolving set of images. Datasets include multimodal context of the items appearing in each scene, and contextual NLU, NLG and coreference annotations using a novel and unified framework of SIMMC conversational acts for both user and assistant utterances. Finally, we present several tasks within SIMMC as objective evaluation protocols, such as structural API prediction, response generation, and dialog state tracking. We benchmark a collection of existing models on these SIMMC tasks as strong baselines, and demonstrate rich multimodal conversational interactions. Our data, annotations, and models will be made publicly available.
LONG9: Multimodal 1
Tuesday, December 8, 202017:30
98
Meet Changes with Constancy: Learning Invariance in Multi-Source Translation
Jianfeng Liu, Ling Luo, Xiang Ao, Yan Song, Haoran Xu, Jian Ye
https://www.aclweb.org/anthology/2020.coling-main.97
Multi-source neural machine translation aims to translate from parallel sources of information (e.g. languages, images, etc.) to a single target language, which has shown better performance than most one-to-one systems. Despite the remarkable success of existing models, they usually neglect the fact that multiple source inputs may have inconsistencies. Such differences might bring noise to the task and limit the performance of existing multi-source NMT approaches due to their indiscriminate usage of input sources for target word predictions. In this paper, we attempt to leverage the potential complementary information among distinct sources and alleviate the occasional conflicts of them. To accomplish that, we propose a source invariance network to learn the invariant information of parallel sources. Such network can be easily integrated with multi-encoder based multi-source NMT methods (e.g. multi-encoder RNN and transformer) to enhance the translation results. Extensive experiments on two multi-source translation tasks demonstrate that the proposed approach not only achieves clear gains in translation quality but also captures implicit invariance between different sources.
LONG9: Multimodal 1
Tuesday, December 8, 202017:30
99
Enhancing Neural Models with Vulnerability via Adversarial Attack
Rong Zhang, Qifei Zhou, Bo An, Weiping Li, Tong Mo, Bo Wu
https://www.aclweb.org/anthology/2020.coling-main.98
Natural Language Sentence Matching (NLSM) serves as the core of many natural language processing tasks. 1) Most previous work develops a single specific neural model for NLSM tasks. 2) There is no previous work considering adversarial attack to improve the performance of NLSM tasks. 3) Adversarial attack is usually used to generate adversarial samples that can fool neural models. In this paper, we first find a phenomenon that different categories of samples have different vulnerabilities. Vulnerability is the difficulty degree in changing the label of a sample. Considering the phenomenon, we propose a general two-stage training framework to enhance neural models with Vulnerability via Adversarial Attack (VAA). We design criteria to measure the vulnerability which is obtained by adversarial attack. VAA framework can be adapted to various neural models by incorporating the vulnerability. In addition, we prove a theorem and four corollaries to explain the factors influencing vulnerability effectiveness. Experimental results show that VAA significantly improves the performance of neural models on NLSM datasets. The results are also consistent with the theorem and corollaries. The code is released on https://github.com/rzhangpku/VAA.
LONG10: Machine learning 1
Tuesday, December 8, 202017:30
100
R-VGAE: Relational-variational Graph Autoencoder for Unsupervised Prerequisite Chain Learning
Irene Li, Alexander Fabbri, Swapnil Hingmire, Dragomir Radev
https://www.aclweb.org/anthology/2020.coling-main.99
The task of concept prerequisite chain learning is to automatically determine the existence of prerequisite relationships among concept pairs. In this paper, we frame learning prerequisite relationships among concepts as an unsupervised task with no access to labeled concept pairs during training. We propose a model called the Relational-Variational Graph AutoEncoder (R-VGAE) to predict concept relations within a graph consisting of concept and resource nodes. Results show that our unsupervised approach outperforms graph-based semi-supervised methods and other baseline methods by up to 9.77% and 10.47% in terms of prerequisite relation prediction accuracy and F1 score. Our method is notably the first graph-based model that attempts to make use of deep learning representations for the task of unsupervised prerequisite learning. We also expand an existing corpus which totals 1,717 English Natural Language Processing (NLP)-related lecture slide files and manual concept pair annotations over 322 topics.
LONG10: Machine learning 1
Tuesday, December 8, 202017:30