September 23, 2013
For many decades, NLP has suffered from low software engineering standards causing a limited degree of reusability of code and interoperability of different modules within larger NLP systems. While this did not really hamper success in limited task areas (such as implementing a parser), it caused serious problems for the emerging field of language technology where the focus is on building complex integrated software systems, e.g., for information extraction or machine translation. This lack of integration has led to duplicated software development, work-arounds for programs written in different (versions of) programming languages, and ad-hoc tweaking of interfaces between modules developed at different sites.
In recent years, the Unstructured Information Management Architecture (UIMA) framework has been proposed as a middleware platform which offers integration by design through common type systems and standardized communication methods for components analysing streams of unstructured information, such as natural language. The UIMA framework offers a solid processing infrastructure that allows developers to concentrate on the implementation of the actual analytics components. An increasing number of members of the NLP community thus have adopted UIMA as a platform facilitating the creation of reusable NLP components that can be assembled to address different NLP tasks depending on their order, combination and configuration.
This workshop aims at bringing together members of the NLP community -- users, developers or providers of either UIMA components or UIMA-related tools in order to explore and discuss the opportunities and challenges in using UIMA as a platform for modern, well-engineered NLP. Alternatives to and comparisons of other frameworks (such as GATE, LingPipe, etc) with UIMA are of interest, too.
In the context of an active NLP-oriented UIMA community, the challenge of creating reusable and interoperable components raises particular interest. From a methodological perspective, interoperability relies largely on UIMA type systems. Technically, it includes issues related to the packaging and distribution of UIMA components. Also, tools are important, for example to assemble complex processing workflows, to manage the bodies of data that are to be analysed and to visualize, explore, and further deploy the analysis results. Interoperability is also affected by legal issues, such as potentially incompatible licenses of components and tools. Further challenges are involved in embedding UIMA analysis within applications or using it in distributed computing scenarios, such as deployment of and access to required resources. Finally, the preservation of analysis results, their provenance and reproducibility are of particular interest to the scientific user community.
Workshop topics include, but are not limited to:
Welcome and Introduction
Storing UIMA CASes in a relational database
Georg Fette, Martin Toepfer, Frank Puppe
Aid to spatial navigation within a UIMA annotation index
A Model-driven approach to NLP programming with UIMA
Alessandro Di Bari, Alessandro Faraotti, Carmela Gambardella, Guido Vetere
Using UIMA to Structure An Open Platform for Textual Entailment
Tae-Gil Noh, Sebastian Padó
Bluima: a UIMA-based NLP Toolkit for Neuroscience
Renaud Richardet, Jean-Cédric Chappelier, Martin Telefont
Keynote: Apache clinical Text Analysis and Knowledge Extraction System (cTAKES)
Pei Chen, Guergana Savova
CSE Framework: A UIMA-based Distributed System for Configuration Space Exploration
Elmer Garduno, Zi Yang, Avner Maiberg, Collin McCormack, Yan Fang, Eric Nyberg
Constraint-driven Evaluation in UIMA Ruta
Andreas Wittek, Martin Toepfer, Georg Fette, Peter Kluegl, Frank Puppe
Sentiment Analysis and Visualization using UIMA and Solr
Carlos Rodríguez-Penagos, David García Narbona, Guillem Massó Sanabre, Jens Grivolla, Joan Codina
Extracting hierarchical data points and tables from scanned contracts
Jan Stadermann, Stephan Symons, Ingo Thon
Apache clinical Text Analysis and Knowledge Extraction System (cTAKES)
The presentation will focus on methods and software development behind the cTAKES platform. An overview of the modules will set the stage, followed by more in-depth discussion of some of the methods and evaluations of select modules. The second part of the presentation will shift to software development topics such as optimization and distributed computing including UIMA integration, UIMA-AS, as well as our plans for UIMA-DUCC integration. A live demo of cTAKES will wrap the talk.
About the speaker:
Pei Chen is a Vice President of Apache Software Foundation, leading the top-level cTAKES project (ctakes.apache.org). He is also a lead application development specialist at the Informatics Program at Boston Children’s Hospital/Harvard Medical School. Mr. Chen’s interests lie in building practical applications using machine learning techniques. He has a passion for the end-user experience and has a background Computer Science/Economics. Mr. Chen is a firm believer in the open source community contributing to cTAKES as well as other Apache Software Foundation Projects. Details at (not fully up-to-date) http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpageS3240P0.html
Guergana Savova, Ph.D. is faculty at Harvard Medical School and Childrens Hospital Boston. Her research interest is in natural language processing (NLP) especially as applied to the text generated by physicians (the clinical narrative) focusing on higher level semantic and discourse processing which includes topics such as named entity recognition, event recognition, relation detection and classification including co-reference and temporal relations. The methods are mostly machine learning spanning supervised, lightly supervised and completely unsupervised. Her interest is also in the application of the NLP methodologies to biomedical use cases. Dr. Savova has been leading the development and is the principal architect of cTAKES. She holds a Master’s of Science in Computer Science and a PhD in Linguistics with a minor in Cognitive Science from University of Minnesota. Details at (not fully up-to-date) http://childrenshospital.org/cfapps/research/data_admin/Site3240/mainpageS3240P0.html
This is the material from the UIMA tutorial held in conjunction with the 3rd UIMA@GCSL workshop.
We invite submissions of full papers, limited to 8 pages of text, and position papers or papers describing ongoing work as short papers, limited to 4 pages. Also, system demonstration
We invite submissions of full papers, limited to 8 pages of text, and position papers or papers describing ongoing work as short papers, limited to 4 pages. Also, system demonstration papers are welcome (4 pages). Submitted paper must be original, i.e. not published in an earlier workshop or conference or journal. Reviewing will not be anonymous but authors wishing to keep their anonymity may hide their identity on demand. The submitted papers will be reviewed by three members of the program committee.
The one-day workshop will be held with oral presentations of accepted papers. A comfortable time slot for discussions will be given. The workshop will also include a keynote on Apache cTAKES, the Apache clinical Text Analysis and Knowledge Extraction System which is also based on the UIMA framework.
Note that at least one author of each accepted paper must register and present the contribution. Accepted contributions are planned to appear as CEUR Workshop Proceedings (CEUR-WS.org).
Please address any inquiries regarding the workshop to: email@example.com