Published using Google Docs
Updated automatically every 5 minutes

Unstructured Information Management Architecture (UIMA)

3rd UIMA@GSCL Workshop

September 23, 2013


On the Workshop

For many decades, NLP has suffered from low software engineering standards causing a limited degree of reusability of code and interoperability of different modules within larger NLP systems. While this did not really hamper success in limited task areas (such as implementing a parser), it caused serious problems for the emerging field of language technology where the focus is on building complex integrated software systems, e.g., for information extraction or machine translation. This lack of integration has led to duplicated software development, work-arounds for programs written in different (versions of) programming languages, and ad-hoc tweaking of interfaces between modules developed at different sites.

In recent years, the Unstructured Information Management Architecture (UIMA) framework has been proposed as a middleware platform which offers integration by design through common type systems and standardized communication methods for components analysing streams of unstructured information, such as natural language. The UIMA framework offers a solid processing infrastructure that allows developers to concentrate on the implementation of the actual analytics components. An increasing number of members of the NLP community thus have adopted UIMA as a platform facilitating the creation of reusable NLP components that can be assembled to address different NLP tasks depending on their order, combination and configuration.

This workshop aims at bringing together members of the NLP community -- users, developers or providers of either UIMA components or UIMA-related tools in order to explore and discuss the opportunities and challenges in using UIMA as a platform for modern, well-engineered NLP. Alternatives to and comparisons of other frameworks (such as GATE, LingPipe, etc) with UIMA are of interest, too.

In the context of an active NLP-oriented UIMA community, the challenge of creating reusable and interoperable components raises particular interest. From a methodological perspective, interoperability relies largely on UIMA type systems. Technically, it includes issues related to the packaging and distribution of UIMA components. Also, tools are important, for example to assemble complex processing workflows, to manage the bodies of data that are to be analysed and to visualize, explore, and further deploy the analysis results. Interoperability is also affected by legal issues, such as potentially incompatible licenses of components and tools. Further challenges are involved in embedding UIMA analysis within applications or using it in distributed computing scenarios, such as deployment of and access to required resources. Finally, the preservation of analysis results, their provenance and reproducibility are of particular interest to the scientific user community.

Workshop topics include, but are not limited to: 


from 08:30 

Registration opens


Welcome and Introduction


Storing UIMA CASes in a relational database

Georg Fette, Martin Toepfer, Frank Puppe


Aid to spatial navigation within a UIMA annotation index

Nicolas Hernandez


Coffee break


A Model-driven approach to NLP programming with UIMA

Alessandro Di Bari, Alessandro Faraotti, Carmela Gambardella, Guido Vetere


Using UIMA to Structure An Open Platform for Textual Entailment

Tae-Gil Noh, Sebastian Padó


Bluima: a UIMA-based NLP Toolkit for Neuroscience

Renaud Richardet, Jean-Cédric Chappelier, Martin Telefont


Lunch break


Keynote: Apache clinical Text Analysis and Knowledge Extraction System (cTAKES)

Pei Chen, Guergana Savova


CSE Framework: A UIMA-based Distributed System for Configuration Space Exploration

Elmer Garduno, Zi Yang, Avner Maiberg, Collin McCormack, Yan Fang, Eric Nyberg


Coffee break


Constraint-driven Evaluation in UIMA Ruta

Andreas Wittek, Martin Toepfer, Georg Fette, Peter Kluegl, Frank Puppe


Sentiment Analysis and Visualization using UIMA and Solr

Carlos Rodríguez-Penagos, David García Narbona, Guillem Massó Sanabre, Jens Grivolla, Joan Codina


Extracting hierarchical data points and tables from scanned contracts

Jan Stadermann, Stephan Symons, Ingo Thon




Apache clinical Text Analysis and Knowledge Extraction System (cTAKES)


The presentation will focus on methods and software development behind the cTAKES platform. An overview of the modules will set the stage, followed by more in-depth discussion of some of the methods and evaluations of select modules. The second part of the presentation will shift to software development topics such as optimization and distributed computing including UIMA integration, UIMA-AS, as well as our plans for UIMA-DUCC integration. A live demo of cTAKES will wrap the talk.

About the speaker:

Pei Chen is a Vice President of Apache Software Foundation, leading the top-level cTAKES project ( He is also a lead application development specialist at the Informatics Program at Boston Children’s Hospital/Harvard Medical School. Mr. Chen’s interests lie in building practical applications using machine learning techniques. He has a passion for the end-user experience and has a background Computer Science/Economics. Mr. Chen is a firm believer in the open source community contributing to cTAKES as well as other Apache Software Foundation Projects. Details at (not fully up-to-date)

Guergana Savova, Ph.D. is faculty at Harvard Medical School and Childrens Hospital Boston. Her research interest is in natural language processing (NLP) especially as applied to the text generated by physicians (the clinical narrative) focusing on higher level semantic and discourse processing which includes topics such as named entity recognition, event recognition, relation detection and classification including co-reference and temporal relations. The methods are mostly machine learning spanning supervised, lightly supervised and completely unsupervised. Her interest is also in the application of the NLP methodologies to biomedical use cases. Dr. Savova has been leading the development and is the principal architect of cTAKES. She holds a Master’s of Science in Computer Science and a PhD in Linguistics with a minor in Cognitive Science from University of Minnesota. Details at (not fully up-to-date)


This is the material from the UIMA tutorial held in conjunction with the 3rd UIMA@GCSL workshop.


We invite submissions of full papers, limited to 8 pages of text, and position papers or papers describing ongoing work as short papers, limited to 4 pages. Also, system demonstration

We invite submissions of full papers, limited to 8 pages of text, and position papers or papers describing ongoing work as short papers, limited to 4 pages. Also, system demonstration papers are welcome (4 pages). Submitted paper must be original, i.e. not published in an earlier workshop or conference or journal. Reviewing will not be anonymous but authors wishing to keep their anonymity may hide their identity on demand. The submitted papers will be reviewed by three members of the program committee.

All submissions must be in English and follow the Springer LNCS style [1] and should be created using LaTeX. All papers must be submitted in PDF and via EasyChair [2].

The one-day workshop will be held with oral presentations of accepted papers. A comfortable time slot for discussions will be given. The workshop will also include a keynote on Apache cTAKES, the Apache clinical Text Analysis and Knowledge Extraction System which is also based on the UIMA framework.

Note that at least one author of each accepted paper must register and present the contribution. Accepted contributions are planned to appear as CEUR Workshop Proceedings (



Important Dates

Organizers and Contact

Please address any inquiries regarding the workshop to:

Program Committee