CommCare Technical Overview

Overview

CommCare is a multi-tier mobile, server, and messaging based platform. The platform enables end-users to build and configure content that a user interface, deploy that application to Android or J2ME based devices, and receive data back in real-time (if connected) from the mobile applications.  In addition, content may be defined that leverages bi-directional messaging to end-users via API interfaces to SMS Gateways, E-mails systems, or other messaging services.  The system leverages multiple persistence mechanisms, analytical frameworks, and open source libraries.  

IT System Interfaces

The system contains numerous internal interfaces between services.  Most internal system interfaces are based on a services architecture and use HTTPS to communicate.  External system interfaces are also available via Application Programming Interfaces (APIs) via HTTPS.  All system components are based on open source technology.  

Services

The following diagram depicts the overall architecture of CommCare. The system is logically broken up into different services (green boxes) and within those services are different modules which handle individual pieces of functionality. There are four primary architectural layers, which are (going from top to bottom):

  1. Mobile
  2. User/device layer
  3. Application layer
  4. Database layer

In this document, we refer to the Web and Server based components as sometimes referred to as “CommCare HQ,” which was the original name for the back-end CommCare system.  We will first discuss the CommCare HQ based services (layers 2 and 3). Documentation of the CommCare  mobile application is out of scope of this document.

 

Application Builder Service

Application Building

The Application Builder provides an interface for non-technical users to create and structure an application’s content and workflow. Questions can be added by type (text, integer, multiple answer, date, etc.) and logic conditions can be applied to determine whether the question should be displayed or if the answer is valid.

The underlying data model for CommCare uses forms and cases to track interactions with objects, often people.  A case is created in order to track the ongoing interactions with a case through form submissions.  Every time a form is filled out, it can either create a new case (to represent a person), update an existing case, or close an existing case.  Each case has a type, such as “participant”, or “relative”, which distinguishes it from cases of other types.  Cases may also be structured in a hierarchy using subcases, such that a case can be directly linked to its parent case for maintaining relationships between cases.

Tenant Management

The tenant management service is responsible for providing the multitenant layer on top of CommCare (so that users and data are partitioned according to a base unit called a project space).

User Management

Users are divided up into Web Users and CommCare Users (which we refer to publicly as Mobile Workers). The conceptual difference is that Web Users are project administrators who build applications, analyze data, manage frontline workers, etc, and CommCare Users are the frontline workers. Technically the differences are as follows: Web Users may have various permissions on any number of domains (“project spaces”) whereas CommCare Users are tied specifically to a single domain; and that CommCare Users can log into CloudCare and the mobile client to submit data whereas Web Users cannot use the mobile client and are discouraged from submitting non-test data through CloudCare, as reports are not designed to display Web User submissions well.

Web Users and CommCare Users are stored, with separate models, in CouchDB. These models include all permission and domain membership information, as well as some metadata about the user such as their email address, phone number, etc. Additionally, authentication stubs are synched realtime to SQL where they are saved as Django Users, allowing us to use standard Django authentication, as well as Django Digest, a third-party Django package for supporting HTTP Digest Authentication.

Additionally, CommCare Users may also be grouped into Groups. Groups may be flagged as Case Sharing Groups, Reporting Groups, or both, or neither. Case sharing groups may own cases, in which case all users in the group have access to the case. In practice what this means is that when a CommCare User syncs with the server, the server sends down an XML payload listing all the case sharing groups that the user is in. Groups marked as reporting groups show up in the report filters for selecting groups of users.

Reporting and Analytics

Reports

The set of standard reports available in CommCare are organized into three categories: Monitor Workers, Inspect Data, and Manage Deployments. Domains that leverage CommCare HQ’s messaging capabilities have an additional reporting section for messaging reports.

Much more information about HQ reporting can be found on our help site[1].

In addition to built in reporting, CommCare has a built in analytics engine for defining customized reports based on the project data. More information on that is described in the section on Code Architecture. These customized reports are limited to the project space of the requesting project.

Messaging Service

The following section describes functionality specific to CommCare’s messaging service (previously referred to as CommConnect).

Appointment reminders/confirmations

Every time a case is created, updated, or closed, all rules are checked to see if any reminders should be spawned or retired as a result of the update.  

The appointment reminder use case leverages both the CommCare data model as well as the reminders framework in the following way:

Broadcast Messages

Broadcast messaging uses the reminder framework along with a feature of the CommCare data model called case grouping.  Essentially, a case group can be defined to create a list of cases.  A broadcast message can then be created, which sends some specific content (either an SMS, or an SMS survey created in the Survey Builder) to the entire case group.  Broadcast messages can either be sent immediately, or at a later date and time, and can also be configured to send to groups of users of the system in addition to groups of cases.

Keywording

In addition to scheduling SMS content using reminder definitions, CommCare also allows the creation of keywords which can be used to initiate an SMS workflow.  For example, a keyword can be created such that every time it is texted into the system, the system will respond with an SMS survey from the Application Builder.  CommCare also supports the collection of data over structured SMS, where a single SMS can be used in a structured manner to answer all of the questions in an SMS survey.

Gateway Connectivity and Configuration, Logging, and Audit Tracking

In order to communicate with contacts in the system over SMS, CommCare allows the creation of SMS backends, which manage the two-way communication over SMS with a contact. Each backend is of a specific type which corresponds to the third-party SMS gateway provider used to handle message delivery and receipt, and each backend type implements the low-level protocol for communicating with that third-party service.  The user interface is then used to create and manage instances of these backend types, with each instance corresponding to an account or application with the third-party SMS gateway provider. Multiple backends can be created of the same type, so that separate accounts or applications with a third-party SMS gateway provider can be used and tracked accordingly.  In addition to the backend types that are supported by CommCare out of the box, CommCare provides a backend type that integrates with Telerivet[2], a service allows the creation of an SMS gateway from an Android phone.

All SMS traffic (inbound and outbound) is logged in the CommCare Message Log, which is also available as a report.  In addition to tracking the timestamp, content, and contact the message was associated with, the Message Log also tracks the SMS backend that was used and the workflow that the SMS was a part of (broadcast message, reminder, or keyword interaction).

Messaging Dashboards

Charts and other kinds of visualizations are useful for getting a general overview of the data in your system. The dashboards in CommCare display various graphs that depict case, user, and sms activity over time. These graphs provide visibility into when new cases and users were created, how many SMS messages are being sent daily, and the breakdown of what those messages were used for (reminders, broadcasts, etc).

Data Processing

All data sent to or from the system will be persisted in a datastore.  The conversion of mobile forms or SMS messages into persistent objects is handled by the data processing service. This saves form submissions, as well as applies any transactional case or ledger updates to the underlying data store.

The data processing service is flexible to store any content sent or received via mobile form submissions or SMS services as long as it adheres to the XForms specification. It also saves all logging and auditing information necessary for HIPAA compliance. The data processing service saves all data at the transactional level so that histories can be audited and reconstructed if necessary.

APIs

APIs[3] provide access to save and retrieve data from CommCare directly.  APIs are used both by the mobile application as well as integration with other systems.

Code Architecture

HQ

The majority of the code runs inside the HQ server process. This contains all of the data models and services that power the website. These are broken into functional modules although, the module coupling depicted above is meant as a guide and the boundaries between modules are not always fully defined.

Each module is a collection of one or more django applications that each contain the relevant data models, url mappings and view controllers, templates, and CouchDB views necessary to provide that module’s functionality.

Analytics Engines

The analytics engines[4][5] are used for offline processing of raw data to generate aggregated values used in reporting and analytics. There are a suite of components that are used which are roughly diagrammed below. This offline aggregation and processing is necessary to keep reports running on huge volumes of data fast.

Legacy architecture diagram

Change Processors (Pillows)

Change processors (known in the codebase as pillows)[6] are events that trigger when changes are introduced to the database. We have a suite of tools that listen for new database changes and do additional processing based on those changes. These include the analytics engines, as well as secondary search indices and custom report utilities. All change processors run in independent threads in a separate process from the server process, and are powered by CouchDB’s _changes feed[7].

Task Queue

The task queue is used for asynchronous work and periodic tasks. Processes that require a long time and significant computational resources to run are put into the task queue for asynchronous processing. These include data exports, bulk edit operations, and email services. In addition the task queue is used to provide periodic or scheduled functionality, including SMS reminders, scheduled reports, and data forwarding services. The task queue is powered by celery[8], an open-source, distributed task queueing framework.

Web Apps (Formplayer)

The form player[9] is a standalone process that interacts with the SMS engine and web-based formsplayer via HTTP APIs. The form player is a Java process that interacts with the JavaRosa[10] libraries to provide the same form engine that is used on the mobile devices.

Code Structure

The following tree shows the top-level structure of the application code.

├── corehq

├── custom

├── docs

├── locale

├── requirements

├── services

└── submodules

These components are summarized in the following table.

Folder

Contents

corehq

Majority of the application code

custom

Modules custom to a subset of projects

docs

Documentation

locale

Translation files

requirements

Code dependencies (external applications and libraries)

services

Deployment configuration files

submodules

Internal code dependencies and standalone modules

Here is a closer look at the corehq folder with some key folders highlighted

├── corehq

│   ├── apps

│   │   ├── app_manager

│   │   ├── cloudcare

│   │   ├── domain

│   │   ├── groups

│   │   ├── reminders

│   │   ├── reports

│   │   ├── settings

│   │   ├── sms

│   │   ├── userreports

│   │   ├── users

│   ├── couchapps

│   ├── fluff

│   ├── form_processor

│   ├── pillows

│   └── util

Folder

Contents

apps

Individual applications (with a representative sample highlighted)

couchapps

Applications that are only CouchDB views.

fluff

Analytics engine utilities (deprecated - replaced by UCRs)

form_processor

Form and case processing

pillows

Change processing utilities

util

Miscellaneous other utilities

Here are some key submodules:

└── corehq/ex-submodules

    ├── casexml

    ├── couchforms

    ├── dimagi/utils

    └── pillowtop

└── submodules

    ├── commcare-translations

    └── touchforms

Folder

Contents

casexml

Case processing and phone integration

couchforms

Form processing

dimagi/utils

Various utilities

pillowtop

Change processing framework

commcare-translations

Translation resources for the mobile app

touchforms

Legacy formplayer

Open Source Components

Code

The CommCare system is built in python on top of the django web framework. In addition a variety of other open-source libraries are used.

Python

Python is a remarkably powerful dynamic programming language that is used in a wide variety of application domains. Python is often compared to Tcl, Perl, Ruby, Scheme or Java.[11]  The language is well known for its readability, and it is widely used in many industries, including the web.  The bulk of the server-side application is written in Python, including many of the other components listed here.  We’re currently using python 2.7.

Django

Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. It lets you build high-performing, elegant Web applications quickly.[12]  Django is actively developed and widely used in modern web applications.

CommcareHQ uses Django 1.7.  Because CouchDB (our primary database) does not use SQL for queries, we cannot use the standard Django ORM, so we use couchdbkit to communicate with the database.  Components of Django which we do use include the following:

  1. Middleware - used for acting on requests before they hit the rest of the application, and on responses after processing.  This handles CSRF and XSS protection.
  2. Url mapping - routes incoming requests to views
  3. Views - Django’s “controllers”, these perform most of the logic related to handling an incoming request.  Views accept a request, perform a set of actions that can include reading from or writing to the database, triggering actions, and interacting with external services.  Views return a response, typically a rendered HTML page, which is sent back to the requester.
  4. Templates - Written in a superset of html, templates control the appearance of a page.  Views pass information to the templates, which perform little to no logic on their own, and the template is rendered as html.  Templates are often nested and reused to avoid duplication, and to allow for a site-wide uniform appearance.
  5. Forms - Django provides a form parent class, which can handle interactions around submission of data to the server.  Django Forms can be rendered to html, control server-side validation, and usually dictate actions upon completion of a submission.  They are quite extensible.
  6. Authentication - We use a SQL database to store some information about our users, so that we can use Django’s authentication and sessions system.
  7. Testing - Our test suite relies heavily on Django’s testing framework, which extends Python’s unittest module.  This framework provides tools such as generating fake requests to test out functions and classes intended to operate in conjunction with web requests.
  8. Caching - Django provides bindings to store data for reuse using caching systems such as redis (see below).
  9. Internationalization - As an international organization, we have users from several different languages.  We use Django’s translation module to manage multiple language versions of much of our site.
  10. Development server - commonly used to run a local server to test during development.

Notable Django services we do not use due to different database management include the django admin interface, model forms, and the generic views which rely on model forms.

Data Storage Layer

CommCare HQ leverages the following databases for its persistence layer.

CouchDB (Cloudant)

CommConnect is primarily built on top of CouchDB[13], an open source database designed to be used in web applications. Our cloud environment is built on top of Cloudant[14], a CouchDB-compliant hosted database service designed to scale. CouchDB is the primary data store for all of CommConnect. We store almost all of our primary data there, including domains, users, forms, cases, and sms records.

CouchDB was primarily chosen because it is completely schemaless. All data is stored as JSON documents and views are written to index into the documents to provide fast map-reduce-style querying. Due to the unstructured nature of much of our data this seemed like a good fit.

In addition we leverage the CouchDB changes feed heavily to do asynchronous and post processing of our data.  This is described more in the “change listeners” section above.

PostgreSQL

User account information is stored in a standard relational database that is automatically kept in sync with information in CouchDB. This enables seamless integration with the user management of the Django web framework.

Also stored in a relational database, independently from the user data, are tables of indicator aggregations. For a particular reporting need, our reporting framework stores a table where each row contains the relevant indicators aggregated to some minimum interval (such as one day) as well as any values necessary for filtering. Thus any person or system can use standard SQL reporting queries or tools to build quantitative reports capable of drill-down, drill-through, and slice-and-dice operations.

PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness.[15]

Elasticsearch

ElasticSearch is a flexible and powerful open source, distributed real-time search and analytics engine for the cloud.[16] We use ElasticSearch for three distinct purposes:

First, we serve portions of our REST API from a read-only copy of our form and case data that is replicated in real time to an ElasticSearch service. Compared with CouchDB which is known for extremely high data integrity guarantees but does not support ad-hoc querying, ElasticSearch is less focussed on integrity and durability (because the master copy of all data is stored in other databases) and more focused on allowing complex queries, such as simultaneously filtering by date ranges, users, domains, and case or form types.

Second, we use this same read-only copy of the data to rapidly serve reports that would otherwise consume a large amount of resources from our transactional CouchDB servers. Thus ElasticSearch effectively helps us independently scale reporting and transactional capacity.

Finally, we also aggregate logs into ElasticSearch where they can be examined using open source tools such as Kibana. Thus reports about the health of our system are searchable and may be designed, saved, and shared entirely via ElasticSearch.

Other services

Nginx (proxy)

Our main entry point for all traffic to CommCare HQ goes through Nginx. This is installable via the Ubuntu software installer. SSL termination happens at Nginx. Web traffic once hitting nginx is then routed to our multiple web-worker processes running Gunicorn (see below). The routing of traffic is determined by the nginx load balancer that proxy this traffic transparently to the user, and balances the load between the web processes.

Redis (cache)

Redis is an open source document store that is used for caching in CommCareHQ. Its primary use is for general caching of data that otherwise would require a query to the database to speed up the performance of the site. Redis also is used as a temporary data storage of large binary file storage for caching export files, image dumps, and other large downloads.

RabbitMQ (async task queue)

RabbitMQ (RMQ) is an open source Advanced Message Queuing Protocol (AMQP) compliant server. Our long running, periodic, and computationally expensive backend processes are queued and executed via the AMQP protocol.

A queuing system is vital for running a large data-heavy website in a smooth and predictable manner. Tasks that are known to take a while ought to be queued in a background process and not force a user and their browser to “wait” interminably long for an operation to happen. AMQP and the technologies surrounding it make for a clean, reusable interface to allow developers to create, execute, and retrieve results from these long running tasks.

The python library that utilizes AMQP and RMQ is the celery project, an open source library for asynchronous task queuing. A task can be written in python code to do a database operation or other report for CommCareHQ. To execute the task, the website can transmit a job request that is sent to the RabbitMQ queue. Separate worker processes on other dedicated machines can receive these tasks requests by querying the RabbitMQ server for new task requests. Once the worker completes the task, it can then notify the frontend of its completion in various ways. Either sending an email to the user making the request that the job is completed, and provide a link, or utilizing redis, update the content of a URL the user is viewing to show that the task is completed.

Gunicorn (web processes)

Gunicorn is an out-of-the-box multithreaded server for Python, including good integration with Django. It allows us to run a number of worker processes on each worker machine with very little additional setup. We are also using a configuration option that allows each worker process to handle multiple requests at a time using the popular event-based concurrency library Gevent. On each worker machine, Gunicorn abstracts the concurrency and exposes our Django application on a single port. After deciding upon a machine through its load balancer, our proxy is then able to forward traffic to this machine’s port as if forwarding to a naïve single-threaded implementation such as Django’s built-in “runserver”.

Glossary

This is a place for terms that devs use.

deploy/deploying

preindex/preindexing

staging

production


[1] https://confluence.dimagi.com/pages/editpage.action?pageId=5899281 

[2] http://telerivet.com/

[3] https://confluence.dimagi.com/display/commcarepublic/CommCare+HQ+APIs 

[4] https://github.com/dimagi/ctable

[5] https://github.com/dimagi/fluff

[6] https://github.com/dimagi/pillowtop/

[7] http://guide.couchdb.org/draft/notifications.html

[8] http://www.celeryproject.org/

[9] https://github.com/dimagi/formplayer 

[10] https://bitbucket.org/javarosa/javarosa/wiki/Home

[11] http://www.python.org/about/

[12] https://www.djangoproject.com/

[13] http://couchdb.apache.org/

[14] https://cloudant.com/

[15] http://www.postgresql.org/about/

[16] http://www.elasticsearch.org/