EXTRACTING NUTRACEUTICAL MONOGRAPHS FROM GENERIC PRODUCT DESCRIPTIONS        

Extracting Nutraceutical Monographs from Generic Product Descriptions

Computer Science Honours Project - COMP 4905

Carleton University, Ottawa, ON, Canada

Uri Gorelik

Dr. D. Deugo

December 16th 2016


1 Acknowledgements

I would like to thank Dr. Dwight Deugo for supervising this project. I would also like to thank Fullscript, specifically Chris Wise, for giving me permission to access parts of the platform. Finally, a thank you to contributors of open source, specifically to José Valim, the creator of Elixir; and Chris McCord, the creator of the Phoenix web framework; and a big thank you to all of the maintainers and contributors to these projects. This project would not have been possible without you.


1 Introduction        3

1.1 Problem        4

1.2 Motivation        5

1.3 Goals        5

1.4 Objectives        6

1.5 Outline        6

2 Background        7

2.1 Fullscript and Natural Medicine        7

2.2 Elixir        9

3 Approach        10

3.1 Fullscript Database Client        10

3.2 Natural Medicine API        12

3.2.1 Performing requests        12

3.2.2 Monograph Lookup        14

3.2.2.1 Monograph Scanning        14

3.2.2.2 Creating the Lookup        15

3.2.3 Text Analysis        17

3.3 Monograph API        18

3.3.1 HTTP Server (web/sockets)        18

3.3.2 HTTP Server (JSON API)        20

3.4 Mobile Application        21

3.5 Architecture        21

4 Results        22

4.1 HTTP Server        22

4.2 Mobile Application        22

4.3 Calculating monographs        25

4.4 Monograph accuracy        26

5 Conclusion        29

5.1 Future Work        29

6 Appendix        31

6.1 Monograph Data File        31

6.2 Text Analysis Resultant Structure        32

6.3 URL Schemes as Structures        33

6.3.1 Natural Medicine        33

6.3.2 Monograph API        33

6.4 Open source dependencies        34

6.5 Resources Used by not cited in report        34

2 Introduction

In the modern era, we live in a world where a pharmaceutical exists for any condition. Instead cold medicine, we have daytime and nighttime cold medicine; instead of having headache medicine we have Advil, Tylenol, and countless others -- all of which come in multiple variants: kids, normal, extra-strength, gel, spray, etc. Drug culture seems to run deep within our society, from some individuals referencing a pharmaceutical’s active ingredient as its name (e.g. ibuprofen - Advil, paracetamol/acetaminophen - Tylenol), to doctors running their own daytime television shows, where they can give advice to the masses who are obsessed with personal health.

Now, health specialists are recommending nutraceuticals. These health specialists (referred to as practitioners from this point) come in many forms: Certified Nutritionists, Chiropractors, Mid-level Practitioners, Naturopathic Doctors, non-licensed Naturopathic Doctor, Nurse Practitioners, Osteopaths, Registered Dietitian, to name a few. Pharmaceuticals are heavily regulated by the FDA in the United States[1] and by the HPFB in Canada[2]. However nutraceuticals have a somewhat loose definition which makes them difficult to regulate. In Canada, nutraceuticals can be marketed as either food or as a drug[3]. According to the FDA, “...[nutraceuticals] are widely used in the marketplace. Such foods are regulated by FDA … even though they are not specifically defined by law.”[4]

Several companies use these loose regulations as a low barrier of entry to provide platforms for dispensing nutraceuticals. Fullscript, Natural Partners, Emerson Ecologics, and Wellevate are the primary contenders in this space. These companies provide practitioners with services related to dispensing, bulk purchasing, and the selling of nutraceuticals. The practitioners that use these platforms are responsible for creating safe and beneficial treatment plans. There are several tools that can aid a practitioner in creating a beneficial treatment plan, most of which come in the form of online references. One such specific reference used by many practitioners is the Natural Medicine Comprehensive Database.

This report will detail how integrating the Natural Medicine Comprehensive Database can aid Fullscript customers in writing better, more beneficial treatment plans.

2.1 Problem

Practitioners on the Fullscript platform need to research their treatment plan before allowing a patient to adhere to it. Typically, a practitioner will use a resource like the Natural Medicine Comprehensive Database to perform this research. Practitioners typically do this to prevent creating a deficiency or an unexpected interaction. For instance, alcohol may have a negative interaction with a nutraceutical the practitioner is planning on prescribing. The practitioner needs to be aware of this interaction in order to forward this information to the patient or to look for another, more suitable nutraceutical.

        Currently a Fullscript product/nutraceutical has no equivalent in the Natural Medicine Comprehensive Database. The practitioner is forced to thoroughly read the product description and research the appropriate ingredients in order to avoid an unexpected interaction. In order to programmatically reference the database, monographs must be calculated from the product’s description, which can then be used to reference information from the database.

2.2 Motivation

Being able to calculate the monograph for a Fullscript product will allow Fullscript to fully leverage the Natural Medicine Comprehensive Database. This would save Fullscript’s practitioners countless hours by having the information they need readily available to them. Creating a technique to extract monographs from a plain text description can provide a similar functionality as blog summarizers (i.e. boil down large amounts of texts to critical information), which can be beneficial for other platforms as well.

        Aside from the immense benefits of providing practitioners with more accurate information so that they can in turn create better treatment plans for their patients, the motivation of this project is also to explore the emerging programming language called Elixir. Elixir has many interesting properties that may make it a good candidate for this kind of problem.

2.3 Goals

This projects aims to create a bridge between Fullscript and the Natural Medicine Comprehensive Database. Creating this bridge will aid practitioners in creating better, safer treatment plans. This projects also aims to create several interfaces to access this bridge; one being a standalone Android application, which users can use to query the Fullscript product database and lookup interactions between products; and an API interface to allow third parties (and Fullscript itself) to also leverage this bridge. Finally, this project will explore how to parse monographs from a product description.

2.4 Objectives

In order to achieve this bridge several pieces will have to be created. First, A database adapter written in Elixir must be created to allow the querying of Fullscript products, as well as the creation of an API client for the Natural Medicine Comprehensive Database in order to calculate interactions with monographs.  Finally in order to calculate a monograph, an algorithm must be devised.

Once these elements are in place, a singular point of entry needs to be established in order to serve the aggregated data. This will come in the form of an HTTP server which will provide a simple web UI and a JSON API. Finally and Android application will be built that will allow users to search Fullscript products and view their associated interactions.

2.5 Outline

This report will summarize and explain the system created to solve the mentioned problem. It will analyse the the main components of the system, which are Fullscript Database Client, Natural Medicine API, Monograph Lookup, Monograph Calculation, HTTP Server (web/sockets and JSON), and the Mobile Application. Finally, a summarization of the system, its benefits, what works and what doesn’t, and future improvements.

3 Background

3.1 Fullscript and Natural Medicine

Fullscript has a diligent registration process for its practitioners. Practitioners are required to have some form of official accreditation. As such, a practitioner on the Fullscript platform is typically qualified and well versed in their field. Fullscript will also limit which nutraceuticals are available to a practitioner (for example a Naturopathic Doctor will have access to a different catalog than, say, a Nurse Practitioner). However, it is ultimately the responsibility of the practitioner to make sure they create an effective treatment plan. Some practitioners can spend hours researching the neutraceuticals they recommend along with the patient history to make sure that they are not harming the patient.

As with nutraceuticals, the term practitioner is loosely defined. For the sake of example we will look at a Naturopathic Doctor versus a Naturopath. A Naturopathic Doctor and a Naturopath may seem similar, but are very much different. The former studies in an accredited medical school for a period of three to five years, the latter may receive a certificate from a non-medical affiliation which, may or may not have be accredited itself. The kinds of treatment plans these two practitioners types will produce can vary greatly, as such it is in Fullscript’s, and the patient’s, best interest to provide all of its practitioners with tools to prevent them from unknowingly creating an unbeneficial or harmful treatment plan.

For example, a practitioner might prescribe products that can create a deficiency or an unintended side effect in the client's health. For example, if a patient is known to be taking BuSpar (buspirone), there is a known interaction between buspirone and grapefruit[5]. Grapefruit, which is a common ingredient in some nutraceuticals, decreases the body’s ability to metabolise buspirone, which can cause an overdose. Thus nutraceuticals containing any kind of grapefruit compound should be avoided for that specific patient.

Fullscript gives their practitioners access to the manufacturer's full description of the product. These descriptions come in many different formats and are not standardized. For the most part, they contain ingredient lists, directions, and dosages. The typical workflow of a diligent practitioner would be to thoroughly read the neutraceutical’s description, pick out any key terms they might find useful, and then begin researching using a resource like the Natural Medicine Comprehensive Database. Once the practitioner is satisfied with their research, they may choose to add that nutraceutical to the treatment plan or to look for alternatives.

Currently, Fullscript and the Natural Medicine are two separate entities that require a “translation” between them. Fullscript deals in products/nutraceuticals and Natural Medicine deals with monographs. A monograph is a common representation of an ingredient. For example, the ingredients vitamin D, Calciferol, Paracalcin, and Ergocalciferol all share the same monograph (vitamin D). As such if a product contains the term Calciferol, which  the practitioner is unfamiliar with, they will have to research it to find out it’s actually a vitamin D. To further illustrate the problem of “translation”, the Fullscript product Co-Enzyme B Complex has no equivalent in the Natural Medicine Comprehensive Database. It is up to the practitioner to look at the description Co-Enzyme B Complex and research the ingredients found in the the description using the database.

In order to calculate an interaction, Natural Medicine requires a monograph to be provided. Currently, Fullscript’s database does not contain any monographs associated with a product. If a monograph can be determined for a Fullscript product, it can be used to leverage the Natural Medicine Comprehensive Database. Using this translation, Fullscript would be able to programmatically provide better descriptions and interactions for their products. Thus, allowing practitioners to create better treatment plans.

3.2 Elixir

Elixir is a functional programming language, created by Jose Valim[6], that compiles to Erlang bytecode and runs on the Erlang virtual machine. Erlang itself was created in 1986[7] as an internal language for the company Ericsson. It was primarily used for telecom problems that demanded robust infrastructures and performance. The creators of Erlang also created OTP, or the Open Telecom Platform, which is a platform for building concurrent and distributed software.

        As Elixir compiles to Erlang bytecode, it leverages Erlang’s entire standard library as well as OTP (which is part of most Erlang distributions). Elixir is a modern language; its syntax is similar to that of Ruby’s, but it borrows many create aspects from many different languages. Elixir is typically described as a functional language, but it’s more accurately labeled as a distributed language. Elixir is distributed first, meaning it's also an ideal language for writing concurrent software. Some modern languages like Go provide a clean interface to run code in parallel but lack a mature framework for writing concurrent or distributed software. 

        Elixir is also very popular in the web-space as a lot of its contributors and sponsors are former Ruby on Rails developers[8].

        This project will discuss the benefits of using Elixir, specifically the use of concurrency, but it will not explore distributed software.

4 Approach

The final system is composed of several critical pieces: a component that can query for Fullscript products with their description and name; an API client that communicates with the Natural Medicine API over HTTP/1.1; a web UI that can search products and calculate monographs; a JSON API; and, an Android application. System components aside, this section will also detail how to look up monographs, how to analyse a product description for monographs, and finally an architecture overview of how the components and concepts fit together.

4.1 Fullscript Database Client

The Fullscript Database Client is the simplest component in the system, due to the fact that the system has a local copy of a the product table. This allows the system to simply query the local database instead of using an HTTP API. This component was built in Elixir using a library called Ecto. Ecto is an open source project primarily maintained by the creator of Elixir (José Valim). Ecto provides a lot of same functionalities of a typical object relationship mapper (ORM), that is, it will take a database row, and map it to an Elixir data structure -- in this case, a named structure (struct).

        To use Ecto, a schema must be defined for every relationship. This schema will essentially instruct Elixir what columns should be present in the struct. A Fullscript product has 40 fields, which provide information for: checking out; displaying the product nicely; default values associated with creating a treatment man; meta data (e.g. heat sensitivity); warnings; etc. These fields are invaluable in the Fullscript platform, but for this system, most of it is unnecessary. By using Ecto, one can create multiple schemas for accessing a single table. For instance, a schema for administrators can be created that will give access to most columns, or perhaps a schema for reporting can be made to only access the relevant columns for that task. In the case of monograph lookups, the schema needs very little: ID, name, and description.

        Ecto also provides a query DSL (domain specific language) that allows for the creation of database queries. For example,  a typical SQL to retrieve the most recently created products can be found below.

SELECT * FROM products ORDER BY created_at DESC LIMIT 10;

And here is the equivalent in Elixir using Ecto’s query DSL:

from Product, order_by: [created_at: :desc], limit: 10

        The query DSL (which requires a schema to be defined), it will return Elixir structs with the predefined, and sanction columns. This is in contrast with the SQL statement that if executed will return every column on the product. In short, the DSL provides the convenience of using Elixir to make queries and the peace of mind to know that only sanctioned database operations will be performed.

        Finally, by using an Elixir package to encapsulate all database access to Fullscript products, it ensures that data will never be written or modified. It also ensures that sensitive columns will never be accessible and that custom queries cannot be performed as they must be predefined in package.

4.2 Natural Medicine API

The Natural Medicine API is defined as another Elixir package and can be used independently of the Fullscript Database Client. The API requires an API key which consists of a 40 character hexadecimal string. This is required for security reasons, as the API is not publicly available. The key itself is stored outside of version control and loaded as an environment variable.

        This package uses two open source libraries: Poison and HTTPoison. The former being a JSON parser and the latter being an HTTP client.

        This package has three main uses: creating requests to the API, looking up monographs, and parsing text from monographs.

4.2.1 Performing requests

There are three modules associated with creating requests: Request, Monograph, and Interaction. The Request module serves as the basis for the latter two and will be the focus of this section. The Request module also performs tasks like setting the request headers (Content-Type, Accept, and the API key) and holds a common function Request.request/2[9] to actually perform the requests.

Another key function in this module is Request.transform/2. This function recursively iterates over the keys and values of a JSON response and can perform transformations. This is useful as the Natural medicine API returns multi-word keys with a hyphen instead of an underscore (“monograph-id” as opposed to “monograph_id”). This does not break any of JSON’s syntax rules, but by modern conventions, hyphens are typically avoided in order to support using dot-syntax message passing. For example in JavaScript, a JSON object can be used in the following way:

payload = {"first_name": "Bobby", "last_name": "Braves"}
payload.first_name
// "Bobby"
payload.last_name
// "Braves"
payload.address
// undefined

Unlike other languages like Ruby and JavaScript, who will return their null, Elixir will raise a runtime exception:

payload = %{first_name: "Bobby", last_name: "Braves"} # %{} is notation for a map
payload.first_name
# "Bobby"
payload.last_name
# "Braves"
payload.address
# Raise an error: ** (KeyError) key :address not found…

This a desired behaviour as it will raise a clearer error message if a null field should be accessed. In addition to raising a runtime error when a null field is accessed, Elixir will also raise a compile-time error if a field, which is not explicitly defined, is accessed. Silently returning a null value when an undefined or null field is access is a common problem for interpreted language such as Ruby or JavaScript and the source of many null exception errors.

Finally the Monograph and Interaction modules define their own parameter structs (Monograph.Params and Interactions.Params) to be used with the Request.request/2 function. These modules are essentially responsible for anything that cannot be generalized in the the Request module. For instance, Monograph module contains the path for the monographs endpoint[10] as well as the parameters that the endpoint expects. All of this is accomplished with very few lines of code, leading to very legible, simple, and maintainable code.

4.2.2 Monograph Lookup

The Natural Medicine API can lookup details for a monograph via the monograph’s name[11] or internal identifier (ID). However, this means that queries to the API must provide either the monograph or ID, meaning this system must be the entity to determine the monographs. In order for the system to use the API, it must know about all possible monographs, and eventually reduce[12] the list of all monographs to the only the ones related to the product in question.

        Luckily, the Natural Medicine API allows querying for many monographs using pagination[13]. By iterating through this endpoint and changing the page parameter, the system can eventually walk over all known monographs. However, every time a monograph calculation has to be made, the system cannot be expected to iterate over the API, performing several hundred requests for a single product calculation. Thus, the list of all know monographs must be cached within the system.

        To allow the system to obtain and use all known monographs, the system performs two steps: monograph scanning, and creating the lookup.

4.2.2.1 Monograph Scanning

As mentioned previously, it is required to make several HTTP requests to the Natural Medicine API in order to iterate over the entire collection of monographs. As such, this package (Natural Medicine API) has a script component that performs this iteration. In order to respect the Natural Medicine servers, the limit parameter was set to 50 (thus only asking for 50 monographs) and an artificial delay of six seconds was added between requests.

        A monograph response contains many fields, however for the system, only three are needed: name, alternate-names, and scientific-names. The name field represents  the definite name of the monograph (i.e. all aliases will be calculated to this name). The alternate-names field is is a list of other names (colloquial and more common names) and are synonyms for the primary name. And finally, the scientific-names field represents scientific nomenclature for the monograph. These fields are taken and written to a file. Once this data is obtained, a monograph data file[14] is created. The file is named after the monograph (vitamin_d.txt) and contains the name of monograph as the first line, followed by the alternate names on several lines, and finally followed by the scientific names on several more lines. These data files are stored in the Natural Medicine Elixir package, but are not under version control.

4.2.2.2 Creating the Lookup

Once there exists a list of all possible monographs, Elixir’s macro system becomes extremely useful. Elixir is essentially compiled twice; once into AST (abstract syntax tree) form and then into Erlang bytecode. The macro system allows the user to hook into the AST and make any modifications before being compiled to bytecode. This is similar to, and canonically referred to, as metaprogramming.

        Using the monograph data files from the scan, the system now has the ability to create a cache or a lookup table for all monographs. One solution for creating a lookup is to distribute these datafiles with the system and upon initialization to read these files into a global data structure. However this technique presents the downside of longer initialization times. There are approximately 1400 monograph data files and processing them takes over 60 seconds. This would mean that every time the system is initialized it would take over a minute before it can start processing data. This approach, while it may not seem like an obvious detriment for one system, will in fact impact any other Elixir system that choose to make Natural Medicine API a dependency.

        By using the Macro system, the monograph data files can be compiled directly into bytecode. This will increase the compile time but will have no impact on initialization. In order to accomplish this, the MonographLookup module processes the data files and dynamically creates lookup functions in the AST compilation phase. Elixir uses pattern matching which means functions can be “overloaded” with not just arity, but also specific arguments (similar to creating facts in the prolog programming language). That is to say, during the first compilation phase, hundreds of functions (or facts) will be created in the MonographLookup module. The following is an example of what the functions would have looked like if they were written by hand:

# ...

def lookup("vitamin d"), do: "vitamin d"
def lookup("dihydrotachysterol 2"), do: "vitamin d"
def lookup("dichysterol"), do: "vitamin d"
def lookup("Vitamine D3"), do: "vitamin d"

# ...

And an example of their usage (with a comment representing the return value):

lookup("the") #=> nil
lookup(
"dog") #=> nil
lookup(
"vitamin d") #=> "vitamin d"
lookup(
"dichysterol") #=> "vitamin d"

Finally, a lookup function with a catch-all parameter is created to catch all other case (seen above when passing the term “dog” to the lookup).

4.2.3 Text Analysis

Now that the system can test whether a term[15] is a monograph, it still needs an algorithm for parsing product descriptions and using the lookup functions. The algorithm simply iterates over every word in a product description and attempts a monograph lookup. However, many monographs are multiple words separated by whitespace or punctuation. Using this simple algorithm, common monographs such as vitamin d will never be found. To solve this problem, the algorithm chunks a product’s description into different window sizes. A window of size n will produce a term with n words. That is to say, the algorithm will iterate over the description with a window size of one, then two, then three, and finally four.

Given the example string: the “quick brown fox jumps over the lazy dog,” The algorithm chunks it into the following windows:

["the", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"]

Window size 1

["the quick", "quick brown", "brown fox", "fox jumps", "jumps over", "over the",
"the lazy", "lazy dog"]

Window size 2

["the quick brown", "quick brown fox", "brown fox jumps", "fox jumps over",
"jumps over the", "over the lazy", "the lazy dog"]

Window size 3

["the quick brown fox", "quick brown fox jumps", "brown fox jumps over",
"fox jumps over the", "jumps over the lazy", "over the lazy dog"]

Window size 4

Terms in all windows are passed to the monograph lookup which essentially[16] produces a list of empty values followed by monographs. The empty values are filtered out and the remaining monographs are reduced to a name and a hit-score, representing how many times those monographs were seen in the description. Finally, the results are sorted by hit-score and the top three are selected.

4.3 Monograph API

A new package, labeled Monograph API, was created for the system with the purpose of acting as a server for any lookup requests. This package depends on the previously mentioned Fullscript Database Client, Monograph API packages, and an external library called Phoenix. Phoenix is a web framework similar to Rails, Django, or Spring. This package has two main subcomponents: the web UI and the JSON API.

4.3.1 HTTP Server (web UI)

The HTTP web UI serves as a debugging platform to easily compare product descriptions with the calculated monographs. The server only has one HTTP endpoint , /products, which is used for searching Fullscript products. The user interface is quite simple, it contains the following components (see Figure 3.1 and Figure 3.2): a search bar for searching products; a search button to initiate the search; a view description button which toggles the display of the product’s description; and a monographs button which fires a calculation message via WebSockets.

Screen Shot 2016-12-15 at 2.53.23 PM.png

Screen Shot 2016-12-15 at 2.55.06 PM.png

Figure 3.1: Depicting the search interface for products as well as the calculate all button.

Figure 3.2: Depicting a scrolled down view, with an expanded description as well as the calculated monographs.

The monograph calculation is performed via a  WebSocket[17] connection. The “Monographs” buttons are bound using JavaScript to send a request through the WebSocket. The WebSocket connection is provided through an abstraction in Phoenix called channels. Using channels, the monograph calculation does not need to perform an entire request and a new page does not have to be loaded. The channel is also using a custom caching system implement using an Elixir GenServer[18]. A GenServer (meaning “generic server”) is an abstraction provided in OTP (and thus is available in Elixir’s core). It is an abstraction that can hold state and sends and receives messages. Once the channel receives a monograph calculation request, it forwards the message the MonographCache module (backed by a GenServer), which in turn checks if it has an existing calculation for the product in question or if it must perform a new calculation. By forcing the calculation request to go through a channel and then a cache, the calculations can be quickly and efficiently performed if needed.

4.3.2 HTTP Server (JSON API)

The JSON API portion of the Monograph API package serves as the backend for the mobile application, as well as any future integrations. The JSON API provides two major endpoints: product search and interaction checking. The product search endpoint returns a list of products based on a search query. The interaction endpoint accepts a series of product IDs and returns ingredient interactions for the products. The interactions endpoint must also perform a monograph calculation. The lifecycle of the request can be seen in the following diagram (Figure 3.3):

interaction_request.png

Figure 3.3: Interaction request lifecycle.

4.4 Mobile Application

The mobile application is written using Android. It supports a minimum SDK level of 15. The application has no extra dependencies and uses native Android classes to accomplish all of its tasks. The application is composed of two screens: the product screen and the interaction screen. The product screen is implemented using an Android ListView. The interaction results screen is also implemented via a ListView, albeit with customly crafted items. The application has a direct dependency on the Monograph API JSON server.

4.5 Architecture

The system is composed of four main components (Figure 3.4). The Android application, which can be substituted for a generic third party application; the HTTP server which serves for reporting, viewing results, analysing the system, and providing the JSON API; the Natural Medicine API package; and the Fullscript Database Client package. The Monograph API package acts a central process which communicates with all other components.

monograph-architecture.png

Figure 3.4: System architecture

5 Results

5.1 HTTP Server

The web UI performs as expected, it produces results quickly and is useful for viewing the results of the monograph calculation algorithm. It accomplishes this by handling multiple monograph calculation requests in parallel. Upon testing, pressing the “Calculate All!” button on a set of 50 products performs all 50 calculations almost instantly. Increasing the set to 200 products, still yields all 200 calculations in just over one second. Beyond 200 products a slow down is noticeable. This, however; is due to  JavaScript blocking and sequentially firing the lookup events to the channel. As soon as the JavaScript is complete, the monographs appear almost instantly.

        The JSON API portion of the server also performs well. It is able to deliver 5000 products (or 4.8 MB of data) in 451 milliseconds[19]. For a more traditional request of 50 products, it is able to deliver the payload in an average of 47 milliseconds.

5.2 Mobile Application

The mobile application, unlike the web UI, focuses on interactions and not monographs. The application uses ListViews to display both products and interactions. The products ListView simply displays the product name along with its ID. Pressing on a row will launch a new screen that will show that product’s description. Long pressing on a row will change the list into a multi-selection mode. Once the user has made their selection, the CALCULATE menu item will launch a new activity which queries the Monograph API server to calculate interactions for the selected products. Figure 4.1 depicts the ListView of products and Figure 4.2 depicts an example of selecting multiple products for interaction calculation.

Screenshot_1481906843.png

Screenshot_1481908396.png

Figure 4.1: The list of products, with search functionality.

Figure 4.2: Selecting multiple products for interaction calculation after performing a search for “grapefruit”.

Interactions have four components: the title of the interaction (or summary), the likelihood of it occurring, the severity of the interaction, and a detailed description of the interaction (see Figure 4.3). Values for likelihood range from unlikely, possible, probable, to likely; and values for severity range from mild, moderate, to high.

        All interactions are calculated globally instead of against each other. This behaviour will show interactions between products outside of the Fullscript ecosystem (i.e. pharmaceuticals).

Screenshot_1481908179.png

Figure 4.3: The interface for viewing interactions.

5.3 Calculating monographs

The technique of using a window to parse and lookup monographs on a product’s description proves to be quite successful (discussed below). This technique also proves to be quite fast. By performing approximately 4n + 6 lookups, and each lookup taking O(1), it is evident that the calculation algorithm runs in O(n).

However, while it appears that most results are positive there are two problems with the monograph calculation. The first is not accounting for negation. Some product descriptions will explicitly say, “does not include…” followed by a list of ingredients; a monograph calculation on such a product description will in fact produce monographs which are explicitly stated to not be present in the product. The second problem is that the monograph calculation does not account for dosages. For example, a product description might read, “Calcium 280 mg”, but if another monograph with a lower dosage amount is mentioned more frequently, then it will be weighted more heavily. This problem also exists because occasionally an ingredient is mentioned by multiple names. For example, the description “Vitamin E (as d-Alpha Tocopheryl) 10mg”, will  create two hits for vitamin E.

These problems can be solved by windowing on sentences instead of the entire text. The algorithm can then issue rules such as: a monograph should only be present once in a sentence (ignore multiple occurrences), and to check if the sentence has a negation (i.e. “does not contain,” “has no,” “no”). The problem of dosage amounts is more difficult as monographs have different standard dosages. For example, if calcium has a standard dosage of 1000mg and vitamin D has a standard dosage of 200 mg, calcium should not be weighted higher. However, the standard dosage amount is not something available within the system, nor is it available in the Natural Medicine Comprehensive Database.

5.4 Monograph accuracy

To determine whether or not the monograph calculation was accurate, 15 random products (Table 4.1), along with ten of the most popular products (Table 4.2) were calculated. Degrees of accuracy are measured from very low, low, moderate, high, very high. Determining the accuracy rating was done by comparing the product’s name and description with its three most frequent monographs.

Figure 4.1 depicts the accuracy distribution of the monograph calculations. It can be noted that there are only two products that produced a low or very low result, with the majority of calculations rating as high. The very low, low, and moderate results can be improved by altering the lookup algorithm, and is not a limitation of the system. Out of the 25 selected products, one (TMG 50 gms) produced a “negative” result. The product’s top three monographs were derived from a negation, that is, the description stated that the top three monographs were not present as ingredients in the product.

Product name

Top 3 Monographs

Accuracy

Circulatory Tonic

burdock, organic food, hawthorn

high

Red Root 4 oz[20]

bloodroot, organic food, new jersey tea

high

TMG 50 gms[21]

betaine hydrochloride, glycine

Very low

Devil's Claw 500 mg[22]

devil's claw, soy, wheat bran

moderate

Trehalose Complex

pectin, ribose, wheat bran

high

Eskimo® PurEFA™ 1000 mg

lecithin, glycerol, epa (eicosapentaenoic acid)

high

B-50 Complex 50mg 100 tabs

choline, vitamin b12, thiamine (vitamin b1)

high

Vitamin A 10,000 IU[23]

vitamin a, cod liver oil, wheat bran

Very high

HistaEze - CA ONLY

potassium, stinging nettle, vitamin c (ascorbic acid)

high

Balancing Gel Cleanser

sage, willow bark, salvia divinorum

high

Energy Formula Pro

stinging nettle, ginseng, panax, ginkgo

Very high

Lecithin 1200 mg

lecithin, soy, inositol

Very high

Genoma EQ[24]

saw palmetto, stinging nettle, magnesium

high

Middle Mover 2 oz

bitter orange, pinellia ternata, sweet orange

high

Ultra Preventive® III w Iron[25]

vitamin e, diosmin, pantothenic acid (vitamin b5)

low

Table 4.1: 15 random products


Product name

Top 3 Monographs

Accuracy

Cortisol Manager™[26]

magnolia, ashwagandha, theanine

moderate

Ther-Biotic Complete[27]

lactobacillus, bifidobacteria, peanut oil

Very high

O.N.E. Multivitamin

vitamin c (ascorbic acid), vitamin a, pyridoxine (vitamin b6)

moderate

Active B Complex

vitamin b12, pyridoxine (vitamin b6), thiamine (vitamin b1)

Very high

D-5,000

vitamin d, silicon, branched-chain amino acids

high

Vitamin D/K2 Liquid 1 oz

vitamin k, vitamin d, vitamin e

high

MethylGuard Plus

vitamin b12, pyridoxine (vitamin b6), calcium

high

Basic Nutrients 2/Day

vitamin e, calcium, vitamin b12

high

Basic Prenatal

pyridoxine (vitamin b6), folic acid, calcium

moderate

B-Complex Plus

pyridoxine (vitamin b6), pantothenic acid (vitamin b5), vitamin c (ascorbic acid)

moderate

Table 4.2: Top 10 products on Fullscript

Figure 4.4:Monograph accuracy distribution

6 Conclusion

The completed system accomplishes its main goals. It creates a bridge between Fullscript and Natural Medicine, an API layer to hook into, a proof of concept application. And most importantly, the ability to create safer and more beneficial treatment plans. In Figure 4.3, the interaction between grapefruit and BuSpar  can be seen. This interaction between a Fullscript product and a pharmaceutical was not determinable before the completion of this system.

The motivation for creating this system was largely influenced by the Elixir programming language. It was predicted that Elixir would be a good candidate for this problem which proved to be true. From its package management and dependency system, to its ability to create robust system, to its ability to easily create concurrent processes, Elixir proved to be an invaluable tool in the creation of this system.

The system is by no means production ready, but it serves its main purpose as a proof of concept, paving the way for a production level application.

6.1 Future Work

There are many improvements that can be made to the system: improving the monograph calculation algorithm (discussed previously), improving the mobile application interface, and improving the Monograph API’s functionality and responses.

        On the subject of improving the mobile UI, several change can be made. For instance, selecting products is only possible within a single search. That is to say, it is impossible to search for a product, select it, and then perform another search. This improvement is only a limitation of the mobile IU and not the system itself however. Improvements can also be made to the interactions view. Currently, it only displays a list of interactions with no direct link to a monograph or product, thus making it difficult to directly attribute an interaction to a single product (when selecting multiple products).

        On the subject of improving  the server, again, several improvements can be made. The server currently does not perform any kind of authentication and is thus completely open. Ideally, all requests would be validated with a Fullscript API token (if using Fullscript products), and a Natural Medicine token (as the one used in this system would not be valid for other parties). Basic security improvements can also be made by serving the API over an SSL connection.

        The Natural Medicine API is limited in the context of checking multiple interactions against one another. As such, it would be desirable for this system to fulfill that functionality. This calculation is possible by requesting all interactions between products, and then determining if any product monographs are found in the interaction’s description. Then it is possible to filter only those interactions that have monographs relating to a interaction (i.e. product A’s monographs are present in product B’s interaction descriptions or vice versa).


7 Appendix

7.1 Monograph Data File

The following is an example of a monograph datafile.

andiroba.txt

ANDIROBA

Andiroba-Saruba

Bastard Mahogany

Brazilian Mahogany

Caoba Bastarda

Caoba del Brasil

Caobilla

Carapa

Carapa Rouge

Cedro

Cedro Macho

Crabwood

Iandirova

Mahogany

Najesí

Requia

Carapa guianensis


7.2 Text Analysis Resultant Structure

The resultant structure is in the following format:

“Pesudo” ABNF

Example structure

return = { product_name, results }

results = [

   window_results(4),window_results(3),

   window_results(2),window_results(1)]

window_results(n) =

    [ ]

    [ monograph_hit ]

    [ monograph_hit, window_results(n) ]

monograph_hit = { matching_text, monograph }

product_name = value

matching_text = value

monograph = value

list =

    [ ]

    [ value, list ]

tuple = { value, value }

value = “0”-”9” / “a-z” / “ “ / “(“ / “)” / “-”

{"B-1 Thiamine HCL 250 mg 100 vtabs",
[
  [],
# window size 4
  [],
# window size 3

  # window size 2
 [{
"thiamine hcl", "thiamine (vitamin b1)"},
  {
"vitamin b-1", "thiamine (vitamin b1)"}],

 [{
"thiamine", "thiamine (vitamin b1)"},
  {
"thiamin", "thiamine (vitamin b1)"},
  {
"pyruvate", "pyruvate"},
  {
"alpha-ketoglutarate", "alpha-ketoglutarate"},
  {
"thiamin", "thiamine (vitamin b1)"},
  {
"thiamin", "thiamine (vitamin b1)"},
  {
"thiamin", "thiamine (vitamin b1)"},
  {
"palm", "palm oil"},
  {
"silica", "silicon"},
  {
"magnesium", "magnesium"}]
 ]
}


7.3 URL Schemes as Structures

URL parameters prefixed with a ? are optional.

7.3.1 Natural Medicine

Base URL https://natmed-api.herokuapp.com/. All requests require an X-API-Key header for authentication.

Path

Parameters

Description

/

The entrypoint of the API. It lists all available routes.

/monographs/:id

limit

page

?name

Returns an list of monographs with corresponding details, or a single monograph if :id is provided

/interactions/:id

?monograph_ids

Returns all interactions relating to to the provided monograph_ids. Note: it does not calculate interactions between monographs.

7.3.2 Monograph API

Path

Parameters

Description

/products/:id

?limit

?shuffle

?search[q]

Returns an list of monographs with corresponding details, or a single monograph if :id is provided

/interactions

product_ids

Returns all interactions relating to to the provided product_ids.

/sockets

Accepts custom parameters

Used for establish a WebSocket connection


7.4 Open source dependencies

Project

Description

Website

License

Elixir

The language itself

http://elixir-lang.org/

Apache 2

Poison

Elixir JSON parser

https://github.com/devinus/poison

CC0 1.0

HTTPoison

HTTP client

https://github.com/edgurgel/httpoison

MIT

Phoenix

Web framework

http://www.phoenixframework.org/

MIT

Ecto

Database ORM/tooling

https://github.com/elixir-ecto/ecto

Apache 2

Mariaex

MySQL adapter for Ecto

https://github.com/xerions/mariaex

Apache 2

gettext

Internationalization

https://github.com/elixir-lang/gettext

Apache 2

Cowboy

HTTP Server

https://github.com/ninenines/cowboy

https://github.com/ninenines/cowboy/blob/master/LICENSE

7.5 Resources Used by not cited in report

Resource

Description

Website

PlantUML

Language for creating sequence diagrams

http://plantuml.com/

Android Developer Resources

Android developer references

https://developer.android.com/index.html


[1] "Development & Approval Process (Drugs) - FDA." 29 Jan. 2016, http://www.fda.gov/Drugs/DevelopmentApprovalProcess/. Accessed Dec. 2016.

[2] "How Drugs are Reviewed in Canada." 15 Dec. 2015, http://www.hc-sc.gc.ca/dhp-mps/prodpharma/activit/fs-fi/reviewfs_examenfd-eng.php. Accessed Dec. 2016.

[3] "Nutraceuticals / Functional Foods and Health Claims on ...." 4 Oct. 2002, http://www.hc-sc.gc.ca/fn-an/label-etiquet/claims-reclam/nutra-funct_foods-nutra-fonct_aliment-eng.php. Accessed Dec. 2016.

[4] "Labeling & Nutrition - FDA." http://www.fda.gov/Food/IngredientsPackagingLabeling/LabelingNutrition/. Accessed Dec. 2016.

[5] Advokat, Claire D., Joseph E. Comaty, and Robert M. Julien. Julien's Primer of Drug Action: A Comprehensive Guide to the Actions, Uses, and Side Effects of Psychoactive Drugs. New York: Worth, 2014. 35. Print.

[6] Sahu, Nihal. "An Interview with Elixir Creator José Valim." SitePoint. 30 Nov. 2015. Web. Dec. 2016.        

[7] [Next Day Video]. (2013, July 12). 26 years with Erlang or How I got my grey hairs. [Video File]. Retrieved from https://www.youtube.com/watch?v=HCwRGHj5jOE.

[8] Sahu, Nihal. "An Interview with Elixir Creator José Valim." SitePoint. 30 Nov. 2015. Web. Dec. 2016.        

[9] Functions will be referred to in the following format: {Module name}.{function name}/{arity}

[10] An endpoint in this case refers to the URL path that represents a resource.

[11] Only the name of the monograph can be used for this query, alternate-names and scientific-names (discussed further down) cannot be used.

[12] Reducing the list of all known monographs to the set related to a product will be known as calculating the monograph.

[13] "Pagination - Wikipedia." https://en.wikipedia.org/wiki/Pagination. Accessed Dec. 2016.

[14] See appendix for example.

[15] A term in the context represents a string of one to many words separated by whitespace or punctuation.

[16] See appendix for data structure.

[17] "WebSockets - Web APIs | MDN - Mozilla Developer Network." 6 Nov. 2016, https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API. Accessed Dec. 2016.

[18] GenServer means “generic server” it is provided

[19] Note these requests were made to a local IP.

[20] New Jersey Tea is the prime monograph for redroot and red root.

[21] This is an example of a negation, in this case the product does not contain betaine hydrochloride

[22] There are two instance of this product in the database with slightly different descriptions. Devil’s root is the primary monograph, however the latter two are uncertain.

[23] Similar to devil's root, there are four different descriptions for this product, all four have very similar monographs.

[24] This product identifies stinging nettle correctly under the name Urtica dioica Root.

[25] This product has a very large ingredient list, the top 3 monographs are probably not good identifiers for the entire product.

[26] This product contains a large list of ingredients -- it is not likely that the top three are representative of the important monographs.

[27] This product has a very clear ingredient list and the top three monographs are correct.