1 of 22

DoctorAI and Medical Embeddings: Developing a Medical Sense

Connor Favreau

Data Science Intern at Providence Health and Services

2 of 22

The Takeaways

  1. What a recurrent neural network is
  2. How to generate word embeddings from the word2vec skipgram algorithm
  3. That word embeddings can provide a good “time-averaged” sense of an input in a time series and, as such, can be combined with RNNs to yield greater predictive accuracy
  4. That RNNs and embeddings can be applied to time series data beyond words
  5. How RNNs and embeddings can be applied to the medical field

3 of 22

Neural Networks

Input (scalar, vector, matrix)

Hidden (nonlinear activation function, weights, biases)

Calculated Output (scalar, vector, matrix)

Actual Output (scalar, vector, matrix)

Cost Function

Backpropagation

Training

4 of 22

Recurrent Neural Networks

 

Britz D., 2015

5 of 22

Recurrent Neural Network Spin-offs

  • RNNs suffer from vanishing and exploding gradients.
    • tanh and sigmoid derivatives both approach zero from both sides. Gradients get smaller and smaller with later times.

LSTMs (Long Short Term Memory)

  • First proposed in 1997.
  • Most robust, and best at capturing non-adjacent sequence dependencies.
  • More parameters to train.

GRUs (Gated Recurrent Units)

  • First proposed in 2014.
  • Less parameters to train than LSTM.

6 of 22

Adding Embeddings (word2vec/skipgram) Into RNNs

  • Traditionally could use one-hot vectors for inputs and outputs.
  • Instead could perform word2vec embeddings first, based on the co-occurrence of items.
  • Ex.: Predicting words in a sentence.

context

context

The cat chased a bird flying through the window.

target

The cat chased a bird __?__.

7 of 22

The Word2Vec Skip-Gram Algorithm

  • A text-learning neural net algorithm that maps words to vectors.
  • Each vector roughly represents the conditional probability of a word appearing near the other words in the vocabulary.
  • First introduced in Mikolov et al. 2013 paper “Efficient Estimation of Word Representations in Vector Space”
  • Skip-gram task: Predict the target word from the context words.

The cat chased a bird flying through the window.

context

context

target

 

8 of 22

What are Embeddings?

Mikolov et al., 2013b

Mikolov et al., 2013c

9 of 22

Applications

  • Natural Language Processing
    • Sentences as time sequences
    • Can train a RNN to predict the next word in a sentence and to generate text.
      • Shakespeare
  • Speech Recognition
  • Medical predictions

10 of 22

Recurrent Neural Networks and Embeddings in the Medical Field

11 of 22

Medical Data Usable for RNNs

  • ICD-9 and ICD-10 codes
    • “International Statistical Classification of Diseases and Health Related Problems”
    • Alphanumeric codes for patient diagnosis.
    • ICD-10 replaced ICD-9 in October 2015, and has over 69,000 unique codes.
    • Ex.: “F25.0” corresponds to “cyclic schizophrenia”
  • Current Procedural Terminology (CPT) codes
    • Thousands of five-character codes describing medical events.
    • Typically used in outpatient processing and billing.
    • Ex.: “2014F” corresponds to “Mental status assessed”
  • General Product Identifer (GPI) codes
    • 14-character codes to identify drugs.
    • Ex.: “6410001000” corresponds to “Aspirin”
  • Unstructured notes

12 of 22

DoctorAI: Medical Predictions from a GRU Framework

Choi E. et al., 2015

13 of 22

GRU Networks

Update Gate

Reset Gate

How much to update by

How much value to assign new inputs versus previous layer

Schraudolph N., 2014

Hsu C., 2017

14 of 22

From GRU to Predictions

 

 

  • y is the situational prediction at the next time (also the next input)
    • Use softmax to reflect co-occurring probabilities
  • d is the prediction of the time duration until the next visit (also the next input)
    • Use ReLU/max() to reflect the “unbounded” nature of time

Guo B., 2016

15 of 22

For Training

cross-entropy, summing for each time

Square between predicted and actual duration

Goal: Minimize this cost function

16 of 22

Medical Embeddings… Not Just Words in Sentences

  • Can perform “word2vec” skip-gram training based on co-occurrence of codes and/or text in specific “bins” or periods in time.

Collection of a patient’s codes over a given period of time

Choi Y. et al., 2016

17 of 22

Med2Vec

  • Converts codes (ICD, CPT, medication) and demographics information into lower-dimensional embeddings, that can be fed into doctorAI for predictions.
  • Based on word2vec skip-gram model.
  • Written in Python/Theano.

Choi E. et al., 2016

18 of 22

Medical Embeddings from Text

  • Annotate text to note list of concepts, including whether they are negated.
  • Perform skip-gram/co-occurrence training (word2vec).
  • Previous works have binned notes by patient by time frames extending from 1 day bins to 1 year bins

Finlayson et al., 2014

19 of 22

Medical Embeddings from Text

Choi Y. et al., 2016

20 of 22

DoctorAI Results

Choi E. et al., 2016

21 of 22

The Takeaways

  1. What a recurrent neural network is
  2. How to generate word embeddings from the word2vec skipgram algorithm
  3. That word embeddings can provide a good “time-averaged” sense of an input in a time series and, as such, can be combined with RNNs to yield greater predictive accuracy
  4. That RNNs and embeddings can be applied to time series data beyond words
  5. How RNNs and embeddings can be applied to the medical field

22 of 22

References/Good Links

  1. Britz D., 2015. Recurrent Neural Networks Tutorial [Blog Post]; WildML – Artificial Intelligence, Deep Learning, and NLP. http://www.wildml.com/
  2. Choi E., Bahadori MT. Sun J. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks. CoRR. 2015; abs/1511.05942. https://arxiv.org/pdf/1511.05942.pdf
  3. Choi E., Taha Bahadori M., Searles E., Coffey C., and Jimeng S. Multi-layer representation learning for medical concepts. In KDD, 2016a. http://www.kdd.org/kdd2016/papers/files/rpp0303-choiA.pdf
  4. Choi Y., Yi-I Chiu C., and Sontag D., 2016. Learning Low Dimensional Representations of Medical Concepts. AMIA Jt Summits Transl Sci Proc., 2016: 41-50, 2016. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5001761/
  5. De Vine L., Zuccon G., Koopman B., Sitbon L., Bruza P. Medical Semantic Similarity with a Neural Language Model; Proceedings of CIKM ’14; New York, NY, USA: ACM; 2014. pp. 1819–1822.
  6. Finlayson S., LePendu P., Shah N. Data from: Building the graph of medicine from millions of clinical narratives. Dryad Digital Repository. 2014.
  7. Guo B., 2016. ReLu compared against Sigmoid, Softmax, Tanh [Blog Post]; Quora. https://algorithmsdatascience.quora.com/ReLu-compared-against-Sigmoid-Softmax-Tanh
  8. Hsu C., 2017. Logistic Regression: Sigmoid Function Explained in Plain English [Blog Post]; LinkedIn. https://www.linkedin.com/pulse/logistic-regression-sigmoid-function-explained-plain-english-hsu
  9. Mikolov T., Chen K., Corrado G., Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. https://arxiv.org/pdf/1301.3781.pdf
  10. Mikolov T., Sutskever I., Chen K., Corrado G., Dean, J., 2013. Distributed Representations of Words and Phrases and their Compositionality. arXiv preprint arXiv:1310.4546v1. https://arxiv.org/pdf/1310.4546.pdf
  11. Mikolov T., Yih W., Zweig G., 2013. Linguistic Regularities in Continuous Space Word Representations. http://www.aclweb.org/anthology/N13-1090
  12. Rong X., 2016. word2vec Parameter Learning Explained. arXiv preprint arXiv:1411.2738v4 . https://arxiv.org/pdf/1411.2738.pdf
  13. Schraudolph N., 2014. Multi-layer networks. https://nic.schraudolph.org/teach/NNcourse/multilayer.html
  14. SyTrue, 2015. Why Structured Data Holds the Key to Intelligent Healthcare Systems. http://hitconsultant.net/2015/03/31/tapping-unstructured-data-healthcares-biggest-hurdle-realized/