Tutorial on PSSP:

Patient-Specific Survival Prediction

PSSP is an automated software tool that can learn (from a dataset of historical patients) the patterns that allow it to produce “personalized” survival curves for novel patients, which incorporate all of the features of each patient -- hence, this survival PREDICTION task is fundamentally different from survival ANALYSIS. This document summarizes the PSSP website http://pssp.srv.ualberta.ca/ , and also explains, at a high level, why PSSP’s predictions are meaningful. See [Yu et al, 2011] for more details about PSSP itself, and see http://pssp.srv.ualberta.ca/home/instructions for instructions on how to use this website. Section 7 also provides some background.

 1. Overview

This is still a draft … please send comments to rgreiner @ ualberta . ca

1. Overview

To get started in understanding PSSP, go to

which shows the predictors learned from the

Northern Alberta Cancer Data (L2)

Here, clicking

which shows (at the top) the "Survival Curves" of 15 of the 2402 patients, as well as the overall Kaplan-Meier plot (dotted in grey).

The line at y-value of “50” shows the median survival times predicted for these patients, showing that PSSP predicts that some patients will live much shorter than KM's median of about 21 months, while others should live much longer:  The bottom "tan" patient has a median survival time of only around 3 months, while the upper darker-tan patient suggests over 80 months survival.

We can also see the predictions for an individual patient by clicking on one of these lines.  Here, clicking on the bottom tan line leads to http://pssp.srv.ualberta.ca/subjects/209452 (Subject#347):

In general, each (x,y) point on this curve (the dark-red line) is a prediction: that there is a y% probability that this patient (here Subject#347) will live at least x months from the start time. Hence, the point (3.0, 50) means that PSSP predicts there is a 50% chance that Subject#347 will live at least 3.0 months -- ie, this 3.0 is her median survival time.   The solid vertical line means that Subject#347 actually survived 2.33 months, which is very close to that median.

The grey line is the Kaplan-Meier plot for this person (based on the ⅘ of the patients NOT in Subject#347’s cross-validation fold).

You can also zoom into on the figure, to see more details about the graph, etc.

This survival curve was computed by running the model, learned by PSSP on a set of earlier patients, on this Subject#347, based on her features.  To see these values (which might help understand why this person died earlier than most), you can scroll down on the page, to see

### Feature Values for this Subject

which lists the 51 feature values for this patient.

Going back a page and clicking on the highest (darker tan) line, leads to http://pssp.srv.ualberta.ca/subjects/210174  (Subject#1069).

Again this graph was predicted by PSSP, based on the 51 characteristics for this patient, shown on this page.

Going back and selecting the rust-colored line, around the “middle”, leads to http://pssp.srv.ualberta.ca/subjects/210189  (Subject#1084).

This person was ALIVE at her last visit, at around 54.1 months -- ie, was censored (note the vertical line there is dotted).

Going back to the “Prediction page”

we note this top figure shows the curves of of only 15 patients. It is enough to show one important difference between PSSP and the Cox Proportional Hazard model: note that these survival lines do cross, while those produced by the Cox PH model will not.

We can examine any of the 2402 patients in this study, using the

Individual Predictions

table below.  Note that you can sort these entries by any of the fields [Censored; Event Time; Predicted % P(Label); Predicted Median Survival; Absolute Error], and also select any patient by clicking the associated  Details button.  You can download a CSV file of all 2402 predictions

### 2. Calibrated Distribution

Consider Subject#2050 (http://pssp.srv.ualberta.ca/subjects/211155), whose predicted plot:

claims that this patient has a 75% chance of living another 6.73 months, a 50% chance of living 10.40 months, and a 25% chance of living 17.51 months.

Why should the patient believe these predictions?  If we had previously observed 1000 patients exactly identical to this Subject#2050, we could compare these predictions to the actual survival times, and believe the prediction if they were accurate -- ie, if around 250 die in the first 6.73 months, another 250 in months 6.74 to 10.4, then another 250 in months 10.5 to 17.51 months, etc.

Of course, we do not have 1000 “copies” of Subject#2050.  But here we do have 1000 other subjects, each with his[1] own characteristic survival curve, including say the 4 curves shown in Fig 1.  For these historical patients, we know the actual event time for each patient, which has a corresponding “event percentage” -- ie, y-value for the associated time.   (Note that all 4 here are uncensored.) See Table 1.  Here, if our predictor is working correctly, we would expect that two of these 4 would pass away before his/her respective median time, and the other 2 after his/her median time.  Indeed, we would actually expect 1 to die in each of the four 25%-quartiles. This actually happened here -- see the final column in Table 1.

Table 1: Description of 4 patients from the NACD Dataset (see also Figure 1)

 Subject ID Event time Event %age Quantile 1069 38.67 90.3 #1 347 2.33 60.1 #2 2050 12.50 39.4 #3 1962 28.10 13.7 #4

Subject#1069

Subject#347

Subject#2050

Subject#1962

This is a fairly crude measure, involving only 4 of the 2402 patients, and only size-25% “bins”.  We can consider a histogram over all 2402 patients, with size-5% bins.  Here, if our PSSP predictor was perfectly “calibrated”, we would expect 5% of the predictions would be in each 5%-bin.

The

button (leading to the figure below) shows that this is essentially true.  (There are some issues in dealing with censored data.)

In general, a model is “calibrated” if  this

“probability that patient#i’s survival curve assigns to patient#i’s event”

is uniform.  Note this will happen when the model corresponds to the “true” survival curve for each patient -- which means this criteria is related to the log-likelihood of the model, which is one of the measures we use to evaluate the quality of the model; see Section 3.

The bottom line, here, is that the model that PSSP learned on this data appears nicely calibrated, which means it is appropriate to tell a patient that he has even odds of surviving until his median date, and so forth, for other Survival-probabilities on the y-axis.

### 3. Global “Accuracy” statistics

After training PSSP on a dataset, it is useful to determine whether the resulting learned model is “accurate”.  We suggested one measure: the “calibration” graph mentioned above, which relates to Log-likelihood.  Returning now to the top-level page http://pssp.srv.ualberta.ca/predictors/34 ,  we see many statistics of our PSSP tool on this dataset, including Log-Likelihood.

 5-Fold Cross-Validation Statistics Measure Your Predictor* Baseline* Concordance Index 0.76 ± 0.02 0.5 ± 0.0 L1 Loss 9.99 ± 0.93 14.06 ± 0.07 L2 Loss 16.33 ± 1.46 18.76 ± 0.1 L1 Log-Loss 0.57 ± 0.03 0.85 ± 0.01 L2 Log-Loss 0.74 ± 0.11 1.07 ± 0.03 Log-Likelihood Loss 3.62 ± 0.18

(Note that each is based on 5-fold cross validation -- ie, train on ⅘ of the data, then evaluate on the remaining ⅕ .. then do this 5 times, and return average ± empirical standard deviation.)

We see that

• The Log-Likelihood Loss explicitly views PSSP as returning a distribution -- ie, the curve for patient#r is plotting
P( d
r > T | fr , ) vs time t
where d
r is the time of death of patient r,  fr is the facts about patient r, and   is the learned values of the parameters.
Its derivative is  p(d
r = T | fr , )
• Let er be the “event” of patient r -- which is either the time of death dr (for uncensored) or the time of the final visit (for censored)
Let U be the indices of the uncensored patients, and C be the indices of the censored patients.
• The likelihood of the uncensored patient = point on survival curve.
• To compute likelihood of censored patients: need to use the derivative
•  The Log Likelihood Loss of a predictor is

• PSSP’s Concordance Index (using the median value for each patient) is significantly better than random guessing.
• A predictor’s Concordance Index considers all comparable pairs of patients (r,s), and gives a score of +1 if patient r actually lived longer than patient s and the predictor predicted this (or vice versa); and a score of 0 if the predicted order does not match the truth.
• over pairs of “comparable” patients
(both uncensored, or
r censored at time after s’s death)
• See Section 5.
• PSSP’s L1 Loss  is significantly better than just assigning each patient to the median survival time:
• The L1 Loss of a predictor is   where  is the true survival time for the i-th patient and  is the median of the patient's predicted survival distribution, over the n uncensored patients being used for evaluation.  (Note this does not use censored patients.)
• PSSP’s L2 Loss is much better than just assigning each patient to the mean survival time;
• The L2 Loss of a predictor is  where  is the true survival time for the i-th patient and  is the mean of the patient's predicted survival distribution, over the n uncensored patients being used for evaluation.
• We also consider the “Log” terms, which deal with evaluating the LOG of the (predicted) survival times; again these results are also significantly better.
• Using the terms defined above, the L1 Log-Loss is    and the L2 Log-Loss is .

For this dataset, all of the measures suggest that this PSSP model is working very well -- certainly significantly better than just chance.

### 4. Classification Accuracy

In some situations, we may have a specific classification task: eg, to predict whether each subject will survive at least 3 years, or not.  PSSP allows us to measure the accuracy of this predictive task, for some user-specified cut-off time.

On the http://pssp.srv.ualberta.ca/predictors/34 webpage, note the

Examine Classification Accuracy

Cutoff:  [  18.4 ]

Enter a cutoff value, and click "Submit" to examine a hypothetical classifier,

predicting either above or below that cutoff.

box.  Clicking the “[Submit]” button leads to http://pssp.srv.ualberta.ca/predictors/34/examine_classification

with the material

 Classifier Performance for Cutoff = 18.4 Subjects Examined * 2096 / 2402 Label Distribution > 18.4 → 1047 (50.0%)≤ 18.4 → 1049 (50.0%) Accuracy 73.3% Precision(Positive Predictive Value) 72.2% Sensitivity (Recall) 75.7% Specificity 70.9%

* Subjects are excluded from analysis if they are censored, and the censored time occurs before the cutoff time.

Here, the “Label distribution” box states that 50% of patients survived more than 18.4 months, and 50% survived less, which means that 18.4 months is the median survival time, over all of the patients in the dataset.  Well, over the 2096 patients that are either uncensored, or censored at a time after 18.4 months.  (The remaining 2402-2096 = 306 patients were censored, with the last event before 18.4 months -- which means we cannot know their “18.4 month survival” status.)

PSSP then computed the median survival time for each of these patients, and predicted “Yes, >18.4 months” if that median time is over 18.4 months, and “No” otherwise.  This predictor has a (5 fold cross-validation) accuracy of 73.3%.  We also see similar values for Precision, Sensitivity and Specificity.  (As these classes are balanced -- ie, the baseline here is 50% -- these other measures are not particularly relevant.)

Returning to http://pssp.srv.ualberta.ca/predictors/34, we can ask for the 1 year survival time by filling in the “Cutoff” box with “12”; here, we see

 Classifier Performance for Cutoff = 12.0 Subjects Examined * 2147 / 2402 Label Distribution > 12.0 → 1373 (63.9%)≤ 12.0 → 774 (36.1%) Accuracy 75.7% Precision(Positive Predictive Value) 78.0% Sensitivity (Recall) 86.3% Specificity 56.8%

-- that is while the baseline had 63.9% accuracy, PSSP did a bit better (at 75.7%). As the classes are not as balanced, it makes sense to consider precision and recall; here we see reasonable values for both.

Similarly we can consider 36-month survival time, or 60-month (ie, 3 year or 5 year), or any other number.

### 5. Concordance Score

Imagine you have a treatment that is expensive, but very likely to work.  We might then want to sort the patients by their respective “expected survival time”, and then allocate this treatment to ones with the shortest anticipated survival times, in order.  Here, we want to know that our measure does this ranking correctly. The concordance measure does this:

It considers all (n choose 2) pairs of patients, and selects the  pairs that are “comparable” -- either both uncensored, or with one censored at time and the other uncensored at an earlier time .  For each such pair of patients, PSSP makes an ordering prediction, specifying which  patient it thinks will die first (here using the mean value of the distribution for each patient).  The concordance score is computed by adding a “1” for each time the prediction is correct, and a “0” for each other, then dividing by m.

Here, we see that the concordance is 76%, which is much better than the baseline of 0.5 .

### 6. Other Datasets

This document has focused on the NACD dataset.  We have also applied PSSP to many other datasets, including two public ones:

Knaus WA, Harrell FE, Lynn J et al. (1995): The SUPPORT prognostic model: Objective estimates of survival for seriously ill hospitalized adults. Annals of Internal Medicine 122:191-203.

The other, currently private, datasets deal with a wide variety of ailments, including liver transplants, breast cancer and brain tumors.

### 7. Foundations

(To be completed)

Survival analysis

Censored

CDF

PDF

? Logistic separator

This curve corresponds to 1-CDF(t)  where CDF(.) is the cummulative distribution function. By clicking Show PDF , one can find the associated PDF (probability distribution function):

The green dots are the specific values directly computed; the purple line is a smoothed version of this.

[1] To simplify our descriptions, we will just use the male gender for the subjects.  Of course, everything continues to work for female subjects.