1 of 56

A Rigorous Framework for Trial Design via Simulation

September 8, 2021

�Michael Sklar

Stein Postdoctoral Fellow at Stanford

https://mikesklar.github.io/thesis/

2 of 56

The Innovation Process ?

Clinician

Statistician

New Design

Pharma Runs Trial

3 of 56

The Innovation Process

New Design

Pharma Runs Trial

Journal Submission

4 of 56

The Innovation Process

New Design

Pharma Runs Trial

Journal Submission

Persuade Pharma Decisionmakers

5 of 56

The Innovation Process

New Design

Pharma Runs Trial

Journal Submission

Persuade Pharma Decisionmakers

High Reward/

Low Execution Risk

6 of 56

The Innovation Process

New Design

Pharma Runs Trial

Journal Submission

Persuade Pharma Decisionmakers

High Reward/

Low Execution Risk

Low Risk of Slow Processing or Denial

7 of 56

The Innovation Process

New Design

Pharma Runs Trial

Journal Submission

Persuade Pharma Decisionmakers

Persuade FDA Decisionmakers

High Reward/

Low Execution Risk

Low Risk of Slow Processing or Denial

8 of 56

The Innovation Process

New Design

Journal Submission

Persuade Pharma Decisionmakers

High Reward/

Low Execution Risk

Pass FDA validation and negotiations

Type I Error Proof

Low Risk of Slow Processing or Denial

Pharma Runs Trial

Persuade FDA Decisionmakers

9 of 56

The Innovation Process

New Design

Journal Submission

Persuade Pharma Decisionmakers

High Reward/

Low Execution Risk

Pass FDA validation and negotiations

Type I Error Proof

Low Risk of Slow Processing or Denial

Pharma Runs Trial

Persuade FDA Decisionmakers

10 of 56

The Innovation Process

New Design

Journal Submission

Persuade Pharma Decisionmakers

High Reward/

Low Execution Risk

Pass FDA validation and negotiations

Type I Error Proof

Low Risk of Slow Processing or Denial

Pharma Runs Trial

Persuade FDA Decisionmakers

11 of 56

Simulation slices out pain points

New Design

Journal Submission

Persuade Pharma Decisionmakers

High Reward/

Low Execution Risk

Pass FDA validation and negotiations

Type I Error Proof

Low Risk of Slow Processing or Denial

Pharma Runs Trial

Persuade FDA

Decisionmakers

12 of 56

(Lifted from FDA website – John Scott, 2018)

13 of 56

14 of 56

15 of 56

A Rigorous Framework for Type I Error Control in Complex Trial Design

Idea: Provably control Type I error by filling in the gaps between simulation grid-points

Can this speed up validation?

16 of 56

What could automated validation do?

Reduce burden on regulators
Speed up validation for pharma & consumers
Increase predictability for designers

Proof can answer concerns about powerful “black box” optimizations moving Type I Error to places where it isn’t checked

17 of 56

Challenges

Massive computational power

[Example uses 10^10 simulations for 2 unknown parameters]
Difficulty quickly increases with complexity and # of parameters

Regulatory agreement on the model class for simulations

Well-behaved model for the data (exponential family)

Includes Gaussian, binomial, exponential, gamma, some Weibull models
(Censoring and adaptive sampling are OK)

Justification for focusing on a compact part of null hypothesis space

Of course, we are not going to knock out the entire class of complex designs.��-For lots of designs, it’s just not going to work.�-Now, computers are always getting more powerful, so there is a long-term interest here. ��I don’t know whether this will max out at a 3-arm trial, and only go up another arm per year.

_____��[List other issues]�____�Note that two issues here – 2 and 4, are more general problems with simulations, that I imagine the CID team is facing in its current cases�____��Narrowing in on the model is a challenging issue! It’s something I’m sure CID is finding ways to deal with in their current work, and I really look forward to seeing cases and standards for what is acceptable.��Similar for justifying why we can limit our attention to a bounded region.�Which can involve a math proof, or constraints of reality.��

18 of 56

Roadmap for the Talk:

How to prove Type I Error control over continuous space with simulation
Examples
Challenges, recommendations, and additional features

Tuning critical values
Re-design and Platform trials

Questions
Discussion on possibilities for software implementation

…

Further math details

19 of 56

Consider the most basic test

20 of 56

Consider the most basic test

21 of 56

Zoom in on Type I Error near the Ho boundary

22 of 56

Assume we know the exact Type I Error only at a few points…

23 of 56

Assume we know the exact Type I Error only at a few points…

24 of 56

Assume we know the true derivative of Type I Error at those points too

25 of 56

First-order approximation is close but not conservative

26 of 56

Taylor’s Theorem Describes the Error

27 of 56

But using a worst-case bound on the second derivative, we get a conservative approximation

28 of 56

Double the number of simulation points

29 of 56

Double the number of simulation points

30 of 56

Double the number of simulation points

31 of 56

Double the number of simulation points

32 of 56

Monte Carlo simulations leaves uncertainty

33 of 56

Monte Carlo simulations leaves uncertainty

34 of 56

Key Steps:

Find a general upper bound to the second derivative
Construct confidence intervals with the Monte Carlo simulation for the point estimate of Type I Error and the derivative

Give each interval a confidence of 1 - 𝛿 / 2

Maximize our Taylor expansion upper bound over these confidence intervals

Result: A (pointwise) 1 - 𝛿 upper confidence bound to the true Type I Error function

35 of 56

Example: FWER for Two Arms

What if you did two independent z-tests?

Here is a display of the true FWER, as a function of the parameters

36 of 56

Our 99% confidence upper bound

37 of 56

Adaptive Trial Example: Thompson Sampling

Two arms
Bernoulli (𝛉i) outcomes
Ho : 𝛉i < .6
N = 100
Beta(1,1) prior

Reject arm i at the end if posterior�P(𝛉 i > .6) > 95%

With a cluster:�~800,000 Monte Carlo samples per point

~16,000 grid points

And it turns out this method really flexible. We can use it for highly adaptive trials.��This one is about as adaptive of a trial as it gets.��We’re doing something like a Bayesian Phase II in oncology – using Thompson sampling, aka posterior sampling.

�We have two arms with binomial outcomes, with a Bayesian rejection at the end��You can see why this is hard to handle analytically with math, because it has a kink.��So you are usually forced into simulating like this to figure out the operating characteristics; now we have proof as well. ��That’s not to say this is easy with these methods either; ��here we’re doing 800,000 samples per point; and we’re looking at 16,000 grid points here. That’s 10 billion simulations, and I had to use Stanford resources.��

38 of 56

Setting: Data is Exponential Family

Includes your favorite R.V.’s : Gaussian, Binomial, Exponential, Gamma, Weibull, …

Adaptive Data Collection is OK!
Censored Data is OK!

Applies to Gaussian Processes, Brownian Motion

Hence, for the limiting distribution of max-likelihood estimates with i.i.d data
Or, Cox regression under a proportional hazards assumption

39 of 56

Assumptions

40 of 56

A workflow issue?

41 of 56

Solution:

Tune our rejection threshold while we perform the validation to guarantee an overall .025 bound

Key Idea: Use a monotone family of rejection rules, and use only one set of Monte Carlo simulations to find the rejection rule which exactly hits the .025 bound for that set of sims

(Details in thesis)

�

42 of 56

Suggestion for discussion:

Should a ledger be developed for recording and locking in the outcomes of Monte Carlo simulations?

This would prevent any gaming involving re-running of simulations.

43 of 56

Flexible re-design

Having an accurate understanding of Type I Error means we could apply conditional error Type I Error arguments to multi-dimensional hypothesis spaces

So, in theory, one could change to a new design which is provably below the old design’s Type I Error profile

A variation of this argument lets us add new arms to the trial. Perhaps this is usable for platform trials

(Further details in thesis)

44 of 56

Open floor for questions

45 of 56

Questions for you

What part of this work appeals to you the most?

What problems would you suggest to focus on next?

Recommendations for proof-of-concept applications?

46 of 56

Possibility: A high-speed validation pipeline

Can we set out a large class of designs, where software validation of Type I Error removes the need for any further negotiation and mathematical review by FDA statisticians?�

- “Always approved” model classes

-Binomial Outcomes

-Asymptotic gaussian statistics

-Depending on context, a class of survival models:

-proportional hazards (for log-rank and cox models)

-exponential or gamma survival distributions

47 of 56

END OF SEPT 8^th TALK. �See presenter notes on this slide for detailed notes on the seminar and followup discussion

John Scott generously arranged a 90 minute block for the seminar, followed by a 60 minute small group discussion. Due to a glitch in the webinar software, we started 30 minutes late and lost a good chunk of potential attendees, but still had 50-60 people through the talk. Fortunately the "small group" discussion had nearly 20 people in it, with lots of active discussion.��It is hard to say how well the seminar was received, since no audience audio or video was available, and I only received a small number of questions/comments; but in any case the most important thing was exposing FDA to the ideas, so that they can recognize the paradigm in the future and not reject it for unfamiliarity.��I think the small group went great. I got very useful feedback - they brought up a few barriers in discussion which I think are real, but overall quite surmountable.��One reviewer pointed out that in their approval process, they will require not just Type I Error control, but inference on the point estimate; they mentioned wanting bias and variance/MSE of the point estimate. NEW EDIT: Since then, I have extended the framework to work for FDR, and bias and mean-squared-error of bounded estimators��As mentioned above: John Scott indicated he would like to see a paper published, preferably on a relatively straightforward example - such as a seamless selection design with a simple binomial model - with all the details worked out and code published. This is as expected. I will need to find a fast yet appropriate journal.��But what did surprise me, is that the FDA is actually set up to run simulations and typically validates a subset of simulations for themselves, with their own cluster; I will be putting more thought into how to manage a simulation structure that they can verify quickly and efficiently, since their computer scientists seem eager to do this.��I did not manage to extract much in the way of characterization of the space of models that they would accept, nor any direct endorsements. (As should be expected for careful regulators, I suppose...). On these topics, they referred me to the cases they have completed through the CID program. Unfortunately, their publishing of these cases appears to be behind schedule - there are zero posted so far. I cannot approach survival analysis modeling yet, until I know what kinds of models they will accept. I hope this resolves soon, else I may have to snoop around from the pharma side to find out the sponsors and methods.��____________��Notes taken during FDA discussion;��They are running all of the simulations themselves????

�Curve-fitting optimization problem for Type I Error

Review work and drug development

How easy is it to set up an optimization for a given problem��- to what degree can it be automated. Can it cover all designs of the same type

How easy is it to audit that this has been done properly

�[AI workstations – big nodes in a really high performance computing cluster]��Software that FDA uses – FDA and python are interpreted as opposed to compiled��[numba package? JIT compiled?]

�Cloud instance – have to use something called FEDRamp – that’s a pain, to be honest!��Not quite at the place where Fda can use cloud resources for the clinical trial design data.��Binomial endpoint selection design – writing up a paper that works through, in detail, the approach in this framework. Links to code!��Talking to simulation software – Cytel!? What are the problems they think are hard��R in pharma:��Merck has supported open-source.

48 of 56

Returning to the math

49 of 56

Underlying Idea: Taylor Expansion

where

50 of 56

Monte Carlo on grid points

Can get estimates with Monte Carlo

51 of 56

52 of 56

53 of 56

54 of 56

What about ?

Martingales + Upper Bounds on Sample Sizes

Bound on the covariance matrix of

55 of 56

What about ?

56 of 56

Further questions?

End slides