Model Data Formats

A DDP file describes a dose-response relationship with a limited collection of response outcomes. The dose and response values may be either categorical or sampled at a collection of numerical levels. The format of a DDP file is

<format>,<y-value-1>,<y-value-2>,...

<x-value-1>,p(y1|x1),p(y2|x1),...

<x-value-2>,p(y1|x2),p(y2|x2),...

…

<format> may be one of the following values:

- ddp1 - the p(.) values are simple probabilities (0 < p(.) < 1 and sum p(y|x) = 1)
- ddp2 - the p(.) values are log probabilities

<y-value-1>, …, <y-value-N> and <x-value-1>, …, <x-value-N> are either strings, for named categories or numerical values.

Below is a sample categorical DDP file:

ddp1,live,dead

control,.5,.5

treated,.9,.1

Below is a sample numerical DDP file:

ddp1,-10.0,-.33333333333,3.33333333333,10.0

0.0,0.5,0.5,0.0,0.0

13.3333333333,0.0,0.5,0.5,0.0

26.6666666667,0.0,0.0,0.5,0.5

40.0,0.0,0.0,0.0,0.5

Each line in a model spline file represents a polynomial segment in log-probability space. The format is as follows:

spp1

<x>,<y0>,<y1>,<a0>,<a1>,<a2>

...

Each line describes a segment of a probability distribution of y, conditional on x = <x>. The segment spans from <y0> to <y1>, where the lowest value of <y0> may be -inf, and the highest value of <y1> may be inf. The <x> values may also be categorical or numerical. If they are numerical, it is assumed that these values represent samples of a smoothly varying function (a cubic spline in every y).

The values <a0>,<a1> and <a2> are the polynomial coefficients in y (with quadratic coefficients, only normal or exponential tails are possible). The final segment of the probability function is

exp(a0 + a1 y + a2 y2)

The probability features file has the following format

dpc1,<p-header-1>,<p-header-2>,...

<x-value-1>,g1(y | x1),g2(y | x1),...

<x-value-2>,g1(y | x2),g2(y | x2),...

...

<p-header> headers can be any of the following, with the corresponding values in their rows (<p-value-ij>).

- mean: E y|xi
- var: E (y|xi - E y|xi)2
- sdev: sqrt(E (y|xi - E y|xi)2)
- skew: E ((y|xi - E y|xi) / sqrt(E (y|xi - E y|xi)2))3
- mode: max f(y | xi)
- numeric (0 - 1): F-1(pj|xi)

The row headers (<x-value>) can be numeric, in which case a continuous spline bridges them, or categorical strings.

Below is a sample features file:

dpc1,mean,var

treated,0,1

control,4,4

You can import a collection of Gaussian models at once, with the Normal Model Import tool. The format of the file is:

gmi1,effect,sd

<Name 1>,<mean>,<standard deviation>

<Name 2>,<mean>,<standard deviation>

<Name 3>,<mean>,<standard deviation>

…

A new model will be created for each row, with a name given by the first value in the row.

Multivariate Model

TO BE DESCRIBED

Bin Model

A bin model represents bins of different spans, where the distribution is constant over each bin. It is a combination of information describing the bins and an underlying categorical model of one of the other types.

The format is,

bin1

<x0>,<x1>,<x2>, …

<underlying model>

Example:

bin1

10,29,50

dpc1,mean,sdev

1,0.0314,0.068

2,-0.622,0.068

Delta Model

TO BE DESCRIBED

Mean-Size Model

In Mean-Size models, each point is characterized only by a value and the population size that went into estimating that value. As such, it does not have enough information to generate a full distribution. It can be safely combined with other mean-size models, or approximated with a Gaussian (with a variance which is equal to the absolute value of the mean for size = 1, and a variance that decreases with the square root of the size, according to the Central Limit Theorem).

The format is,

msx1,mean,size

<x0>,<mean0>,<size0>

<x1>,<mean1>,<size1>

...