Guide
Basics
Introduction
A probabilistic program is Julia code wrapped in a @model
macro. It can use arbitrary Julia code, but to ensure correctness of inference it should not have external effects or modify global state. Stack-allocated variables are safe, but mutable heap-allocated objects may lead to subtle bugs when using task copying. To help avoid those we provide a Turing-safe datatype TArray
that can be used to create mutable arrays in Turing programs.
To specify distributions of random variables, Turing programs should use the ~
notation:
x ~ distr
where x
is a symbol and distr
is a distribution. If x
is undefined in the model function, inside the probabilistic program, this puts a random variable named x
, distributed according to distr
, in the current scope. distr
can be a value of any type that implements rand(distr)
, which samples a value from the distribution distr
. If x
is defined, this is used for conditioning in a style similar to Anglican (another PPL). In this case, x
is an observed value, assumed to have been drawn from the distribution distr
. The likelihood is computed using logpdf(distr,y)
. The observe statements should be arranged so that every possible run traverses all of them in exactly the same order. This is equivalent to demanding that they are not placed inside stochastic control flow.
Available inference methods include Importance Sampling (IS), Sequential Monte Carlo (SMC), Particle Gibbs (PG), Hamiltonian Monte Carlo (HMC), Hamiltonian Monte Carlo with Dual Averaging (HMCDA) and The No-U-Turn Sampler (NUTS).
Simple Gaussian Demo
Below is a simple Gaussian demo illustrate the basic usage of Turing.jl.
# Import packages.
using Turing
using StatsPlots
# Define a simple Normal model with unknown mean and variance.
@model gdemo(x, y) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s))
y ~ Normal(m, sqrt(s))
end
Note: As a sanity check, the expectation of s
is 49/24 (2.04166666…) and the expectation of m
is 7/6 (1.16666666…).
We can perform inference by using the sample
function, the first argument of which is our probabalistic program and the second of which is a sampler. More information on each sampler is located in the API.
# Run sampler, collect results.
c1 = sample(gdemo(1.5, 2), SMC(1000))
c2 = sample(gdemo(1.5, 2), PG(10,1000))
c3 = sample(gdemo(1.5, 2), HMC(1000, 0.1, 5))
c4 = sample(gdemo(1.5, 2), Gibbs(1000, PG(10, 2, :m), HMC(2, 0.1, 5, :s)))
c5 = sample(gdemo(1.5, 2), HMCDA(1000, 0.15, 0.65))
c6 = sample(gdemo(1.5, 2), NUTS(1000, 0.65))
The MCMCChain
module (which is re-exported by Turing) provides plotting tools for the Chain
objects returned by a sample
function. See the MCMCChain repository for more information on the suite of tools available for diagnosing MCMC chains.
# Summarise results
describe(c3)
# Plot results
plot(c3)
savefig("gdemo-plot.png")
The arguments for each sampler are:
- SMC: number of particles.
- PG: number of particles, number of iterations.
- HMC: number of samples, leapfrog step size, leapfrog step numbers.
- Gibbs: number of samples, component sampler 1, component sampler 2, …
- HMCDA: number of samples, total leapfrog length, target accept ratio.
- NUTS: number of samples, target accept ratio.
For detailed information on the samplers, please review Turing.jl’s API documentation.
Modelling Syntax Explained
Using this syntax, a probabilistic model is defined in Turing. The model function generated by Turing can then be used to condition the model onto data. Subsequently, the sample function can be used to generate samples from the posterior distribution.
In the following example, the defined model is conditioned to the date (arg1 = 1, arg2 = 2) by passing (1, 2) to the model function.
@model model_name(arg_1, arg_2) = begin
...
end
The conditioned model can then be passed onto the sample function to run posterior inference.
model_func = model_name(1, 2)
chn = sample(model_func, HMC(..)) # Perform inference by sampling using HMC.
The returned chain contains samples of the variables in the model.
var_1 = mean(chn[:var_1]) # Taking the mean of a variable named var_1.
Note that the key (:var_1
) should be a symbol. For example, to fetch x[1]
, one would need to do chn[Symbol(:x[1])
.
Turing does not have a declarative form. More generally, the order in which you place the lines of a @model
macro matters. For example, the following example works:
# Define a simple Normal model with unknown mean and variance.
@model model_function(y) = begin
s ~ Poisson(1)
y ~ Normal(s, 1)
return y
end
sample(model_function(10), SMC(100))
But if we switch the s ~ Poisson(1)
and y ~ Normal(s, 1)
lines, the model will no longer sample correctly:
# Define a simple Normal model with unknown mean and variance.
@model model_function(y) = begin
y ~ Normal(s, 1)
s ~ Poisson(1)
return y
end
sample(model_function(10), SMC(100))
Sampling from the Prior
Turing allows you to sample from a declared model’s prior by calling the model without specifying inputs or a sampler. In the below example, we specify a gdemo
model which accepts two inputs, x
and y
.
@model gdemo(x, y) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s))
y ~ Normal(m, sqrt(s))
return x, y
end
Assign the function without inputs to a variable, and Turing will produce a sample from the prior distribution.
g = gdemo()
Output:
(0.685690547873451, -1.1972706455914328)
Generative Models
Using Missing
The simplest way to treat a model as generative is to pass in an Array{Missing}
value for the parameter(s) you wish to generate. Turing v0.6.7 supports the following syntax:
@model gdemo(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0, sqrt(s))
for i in eachindex(x)
x[i] ~ Normal(m, sqrt(s))
end
end
# Treat x as a vector of missing values.
model = gdemo(fill(missing, 2))
c = sample(model, HMC(500, 0.01, 5))
The above case tells the model compiler the dimensions of the values it needs to generate. The generated values for x
can be extracted from the Chains
object using c[:x]
.
Using Argument Defaults
Turing models can also be treated as generative by providing default values in the model declaration, and then calling that model without arguments.
Suppose we wish to generate data according to the model
Each can be generated by Turing. In the model below, if x
is not provided when the function is called, x
will default to Vector{Real}(undef, 10)
, a 10-element array of Real
values. The sampler will then treat x
as a parameter and generate those quantities.
using Turing
# Declare a model with a default value.
@model generative(x = Vector{Real}(undef, 10)) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
for i in 1:length(x)
x[i] ~ Normal(m, sqrt(s))
end
return s, m
end
This model can be called in a traditional fashion, with an argument vector of any size:
# The values 1.5 and 2.0 will be observed by the sampler.
m = generative([1.5,2.0])
chain = sample(m, HMC(1000, 0.01, 5))
We can generate observations by providing no arguments in the sample
call.
# This call will generate a vector of 10 values
# every sampler iteration.
generated = sample(generative(), HMC(1000, 0.01, 5))
The generated quantities can then be accessed by pulling them out of the chain.
xs = generated[:x]
xs[15]
Output:
15-element Array{Any,1}:
2.5087545935216204
1.9705471514656179
1.5909436324555706
0.657277570452348
1.7559808833945953
⋮
1.393870026816734
0.517867487068248
1.198359051797093
1.7438714106828728
What to Use as a Default Value
Currently, the actual value of the default argument does not matter. Only the dimensions and type of a non-atomic value are relevant. Turing uses default values to pre-allocate vectors when they are treated as parameters, because if the value is not provided, the model will not know the size or type of a vector. Consider the following model:
@model generator(x) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
for i in 1:length(x)
x[i] ~ Normal(m, sqrt(s))
end
return s, m
end
If we are trying to generate random random values from the generator
model and we call sample(generator(), HMC(1000, 0.01, 5))
, we will receive an error. This is because there is no way to determine length(x)
, whether x
is a vector, and the type of the values in x
.
A sensible default value might be:
@model generator(x = zeros(10)) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
for i in 1:length(x)
x[i] ~ Normal(m, sqrt(s))
end
return s, m
end
In this case, the model compiler can now determine that x
is a Vector{Float64,1}
of length 10, and the model will work as intended. It doesn’t matter what the values in the vector are — at current, x
will be treated as a parameter if it assumes its default value, i.e. no value was provided in the function call for that variable.
The element type of the vector (or matrix) should match the type of the random variable, <: Integer
for discrete random variables and <: AbstractFloat
for continuous random variables. Moreover, if the continuous random variable is to be sampled using a Hamiltonian sampler, the vector’s element type needs to be Real
to enable auto-differentiation through the model which uses special number types that are sub-types of Real
. Finally, when using a particle sampler, a TArray
should be used.
Beyond the Basics
Compositional Sampling Using Gibbs
Turing.jl provides a Gibbs interface to combine different samplers. For example, one can combine an HMC
sampler with a PG
sampler to run inference for different parameters in a single model as below.
@model simple_choice(xs) = begin
p ~ Beta(2, 2)
z ~ Bernoulli(p)
for i in 1:length(xs)
if z == 1
xs[i] ~ Normal(0, 1)
else
xs[i] ~ Normal(2, 1)
end
end
end
simple_choice_f = simple_choice([1.5, 2.0, 0.3])
chn = sample(simple_choice_f, Gibbs(1000, HMC(1,0.2,3,:p), PG(20,1,:z)))
The Gibbs
sampler can be used to specify unique automatic differentation backends for different variable spaces. Please see the Automatic Differentiation article for more.
For more details of compositional sampling in Turing.jl, please check the corresponding paper.
Working with MCMCChain.jl
Turing.jl wraps its samples using MCMCChain.Chain
so that all the functions working for MCMCChain.Chain
can be re-used in Turing.jl. Two typical functions are MCMCChain.describe
and MCMCChain.plot
, which can be used as follows for an obtained chain chn
. For more information on MCMCChain
, please see the GitHub repository.
using MCMCChain: describe, plot
describe(chn) # Lists statistics of the samples.
plot(chn) # Plots statistics of the samples.
There are numerous functions in addition to describe
and plot
in the MCMCChain
package, such as those used in convergence diagnostics. For more information on the package, please see the GitHub repository.
Working with Libtask.jl
The Libtask.jl library provides write-on-copy data structures that are safe for use in Turing’s particle-based samplers. One data structure in particular is often required for use – the TArray
. The following sampler types require the use of a TArray
to store distributions:
IPMCMC
IS
PG
PMMH
SMC
If you do not use a TArray
to store arrays of distributions when using a particle-based sampler, you may experience errors.
Here is an example of how the TArray
(using a TArray
constructor function called tzeros
) can be applied in this way:
# Turing model definition.
@model BayesHmm(y) = begin
# Declare a TArray with a length of N.
s = tzeros(Int, N)
m = Vector{Real}(undef, K)
T = Vector{Vector{Real}}(undef, K)
for i = 1:K
T[i] ~ Dirichlet(ones(K)/K)
m[i] ~ Normal(i, 0.01)
end
# Draw from a distribution for each element in s.
s[1] ~ Categorical(K)
for i = 2:N
s[i] ~ Categorical(vec(T[s[i-1]]))
y[i] ~ Normal(m[s[i]], 0.1)
end
return (s, m)
end;
Changing Default Settings
Some of Turing.jl’s default settings can be changed for better usage.
AD Chunk Size
ForwardDiff (Turing’s default AD backend) uses forward-mode chunk-wise AD. The chunk size can be manually set by setchunksize(new_chunk_size)
; alternatively, use an auto-tuning helper function auto_tune_chunk_size!(mf::Function, rep_num=10)
, which will profile various chunk sizes. Here mf
is the model function, e.g. gdemo(1.5, 2)
, and rep_num
is the number of repetitions during profiling.
AD Backend
Since #428, Turing.jl supports Flux.Tracker
as backend for reverse mode autodiff. To switch between ForwardDiff.jl
and Flux.Tracker
, one can call function setadbackend(backend_sym)
, where backend_sym
can be :forward_diff
or :reverse_diff
.
For more information on Turing’s automatic differentiation backend, please see the Automatic Differentiation article.
Progress Meter
Turing.jl uses ProgressMeter.jl to show the progress of sampling, which may lead to slow down of inference or even cause bugs in some IDEs due to I/O. This can be turned on or off by turnprogress(true)
and turnprogress(false)
, of which the former is set as default.