A probabilistic program is Julia code wrapped in a
@model macro. It can use arbitrary Julia code, but to ensure correctness of inference it should not have external effects or modify global state. Stack-allocated variables are safe, but mutable heap-allocated objects may lead to subtle bugs when using task copying. To help avoid those we provide a Turing-safe datatype
TArray that can be used to create mutable arrays in Turing programs.
To specify distributions of random variables, Turing programs should use the
x ~ distr where
x is a symbol and
distr is a distribution. If
x is undefined in the model function, inside the probabilistic program, this puts a random variable named
x, distributed according to
distr, in the current scope.
distr can be a value of any type that implements
rand(distr), which samples a value from the distribution
x is defined, this is used for conditioning in a style similar to Anglican (another PPL). In this case,
x is an observed value, assumed to have been drawn from the distribution
distr. The likelihood is computed using
logpdf(distr,y). The observe statements should be arranged so that every possible run traverses all of them in exactly the same order. This is equivalent to demanding that they are not placed inside stochastic control flow.
Available inference methods include Importance Sampling (IS), Sequential Monte Carlo (SMC), Particle Gibbs (PG), Hamiltonian Monte Carlo (HMC), Hamiltonian Monte Carlo with Dual Averaging (HMCDA) and The No-U-Turn Sampler (NUTS).
Simple Gaussian Demo
Below is a simple Gaussian demo illustrate the basic usage of Turing.jl.
# Import packages. using Turing using StatPlots # Define a simple Normal model with unknown mean and variance. @model gdemo(x, y) = begin s ~ InverseGamma(2,3) m ~ Normal(0,sqrt(s)) x ~ Normal(m, sqrt(s)) y ~ Normal(m, sqrt(s)) return s, m end
Note: As a sanity check, the expectation of
s is 49/24 (2.04166666…) and the expectation of
m is 7/6 (1.16666666…).
We can perform inference by using the
sample function, the first argument of which is our probabalistic program and the second of which is a sampler. More information on each sampler is located in the API.
# Run sampler, collect results. c1 = sample(gdemo(1.5, 2), SMC(1000)) c2 = sample(gdemo(1.5, 2), PG(10,1000)) c3 = sample(gdemo(1.5, 2), HMC(1000, 0.1, 5)) c4 = sample(gdemo(1.5, 2), Gibbs(1000, PG(10, 2, :m), HMC(2, 0.1, 5, :s))) c5 = sample(gdemo(1.5, 2), HMCDA(1000, 0.15, 0.65)) c6 = sample(gdemo(1.5, 2), NUTS(1000, 0.65))
MCMCChain module (which is re-exported by Turing) provides plotting tools for the
Chain objects returned by a
sample function. See the MCMCChain repository for more information on the suite of tools available for diagnosing MCMC chains.
# Summarise results describe(c3) # Plot results plot(c3) savefig("gdemo-plot.png")
The arguments for each sampler are:
- SMC: number of particles.
- PG: number of particles, number of iterations.
- HMC: number of samples, leapfrog step size, leapfrog step numbers.
- Gibbs: number of samples, component sampler 1, component sampler 2, …
- HMCDA: number of samples, total leapfrog length, target accept ratio.
- NUTS: number of samples, target accept ratio.
For detailed information on the samplers, please review Turing.jl’s API documentation.
Modelling Syntax Explained
Using this syntax, a probabilistic model is defined in Turing. The model function generated by Turing can then be used to condition the model onto data. Subsequently, the sample function can be used to generate samples from the posterior distribution.
In the following example, the defined model is conditioned to the date (arg1 = 1, arg2 = 2) by passing (1, 2) to the model function.
@model model_name(arg_1, arg_2) = begin ... end
The conditioned model can then be passed onto the sample function to run posterior inference.
model_func = model_name(1, 2) chn = sample(model_func, HMC(..)) # Perform inference by sampling using HMC.
The returned chain contains samples of the variables in the model.
var_1 = mean(chn[:var_1]) # Taking the mean of a variable named var_1.
Note that the key (
:var_1) should be a symbol. For example, to fetch
x, one would need to do
Sampling from the Prior
Turing allows you to sample from a declared model’s prior by calling the model without specifying inputs or a sampler. In the below example, we specify a
gdemo model which accepts two inputs,
@model gdemo(x, y) = begin s ~ InverseGamma(2,3) m ~ Normal(0,sqrt(s)) x ~ Normal(m, sqrt(s)) y ~ Normal(m, sqrt(s)) return x, y end
Assign the function without inputs to a variable, and Turing will produce a sample from the prior distribution.
g = gdemo() g # Output: (0.685690547873451, -1.1972706455914328)
Beyond the Basics
Compositional Sampling Using Gibbs
Turing.jl provides a Gibbs interface to combine different samplers. For example, one can combine an HMC sampler with a PG sampler to run inference for different parameters in a single model as below.
@model simple_choice(xs) = begin p ~ Beta(2, 2) z ~ Categorical(p) for x = xs if z == 1 x ~ Normal(0, 1) else x ~ Normal(2, 1) end end end simple_choice_f = simple_choice([1.5, 2.0, 0.3]) chn = sample(simple_choice_f, Gibbs(1000, HMC(1,0.2,3,:p), PG(20,1,:z))
For details of compositional sampling in Turing.jl, please check the corresponding paper.
Working with MCMCChain.jl
Turing.jl wraps its samples using
MCMCChain.Chain so that all the functions working for
MCMCChain.Chain can be re-used in Turing.jl. Two typical functions are
MCMCChain.plot, which can be used as follows for an obtained chain
chn. For more information on
MCMCChain, please see the GitHub repository.
using MCMCChain: describe, plot describe(chn) # Lists statistics of the samples. plot(chn) # Plots statistics of the samples.
There are numerous functions in addition to
plot in the
MCMCChain package, such as those used in convergence diagnostics. For more information on the package, please see the GitHub repository.
Changing Default Settings
Some of Turing.jl’s default settings can be changed for better usage.
AD Chunk Size
Turing.jl uses ForwardDiff.jl for automatic differentiation, which uses forward-mode chunk-wise AD. The chunk size can be manually set by
setchunksize(new_chunk_size); alternatively, use an auto-tuning helper function
auto_tune_chunk_size!(mf::Function, rep_num=10), which will profile various chunk sizes. Here
mf is the model function, e.g.
gdemo(1.5, 2), and
rep_num is the number of repetitions during profiling.
Since #428, Turing.jl supports
Flux.Tracker as backend for reverse mode autodiff. To switch between
Flux.Tracker, one can call function
backend_sym can be
Turing.jl uses ProgressMeter.jl to show the progress of sampling, which may lead to slow down of inference or even cause bugs in some IDEs due to I/O. This can be turned on or off by
turnprogress(false), of which the former is set as default.