Guide

Basics

Introduction

A probabilistic program is Julia code wrapped in a @model macro. It can use arbitrary Julia code, but to ensure correctness of inference it should not have external effects or modify global state. Stack-allocated variables are safe, but mutable heap-allocated objects may lead to subtle bugs when using task copying. To help avoid those we provide a Turing-safe datatype TArray that can be used to create mutable arrays in Turing programs.

To specify distributions of random variables, Turing programs should use the ~ notation:

x ~ distr where x is a symbol and distr is a distribution. If x is undefined in the model function, inside the probabilistic program, this puts a random variable named x, distributed according to distr, in the current scope. distr can be a value of any type that implements rand(distr), which samples a value from the distribution distr. If x is defined, this is used for conditioning in a style similar to Anglican (another PPL). In this case, x is an observed value, assumed to have been drawn from the distribution distr. The likelihood is computed using logpdf(distr,y). The observe statements should be arranged so that every possible run traverses all of them in exactly the same order. This is equivalent to demanding that they are not placed inside stochastic control flow.

Available inference methods include Importance Sampling (IS), Sequential Monte Carlo (SMC), Particle Gibbs (PG), Hamiltonian Monte Carlo (HMC), Hamiltonian Monte Carlo with Dual Averaging (HMCDA) and The No-U-Turn Sampler (NUTS).

Simple Gaussian Demo

Below is a simple Gaussian demo illustrate the basic usage of Turing.jl.

# Import packages.
using Turing
using StatPlots

# Define a simple Normal model with unknown mean and variance.
@model gdemo(x, y) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s))
  y ~ Normal(m, sqrt(s))
  return s, m
end

Note: As a sanity check, the expectation of s is 49/24 (2.04166666…) and the expectation of m is 7/6 (1.16666666…).

We can perform inference by using the sample function, the first argument of which is our probabalistic program and the second of which is a sampler. More information on each sampler is located in the API.

#  Run sampler, collect results.
c1 = sample(gdemo(1.5, 2), SMC(1000))
c2 = sample(gdemo(1.5, 2), PG(10,1000))
c3 = sample(gdemo(1.5, 2), HMC(1000, 0.1, 5))
c4 = sample(gdemo(1.5, 2), Gibbs(1000, PG(10, 2, :m), HMC(2, 0.1, 5, :s)))
c5 = sample(gdemo(1.5, 2), HMCDA(1000, 0.15, 0.65))
c6 = sample(gdemo(1.5, 2), NUTS(1000,  0.65))

The MCMCChain module (which is re-exported by Turing) provides plotting tools for the Chain objects returned by a sample function. See the MCMCChain repository for more information on the suite of tools available for diagnosing MCMC chains.

# Summarise results
describe(c3)

# Plot results
plot(c3)
savefig("gdemo-plot.png")

The arguments for each sampler are:

  • SMC: number of particles.
  • PG: number of particles, number of iterations.
  • HMC: number of samples, leapfrog step size, leapfrog step numbers.
  • Gibbs: number of samples, component sampler 1, component sampler 2, …
  • HMCDA: number of samples, total leapfrog length, target accept ratio.
  • NUTS: number of samples, target accept ratio.

For detailed information on the samplers, please review Turing.jl’s API documentation.

Modelling Syntax Explained

Using this syntax, a probabilistic model is defined in Turing. The model function generated by Turing can then be used to condition the model onto data. Subsequently, the sample function can be used to generate samples from the posterior distribution.

In the following example, the defined model is conditioned to the date (arg1 = 1, arg2 = 2) by passing (1, 2) to the model function.

@model model_name(arg_1, arg_2) = begin
  ...
end

The conditioned model can then be passed onto the sample function to run posterior inference.

model_func = model_name(1, 2)
chn = sample(model_func, HMC(..)) # Perform inference by sampling using HMC.

The returned chain contains samples of the variables in the model.

var_1 = mean(chn[:var_1]) # Taking the mean of a variable named var_1.

Note that the key (:var_1) should be a symbol. For example, to fetch x[1], one would need to do chn[Symbol(:x[1]).

Turing does not have a declarative form. More generally, the order in which you place the lines of a @model macro matters. For example, the following example works:

# Define a simple Normal model with unknown mean and variance.
@model model_function(y) = begin
  s ~ Poisson(1)
  y ~ Normal(s, 1)
  return y
end

sample(model_function(10), SMC(100))

But if we switch the s ~ Poisson(1) and y ~ Normal(s, 1) lines, the model will no longer sample correctly:

# Define a simple Normal model with unknown mean and variance.
@model model_function(y) = begin
  y ~ Normal(s, 1)
  s ~ Poisson(1)
  return y
end

sample(model_function(10), SMC(100))

Sampling from the Prior

Turing allows you to sample from a declared model’s prior by calling the model without specifying inputs or a sampler. In the below example, we specify a gdemo model which accepts two inputs, x and y.

@model gdemo(x, y) = begin
  s ~ InverseGamma(2,3)
  m ~ Normal(0,sqrt(s))
  x ~ Normal(m, sqrt(s))
  y ~ Normal(m, sqrt(s))
  return x, y
end

Assign the function without inputs to a variable, and Turing will produce a sample from the prior distribution.

g = gdemo()
g
# Output: (0.685690547873451, -1.1972706455914328)

Beyond the Basics

Compositional Sampling Using Gibbs

Turing.jl provides a Gibbs interface to combine different samplers. For example, one can combine an HMC sampler with a PG sampler to run inference for different parameters in a single model as below.

@model simple_choice(xs) = begin
  p ~ Beta(2, 2)
  z ~ Categorical(p)
  for x = xs
    if z == 1
      x ~ Normal(0, 1)
    else
      x ~ Normal(2, 1)
    end
  end
end

simple_choice_f = simple_choice([1.5, 2.0, 0.3])

chn = sample(simple_choice_f, Gibbs(1000, HMC(1,0.2,3,:p), PG(20,1,:z))

For details of compositional sampling in Turing.jl, please check the corresponding paper.

Working with MCMCChain.jl

Turing.jl wraps its samples using MCMCChain.Chain so that all the functions working for MCMCChain.Chain can be re-used in Turing.jl. Two typical functions are MCMCChain.describe and MCMCChain.plot, which can be used as follows for an obtained chain chn. For more information on MCMCChain, please see the GitHub repository.

using MCMCChain: describe, plot

describe(chn) # Lists statistics of the samples.
plot(chn) # Plots statistics of the samples.

There are numerous functions in addition to describe and plot in the MCMCChain package, such as those used in convergence diagnostics. For more information on the package, please see the GitHub repository.

Working with Libtask.jl

The Libtask.jl library provides write-on-copy data structures that are safe for use in Turing’s particle-based samplers. One data structure in particular is often required for use – the TArray. The following sampler types require the use of a TArray to store distributions:

  • IPMCMC
  • IS
  • PG
  • PMMH
  • SMC

If you do not use a TArray to store arrays of distributions when using a particle-based sampler, you may experience errors.

Here is an example of how the TArray (using a TArray constructor function called tzeros) can be applied in this way:

# Turing model definition.
@model BayesHmm(y) = begin
    # Declare a TArray with a length of N.
    s = tzeros(Int, N)
    m = Vector{Real}(undef, K)
    T = Vector{Vector{Real}}(undef, K)
    for i = 1:K
        T[i] ~ Dirichlet(ones(K)/K)
        m[i] ~ Normal(i, 0.01)
    end

    # Draw from a distribution for each element in s.
    s[1] ~ Categorical(K)
    for i = 2:N
        s[i] ~ Categorical(vec(T[s[i-1]]))
        y[i] ~ Normal(m[s[i]], 0.1)
    end
    return(s, m)
end;

Changing Default Settings

Some of Turing.jl’s default settings can be changed for better usage.

AD Chunk Size

Turing.jl uses ForwardDiff.jl for automatic differentiation, which uses forward-mode chunk-wise AD. The chunk size can be manually set by setchunksize(new_chunk_size); alternatively, use an auto-tuning helper function auto_tune_chunk_size!(mf::Function, rep_num=10), which will profile various chunk sizes. Here mf is the model function, e.g. gdemo(1.5, 2), and rep_num is the number of repetitions during profiling.

AD Backend

Since #428, Turing.jl supports Flux.Tracker as backend for reverse mode autodiff. To switch between ForwardDiff.jl and Flux.Tracker, one can call function setadbackend(backend_sym), where backend_sym can be :forward_diff or :reverse_diff.

Progress Meter

Turing.jl uses ProgressMeter.jl to show the progress of sampling, which may lead to slow down of inference or even cause bugs in some IDEs due to I/O. This can be turned on or off by turnprogress(true) and turnprogress(false), of which the former is set as default.