# Guide

## Basics

### Introduction

A probabilistic program is Julia code wrapped in a `@model`

macro. It can use arbitrary Julia code, but to ensure correctness of inference it should not have external effects or modify global state. Stack-allocated variables are safe, but mutable heap-allocated objects may lead to subtle bugs when using task copying. To help avoid those we provide a Turing-safe datatype `TArray`

that can be used to create mutable arrays in Turing programs.

To specify distributions of random variables, Turing programs should use the `~`

notation:

`x ~ distr`

where `x`

is a symbol and `distr`

is a distribution. If `x`

is undefined in the model function, inside the probabilistic program, this puts a random variable named `x`

, distributed according to `distr`

, in the current scope. `distr`

can be a value of any type that implements `rand(distr)`

, which samples a value from the distribution `distr`

. If `x`

is defined, this is used for conditioning in a style similar to Anglican (another PPL). In this case, `x`

is an observed value, assumed to have been drawn from the distribution `distr`

. The likelihood is computed using `logpdf(distr,y)`

. The observe statements should be arranged so that every possible run traverses all of them in exactly the same order. This is equivalent to demanding that they are not placed inside stochastic control flow.

Available inference methods include Importance Sampling (IS), Sequential Monte Carlo (SMC), Particle Gibbs (PG), Hamiltonian Monte Carlo (HMC), Hamiltonian Monte Carlo with Dual Averaging (HMCDA) and The No-U-Turn Sampler (NUTS).

### Simple Gaussian Demo

Below is a simple Gaussian demo illustrate the basic usage of Turing.jl.

```
# Import packages.
using Turing
using StatPlots
# Define a simple Normal model with unknown mean and variance.
@model gdemo(x, y) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s))
y ~ Normal(m, sqrt(s))
return s, m
end
```

Note: As a sanity check, the expectation of `s`

is 49/24 (2.04166666…) and the expectation of `m`

is 7/6 (1.16666666…).

We can perform inference by using the `sample`

function, the first argument of which is our probabalistic program and the second of which is a sampler. More information on each sampler is located in the API.

```
# Run sampler, collect results.
c1 = sample(gdemo(1.5, 2), SMC(1000))
c2 = sample(gdemo(1.5, 2), PG(10,1000))
c3 = sample(gdemo(1.5, 2), HMC(1000, 0.1, 5))
c4 = sample(gdemo(1.5, 2), Gibbs(1000, PG(10, 2, :m), HMC(2, 0.1, 5, :s)))
c5 = sample(gdemo(1.5, 2), HMCDA(1000, 0.15, 0.65))
c6 = sample(gdemo(1.5, 2), NUTS(1000, 0.65))
```

The `MCMCChain`

module (which is re-exported by Turing) provides plotting tools for the `Chain`

objects returned by a `sample`

function. See the MCMCChain repository for more information on the suite of tools available for diagnosing MCMC chains.

```
# Summarise results
describe(c3)
# Plot results
plot(c3)
savefig("gdemo-plot.png")
```

The arguments for each sampler are:

- SMC: number of particles.
- PG: number of particles, number of iterations.
- HMC: number of samples, leapfrog step size, leapfrog step numbers.
- Gibbs: number of samples, component sampler 1, component sampler 2, …
- HMCDA: number of samples, total leapfrog length, target accept ratio.
- NUTS: number of samples, target accept ratio.

For detailed information on the samplers, please review Turing.jl’s API documentation.

### Modelling Syntax Explained

Using this syntax, a probabilistic model is defined in Turing. The model function generated by Turing can then be used to condition the model onto data. Subsequently, the sample function can be used to generate samples from the posterior distribution.

In the following example, the defined model is conditioned to the date (arg*1 = 1, arg*2 = 2) by passing (1, 2) to the model function.

```
@model model_name(arg_1, arg_2) = begin
...
end
```

The conditioned model can then be passed onto the sample function to run posterior inference.

```
model_func = model_name(1, 2)
chn = sample(model_func, HMC(..)) # Perform inference by sampling using HMC.
```

The returned chain contains samples of the variables in the model.

```
var_1 = mean(chn[:var_1]) # Taking the mean of a variable named var_1.
```

Note that the key (`:var_1`

) should be a symbol. For example, to fetch `x[1]`

, one would need to do `chn[Symbol(:x[1])`

.

Turing does not have a declarative form. More generally, the order in which you place the lines of a `@model`

macro matters. For example, the following example works:

```
# Define a simple Normal model with unknown mean and variance.
@model model_function(y) = begin
s ~ Poisson(1)
y ~ Normal(s, 1)
return y
end
sample(model_function(10), SMC(100))
```

But if we switch the `s ~ Poisson(1)`

and `y ~ Normal(s, 1)`

lines, the model will no longer sample correctly:

```
# Define a simple Normal model with unknown mean and variance.
@model model_function(y) = begin
y ~ Normal(s, 1)
s ~ Poisson(1)
return y
end
sample(model_function(10), SMC(100))
```

### Sampling from the Prior

Turing allows you to sample from a declared model’s prior by calling the model without specifying inputs or a sampler. In the below example, we specify a `gdemo`

model which accepts two inputs, `x`

and `y`

.

```
@model gdemo(x, y) = begin
s ~ InverseGamma(2,3)
m ~ Normal(0,sqrt(s))
x ~ Normal(m, sqrt(s))
y ~ Normal(m, sqrt(s))
return x, y
end
```

Assign the function without inputs to a variable, and Turing will produce a sample from the prior distribution.

```
g = gdemo()
g
# Output: (0.685690547873451, -1.1972706455914328)
```

## Beyond the Basics

### Compositional Sampling Using Gibbs

Turing.jl provides a Gibbs interface to combine different samplers. For example, one can combine an HMC sampler with a PG sampler to run inference for different parameters in a single model as below.

```
@model simple_choice(xs) = begin
p ~ Beta(2, 2)
z ~ Categorical(p)
for x = xs
if z == 1
x ~ Normal(0, 1)
else
x ~ Normal(2, 1)
end
end
end
simple_choice_f = simple_choice([1.5, 2.0, 0.3])
chn = sample(simple_choice_f, Gibbs(1000, HMC(1,0.2,3,:p), PG(20,1,:z))
```

For details of compositional sampling in Turing.jl, please check the corresponding paper.

### Working with MCMCChain.jl

Turing.jl wraps its samples using `MCMCChain.Chain`

so that all the functions working for `MCMCChain.Chain`

can be re-used in Turing.jl. Two typical functions are `MCMCChain.describe`

and `MCMCChain.plot`

, which can be used as follows for an obtained chain `chn`

. For more information on `MCMCChain`

, please see the GitHub repository.

```
using MCMCChain: describe, plot
describe(chn) # Lists statistics of the samples.
plot(chn) # Plots statistics of the samples.
```

There are numerous functions in addition to `describe`

and `plot`

in the `MCMCChain`

package, such as those used in convergence diagnostics. For more information on the package, please see the GitHub repository.

### Working with Libtask.jl

The Libtask.jl library provides write-on-copy data structures that are safe for use in Turing’s particle-based samplers. One data structure in particular is often required for use – the `TArray`

. The following sampler types require the use of a `TArray`

to store distributions:

`IPMCMC`

`IS`

`PG`

`PMMH`

`SMC`

If you do not use a `TArray`

to store arrays of distributions when using a particle-based sampler, you may experience errors.

Here is an example of how the `TArray`

(using a `TArray`

constructor function called `tzeros`

) can be applied in this way:

```
# Turing model definition.
@model BayesHmm(y) = begin
# Declare a TArray with a length of N.
s = tzeros(Int, N)
m = Vector{Real}(undef, K)
T = Vector{Vector{Real}}(undef, K)
for i = 1:K
T[i] ~ Dirichlet(ones(K)/K)
m[i] ~ Normal(i, 0.01)
end
# Draw from a distribution for each element in s.
s[1] ~ Categorical(K)
for i = 2:N
s[i] ~ Categorical(vec(T[s[i-1]]))
y[i] ~ Normal(m[s[i]], 0.1)
end
return(s, m)
end;
```

### Changing Default Settings

Some of Turing.jl’s default settings can be changed for better usage.

#### AD Chunk Size

Turing.jl uses ForwardDiff.jl for automatic differentiation, which uses forward-mode chunk-wise AD. The chunk size can be manually set by `setchunksize(new_chunk_size)`

; alternatively, use an auto-tuning helper function `auto_tune_chunk_size!(mf::Function, rep_num=10)`

, which will profile various chunk sizes. Here `mf`

is the model function, e.g. `gdemo(1.5, 2)`

, and `rep_num`

is the number of repetitions during profiling.

#### AD Backend

Since #428, Turing.jl supports `Flux.Tracker`

as backend for reverse mode autodiff. To switch between `ForwardDiff.jl`

and `Flux.Tracker`

, one can call function `setadbackend(backend_sym)`

, where `backend_sym`

can be `:forward_diff`

or `:reverse_diff`

.

#### Progress Meter

Turing.jl uses ProgressMeter.jl to show the progress of sampling, which may lead to slow down of inference or even cause bugs in some IDEs due to I/O. This can be turned on or off by `turnprogress(true)`

and `turnprogress(false)`

, of which the former is set as default.