Hidden Markov Models (part II): forward-backward algorithm for marginal conditional probability of the states

(in the same series HMM (part I): recurrence equations for filtering and prediction)

Consider a Hidden Markov Model (HMM) with hidden states x_t (for t \in {1, 2, \cdots, T}), initial probability p(x_1), observed states y_t, transition probability p(x_t|x_{t-1}) and observation model p(y_t|x_t). This model can be factorized as
p(x_{1:T},y_{1:T}) = p(y_1|x_1)p(x_1)\prod_{t=2}^{t=T}p(y_t|x_t)p(x_t|x_{t-1}) . We will use the notation X=x_{1:T} to represent the set X=\{x_1,x_2,\cdots,x_T\}.
In this post we will present the details of the method to find the smoothing distribution p(x_t|y_{1:T}) of a HMM, given a set of observations y_{1:T}:
Our starting point is the marginal probability p(x_t|y_{1:T}) of x_t given all the observations y_{1:T}.

\begin{aligned} p(x_t|y_{1:T}) &= \frac{p(x_t,y_{1:T})}{p(y_{1:T})} \\ &= \frac{p(x_t,y_{1:t},y_{(t+1):T})}{p(y_{1:T})}\\ &= \underbrace{p(y_{(t+1):T}|x_t)}_{\beta_t(x_t)}\underbrace{p(x_t,y_{1:t})}_{\alpha_t(x_t)}\frac{1}{p(y_{1:T})} \\ &= \frac{\alpha_t(x_t) \beta_t(x_t)}{p(y_{1:T})} \end{aligned}

This marginal can be computed by using dynamic programming to calculate the values of \beta_t(x_t) and \alpha_t(x_t). Starting with \alpha_t(x_t) we have
\begin{aligned} \alpha_t(x_t) &= p(x_t,y_{1:t}) \\ &= \sum_{x_{t-1}} p(x_t,x_{t-1},y_{1:t}) \\ &= \sum_{x_{t-1}} p(x_t,x_{t-1},y_{1:(t-1)},y_t) \\ &= \sum_{x_{t-1}} p(y_t|x_t,x_{t-1},y_{1:(t-1)})p(x_t|x_{t-1},y_{1:(t-1)})\underbrace{p(x_{t-1},y_{1:(t-1)})}_{\alpha_{t-1}(x_{t-1})} \\ \Rightarrow \underbrace{\alpha_t(x_t)}_{\text{\texttt{alpha[k,t]}, k=}x_t} & = \underbrace{p(y_t|x_t)}_{\text{\texttt{fyx}}} \sum_{x_{t-1}} \underbrace{p(x_t|x_{t-1})}_{\text{\texttt{fxx}}} \underbrace{\alpha_{t-1}(x_{t-1})}_{\text{\texttt{alpha[ks,t-1]}, ks=}x_{t-1}} \end{aligned}

For \beta_t(x_t) we have:
\begin{aligned} \beta_t(x_t) &= p(y_{(t+1):T}|x_t) \\ &= \sum_{x_{t+1}} \frac{p(y_{t+1},y_{(t+2):T},x_t,x_{t+1})} {p(x_t)} \\ &= \sum_{x_{t+1}} \frac{p(y_{(t+2):T}|y_{t+1},x_t,x_{t+1})p(y_{t+1}|x_{t+1},x_t)p(x_{t+1}|x_t)p(x_t)} {p(x_t)} \\ &= \sum_{x_{t+1}} \frac{p(y_{(t+2):T}|x_{t+1})p(y_{t+1}|x_{t+1})p(x_{t+1}|x_t){p(x_t)}} {p(x_t)} \\ &= \sum_{x_{t+1}} \underbrace{p(y_{(t+2):T}|x_{t+1})}_{\beta_{t+1}(x_{t+1})} p(y_{t+1}|x_{t+1})p(x_{t+1}|x_t) \\ \Rightarrow \underbrace{\beta_t(x_t)}_{\text{\texttt{beta[k,t]}, k=}x_t} &= \sum_{x_{t+1}} \underbrace{\beta_{t+1}(x_{t+1})}_{\text{\texttt{beta[ks,t+1]}, ks=}x_{t+1}} \underbrace{p(y_{t+1}|x_{t+1})}_{\text{\texttt{fyx}}} \underbrace{p(x_{t+1}|x_t)}_{\text{\texttt{fxx}}} \end{aligned}

Considering a chain of length T and K number of states, both this recursive calculations can be done in O(TK^2) by storing the intermediate results in two vectors \text{\texttt{alpha}} and \text{\texttt{beta}} using O(TK) space (where \text{\texttt{fxx}} and \text{\texttt{fyx}} are function calculating p(x_{t+1}|x_t) and p(y_t|x_t) respectively). The marginal probability of x_t given the observations y_{1:T} can be calculated in O(TK).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s