Belief Propagation

一个图模型包含 $N$ 个随机变量 $\underline{x}=(x_1,\dots,x_N)$ ，从有限字母表 $\mathcal{X}$ 中取值. 需要解决一些问题，例如：

给定一些条件，求另一些变量的边缘分布；
一些后验估计相关

但是我们不会去直接计算联合分布 $p(\underline{x})$ ，因为它的状态空间是指数级别 $|\mathcal{X}|^N$ 个取值. BP 将这个问题转化为局部计算，利用图的稀疏性/特殊结构来避免指数级别的计算.

Example 1: Ising chain

Ferromagnetic Ising model has $\underline{\sigma}=(\sigma_1,\dots, \sigma_N), \sigma_i\in\{1,-1\}$ with joint distribution

\mu_\beta(\underline{\sigma}) = \frac{1}{Z} \exp\left(\beta \sum_{i=1}^{N-1} \sigma_i \sigma_{i+1}+\beta B\sum_{i=1}^N\sigma_i\right)

where $Z$ is the partition function.

我们计算 $\sigma_j$ 的 marginal distribution, 这里用 $\propto$ 来省略归一化因子:

\mu(\sigma_j)\propto\sum_{\sigma_1,\dots,\sigma_{j-1},\sigma_{j+1},\dots,\sigma_{N}}\exp\left(\beta \sum_{i=1}^{N-1} \sigma_i \sigma_{i+1}+\beta B\sum_{i=1}^N\sigma_i\right) \\=\sum_{\sigma_1,\dots,\sigma_{j-1},\sigma_{j+1},\dots,\sigma_{N}}\exp\left(\beta \sum_{i=1}^{j-1} \sigma_i \sigma_{i+1}+\beta B\sum_{i=1}^{j-1}\sigma_i\right)\exp\left(\beta\sum_{i=j}^{N-1} \sigma_i \sigma_{i+1}+\beta B\sum_{i=j+1}^{N}\sigma_i\right)\exp\left(\beta B\sigma_{j}\right)

分配拆开即可，是两个和的乘积. 定义 "message":

This decomposition is interesting because the various messages can be computed iteratively.

而这两个部分都是 local 信息，左边的求和和右边无关. 例如 $\hat{\nu}_{\to1}(\sigma_1)$ 和 $\hat{\nu}_{\leftarrow n}(\sigma_n)$ 就是均匀二项分布. 这样就可以递推求出各个分布了.

\hat{\nu}_{\to i}(\sigma_i) \propto \sum_{\sigma_{i-1}} \hat{\nu}_{\to(i-1)}(\sigma_{i-1}) \exp\!\left( \beta \, \sigma_{i-1}\sigma_i +\beta B \sigma_{i-1}\right)

\hat{\nu}_{\leftarrow i}(\sigma_i) \propto \sum_{\sigma_{i+1}} \hat{\nu}_{\leftarrow(i+1)}(\sigma_{i+1}) \exp\!\left( \beta \, \sigma_{i}\sigma_{i+1} +\beta B \sigma_{i+1}\right)

Example 2. a tree-parity-check code

We start from a concrete example.

左图是这个方程组的 factor graph.

已知纠错矩阵，将一个 $x$ 通过 BSC( $p$ ) 信道传输（每一 bit 以 $p$ 的概率翻转），接收方得到 $y$ . 考虑 distribution conditioned on $y$ ，令 $Q(i|i)=1-p,Q(i|j)=p,\forall\;i\neq j$ .

我们可以逐 bit 分析，需要在给定 $y$ 的情况下计算每个 $x_i$ 的 marginal distribution. 假设 $y=(1000010)$ .

\mu(x_1)\propto\sum_{x_0x_2,\dots,x_6} \mathbb{I}(x_{0} \oplus x_{1} \oplus x_{2} = 0)\, \mathbb{I}(x_{0} \oplus x_{3} \oplus x_{4} = 0)\, \mathbb{I}(x_{0} \oplus x_{5} \oplus x_{6} = 0)\, \prod_{i=0}^{6} Q(y_i | x_i) \,\\ =\sum_{x_0,x_2}\mathbb{I}(x_{0} \oplus x_{1} \oplus x_{2} = 0)Q(y_0|x_0)Q(y_2|x_2)\\\times \sum_{x_3,x_4}\mathbb{I}(x_{0} \oplus x_{3} \oplus x_{4} = 0)Q(y_3|x_3)Q(y_4|x_4)\\\times \sum_{x_5,x_6}\mathbb{I}(x_{0} \oplus x_{5} \oplus x_{6} = 0)Q(y_5|x_5)Q(y_6|x_6)\\\times Q(y_1 | x_1) \, \\ = Q(y_1|x_1) \sum_{x_0,x_2}\mathbb{I}(x_{0} \oplus x_{1} \oplus x_{2} = 0)Q(y_0|x_0)Q(y_2|x_2)\hat{\nu}_{b\to0}(x_0)\hat{\nu}_{c\to0}(x_0)

\mu(x_0)\propto\sum_{x_1,\dots,x_6} \mathbb{I}(x_{0} \oplus x_{1} \oplus x_{2} = 0)\, \mathbb{I}(x_{0} \oplus x_{3} \oplus x_{4} = 0)\, \mathbb{I}(x_{0} \oplus x_{5} \oplus x_{6} = 0)\, \prod_{i=0}^{6} Q(y_i | x_i) \, \\ =\sum_{x_1,x_2}\mathbb{I}(x_{0} \oplus x_{1} \oplus x_{2} = 0)Q(y_1|x_1)Q(y_2|x_2)\\\times \sum_{x_3,x_4}\mathbb{I}(x_{0} \oplus x_{3} \oplus x_{4} = 0)Q(y_3|x_3)Q(y_4|x_4)\\\times \sum_{x_5,x_6}\mathbb{I}(x_{0} \oplus x_{5} \oplus x_{6} = 0)Q(y_5|x_5)Q(y_6|x_6)\\\times Q(y_0 | x_0) \,

乘积的第一项，就是指 $x_0$ 在 $\{0,a,1,2\}$ 子图中的 marginal distribution, 它和其他节点无关. 可以引入 $\hat{\nu}_{a\to0}(x_0)\propto\sum_{x_1,x_2}\mathbb{I}(x_{0} \oplus x_{1} \oplus x_{2} = 0)Q(y_1|x_1)Q(y_2|x_2)$ 等等，得到

\mu(x_1)\propto Q(y_1|x_1)\hat{\nu}_{a\to1}(x_1)

\mu(x_0)\propto Q(y_0|x_0)\hat{\nu}_{a\to0}(x_0)\hat{\nu}_{b\to0}(x_0)\hat{\nu}_{c\to0}(x_0)

Belief Propagation on Trees

In this section,

\mu(\underline{x})=\frac{1}{Z}\psi_a(\underline{x}_{\partial a})

where

\psi_a

has input in

\{x_i:i\in\partial a\}

, i.e. the neighborhood of

a

. Similar sets are defined for notation

\partial i

We have BP rules:

\nu_{j \to a}(x_j) \;\cong\; \prod_{b \in \partial j \setminus a} \hat{\nu}_{b \to j}(x_j)\\ \hat{\nu}_{a \to j}(x_j) \;\cong\; \sum_{\underline{x}_{\partial a \setminus j}} \psi_a(\underline{x}_{\partial a}) \prod_{k \in \partial a \setminus j} \nu_{k \to a}(x_k).

Simple computation is able to show that the example of parity check code is consistent with the equations above.

If $j$ has only $1$ neighbor $a$ , then $\nu_{j \to a}(x_j)\propto1$ , thus uniform.

Let us verify $\hat{\nu}_{a\to0}(x_0)\propto\sum_{x_1,x_2}\mathbb{I}(x_{0} \oplus x_{1} \oplus x_{2} = 0)Q(y_1|x_1)Q(y_2|x_2)$ :

$\text{LHS}=\sum_{x_1,x_2}\mathbb{I}(x_{0} \oplus x_{1} \oplus x_{2} = 0)Q(y_1|x_1)Q(y_2|x_2)\prod_{k=1,2}\nu_{k\to a}(x_k)$

A graph: and a clarifying lecnote:

实际上，以上的 BP 应该按照迭代的方式理解.

这个有点像 Bellmann-ford, 明明我们要对整体进行分析（比如计算边缘分布）但是我们可以对局部（也就是每一条边）进行迭代得到结果.

We have converging theorem:

Correlations and Energy

现在我们更进一步，想要计算某两个变量的联合边缘分布 $\mu_{ij}(x_i,x_j)$ . 因为我们可以计算 $\mu(x_i)$ , 只需计算 $\mathbb{P}_{\underline{x}\sim\mu}(\underline{x}_j=x_j|\underline{x}_i=x_i)$ 即可. (注意这里稍微有点符号混用，你可以将 $\underline{x}$ 看成一个随机变量). 可以定义

\mu(\underline{x}|x_i=b)\propto\prod_{a=1}^M\psi(\underline{x}_{\partial a})\mathbb{I}(x_i=b).

其实相当于加一条约束 $x_i=b$ , 在 factor graph 中的体现为 $i$ 号节点挂出去了一个方块节点.

记 $F_R$ 是某个由方块节点构成的集合（也就是由约束节点构成）， $V_R=\partial F_R$ 是和 $F_R$ 相连的变量节点，其中 $R$ 代表诱导子图. 令 $\underline{x}_R$ 代表 $R$ 中变量构成的联合向量. 不妨设 $R$ 是连通图.

根据 message 的物理意义，可以计算 $\underline{x}_R$ 的边缘联合分布，它由内部因子 $\phi_a(a\in F_R)$ 和外部 message (相当于对子树代表的变量积分) 构成:

\mu(\underline{x}_R)\propto\prod_{a\in F_R}\psi_a(\underline{x}_{\partial a})\prod_{a\in\partial R}\hat{\nu}_{a\to i(a)}(x_{i(a)})

其中 $i(a)$ 表示 $a$ 在 $R$ 中的唯一邻居（如果有两个邻居就总会成环，可以对连通图 $R$ 证明这一点）.

Internal Energy

In physics problems, the compatibility functions $\psi_a$ take the form $\psi_a(\underline{x}_{\partial a})=e^{-\beta E_a(\underline{x}_{\partial a})}$ .