Lecture notes by Elias Koutsoupias from Oxford

Markov Chains

Definitions

A discrete-time finite Markov chain is a sequence of random variables X=X0,X1,X2,X=X_0,X_1,X_2,\cdots taking values in a finite set of states {1,2,,n}\{1,2,\cdots,n\} s.t. tN:\forall t\in\mathbb N:

Pr(Xt+1=jXt=i)=Qi,j,\Pr(X_{t+1}=j|X_t=i)=Q_{i,j},

where Q=(Qi,j)Q=(Q_{i,j}) is an n×nn\times n stochastic transition matrix.

XtX_t represents the state of the Markov chain at time tt and the matrix QQ encodes the state-to-state transition probabilities. The characteristic property of a Markov chain is that the distribution of XtX_t depends only on Xt1X_{t-1}. The conditional probability above is independent of the time tt.

The tt-step transition probabilities are given by the power matrix QtQ^t, i.e.,

Pr(Xs+t=jXs=i)=Qi,jt,\Pr(X_{s+t}=j|X_s=i)=Q_{i,j}^t,

s,tN\forall s,t\in\mathbb N. Thus,

Pr(Xt=j)=i=1nQi,jtPr(X0=i).\Pr(X_t=j)=\sum_{i=1}^nQ_{i,j}^t\cdot\Pr(X_0=i).

A Markov chain is completely determined by the distribution of X0X_0 and the transition matrix QQ.

A Markov chain can be represented as a weighted directed graph whose vertices are the states of the chain and in which there is an edge (i,j)(i,j) of weight Qi,jQ_{i,j} whenever Qi,j>0Q_{i,j}\gt0. We define the weight of a path to the product of the weights of the edges along the path, i.e. the probability to take the path. Then the sum of the weights of all length-kk paths ii to jj is Qi,jkQ_{i,j}^k.

The term random walk usually refers to the random process in which a particle moves on a graph, selecting the next node uniformly at random from the adjacent nodes. It is the special case of a Markov chain when for every state ii all probabilities Qi,jQ_{i,j} for jij\neq i are equal and Qi,i=0Q_{i,i}=0.

A lazy random walk is similar but Qi,i=1/2Q_{i,i}=1/2.

Randomized 2-SAT

Given a 2-CNF formula, either find a satisfying assignment or determine that no such assignment exists.

CNF aka conjunctive normal form,合取范式,形如 ()\and()\and()

2-CNF 就是 ()\and()\and()\cdots\and(),括号内里面是个布尔变量的析取。

DNF aka disjunctive normal form,析取范式,形如 ()\or()\or()

2-SAT is solvable in polynomial time (in fact linear time).

There is a polynomial time randomized algorithm for 2-CNF formulas whose analysis is based on a random walk.

Pseudo-code

Randomized 2-SAT(φ)(\varphi)

  • Pick a random assignment
  • Repeat 2n22n^2 times or until a satisfying assignment is found
    • Pick an unsatisfied clause CC
    • Pick a literal in CC uniformly at random, and flip its value
  • Return current assignment if satisfying, else return unsatisfiable

Analysis

If the φ\varphi is not satisfiable then the algorithm returns unsatisfiable. However, if φ\varphi is satisfiable then the algorithm might not find a satisfying assignment. We analyze the probability that the algorithm gives correct answer.

Consider φ\varphi is satisfiable. Let SS be a particular assignment that satisfies φ\varphi.

Then we bound the probability that algorithm finds SS. If φ\varphi is not satisfied by an assignment AA there must be an unsatisfied clause CC. Observe that one or both literals of CC must have different truth values under AA and SS. Thus if we flip a random literal in CC then the probability that we increase the agreement between AA and SS is either 1/21/2 or 11.

Writing AtA_t for the assignment after tt iterations of the main loop and XtX_t for the number of variables in which AtA_t agrees with SS, we have

Pr(Xt+1=1Xt=0)=1Pr(Xt+1=i+1Xt=i)1/2,i{1,,n1}Pr(Xt+1=i1Xt=i)1/2,i{1,,n1}Pr(Xt+1=nXt=n)=1\Pr(X_{t+1}=1|X_t=0)=1\\ \Pr(X_{t+1}=i+1|X_t=i)\ge1/2,i\in\{1,\cdots,n-1\}\\ \Pr(X_{t+1}=i-1|X_t=i)\le1/2,i\in\{1,\cdots,n-1\}\\ \Pr(X_{t+1}=n|X_t=n)=1

If we replace inequalities with equations, then XX becomes a Markov chain on the state space [n][n]. The Markov chain of this process is:

The chain describes the movement of a particle on a line: it is an example of a one-dimensional random walk with one reflecting and one absorbing barrier.

The expected time until Xt=nX_t=n is an upper bound on the expected number of steps for the 2-SAT algorithm to find SS.

Let hjh_j be the expected time for XX to first reach nn starting at jj. Then we have

hn=0h0=1+h1hj=1+hj12+hj+12,j{1,,n1}h_n=0\\ h_0=1+h_1\\ h_j=1+\frac{h_{j-1}}{2}+\frac{h_{j+1}}{2},j\in\{1,\cdots,n-1\}

This is a linear system of equations in n+1n+1 unknowns.

By some deduction, we have hj=hj+1+2j+1h_j=h_{j+1}+2j+1. Combining with hn=0h_n=0, we get that hj=n2j2h_j=n^2-j^2. Thus hjn2jh_j\le n^2\forall j.

Let TT denote the number of steps for the 2-SAT algorithm to find assignment SS starting from some arbitrary assignment. By Markov’s inequality, we have

Pr(T2n2)E[T]2n2n22n2=12\Pr(T\ge2n^2)\le\frac{\mathbb E[T]}{2n^2}\le \frac{n^2}{2n^2}=\frac{1}{2}

Theorem: If φ\varphi is satisfiable then the 2-SAT algorithm returns a satisfying assignment with probability at least 1/21/2. If φ\varphi is unsatisfiable then the 2-SAT algorithm is always correct.

The failure probability can be decreased to 2N2^{-N} by repeating the loop 2Nn22Nn^2 instead of 2n22n^2 times.

Randomized 3-SAT

Extending the randomized algorithm for 2-SAT to 3-SAT.

3-SAT is NP-complete. We can use O(2n)O(2^n) exhaustive search to solve it.

Suppose a given 3-CNF formula φ\varphi has a satisfying assignment SS. Let AtA_t be the assignment after tt steps and let XtX_t denote the number of propositional variables in which AtA_t agrees with SS. Similarly,

Pr(Xt+1=j+1Xt=j)1/3Pr(Xt+1=j1Xt=j)2/3\Pr(X_{t+1}=j+1|X_t=j)\ge1/3\\ \Pr(X_{t+1}=j-1|X_t=j)\le2/3

Then by replacing \le\ge into ==, we turn the process XX into Markov chain:

The expected time for the resulting chain to reach state nn is an upper bound for the expected time of the algorithm to find a satisfying valuation.

In this case, there is twice likelihood to move away from nn as to move towards nn.

Let hjh_j denote expected number of steps to reach nn when starting from state jj. Then we have the following system of equations

hn=0hj=2hj13+hj+13+1h0=h1+1h_n=0\\ h_j=\frac{2h_{j-1}}{3}+\frac{h_{j+1}}{3}+1\\ h_0=h_1+1

By solving this linear system, we get hj=hj+1+2j+23h_j=h_{j+1}+2^{j+2}-3 for all jj.

With hn=0h_n=0 this leads to hj=2n+22j+23(nj)h_j=2^{n+2}-2^{j+2}-3(n-j).

The above bound shows that the expected number of flips to find SS is at most O(2n)O(2^n). This bound is useless. However, we can improve the performance of this procedure by adapting random restarts into search. A random truth assignment leads to a binomial distribution over the states of the Markov chain XX centered around the state n/2n/2.

Since Markov chain has a tendency to move towards state 00, after running it for a little while we are better off restarting with a random distribution than continuing to flip literals.

After random restart, by Chernoff bounds, the current assignment differs from SS is approximately n/2n/2 positions with high probability.

There is an algorithm trying to find a satisfying assignment in N=O((4/3)nn)N=O((4/3)^n\sqrt n) phases. where each phase starts with a random restart and consists of 3n3n literal flips.

Pseudo-code

Randomized 3-SAT(φ)(\varphi)

  • Repeat NN times or until a satisfying assignment is found
    • Pick a random assignment
    • Repeat 3n3n times or until a satisfying assignment is found
      • Pick an unsatisfied clause CC
      • Pick a literal in CC uniformly at random, and flip its value
  • Return current assignment if satisfying, else return unsatisfiable

Theorem: If φ\varphi is satisfiable then the 3-SAT algorithm returns a satisfying assignment with probability at least 1/21/2. If φ\varphi is unsatisfiable then the 2-SAT algorithm is always correct.

Let qjq_j be the probability to reach position nn with 3n3n steps starting from position njn-j. We have

qjC3jj(2/3)j(1/3)2j,q_j\ge C_{3j}^j(2/3)^j(1/3)^{2j},

since this is the probability to make exactly 2j2j right moves and jj left moves in the first 3j3n3j\le3n steps.

By Stirling’s formula:

2πm(me)mm!22πm(me)m,\sqrt{2\pi m}(\frac{m}{e})^m\le m!\le2\sqrt{2\pi m}(\frac{m}{e})^m,

we have

C3jj=(3j)!j!(2j)!cj(27/4)j(c=38π).C_{3j}^j=\frac{(3j)!}{j!(2j)!}\\ \ge\frac{c}{\sqrt j}(27/4)^j\\ (c=\frac{\sqrt3}{8\sqrt\pi}).

Thus when j>0j\gt0,

qjC3jj(2/3)j(1/3)2jcj(27/4)j(2/3)j(1/3)2jcj12j.q_j\ge C_{3j}^j(2/3)^j(1/3)^{2j}\\ \ge \frac{c}{\sqrt j}(27/4)^j(2/3)^j(1/3)^{2j}\\ \ge\frac{c}{\sqrt j}\frac{1}{2^j}.

The probability of success drops exponentially with the distance jj between the initial random assignment of the phase and the satisfying truth assignment. But by Chernoff bounds, the probability that this distance is away from n/2n/2 also drops exponentially.

Having established a lower bound for qjq_j we now obtain a lower bound for qq, the probability to reach state nn with 3n3n steps in case the initial state of the random walk is determined by a random assignment. We have

qj=0nPr(walk starts at nj)qj12n+j=1nCnj(1/2)ncj12jcn(1/2)nj=0nCnj(1/2)j=cn(3/4)n.q\ge\sum_{j=0}^n\Pr(\text{walk starts at }n-j)\cdot q_j\\ \ge\frac{1}{2^n}+\sum_{j=1}^nC_n^j(1/2)^n\frac{c}{\sqrt j}\frac{1}{2^j}\\ \ge\frac{c}{\sqrt n}(1/2)^n\sum_{j=0}^nC_n^j(1/2)^j\\ =\frac{c}{n}(3/4)^n.

The number of phases to find a satisfying assignment is a geometric random variable with parameter qq, so the expected number of phases to find a satisfying assignment is 1/q1/q. Thus the algorithm finds a satisfying assignment with probability at least 1/21/2 after N=2/qN=2/q phases.

Stationary Distributions

Definition

Let XX be a Markov chain with transition matrix QQ. A probability distribution π\pi on the set of states is called stationary distribution of XX if

π=πQ.\pi=\pi Q.

Given states i,ji,j of a Markov chain, the hitting time Hi,jH_{i,j} is a random variable giving the number of steps to visit state jj starting from state ii. Formally we have

Hi,j=min{t>0:Xt=jX0=i}.H_{i,j}=\min\{t\gt0:X_t=j|X_0=i\}.

We define hi,j=E[Hi,j]h_{i,j}=\mathbb E[H_{i,j}] to be the mean hitting time.

In general hi,jh_{i,j} may be infinite. Certainly hi,j=h_{i,j}=\infty if jj is not reachable from ii in the underlying graph, or, if ii can reach a state kk from which jj is no longer reachable. We therefore restrict attention to Markov chains whose underlying graph consists of a single strongly connected component. We call such chains irreducible.

We can still have hi,j=h_{i,j}=\infty in an irreducible chain.

Consider Markov chain XX with set of states {1,2,3,}\{1,2,3,\cdots\} and transition matrix QQ, where Qi,i+1=i/(i+1)Q_{i,i+1}=i/(i+1) and Qi,1=1/(i+1)Q_{i,1}=1/(i+1). Starting at state 11, the probability not to have returned to state 11 with tt steps is

j=1tjj+1=1t+1.\prod_{j=1}^t\frac{j}{j+1}=\frac{1}{t+1}.

Thus we return to state 11 almost surely, but the expected return time is

h1,1=[H1,1]=j=1Pr(H1,1j)=j=11j+1=h_{1,1}=\mathbb[H_{1,1}]=\sum_{j=1}^\infty\Pr(H_{1,1}\ge j)\\ =\sum_{j=1}^\infty\frac{1}{j+1}=\infty

In finite irreducible Markov chain, the mean hitting time between any pair of states is always finite.

Theorem: For any pair of states i,ji,j in a finite irreducible Markov chain, hi,j<h_{i,j}\lt\infty.

Let dd be the diameter of the underlying graph, let ϵ>0\epsilon\gt0 be the smallest positive entry of the transition matrix QQ. Then for any pair of states i,ji,j, if the chain is started in ii then it visits jj within dd steps with probability at least ϵd\epsilon^d.

hi,j=E[Hi,j]=t=1Pr(Hi,jt)t=1(1ϵd)t/d<.h_{i,j}=\mathbb E[H_{i,j}]\\ =\sum_{t=1}^\infty\Pr(H_{i,j}\ge t)\le\sum_{t=1}^\infty(1-\epsilon^d)^{\lfloor t/d\rfloor}\lt\infty.

A finite irreducible Markov chain has a unique stationary distribution.

Theorem: A finite irreducible Markov chain has a unique stationary distribution π\pi where πj=1/hj,j\pi_j=1/h_{j,j} for each state jj.

Stationary distribution represents a time average. On average the chain visits state jj every hj,jh_{j,j} steps, and thus spends proportion of time πj=1/hj,j\pi_j=1/h_{j,j} in state jj. We expect that the stationary distribution is also a space average, i.e., the distribution of XtX_t converges to π\pi as tt tends to infinity. However, in general this need not be true.

Definition of period: The period of a state ii in a Markov chain is defined to be gcd{m>0:Qi,im>0}\gcd\{m\gt0:Q_{i,i}^m\gt0\}. A state is aperiodic if it has period 11. A Markov chain is aperiodic if all states are aperiodic.

A Markov chain is irreducible and aperiodic iff there exists some power of the transition matrix PP with all entries strictly positive.

Theorem: Consider a finite, irreducible, aperiodic Markov chain with stationary distribution π\pi. Then limnQn\lim_{n\rarr\infty}Q^n is the matrix

Q=[π1π2πnπ1π2πnπ1π2πn]Q^\infty=\left[ \begin{array}{cc} \pi_1&\pi_2&\cdots&\pi_n\\ \pi_1&\pi_2&\cdots&\pi_n\\ \cdots&\cdots&\cdots&\cdots\\ \pi_1&\pi_2&\cdots&\pi_n\\ \end{array} \right]

If follows that for any initial distribution xx we have xQbxQ=πxQ^b\rarr xQ^\infty=\pi as nn\rarr\infty, i.e., the chain converges to the stationary distribution from any starting point.

Random walks on undirected graphs

Let G=(V,E)G=(V,E) be a finite, undirected and connected graph. The degree of a vertex vVv\in V is the number d(v)d(v) of edges incident to vv. A random walk on GG is a Markov chain modeling a particle that moves along the edges of GG and which is equally likely to take any outgoing edge at a given vertex/

Formally a random walk on GG has set of states is VV and for each edge {u,v}E\{u,v\}\in E the probability to transition from uu to vv is 1/d(u)1/d(u).

Claim: A random walk on GG is aperiodic iff GG is not bipartite.

Theorem: A random walk on a connected graph GG has a unique stationary distribution π\pi, where

πv=d(v)2E.\pi_v=\frac{d(v)}{2E}.

所有点的度数之和等于 2E2|E|.

(πP))u=vN(u)d(v)2E1d(v)=d(u)2E=πuuV(\pi P))_u=\sum_{v\in N(u)}\frac{d(v)}{2|E|}\frac{1}{d(v)}=\frac{d(u)}{2|E|}=\pi_u\forall u\in V

Corollary: Let hu,vh_{u,v} denote the expected number of steps to go from vertex uu to vertex vv. Then for any vertex uu we have

hu,u=2Ed(u)h_{u,u}=\frac{2|E|}{d(u)}

Theorem: If u,vu,v are adjacent then hv,u<2Eh_{v,u}\lt 2|E|.

omitted.

Theorem: The cover time of a connected graph GG with nn nodes and mm edges is at most 4nm4nm.

omitted

s-t connectivity

We bound on the cover time of random walks to design an impressively simple space efficient algorithm for the fundamental problem of s-t connectivity. In this problem we are given an undirected graph with nn nodes, not necessarily connected. We are also given two nodes s,ts,t and we want to find out whether s,ts,t are connected.

Pseudo-code

s-t connectivity(G,s,t)(G,s,t)

  • start a random walk from ss
  • If tt is reached in 4n34n^3 steps then return reachable else return unreachable

No false positives in the algorithm.

By Markov’s inequality, the algorithm outputs a path from ss to tt with probability 1/21/2.

The algorithm only needs to store a constant number of vertex-addresses at any one time.

If follows that we can decide reach-ability in undirected graph in O(logn)O(\log n) space using randomization.