- When the object funtion is separable
minimizef(x)+g(z), subject to Ax+Bz≤c
tranlate into:
xk+1:=argmin xLρ(x,zk,yk)zk+1:=argmin zLρ(xk+1,z,yk)yk+1:=yk+ρ(Axk+1+Bzk+1−c)
xk is only an intermediate result. The algorithm update zk+1,yk+1 based on last zk,yk
Extend of ADMM
- Extend of ADMM: multiple blocks (note may not convergence)
minimizeN∑i=1fi(xi), subject to N∑i=1Aixi≤c
The Gauss-Seidel update:
xk+11:=argmin xLρ(x1,xk2,…,xkN,yk)…xk+1i:=argmin xLρ(xk+11,xk+12,…,xi,xki+1,…,xkN,yk)yk+1:=yk+ρ(N∑i=1Aixk+1i−c)
- Extend of ADMM: Parallel Distribution
The Jacobian update is possible with extra conditions (Ai is mutually orthogonal):
xk+1i:=argmin xLρ(xk1,xk2,…,xi,xki+1,…,xkN,yk)yk+1:=yk+ρ(N∑i=1Aixk+1i−c)
- Extend of ADMM: Addtional regularization
xk+1i:=argmin xLρ(xk+11,xk+12,…,xi,xki+1,…,xkN,yk)+α||xi−xki||22
- Separate objective function (Type I)
minimizef(x)+g(z), subject to x−z=0
- Constrained Convex optimization (Type II)
minimizef(x), subject to x∈C
where g(z) is indicator function
g(z)=1C(z)={1, if x∈C0,otherwise
- loss function + regularization
minimize l(x)+λ||x||1
minimize l(x)+λ||z||1 , subject to x−z=0
minimize ||x||1, subject to Ax−b=0
minimize||x||1+1C(z), subject to x−z=0 where C={x∈Rn|Ax=b}
Another Examples: Extremely Low Bit Neural Network: Squeeze the Last Bit Out with ADMM, AAAI 2017
2-Block ADMM Convergence analysis with variational inequality
What is variational inequality
Given a convex set Ω∈Rn and a convex function F:Rn→R, the variational inequality is to find a x∗
For diffierential convex problem min{Θ(x)|x∈Ω}, if x∗ is its solution,
then from the point of x∗,
all loss decreasing set: Sd(x∗)={s∈Rn|sTδΘ(x∗)<0}
and all possible update set Sf(x∗)={s∈Rn|s=x−x∗,x∈Ω}
have no overlap:
If Θ(x) is second-order derivable, the Hessian matrix δ2Θ(x) is symmetric.
In VI(Ω,F), if F is derivable, we don't require the Jacobian maxtrix δF(x) to be symmetric.
Able to Convergence
- 2-Block ADMM (equation constraint as an example)
minimize f1(x)+f2(z) , subject to Ax+Bz=c
f1(x) : Rn1→R,
f2(z) : Rn2→R, c∈Rm. If we solve with ADMM, we have
xk+1:=argmin x(f1(x)+(ρ/2)||Ax+Bzk−c−uk||22)zk+1:=argmin z(f2(z)+(ρ/2)||Axk+1+Bz−c−uk||22)uk+1:=uk−ρ(Axk+1+Bzk+1−c)
we denote v=(z,u), w=(x,z,u). Our goal is to prove with iteration time k increases, v goes to v∗, w goes to w∗
Able to Convergence
minimize f1(x)+f2(z) , subject to Ax+Bz=c
based on the mix-max of lagrange function, for the solution (x∗,z∗,u∗) of the problem, we have
by defining
G(w)=(−ATu−BTuAx+Bz−c) , and
origin task is actually the problem VI(Ω,G,F):
holds for any x∈Rn1, similarly,
holds for any z∈Rn2. Then substituing uk in the two inequaility, we get
holds for any x∈Rn1 and
holds for any z∈Rn2.
Next, we add the two inequaility together and get:
Denote λ=(x,z) and F(λ)=f1(x)+f2(z), above inequaility can be rewriten in a more compact form,
F(λ)−F(λk+1)+(x−xk+1z−zk+1)T{(−ATuk+1−BTuk+1)+ρ(ATBT)B(zk−zk+1)+(000ρBTB)(xk+1−xkzk+1−zk)}≥0 for ∀λ∈(Rn1,Rn2) holds. We addtionally add the scaled dual variable u in the inequaility to get
Last inequaility holds for all w=(x,z,u), thus we can set w=w∗ and get
For simplicity, we rewrite it to
Then move several term form left to right, we get:
As the first row of H0 are all zero, we can rewrite it to (H represents the last two rows of H0):
On the other hand, for the second part, we have:
The former inquaility holds as F is monotone and the later inquaility holds as it is the definition of VI.
It is clear that
For the right side, we have
Recall that
holds for any z∈Rn2. It can also be known that
holds for any z∈Rn2. Set the z to be zk in the former inequation and zk+1 in the latter inequation,
and add the two inequations, we get:
It ensures
Overall, we have
Having the
It is easy to prove that
the sequence is monotonically deceasing (in the word, it can convergence). It further can be proved to convergene at rate (1/t)