1425 字
7 分钟
Duality
2024-02-23

1 The Lagrange dual function#

1.1 The Lagrangian#

1.2 The Lagrange dual function#

  • is always concave.
  • take care the case where the LL is not differentiable.

1.3 Lower bounds on optimal value#

1.4 Linear approximation interpretation#

The Lagrangian and lower bound property can be interpreted as a linear approximation of the indicator functions of the sets {0}\{ 0 \} and R+-\mathbb{R}_{+}.

minimizef0(x)=i=1mI(fi(x))+i=1pI0(hi(x))\text{minimize} \quad f_{0}(x)=\sum_{i=1}^{m}I_{-}(f_{i}(x))+\sum_{i=1}^{p}I_{0}(h_{i}(x))

Where I:RRI_{-}:\mathbb{R} \to \mathbb{R} is the indicator function for the nonpositive reals,

I(u)={0u0u>0\begin{align} I_{-}(u)= \begin{cases} 0 \quad &u \le 0 \\ \infty & u>0 \end{cases} \end{align}

and similarly, I0I_{0} is the indicator function of {0}\{ 0 \}.

Note: 通过 indicator function 去除 constraints,如果不满足则设置很大的惩罚。 Note: 整个思想其实就是, replacing the hard constraints with soft versions.

1.5 Examples#

  • Least-squares solution of linear equations
  • Standard form LP
  • Equality constrained norm minimization (!!!)
  • Two-way partitioning problem (nonconvex)

[!note] the infimum of a quadratic form xTSxx^{T}S x

  • if S0S\ge 0, the infimum is 0.
  • if S0S \ngeq 0, then we can find a vector yy such that yTSy<0y^{T}S y<0, let x=ty,tRx=ty, t \in \mathbb{R}. then xTSx=t2yTSyx^{T}Sx=t^{2}y^{T}Sy\to \infty, when tt\to \infty. (This is a frequently used trick)

[!note] Dual norm

v=supu1uTv\lVert v \rVert_{*} = \underset{\lVert u \rVert\leq 1}{\sup} u^{T}v

Obtain the lower bound:

  • Lagrangian is a good way to obtain a lower bound for some problem. This can be useful when we need to analyze some algorithms. (I guess)
  • A another way to get o lower bound is to augment its feasible set. (relaxation with constraints)

1.6 The Lagrange dual function and conjugate functions#

Consider an optimization problem with linear inequality and equality constraints,

minimizef0(x)subject toAxbCx=d\begin{align} \text{minimize} \quad &f_{0}(x) \\ \text{subject to} \quad &Ax\leq b \\ & Cx = d \end{align}

Then we can write the dual function as

g(λ,ν)=bTλdTνf0(ATλCTν)g(\lambda, \nu)= -b^{T}\lambda - d^{T}\nu - f_{0}^{*}(-A^{T}\lambda - C^ {T}\nu)

The domain of gg follows from the domain of f0f_{0}^*:

dom g={(λ,ν)ATλCTνdom f0}\mathbf{dom}\ g = \{ (\lambda, \nu) | -A^{T}\lambda - C^{T}\nu \in \mathbf{dom}\ f_{0}^* \}

Examples:

  • Equality constrained norm minimization #confusion
  • Entropy maximization: the conjugate of the function uloguu\log u, with scalar variable uu, is ev1e^{v-1}.
  • Minimum volume covering ellipsoid

Note: an equivalent form of xTSxx^{T}Sx: tr(xxTS)\mathbf{tr}(xx^{T}S).

2 The Lagrange dual problem#

maximizeg(λ,ν)subject toλ0\begin{align} &\text{maximize} &&g(\lambda,\nu) \\ &\text{subject to} &&\lambda\geq 0 \end{align}
  • The dual feasible is to describe a pair (λ,ν)(\lambda,\nu) with λ0\lambda \geq 0 and g(λ,ν)>g(\lambda,\nu)>-\infty.
  • The Lagrange dual problem is a convex optimization problem. This is the case whether or not the primal problem is convex.

2.1 Making dual constraints explicit#

The domain of the dual function

dom g={(λ,ν)  g(λ,ν)>}\mathbf{dom}\ g = \{ (\lambda, \nu)\ |\ g(\lambda,\nu) > -\infty \}
  • Lagrange dual of standard form LP
  • Lagrange dual of inequality form LP

2.2 Weak duality#

We denote the optimal value of the Lagrange dual problem as dd^*. Then, we have a simple inequality:

dpd^{*}\leq p^*
  • The weak duality inequality holds when dd^* and pp^* are infinite.
  • The optimal duality gap of the original problem is pdp^*-d^*, which is always nonnegative.

2.3 Strong duality and Slater’s constraint qualification#

If the equality

d=pd^*=p^*

holds, then we say that strong duality holds.

  • does not hold in general
  • (usually) holds for convex problems
  • sufficient conditions that guarantee strong duality in convex problems are called constraint qualifications

Slater’s condition: There exists an xrelint Dx \in \mathbf{relint}\ D such that

fi(x)<0,i=1,,m,Ax=b.f_{i}(x)<0, \quad i=1,\dots,m, \quad Ax=b.

Such a point is sometimes called strictly feasible. Slater’s theorem states that strong duality holds, if Slater’s condition holds and the problem is convex.

  • also guarantees that the dual optimum is attained when d>d^*>-\infty.
  • The affine inequalities do not need to hold with strict inequality.

2.4 Examples#

  • Least-squares solution of linear equations
  • Lagrange dual of LP: One of the two is feasible, then the strong duality holds.
  • Lagrange dual of QCQP:
  • Entropy maximization
  • Minimum volume covering ellipsoid
  • A nonconvex quadratic problem with strong duality: trust region problem

Note: for Ax=bAx=b, when bR(A)b \notin \mathcal{R}(A), there is a zz with ATz=0,bTz0A^{T}z=0,b^{T}z\ne 0. (A trick) Note: strong duality holds for any optimization problem with quadratic objective and one quadratic inequality constraint.

2.5 Mixed strategies for matrix games#

3 Geometric interpretation#

3.1 Weak and strong duality via set of values#

G={(f1(x),,fm(x),h1(x),,hp(x),f0(x))Rm×Rp×RxD}\mathcal{G}=\{(f_1(x),\ldots,f_m(x),h_1(x),\ldots,h_p(x),f_0(x))\in\mathbf{R}^m\times\mathbf{R}^p\times\mathbf{R}\mid x\in\mathcal{D}\}

which is the set of values taken on by the constraint and objective functions. The optimal value pp^\star of primal problem is easily expressed as

p=inf{t(u,v,t)G, u0, v=0}p^{\star}=\inf\{t\mid(u,v,t)\in\mathcal{G},\ u\preceq 0,\ v=0\}

Then, we have

g(λ,ν)=inf{(λ,ν,1)T(u,v,t)  (u,v,t)G}g(\lambda,\nu)=\inf\{(\lambda,\nu,1)^{T}(u,v,t)\ | \ (u,v,t)\in\mathcal{G}\}

In particular, the inequality

(λ,ν,1)T(u,v,t)g(λ,ν)(\lambda,\nu,1)^{T}(u,v,t)\geq g(\lambda,\nu)

defines a supporting hyperplane to G\mathcal{G}. (non vertical)

  • Epigraph variation

3.2 Proof of strong duality under constraint qualification#

3.3 Multicriterion interpretation#

4 Saddle-point interpretation#

5 Optimality conditions#

If strong duality holds, then xx is primal optimal and (λ,ν)(\lambda,\nu) is dual optimal if:

  • xx is feasible
  • (λ,ν)(\lambda,\nu) is feasible
  • f0(x)=g(λ,ν)f_{0}(x)=g(\lambda,\nu)

5.1 Certificate of suboptimality and stopping criteria#

5.2 Complementary slackness#

assume xx satisfies the primal constraints and λ0\lambda\geq0

g(λ,ν)=infx~D(f0(x~)+i=1mλifi(x~)+i=1pνihi(x~))f0(x)+i=1mλifi(x)+i=1pνihi(x)f0(x)\begin{align} g(\lambda,\nu)& \begin{aligned}=\quad\inf_{\tilde{x}\in\mathcal{D}}\left(f_0(\tilde{x})+\sum_{i=1}^m\lambda_if_i(\tilde{x})+\sum_{i=1}^p\nu_i^\star h_i(\tilde{x})\right)\end{aligned} \\ &\leq\quad f_0(x)+\sum_{i=1}^m\lambda_if_i(x)+\sum_{i=1}^p\nu_ih_i(x) \\ &\leq\quad f_0(x) \end{align}

equality f0(x)=g(λ,ν)f_0(x)=g(\lambda,\nu) holds if and only if the two inequalities hold with equality:

  • first inequality: xx minimizes L(x~,λ,ν)L(\tilde{x},\lambda,\nu) over x~D\tilde{x}\in\mathcal{D}
  • 2nd inequality: λifi(x)=0\lambda_if_i(x)=0 for i=1,,m,ie.i=1,\ldots,m,ie. λi>0fi(x)=0,fi(x)<0λi=0\lambda_i>0\quad\Longrightarrow\quad f_i(x)=0,\quad f_i(x)<0\quad\Longrightarrow\quad\lambda_i=0 this is known as complementary slackness.

5.3 KKT optimality conditions#

if strong duality holds, then xx is primal optimal and (λ,ν)(\lambda,\nu) is dual optimal if

  1. fi(x)0f_i(x)\leq0 for i=1,,mi=1,\ldots,m and hi(x)=0h_i(x)=0 for i=1,,pi=1,\ldots,p
  2. λ0\lambda\geq0
  3. λifi(x)=0\lambda_if_i(x)=0 for i=1,,mi=1,\ldots,m
  4. xx is a minimizer of L(,λ,ν)L(\cdot,\lambda,\nu)

conversely, these four conditions imply optimality of x,(λ,ν)x,(\lambda,\nu) and strong duality

if problem is convex and the functions fi,hif_i,h_i are differentiable #4 can written as 4’. the gradient of the Lagrangian with respect to xx vanishes:

f0(x)+i=1mλifi(x)+i=1pνihi(x)=0\nabla f_{0}(x)+\sum_{i=1}^{m}\lambda_{i}\nabla f_{i}(x)+\sum_{i=1}^{p}\nu_{i}\nabla h_{i}(x)=0

conditions 1,2,3,4’ are known as Karush-Kuhn- Tucker (KKT) conditions.

6 Perturbation and Sensitivity analysis#

6.1 The perturbed problem#

minimizef0(x)subject tofi(x)ui,i=1,,mhi(x)vi,i=1,,p\begin{align} &\text{minimize} && f_{0}(x) \\ &\text{subject to} && f_{i}(x)\leq u_{i}, \quad i=1,\dots,m \\ & &&h_{i}(x)\leq v_{i},\quad i=1,\dots,p \end{align}
  • uiu_{i} is positive: relax the ii th inequality constraint
  • uiu_{i} is negative: tight the ii th inequality constraint

we define p(u,v)p^*(u,v) as the optimal value of the perturbed problem:

p(u,v)=inf{f0(x)    xD,fi(x)ui,  i=1,,m,  hi(x)=vi,  i=1,,p}p^{*}(u,v) = \inf \{ f_{0}(x)\;|\; \exists x \in \mathcal{D}, f_{i}(x)\leq u_{i},\; i=1,\dots,m,\;h_{i}(x)=v_{i},\;i=1,\dots,p \}
  • we can have p(u,v)=p^{*}(u,v)=\infty
  • p(0,0)=pp^{*}(0,0)=p^{*}
  • when the original problem is convex, the function pp^{*} is a convex function of uu and vv.

6.2 A global inequality#

Assume that strong duality holds, and that the dual optimum is attained. Let (λ,v)(\lambda^{*},v^{*}) be optimal for the dual of the unperturbed problem. Then for all uu and vv we have

p(u,v)p(0,0)λ Tuv Tvp^{*}(u,v)\geq p^{*}(0,0)- \lambda^{* \ T}u-v^{*\ T}v

Sensitivity interpretations: ![[b21a7823aef5cf9d0fb8973dff1540e 1.jpg|450]]

  • If λi\lambda_{i}^{*} is large and we tighten the ii th constraint (i.e., choose ui<0u_{i} < 0), then the optimal value p(u,v)p^{*}(u,v) is guaranteed to increase greatly.
  • If νi\nu^{*}_{i} is large and positive and we take vi<0v_{i} < 0, or if νi\nu^{*}_{i} is large and negative and we take vi>0v_{i}>0, then the optimal value p(u,v)p^{*}(u,v) is guaranteed to increase greatly.
  • If λi\lambda_{i}^{*} is small, and we loosen the ii th constraint (ui>0u_{i} > 0), then the optimal value p(u,v)p^{*}(u,v) will not decrease too much.
  • If νi\nu^{*}_{i} is small and positive, and vi>0v_{i} > 0, or if νi\nu^{*}_{i} is small and negative and vi<0v_{i}<0, then the optimal value p(u,v)p^{*}(u,v) will not decrease too much.

The inequality just gives a lower bound on the perturbed optimal value.

6.3 Local sensitivity analysis#

Suppose that p(u,v)p^{*}(u,v) is differentiable at u=0,v=0u=0,v=0. Then, provided strong duality holds, the optimal dual variables λ,ν\lambda^{*},\nu^{*} are related to the gradient of pp^{*} at u=0,v=0u=0,v=0:

λi=p(0,0)ui,νi=p(0,0)vi\lambda^{*}_{i}=- \frac{\partial p^{*}(0,0)}{\partial u_{i}}, \quad \nu^{*}_{i}=- \frac{\partial p^{*}(0,0)}{\partial v_{i}}

If fi(x)<0f_{i}(x^{*})<0, then the constraint is inactive, and it follows that the constraint can be tightened or loosened a small amout without affecting the optimal value. We suppose that fi(x)=0f_{i}(x^{*})=0 (active). The ii th optimal Lagrange multiplier tells us how active the constraint is:

  • If λi\lambda_{i}^{*} is small, the constraint can be loosened or tightened a bit without much effect on the optimal value
  • If λi\lambda_{i}^{*} is large, it means that if the constraint is loosened or tightened a bit, the effect on the optimal value will be great

7 Examples#

Simple equivalent reformulations of a problem can lead to very different dual problems.

  • Introducing new variables and equality constraints: can add information for the Lagrangian dual function
  • Transforming the objective
  • Implicit constraints: make the explicit constraints implicit

8 Semidefinite optimization#

minimizecTxsubject tox1F1++xnFnG\begin{align} &\text{minimize} && c^Tx \\ &\text{subject to} &&x_1F_1+\cdots+x_nF_n\leq G \end{align}

matrices F1,,Fn,GF_1,\ldots,F_n,G are symmetric m×mm\times m matrices

Lagrangian and dual function · we associate with the constraint a Lagrange multiplier ZSmZ\in\mathbb{S}^m · define Lagrangian as

L(x,Z)=cTx+tr(Z(x1F1++xnFnG))=i=1n(tr(FiZ)+ci)xitr(GZ)\begin{gathered} L(x,Z) \begin{aligned}=\quad c^Tx+\mathrm{tr}\left(Z(x_1F_1+\cdots+x_nF_n-G)\right)\end{aligned} \\ =\quad\sum_{i=1}^n(\operatorname{tr}(F_iZ)+c_i)x_i-\operatorname{tr}(GZ) \end{gathered}

· dual function

g(Z)=infxL(x,Z)={tr(GZ)tr(FiZ)+ci=0,i=1,,notherwise\left.g(Z)=\inf_xL(x,Z)=\left\{\begin{array}{ll}-\operatorname{tr}(GZ)&\operatorname{tr}(F_iZ)+c_i=0,&i=1,\ldots,n\\-\infty&\text{otherwise}\end{array}\right.\right.

8.1 Dual semidefinite program#

maximizetr(GZ)subject totr(FiZ)+ci=0,i=1,,nZ0\begin{align} &\text{maximize} && -\text{tr}(GZ) \\ &\text{subject to} && \text{tr}(F_{i} Z) + c_{i} = 0, \quad i=1,\dots,n \\ & &&Z\geq 0 \end{align}
  • Weak duality: pdp^{*}\geq d^{*} always
  • Strong duality: p=dp^{*}=d^{*} if primal SDP or dual SDP is strictly feasible

we can readily check whether strong duality holds before we start to solve the problem

if X0,Z0X\geq 0,Z\geq 0, then tr(XZ)0\text{tr}(XZ)\geq 0

[!note] Matrix decomposition For a matrix X0X\geq 0, we can decompose XX as

X=UUTX=U U^T

For example, let X=QΣ12Σ12QT=VVTX=Q\Sigma^{\frac{1}{2}}\Sigma^{\frac{1}{2}}Q^T=VV^T

8.2 complementary slackness#

the primial and dual object values at feasible x,Zx, Z are equal if

0=tr(XZ)where X=Gx1F1xnFn0= \text{tr}(XZ) \quad \text{where} \ X=G-x_{1}F_{1}-\cdots - x_{n}F_{n}

9 Theorems of alternatives#

two related fesibility problems

  • weak alternatives: if at most one is feasible
  • strong alternatives: if exactly one is feasible

wear alternative is easy to check, for strong alternatives, we need to check both infeasible is impossible

9.1 Nonlinear inequalities#

9.2 Theorem of alternatives for linear matrix inequality#

we can see that the proof of above questions is achieved by strong duality.

Duality
https://fuwari.vercel.app/posts/5_duality/
作者
pride7
发布于
2024-02-23
许可协议
CC BY-NC-SA 4.0