EE5121 Optimization - KKT Conditions and Duality

Having established the first-order necessary (KKT) conditions, we now revisit them through a worked example and then develop the theory of duality, which lets us attack a constrained (primal) problem through an associated (dual) problem.

A motivating example

Start with the objective \(f(x) = \frac{2}{5}(x_1^2 + x_2^2)\) and solve the constrained problem

\[\begin{equation*} \text{P1}: \quad \underset{x}{\text{min}}\; f(x) \quad \text{s.t.}\quad c(x) = x_1 + x_2 - 2 \ge 0. \end{equation*}\]

Figure 1. Geometry of P1: contours of \(f\) (circles), the constraint line \(c(x)=0\), the feasible half-plane, and the true solution \((1,1)\) where \(\nabla f\) is parallel to \(\nabla c\).

The Lagrangian is \(\mathcal{L}(x,\lambda) = f(x) - \lambda c(x)\). As per the KKT conditions, at an optimal \(x^*\) we require \(\nabla_x \mathcal{L}(x^*,\lambda^*) = 0\) with \(\lambda^* \ge 0\) and the complementarity condition \(\lambda^* c(x^*) = 0\). The stationarity condition reads

\[\begin{equation*} \nabla f(x) = \lambda \nabla c(x) \quad\implies\quad 0.8 \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \lambda \begin{bmatrix} 1 \\ 1 \end{bmatrix}. \end{equation*}\]

Let us test two candidate points:

What if \(x = \frac{5}{4}\begin{bmatrix} 1 \\ 1\end{bmatrix}\)? Then \(\lambda = 1\), but \(c(x) = \frac{5}{4}+\frac{5}{4}-2 = \frac12 \ne 0\), so \(\lambda c(x) \ne 0\) — complementarity fails.
What if \(x^* = \begin{bmatrix} 1 \\ 1\end{bmatrix}\)? Then \(\lambda = 0.8\), and \(c(x) = 0\) (so \(\lambda c(x) = 0\)). All KKT conditions are satisfied at \((1,1)\). Since \(\nabla c(x) = \begin{bmatrix}1 & 1\end{bmatrix}^T \ne 0\), LICQ holds, so the optimal multiplier is unique.

Solving the same problem via the dual

Let us solve P1 differently, in two steps.

Step 1 — Define the Lagrangian dual function:

\[\begin{equation*} q(\lambda) = \underset{x}{\text{min}}\; \mathcal{L}(x,\lambda) = \underset{x}{\text{min}}\; f(x) - \lambda c(x). \end{equation*}\]

For a fixed (not-yet-optimal) \(\lambda\), the inner minimum occurs where \(\nabla_x \mathcal{L}(x,\lambda) = 0\):

\[\begin{equation*} \frac{4}{5}\begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \lambda \begin{bmatrix} 1 \\ 1 \end{bmatrix} \implies \begin{bmatrix} x_1^\circ \\ x_2^\circ \end{bmatrix} = \frac{5}{4}\begin{bmatrix} \lambda \\ \lambda \end{bmatrix}. \end{equation*}\]

Substituting back,

\[\begin{align*} q(\lambda) &= \frac{2}{5}\left( x_1^{\circ 2} + x_2^{\circ 2}\right) - \lambda\left(x_1^\circ + x_2^\circ - 2\right) = \frac{4}{5}\left(\frac{5\lambda}{4}\right)^2 - \lambda\left(\frac{10}{4}\lambda - 2\right) = -\frac{5}{4}\lambda^2 + 2\lambda. \end{align*}\]

Step 2 — Find \(\lambda \ge 0\) that maximizes \(q(\lambda)\):

\[\begin{equation*} \underset{\lambda}{\text{max}}\; q(\lambda) \implies -\frac{10}{4}\lambda + 2 = 0 \implies \lambda^\circ = \frac{4}{5} \quad (\text{got lucky}) \implies \begin{bmatrix} x_1^\circ \\ x_2^\circ \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, \end{equation*}\]

the same answer as before. What did we just do?

Primal and dual problems

\[\begin{align*} \textbf{Primal problem}: &\quad \underset{x}{\text{min}}\; f(x) \quad \text{s.t.}\quad c(x) \ge 0. \\[4pt] \textbf{Dual problem}: &\quad \underset{\lambda}{\text{max}}\; q(\lambda) \quad \text{s.t.}\quad \lambda \ge 0, \quad \text{where } q(\lambda) = \underset{x}{\text{min}}\; \mathcal{L}(x,\lambda). \end{align*}\]

The dual problem is called the Lagrangian dual problem. In general, when equality constraints are also present, the constraint above becomes \(\lambda_i \ge 0\;\forall i \in \mathcal{I}\) (only the inequality multipliers are sign-constrained).

Two remarks:

It is not guaranteed to always work; it works for a class of problems.
When it works, it leads to a simpler optimization problem.

Proofs come later; we develop the geometry now.

Geometric view of duality

Our usual (primal) problem is in \(x = (x_1, x_2)\) space. Let’s map it to another space given by \((y=c(x), z=f(x))\) via

\[\begin{equation*} x = \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \longmapsto \begin{cases} y = c(x) = x_1 + x_2 - 2 \\ z = f(x) = \frac{2}{5}(x_1^2 + x_2^2) \end{cases} \end{equation*}\]

Figure 2. The map from \(x\)-space to the \((y,z)\)-plane. Every point \((x_1,x_2)\) lands in a region \(G\); feasibility \(c(x)\ge 0\) becomes simply \(y \ge 0\).

Every point \((x_1,x_2)\) is mapped into a region \(G\) in \((y,z)\) coordinates, and feasibility is now just \(y \ge 0\). To see the shape of \(G\), fix \(x_2 = a\); then \(y = x_1 + a - 2\) and

\[\begin{equation*} z = \frac{2}{5}(x_1^2 + a^2) = \frac{2}{5}\left((y + 2 - a)^2 + a^2\right), \end{equation*}\]

so each horizontal line \(x_2 = a\) maps to a parabola; for \(a = 0\), \(z = \frac{2}{5}(y+2)^2\). The geometry is now clear.

What is the primal problem? \(\underset{}{\text{min}}\; z\) while \(y \ge 0\). The marked point \(z_{\text{min}}\) on the \(z\)-axis is the solution.
What is the dual problem? Visualize \(\mathcal{L}(x,\lambda) = f(x) - \lambda c(x) = \alpha\) (say), i.e. \(\alpha = z - \lambda y\), or \(z = \lambda y + \alpha\). What is this geometrically? A line of slope \(\lambda\) and intercept \(\alpha\) on the \(z\) axis.

Figure 3. For a fixed slope \(\lambda \ge 0\), the line \(z = \lambda y + \alpha\) is pushed up until its intercept \(\alpha\) is the smallest value for which the line still meets the region \(G\). That smallest intercept is \(q(\lambda)\).

Let’s solve this in two steps:

Step 1 (dual function). For fixed \(\lambda\),

\[\begin{equation*} q(\lambda) = \underset{x}{\text{min}}\; \mathcal{L}(x,\lambda) = \underset{y,z \in G}{\text{min}}\; (z - \lambda y). \end{equation*}\]

In words: what is the minimum possible value of the intercept \(\alpha\), while keeping at least one point \((y,z) \in G\)?

But why is \(q(\lambda)\) finite at all — why can’t we slide the line down forever and drive \(\alpha \to -\infty\)? Because \(G\) is not the whole plane; it is bounded below. For a fixed \(y\) (i.e. fixed \(x_1+x_2=y+2\)), \(z=f\) is smallest when \(x_1 = x_2 = \frac{y+2}{2}\), giving \(z = \frac{1}{5}(y+2)^2\) — the \(a=0\) slice drawn above is just one curve sitting above this envelope. Thus

\[\begin{equation*} G = \left\{ (y,z) : z \ge \tfrac{1}{5}(y+2)^2 \right\}, \end{equation*}\]

the region on and above this parabola; in particular \(z \ge 0\), touching \(z=0\) only at the vertex \((-2,0)\). A slope-\(\lambda\) line slid downwards therefore jams against this parabolic floor — push it past the point of tangency and the whole line drops below \(G\), meeting nothing. So the smallest reachable intercept is finite, and that value is exactly \(q(\lambda)\). (The weaker fact \(z\ge 0\) alone would not pin it down: a tilted line could run off along \(z=0\) as \(y\to\infty\); it is the parabola’s quadratic growth outpacing \(\lambda y\) that supplies the floor.)

Step 2 (dual optimum). Finally,

\[\begin{equation*} \underset{\lambda \ge 0}{\text{max}}\; q(\lambda) = \underset{\lambda \ge 0}{\text{max}}\; (z^* - \lambda y^*), \end{equation*}\]

while maintaining the definition of \(q\) (which needs legitimate feasible \((y,z)\) points). That is, we maximize the intercept by varying \(\lambda\).

Figure 4. Varying the slope \(\lambda\) sweeps a family of supporting lines \(T_1, T_2, T_3, \dots\). The dual optimum is the one with the largest intercept (here \(T_3\)), giving \(z^* - \lambda^\circ y^* = q(\lambda^\circ)\).

In this example the optimal primal value \(z^*\) equals the optimal dual value — they are equal, so there is no gap: this is strong duality.

Properties of the dual

Recall \(q(\lambda) = \underset{x}{\text{min}}\; \mathcal{L}(x,\lambda)\). It can happen that for some \(\lambda\), \(q(\lambda) \to -\infty\). We therefore define the domain of \(q\) as

\[\begin{equation*} \mathcal{D} \triangleq \{ \lambda \mid q(\lambda) > -\infty \}. \end{equation*}\]

There are two important results, true for any \(f, c\):

\(q\) is concave.
\(\mathcal{D}\) is convex.

Implication: solving the dual problem is a convex optimization problem — regardless of the primal problem.

Proof of (1) — concavity. Consider \(\lambda_1, \lambda_2\) and \(\alpha \in [0,1]\). Since \(\mathcal{L}(x,\lambda) = f(x) - \lambda c(x)\) is affine in \(\lambda\),

\[\begin{equation*} \mathcal{L}\big(x, (1-\alpha)\lambda_1 + \alpha\lambda_2\big) = (1-\alpha)\mathcal{L}(x,\lambda_1) + \alpha\, \mathcal{L}(x,\lambda_2). \end{equation*}\]

Taking \(\underset{x}{\text{min}}\) of both sides and using \(\underset{x}{\text{min}}(a+b) \ge \underset{x}{\text{min}}\,a + \underset{x}{\text{min}}\,b\) (e.g. \(a = x, b = -x\)),

\[\begin{equation*} q\big((1-\alpha)\lambda_1 + \alpha\lambda_2\big) \ge (1-\alpha)q(\lambda_1) + \alpha\, q(\lambda_2) \quad\implies\quad q \text{ is concave.} \end{equation*}\]

Proof of (2) — convexity of \(\mathcal{D}\). In words, convexity of \(\mathcal{D}\) means: if \(\lambda_1 \in \mathcal{D}\) and \(\lambda_2 \in \mathcal{D}\), then \((1-\alpha)\lambda_1 + \alpha\lambda_2 \in \mathcal{D}\) too. Given \(\lambda_1, \lambda_2 \in \mathcal{D}\), we have \(q(\lambda_1) > -\infty\) and \(q(\lambda_2) > -\infty\). By the concavity of \(q\) just proved,

\[\begin{equation*} q\big((1-\alpha)\lambda_1 + \alpha\lambda_2\big) \ge (1-\alpha)q(\lambda_1) + \alpha\, q(\lambda_2) > -\infty \quad\implies\quad (1-\alpha)\lambda_1 + \alpha\lambda_2 \in \mathcal{D}. \end{equation*}\]

Thus a convex combination of \(\lambda_1, \lambda_2\) is also in \(\mathcal{D}\), i.e. \(\mathcal{D}\) is a convex set. To summarize: the primal problem can be anything, but the dual problem is always concave.

Weak duality

For the primal \(\underset{x}{\text{min}}\; f(x)\) s.t. \(c(x) \ge 0\): for any feasible \(\bar{x}\) and any \(\bar\lambda \ge 0\), the following holds:

\[\begin{equation*} q(\bar\lambda) \le f(\bar{x}). \end{equation*}\]

Proof: Weak duality

\[\begin{align*} q(\bar\lambda) &= \underset{x}{\text{min}}\; \mathcal{L}(x,\bar\lambda) = \underset{x}{\text{min}}\; f(x) - \bar\lambda c(x) \\ &\le f(\bar{x}) - \bar\lambda c(\bar{x}) \qquad (\bar{x} \text{ is one feasible candidate}) \\ &\le f(\bar{x}) \qquad\qquad\;\; (\text{since } \bar\lambda \ge 0 \text{ and } c(\bar{x}) \ge 0 \implies \bar\lambda c(\bar{x}) \ge 0). \qquad \blacksquare \end{align*}\]

Important corollary. Define the optimal primal and dual values

\[\begin{equation*} f^\circ = \inf\{ f(x) : x \in \Omega \}, \qquad q^\circ = \sup\{ q(\lambda) : \lambda \ge 0 \}. \end{equation*}\]

Then \(q^\circ \le f^\circ\): the optimum of the dual problem never exceeds the optimum of the primal. Consequently:

If \(q^\circ < f^\circ\), then \(f^\circ - q^\circ > 0\) is called a duality gap.
If \(q^\circ = f^\circ\), we have strong duality.
For convex optimization problems, the duality gap is always \(0\) under a constraint qualification (e.g. LICQ).

Worked example using duality

Consider

\[\begin{equation*} \underset{x_1, x_2 \in \Omega}{\text{min}}\; \big[-(x_1 + x_2)\big] \quad \text{where } \Omega: \begin{cases} 8 - x_1^2 - x_2^2 \ge 0 \\ 6 - x_2 \ge 0. \end{cases} \end{equation*}\]

Figure 5. Geometry of the example: the feasible set \(\Omega\) is the disk (the constraint \(6 - x_2 \ge 0\) is inactive). The objective lines \(-(x_1+x_2)=c\) slide until tangent to the disk at \(x^* = (2,2)\).

Primal. The objective level sets are \(-(x_1+x_2) = c \implies x_2 = -x_1 - c\); we want the minimum value of \(c\). Geometrically the solution is \(x^* = (2,2)\) with \(f^\circ = -4\). The problem is convex, so we expect the dual problem to give the same solution.

Dual.

The Lagrangian (with \(\lambda_i \ge 0\)):

\[\begin{align*} \mathcal{L}(x,\lambda) &= f(x) - \sum_i \lambda_i c_i(x) = -(x_1 + x_2) - \lambda_1(8 - x_1^2 - x_2^2) - \lambda_2(6 - x_2) \\ &= -\big[ -\lambda_1 x_1^2 - \lambda_1 x_2^2 + x_1 + x_2(1 - \lambda_2) + 8\lambda_1 + 6\lambda_2 \big], \end{align*}\]

which is clearly a convex function of \(x\) (for \(\lambda_1 > 0\)).
\(q(\lambda) = \underset{x}{\text{min}}\; \mathcal{L}(x,\lambda)\); by the observation above this minimization makes sense. Setting \(\nabla_x \mathcal{L} = 0\):

\[\begin{equation*} \nabla_x \mathcal{L} = \begin{bmatrix} 2\lambda_1 x_1 - 1 \\ 2\lambda_1 x_2 - 1 + \lambda_2 \end{bmatrix} = 0 \implies x_1^\circ = \frac{1}{2\lambda_1}, \quad x_2^\circ = \frac{1 - \lambda_2}{2\lambda_1}. \end{equation*}\]

Substituting back,

\[\begin{align*} q(\lambda) &= \lambda_1(x_1^{\circ 2} + x_2^{\circ 2}) - x_1^\circ + (\lambda_2 - 1)x_2^\circ - 8\lambda_1 - 6\lambda_2 \\ &= \frac{1 + (1-\lambda_2)^2}{4\lambda_1} - \frac{1}{2\lambda_1} - \frac{(1-\lambda_2)^2}{2\lambda_1} - 8\lambda_1 - 6\lambda_2 = -\frac{1 + (1-\lambda_2)^2}{4\lambda_1} - 8\lambda_1 - 6\lambda_2. \end{align*}\]
The dual problem is \(\underset{\lambda \ge 0}{\text{max}}\; q(\lambda)\). (One can verify concavity explicitly via \(\nabla q, \nabla^2 q\), or invoke the proof of the previous section.) Writing the maximization as a minimization of the negative,

\[\begin{equation*} \underset{\lambda \ge 0}{\text{max}}\; q(\lambda) = -\underset{\lambda \ge 0}{\text{min}}\; \left[ \frac{1 + (1-\lambda_2)^2}{4\lambda_1} + 8\lambda_1 + 6\lambda_2 \right], \qquad \lambda_1 \ge 0,\; \lambda_2 \ge 0. \end{equation*}\]

This inner minimization is itself a constrained problem, but it can be solved analytically. Introduce multipliers \(\mu_1, \mu_2\) for \(\lambda_1 \ge 0, \lambda_2 \ge 0\):

\[\begin{equation*} W(\lambda_1, \lambda_2, \mu_1, \mu_2) = \frac{1 + (1-\lambda_2)^2}{4\lambda_1} + 8\lambda_1 + 6\lambda_2 - \mu_1 \lambda_1 - \mu_2 \lambda_2, \end{equation*}\]

with the KKT conditions \(\mu_1, \mu_2 \ge 0\) and \(\mu_1 \lambda_1 = 0,\; \mu_2 \lambda_2 = 0\). Now \(\lambda_1 = 0\) is not a solution (as \(W\) blows up), so \(\mu_1 = 0\). Then \(\nabla_\lambda W = 0\) gives

\[\begin{align*} \frac{\partial W}{\partial \lambda_1} &= -\frac{1 + (1-\lambda_2)^2}{4\lambda_1^2} + 8 = 0, \\ \frac{\partial W}{\partial \lambda_2} &= \frac{2(\lambda_2 - 1)}{4\lambda_1} + 6 - \mu_2 = 0, \qquad \mu_2 \lambda_2 = 0. \end{align*}\]

Two cases:

\(\mu_2 > 0 \implies \lambda_2 = 0\). Then \(\frac{-2}{4\lambda_1^2} + 8 = 0 \implies \lambda_1 = \frac14\), and \(\frac{2(-1)}{4\lambda_1} + 6 - \mu_2 = -2 + 6 - \mu_2 = 0 \implies \mu_2 = 4 \ge 0\) — feasible.
\(\mu_2 = 0 \implies \frac{\lambda_2 - 1}{2\lambda_1} = -6 \implies \lambda_2 = 1 - 12\lambda_1\). Then \(1 + (1-\lambda_2)^2 = 32\lambda_1^2 \implies 1 + 144\lambda_1^2 = 32\lambda_1^2 \implies 112\lambda_1^2 + 1 = 0\) — not possible.

So the only KKT point is \(\begin{bmatrix} \lambda_1 \\ \lambda_2 \end{bmatrix} = \begin{bmatrix} 1/4 \\ 0 \end{bmatrix}\), which leads to

\[\begin{equation*} \underset{\lambda \ge 0}{\text{max}}\; q(\lambda) = -\left[ \frac{1+1}{1} + 8\cdot\tfrac14 + 0 \right] = -4 = q^\circ. \end{equation*}\]

This is the same as \(f^\circ = -4\), so we have strong duality. The recovered primal point is

\[\begin{equation*} \begin{bmatrix} x_1^\circ \\ x_2^\circ \end{bmatrix} = \begin{bmatrix} 1/(2\lambda_1) \\ (1 - \lambda_2)/(2\lambda_1) \end{bmatrix} = \begin{bmatrix} 2 \\ 2 \end{bmatrix}, \quad \text{as before.} \end{equation*}\]