The Maximum Theorem


Nicolas Boumal


November 23, 2023

If an optimization problem depends on some parameters, then the optimal value is a function of those parameters. Sometimes, that function is continuous.

Consider a family of maximization problems, defined by some parameters. For example, \(\Gamma(t)\) is a set that depends on \(t \geq 0\), and we consider \[ \max_{x} f(x) \qquad \textrm{ subject to } \qquad x \in \Gamma(t). \] The Maximum Theorem (Bergé 1963, p116) (see it on google books) is a convenient set of statements about the continuity of the maximal value of that problem as a function of the parameter. In the example, the theorem requires \(\Gamma(t)\) to be compact and to vary continuously “in some sense”.

The full statement (which is purely topological) is technical and more general—see also the excellent Wikipedia article about it. It separates concerns between upper and lower semicontinuity, does not need a metric, and allows both \(f\) and \(\Gamma\) to vary with more than just a real parameter. I have found that the particular case below is often sufficient for my needs—see for example this post about retractions, where \(\Gamma(t)\) is (essentially) a product of two balls of radius \(t\). The proof below is that of Berge, only slightly simplified for this particular case.

Definition 1 A correspondence from \(A\) to \(B\) maps each element of \(A\) to a subset of \(B\).

Theorem 1 (Maximum Theorem) Let \(f \colon \calM \to \reals\) be continuous on a metric space \(\calM\). Let \(\Gamma\) be a correspondence from \([0, \infty)\) to \(\calM\) such that

  • \(\Gamma(t)\) is a compact subset of \(\calM\) for all \(t \geq 0\);
  • For all \(t > 0\), \(x \in \Gamma(t)\) and \(\delta > 0\), there exists \(x'\) and \(\underline{t} < t\) such that \(\dist(x, x') < \delta\) and \(x' \in \Gamma(t')\) for all \(\underline{t} \leq t' \leq t\); and
  • For all \(t \geq 0\), if \(\calU\) is a neighborhood of \(\Gamma(t)\) then there exists \(\overline{t} > t\) such that \(\Gamma(t') \subseteq \calU\) for all \(t \leq t' \leq \overline{t}\).

Then, \(F(t) = \max_{x \in \Gamma(t)} f(x)\) is continuous on \([0, \infty)\).

Proof. We prove upper and lower semicontinuity separately. For both, pick \(\varepsilon > 0\) and \(t \geq 0\).

If \(t = 0\), let \(\underline{t} = 0\). Now assume \(t > 0\). Choose \(x \in \Gamma(t)\) such that \(F(t) \leq f(x) + \varepsilon\) (we could also pick \(x\) such that \(f(x) = F(t)\) since \(f\) is continuous and \(\Gamma(t)\) is compact, but this is not needed for this part of the proof). Since \(f\) is continuous, there exists \(\delta > 0\) such that \[\begin{align} \dist(x, x') < \delta \qquad \implies \qquad f(x') \geq f(x) - \varepsilon \geq F(t) - 2\varepsilon. \end{align}\] Pick such a point \(x'\) and \(\underline{t} < t\) in order to have \(x' \in \Gamma(t')\) for all \(\underline{t} \leq t' \leq t\). Then, \[\begin{align} F(t') \geq f(x') \geq F(t) - 2\varepsilon, \end{align}\] and this holds for all \(t' \in [\underline{t}, t]\).

Now for the other direction, fix \(t \geq 0\). Notice that for each \(x \in \Gamma(t)\) there exists \(\delta_x > 0\) such that \[\begin{align} \dist(x, x') < \delta_x \qquad \implies \qquad f(x') \leq f(x) + \varepsilon. \end{align}\] This provides an open ball around each \(x \in \Gamma(t)\). Since \(\Gamma(t)\) is compact, we can select a finite number of points \(x_1, \ldots, x_n \in \Gamma(t)\) whose open balls cover \(\Gamma(t)\). The union of these \(n\) balls is an open neighborhood of \(\Gamma(t)\), hence there exists \(\overline{t} > t\) such that for all \(t \leq t' \leq \overline{t}\) the set \(\Gamma(t')\) is included in the union of the balls. Consequently, \[\begin{align} F(t') \leq \max_{i=1,\ldots, n} \left[ \max_{x': \dist(x_i, x') < \delta_{x_i}} f(x') \right] \leq \max_{i=1,\ldots,n} f(x_i) + \varepsilon \leq F(t) + \varepsilon. \end{align}\] This holds for all \(t' \in [t, \overline{t}]\).

Combined, we have shown that \(|F(t) - F(t')| \leq 2\varepsilon\) for all \(t' \in [\underline{t}, \overline{t}]\), i.e., \(F\) is continuous.

A couple of comments about reading Berge’s book:

Another reference for such results is the book by Bonnans and Shapiro (2000, Prop. 4.4).

If you are interested in the differential of the maximum value, check out Danskin’s theorem.


Bergé, C. 1963. Topological Spaces: Including a Treatment of Multi-Valued Functions, Vector Spaces, and Convexity. Oliver; Boyd, Ltd.
Bonnans, J. F., and A. Shapiro. 2000. Perturbation Analysis of Optimization Problems. Springer New York.