If a smooth function is globally PŁ and coercive, then it has a unique minimizer – Race to the bottom

Abstract

If a real-valued function \(f\) is twice continuously differentiable and coercive, and if it also satisfies the Polyak-Łojasiewicz condition globally, then it has a unique minimizer.

The fact

Consider a differentiable cost function \(f \colon \Rd \to \reals\), to be minimized. With \(\fmin := \inf_{x \in \Rd} f(x)\), let \[ S = \{ x \in \Rd : f(x) = \fmin \} \] denote the set of minimizers of \(f\).

On the one hand, we might assume that \(f\) is coercive (that is, its sublevel sets are compact). If so, then \(S\) is non-empty and compact.

On the other hand, we might assume that \(f\) satisfies the (global) Polyak-Łojasiewicz condition (PŁ) with parameter \(\mu > 0\), meaning \[\begin{align} f(x) - \fmin \leq \frac{1}{2\mu}\|\nabla f(x)\|^2 \tag{PŁ} \end{align}\] for all \(x \in \Rd\), where \(\nabla f\) is the gradient of \(f\). (This condition is also called gradient dominance.)

The PŁ condition has many convenient implications. First and foremost, we see that all critical points of \(f\) (points where \(\nabla f(x) = 0\)) are necessarily optimal. (In particular, local minimizers are in fact global minimizers.) Additionally, this condition is compatible with the possibility that \(f\) might have several minimizers. In fact, the minimizers may even be non-isolated. Going back to Polyak (1963), this has proved rather useful for the design and analysis of optimization algorithms, in part because PŁ can serve as a good replacement to strong convexity (which is restricted to cost functions with a unique minimizer).

From here, we may very well wonder: what can be said about functions \(f\) which are both coercive and PŁ?

A first observation follows from (Rebjock and Boumal 2024, Thm. 2.16), which implies the following (only using PŁ for now):

Proposition 1 If \(f\) is PŁ and twice continuously differentiable (\(C^2\)), then each connected component of \(S\) is an embedded submanifold of \(\Rd\) (without boundary), with regularity at least \(C^1\).

Thus, we see that for a \(C^2\) function \(f\), if it is PŁ and coercive, then its set of minimizers consists of one or more smooth manifolds without boundary contained in a compact set. What could this look like? Sets that come to mind include finite sets of points, circles, spheres and other compact manifolds.

In this post, we show that, in fact, the only possibility is that \(f\) has a unique minimizer.

Theorem 1 If \(f \colon \Rd \to \reals\) is \(C^2\) and coercive, and if it satisfies the PŁ condition globally, then it has a unique minimizer.

Proof. The steps of the proof are as follows:

Negative gradient flow on \(f\) provides a deformation retract from \(\Rd\) onto \(S\).
Since \(\Rd\) is contractible, this implies that \(S\) is contractible.
In particular, \(S\) is (simply) connected, so that it consists of a single embedded submanifold without boundary (by Proposition 1).
Thus, \(S\) is a non-empty, compact, contractible manifold without boundary in \(\Rd\).
As it turns out, all such manifolds are singletons.

The first and last steps deserve more context: see later in this post.

As a remark, notice that we only use coercivity to claim that \(S\) is compact (both PŁ and coercivity individually imply that \(S\) is non-empty). This is without loss of generality though. Indeed, if \(f\) satisfies PŁ globally, then \(f\) grows at least quadratically away from \(S\), that is, \(f(x) - \fmin \geq \frac{\mu}{2} \dist(x, S)^2\) (as per a classical argument (Otto and Villani 2000, Prop. 1’) repeated in (Rebjock and Boumal 2024, Lem. A.1) with further historical notes). Therefore, if also \(S\) is compact, then it must be that \(f\) is coercive.

The set of minimizers is contractible

We first want to argue that \(S\) is contractible (we won’t use coercivity yet, and it’s enough for \(f\) to be \(C^1\)). For good measure, let’s remember what that means.

Definition 1 Let \(X\) be a topological space. A continuous map \(F \colon X \times [0, 1] \to X\) is a deformation retract of \(X\) onto a topological subspace \(A \subset X\) if, for every \(x \in X\) and \(a \in A\), \[\begin{align*} F(x, 0) = x, && F(x, 1) \in A, && \textrm{ and } && F(a, 1) = a. \end{align*}\] We say \(X\) deformation retracts onto \(A\).

Definition 2 A topological space \(X\) is contractible if it deformation retracts onto a singleton.

To show that \(S\) is contractible, we show that the (negative) gradient flow of \(f\) defines a deformation retract from \(\Rd\) to \(S\). This is a well-known fact based on Łojasiewicz inequalities (Łojasiewicz 1963, Thm. 5), (Kurdyka 1998, Prop. 3). We take the opportunity to detail the arguments.

Since \(f\) is globally PŁ, we know negative gradient flow starting from any \(x_0 \in \Rd\) necessarily converges to a point in the solution set \(S\). More precisely, the curve \(t \mapsto x(t)\) which satisfies \[\begin{align*} \ddt x(t) = - \nabla f(x(t)) && \textrm{ and } && x(0) = x_0 \end{align*} \qquad(1)\] is defined for all times \(t \geq 0\), and \(x(\infty) := \lim_{t \to \infty} x(t)\) is in \(S\) (Łojasiewicz 1963).

Define a reparameterization \(t(s) = s/(1-s)\), which is strictly increasing and maps \([0, 1]\) to \([0, \infty]\). Define the map \(F \colon \Rd \times [0, 1] \to \Rd\) by \[ F(x_0, s) = x(t(s)) \] where \(t \mapsto x(t)\) is as in Eq. 1.

By the above properties, we have that \(F\) is well defined. Moreover, it is well known that \(F\) is continuous. This can be seen as a consequence of the continuity of the flow map, and the path-length bound provided by PŁ (see for example (Rebjock and Boumal 2024, Lem. A.1) for a classical argument about the path-length bound, or (Falconer 1983, Thm. 5.1) for a stronger result about the differentiability of \(x(\infty)\) with respect to \(x_0\)).

Moreover, for any \(x_0 \in \Rd\) we have \[\begin{align*} F(x_0, 0) = x(0) = x_0 && \textrm{ and } && F(x_0, 1) = x(\infty) \in S. \end{align*}\] Additionally, if \(x_0\) is in \(S\) then \(x(t) = x_0\) for all \(t \geq 0\) (points in \(S\) are fixed points of gradient flow), and hence \(F(x_0, 1) = x_0\) for all \(x_0 \in S\). We conclude that \(F\) is a deformation retract of \(\Rd\) onto \(S\).

Let \(\bar x\) be a point in \(S\) (it exists because \(f\) is PŁ). Note that \(\Rd\) deformation retracts onto \(\{\bar x\}\) via \(G \colon \Rd \times [0, 1] \to \Rd\) defined by \(G(x, t) = (1-t) x + t \bar x\). Therefore, \(S\) also deformation retracts onto \(\{\bar x\}\). Indeed, simply consider \(H \colon S \times [0, 1] \to S\) defined by \(H(x, t) = F(G(x, t), 1)\). We conclude the following.

Proposition 2 If \(f \colon \Rd \to \reals\) is \(C^1\) and it satisfies the PŁ condition globally, then its set of minimizers is contractible.

A contractible space is path-connected (\(0\)-connected) and simply connected (\(1\)-connected). More generally, a contractible space is \(k\)-connected for all \(k \geq 0\). For now, we mostly care that \(S\) is connected, as it combines well with Proposition 1 to imply that \(S\) consists of a single manifold (with a well defined dimension). Explicitly, we get this corollary:

Corollary 1 If \(f \colon \Rd \to \reals\) is \(C^2\) and PŁ, then \(S\) is a contractible (in particular, connected) embedded submanifold of \(\Rd\) (without boundary), with regularity at least \(C^1\).

We note in passing that if \(f\) is \(C^k\) with \(k \geq 2\) then \(S\) is \(C^{k-1}\) (and likewise with \(C^{\infty}\) and real-analytic): see (Rebjock and Boumal 2024, Thm. 2.16).

Manifolds that are compact and contractible contain at most one point

Let’s summarize what we know: under the assumptions of Theorem 1, \(S \subset \Rd\) is a non-empty, compact and contractible \(C^1\) manifold without boundary. What are such spaces? A point is certainly one example. What are other examples? A closed ball is not an example since it has a boundary. A sphere is also not an example since it is not contractible. It turns out single points are the only examples (and hence the proof of Theorem 1 is complete).

This is well known (see for example this subwiki). We sketch a proof here.

Proposition 3 Let \(M\) be a non-empty, compact and contractible topological manifold (without boundary). Then \(M\) is a point.

Proof. Let \(n\) denote the dimension of \(M\). Since \(M\) is contractible, it is simply connected, and so orientable (Hatcher 2002, Prop. 3.25). Therefore the top homology group of \(M\) is \(H_n(M) = \mathbb{Z}\) (Hatcher 2002, Thm. 3.26). Since \(M\) is contractible, it has the same homology groups as a point, because homology groups are invariant under homotopy equivalence (Hatcher 2002, sec. 2.1). The homology groups of a point are \(H_0 = \mathbb{Z}\) and \(H_k = 0\) for \(k \geq 1\). If \(n \geq 1\), we conclude that \(\mathbb{Z} = H_n(M) = 0\): a contradiction. Thus, \(n = 0\) and hence \(M\) is a collection of points. Since \(M\) is connected (by contractibility), \(M\) must be a single point.

References

Falconer, K. J. 1983. “Differentiation of the Limit Mapping in a Dynamical System.” Journal of the London Mathematical Society s2-27 (2): 356–72. https://doi.org/10.1112/jlms/s2-27.2.356.

Hatcher, A. 2002. Algebraic Topology. Cambridge University Press. https://pi.math.cornell.edu/~hatcher/AT/ATpage.html.

Kurdyka, K. 1998. “On Gradients of Functions Definable in o-Minimal Structures.” Annales de l’institut Fourier 48 (3): 769–83. https://doi.org/10.5802/aif.1638.

Łojasiewicz, S. 1963. “Une Propriété Topologique Des Sous-Ensembles Analytiques réels.” Les équations Aux dérivées Partielles 117: 87–89.

Otto, F., and C. Villani. 2000. “Generalization of an Inequality by Talagrand and Links with the Logarithmic Sobolev Inequality.” Journal of Functional Analysis 173 (2): 361–400.

Polyak, B. T. 1963. “Gradient Methods for the Minimisation of Functionals.” USSR Computational Mathematics and Mathematical Physics 3 (4): 864–78. https://doi.org/10.1016/0041-5553(63)90382-3.

Rebjock, Q., and N. Boumal. 2024. “Fast Convergence to Non-Isolated Minima: Four Equivalent Conditions for \(C^2\) Functions.” Mathematical Programming. https://doi.org/10.1007/s10107-024-02136-6.

Citation

BibTeX citation:

@online{criscitiello2025,
  author = {Criscitiello, Chris and Rebjock, Quentin and Boumal,
    Nicolas},
  title = {If a Smooth Function Is Globally {PŁ} and Coercive, Then It
    Has a Unique Minimizer},
  date = {2025-02-08},
  url = {www.racetothebottom.xyz/posts/PL-smooth-unique/},
  langid = {en},
  abstract = {If a real-valued function \$f\$ is twice continuously
    differentiable and coercive, and if it also satisfies the
    Polyak-Łojasiewicz condition globally, then it has a unique
    minimizer.}
}