Calculus of Optimization

V. The Basin

Interrogating the Floor: When Coordinates Become Terrain

It is tempting—perhaps too tempting—to rebrand the parameter vector \(x_i\) as the floor and the scalar objective \(y\) as altitude. The metaphor is seductively intuitive: hills, valleys, descent. But is this identification merely pedagogical, or is it mathematically defensible?

In classical calculus, \(x\) is an independent variable and \(y=f(x)\) its dependent image. The geometry is explicit: a curve embedded in \(\mathbb{R}^2\). Optimization, however, inverts the narrative. The coordinates \(x_i \in \mathbb{R}^n\) no longer index space; they are the space. The function \(y = L(x)\) does not describe motion through a landscape—it induces the landscape itself.

Under this inversion, it is reasonable—precisely reasonable—to interpret \(x_i\) as the floor. Not because it is fixed, but because it is the only thing we are allowed to stand on. The loss \(y\) becomes altitude not in a Euclidean sense, but in an order-theoretic one: higher means worse, lower means better. The geometry is ordinal before it is metric.


The Gradient as Local Gravity

Once the floor is granted primacy, the gradient acquires physical meaning:

$$ \nabla L(x_i) $$

This is not a direction “toward the minimum” in any global sense. It is the direction of steepest local increase—the direction gravity would pull you up if you resisted it. Descent is therefore an act of compliance, not foresight.

In this framing, stochastic gradient descent is not an approximation to calculus; it is calculus stripped of its illicit privileges. Classical calculus assumes access to the full function \(L(x)\). SGD assumes only what the body assumes when walking in fog: contact with the ground and a sense of tilt.


Why Stochasticity Does Not Break the Picture

The usual objection follows quickly: if \(y\) is noisy, sampled, or batch-dependent, how can it define a coherent altitude?

The answer is subtle but decisive. Altitude need not be stable to be navigable. What matters is not the absolute value of \(y\), but the directional derivative it induces on average.

$$ \mathbb{E}\left[\nabla \hat{L}(x)\right] = \nabla L(x) $$

Stochasticity perturbs height, not slope-in-expectation. The floor still tilts. The body still leans. Calculus survives—not as certainty, but as bias.


The Collapse of the Distinction

At this point, the old distinction dissolves. What we once called “optimization algorithms” are revealed as dynamical systems generated by differential structure. SGD is Euler’s method with humility baked in. Learning rate is timestep. Noise is temperature.

To think this way is not metaphorical excess. It is the gradual replacement of naïve spatial intuition with a representation that survives contact with uncertainty. Calculus does not compete with stochastic descent; it reappears inside it, chastened and local.

The floor is not smooth. The altitude flickers. And yet, step by step, the descent proceeds.