# Differentation

Generalization of derivatives to higher dimensions:

* limit of difference quotient: partial derivatives,
* linearization: total derivative.

## Partial derivatives

*Definition*: let $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) and let $f: D \to \mathbb{R}$ and $\mathbf{a} \in D$, if the limit exists the partial derivates of $f$ are

$$
\begin{align*}
    &\partial_1 f(\mathbf{a}) := \lim_{h \to 0} \frac{f(a_1 + h, a_2) - f(\mathbf{a})}{h}, \\
    &\partial_2 f(\mathbf{a}) := \lim_{h \to 0} \frac{f(a_1, a_2 + h) - f(\mathbf{a})}{h}.
\end{align*}
$$

*Theorem*: suppose that two mixed $n$th order partial derivatives of a function $f$ involve the same differentations but in different orders. If those partials are continuous at a point $\mathbf{a}$ and if $f$ and all partials of $f$ of order less than $n$ are continuous in a neighbourhood of $\mathbf{a}$, then the two mixed partials are equal at the point $\mathbf{a}$. We have for $n=2$ 

$$
    \partial_{12} f(P) = \partial_{21} f(P),
$$

??? note "*Proof*:"

    Will be added later.

## Total derivatives

*Definition*: let $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) and let $f: D \to \mathbb{R}$, determining an affine linear approximation of $f$ around $\mathbf{a} \in D$

$$
    p(\mathbf{x}) = f(\mathbf{a}) + \big\langle L,\; \mathbf{x} - \mathbf{a} \big\rangle,
$$

with $f(\mathbf{x}) = p(\mathbf{x}) + r(\mathbf{x})$ demand $\frac{r(\mathbf{x})}{\|\mathbf{x} - \mathbf{a}\|} \to 0$ when $\mathbf{x} \to \mathbf{a}$. 

if $L \in \mathbb{R}^2$ exists to satisfy this, then $f$ is called totally differentiable in $\mathbf{a}$. 

*Theorem*: if $f$ is totally differentiable in $\mathbf{a}$, then $f$ is partially differentiable in $\mathbf{a}$ and the partial derivatives are

$$
    \partial_1 f(\mathbf{x}) = L_1, \qquad \partial_2 f(\mathbf{x}) = L_2,
$$

obtaining 

$$
    p(\mathbf{x}) = f(\mathbf{a}) + \big\langle \nabla f(\mathbf{a}),\; \mathbf{x} - \mathbf{a} \big\rangle.
$$

with $\nabla f(\mathbf{a})$ the gradient of $f$.

??? note "*Proof*:"

    Will be added later.

## Chain rule

*Definition*: let $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) and let $f: D \to \mathbb{R}$, also let $g: \mathbb{R} \to \mathbb{R}$ given by

$$
    g(t) = f\big(\mathbf{x}(t)\big),
$$

if $f$ is continuously differentiable, then $g$ is differentiable with

$$
    g'(t) = \big\langle \nabla f\big(\mathbf{x}(t)\big),\; \mathbf{\dot x}(t) \big\rangle.
$$

## Gradients

*Definition*: at any point $\mathbf{x} \in D$ where the first partial derivatives of $f$ exist, we define the gradient vector $\nabla$ by

$$
    \nabla f(\mathbf{x}) = \begin{pmatrix} \partial_1 f(\mathbf{x}) \\ \partial_2 f(\mathbf{x}) \end{pmatrix}.
$$

The direction of the gradient is the direction of steepest increase of $f$ at $\mathbf{x}$.

<br>

*Theorem*: gradients are orthogonal to level lines and level surfaces.

??? note "*Proof*:"

    let $\mathbf{r}(t) = \big(x(t),\; y(t) \big)^T$ be a parameterization of the level curve of $f$ such that $\mathbf{r}(0) = \mathbf{a}$. Then for all $t$ near $0$, $f(\mathbf{r}(t)) = f(\mathbf{a})$. Differentiating this equation with respect to $t$ using the chain rule, we obtain

    $$
        \partial_1 f(\mathbf{x}) \dot x(t) + \partial_2 f(\mathbf{x}) \dot y(t) = 0,
    $$

    at $t=0$, we can rewrite this to

    $$
        \big\langle \nabla f(\mathbf{a}),\; \mathbf{\dot r}(0) \big\rangle = 0,
    $$

    obtaining that $\nabla f$ is orthogonal to $\mathbf{\dot r}$. 

## Directional derivatives

*Definition*: let $D \subseteq \mathbb{R}^n$ and let $f: D \to \mathbb{R}$ with $\mathbf{v} \in D$ and $\|\mathbf{v}\| = 1$ a unit vector. The directional derivative is then the change of $f$ near a point $\mathbf{a} \in D$ in the direction of $\mathbf{v}$ 

$$  
    D_\mathbf{v} f(\mathbf{a}) = \big\langle \mathbf{v},\; \nabla f(\mathbf{a}) \big\rangle.
$$

## The general case

*Definition*: let $D \subseteq \mathbb{R}^n$ and let $\mathbf{f}: D \to \mathbb{R}^m$, with $f_i: D \to \mathbb{R}$, with $i = 1, \dotsc, m$ being the components of $\mathbf{f}$. 

* $\mathbf{f}$ is continuous at $\mathbf{a} \in D$ $\iff$ all $f_i$ continuous at $\mathbf{a}$,
* $\mathbf{f}$ is partially/totally differentiable at $\mathbf{a}$ $\iff$ all $f_i$ are partially/totally differentiable at $\mathbf{a}$. 

The linearization of every component $f_i$ we have

$$
    f_i(\mathbf{x}) = f_i(\mathbf{a}) + \big\langle \nabla f_i(\mathbf{a}),\; \mathbf{x} - \mathbf{a} \big\rangle + r_i(\mathbf{x}),
$$

so in total we have

$$
    \mathbf{f}(\mathbf{x}) = \mathbf{f}(\mathbf{a}) + D\mathbf{f}(\mathbf{a}) \big(\mathbf{x} - \mathbf{a}\big) + \mathbf{r}(\mathbf{x}),
$$

with $D\mathbf{f}(\mathbf{a})$ the Jacobian of $\mathbf{f}$.

*Definition*: the Jacobian is given by $\big[D\mathbf{f}(\mathbf{a}) \big]_{i,\;j} = \partial_j f_i(\mathbf{a}).$

### Chain rule

Let $D \subseteq \mathbb{R}^n$ and let $E \subseteq \mathbb{R}^m$ be sets and let $\mathbf{f}: D \to \mathbb{R}^m$ and let $\mathbf{g}: E \to \mathbb{R}^k$ with $\mathbf{f}$ differentiable at $\mathbf{x}$ and $\mathbf{g}$ differentiable at $\mathbf{f}(\mathbf{x})$. Then $D\mathbf{f}(\mathbf{x}) \in \mathbb{R}^{m \times n}$ and $D\mathbf{g}\big(\mathbf{f}(\mathbf{x})\big) \in \mathbb{R}^{k \times m}$. 

Then if we differentiate $\mathbf{g} \circ \mathbf{f}$ we obtain

$$
    D(\mathbf{g} \circ \mathbf{f})(\mathbf{x}) = D\mathbf{g}\big(\mathbf{f}(\mathbf{x})\big) D\mathbf{f}(\mathbf{x}).
$$

We have two interpretations:

* the composition of linear maps,
* the matrix multiplication of the Jacobian.

??? note "*Proof*:"

    Will be added later.