# Differentation Generalization of derivatives to higher dimensions: * limit of difference quotient: partial derivatives, * linearization: total derivative. ## Partial derivatives *Definition*: let $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) and let $f: D \to \mathbb{R}$ and $\mathbf{a} \in D$, if the limit exists the partial derivates of $f$ are $$ \begin{align*} &\partial_1 f(\mathbf{a}) := \lim_{h \to 0} \frac{f(a_1 + h, a_2) - f(\mathbf{a})}{h}, \\ &\partial_2 f(\mathbf{a}) := \lim_{h \to 0} \frac{f(a_1, a_2 + h) - f(\mathbf{a})}{h}. \end{align*} $$ *Theorem*: suppose that two mixed $n$th order partial derivatives of a function $f$ involve the same differentations but in different orders. If those partials are continuous at a point $\mathbf{a}$ and if $f$ and all partials of $f$ of order less than $n$ are continuous in a neighbourhood of $\mathbf{a}$, then the two mixed partials are equal at the point $\mathbf{a}$. We have for $n=2$ $$ \partial_{12} f(P) = \partial_{21} f(P), $$ ??? note "*Proof*:" Will be added later. ## Total derivatives *Definition*: let $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) and let $f: D \to \mathbb{R}$, determining an affine linear approximation of $f$ around $\mathbf{a} \in D$ $$ p(\mathbf{x}) = f(\mathbf{a}) + \big\langle L,\; \mathbf{x} - \mathbf{a} \big\rangle, $$ with $f(\mathbf{x}) = p(\mathbf{x}) + r(\mathbf{x})$ demand $\frac{r(\mathbf{x})}{\|\mathbf{x} - \mathbf{a}\|} \to 0$ when $\mathbf{x} \to \mathbf{a}$. if $L \in \mathbb{R}^2$ exists to satisfy this, then $f$ is called totally differentiable in $\mathbf{a}$. *Theorem*: if $f$ is totally differentiable in $\mathbf{a}$, then $f$ is partially differentiable in $\mathbf{a}$ and the partial derivatives are $$ \partial_1 f(\mathbf{x}) = L_1, \qquad \partial_2 f(\mathbf{x}) = L_2, $$ obtaining $$ p(\mathbf{x}) = f(\mathbf{a}) + \big\langle \nabla f(\mathbf{a}),\; \mathbf{x} - \mathbf{a} \big\rangle. $$ with $\nabla f(\mathbf{a})$ the gradient of $f$. ??? note "*Proof*:" Will be added later. ## Chain rule *Definition*: let $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) and let $f: D \to \mathbb{R}$, also let $g: \mathbb{R} \to \mathbb{R}$ given by $$ g(t) = f\big(\mathbf{x}(t)\big), $$ if $f$ is continuously differentiable, then $g$ is differentiable with $$ g'(t) = \big\langle \nabla f\big(\mathbf{x}(t)\big),\; \mathbf{\dot x}(t) \big\rangle. $$ ## Gradients *Definition*: at any point $\mathbf{x} \in D$ where the first partial derivatives of $f$ exist, we define the gradient vector $\nabla$ by $$ \nabla f(\mathbf{x}) = \begin{pmatrix} \partial_1 f(\mathbf{x}) \\ \partial_2 f(\mathbf{x}) \end{pmatrix}. $$ The direction of the gradient is the direction of steepest increase of $f$ at $\mathbf{x}$.
*Theorem*: gradients are orthogonal to level lines and level surfaces. ??? note "*Proof*:" let $\mathbf{r}(t) = \big(x(t),\; y(t) \big)^T$ be a parameterization of the level curve of $f$ such that $\mathbf{r}(0) = \mathbf{a}$. Then for all $t$ near $0$, $f(\mathbf{r}(t)) = f(\mathbf{a})$. Differentiating this equation with respect to $t$ using the chain rule, we obtain $$ \partial_1 f(\mathbf{x}) \dot x(t) + \partial_2 f(\mathbf{x}) \dot y(t) = 0, $$ at $t=0$, we can rewrite this to $$ \big\langle \nabla f(\mathbf{a}),\; \mathbf{\dot r}(0) \big\rangle = 0, $$ obtaining that $\nabla f$ is orthogonal to $\mathbf{\dot r}$. ## Directional derivatives *Definition*: let $D \subseteq \mathbb{R}^n$ and let $f: D \to \mathbb{R}$ with $\mathbf{v} \in D$ and $\|\mathbf{v}\| = 1$ a unit vector. The directional derivative is then the change of $f$ near a point $\mathbf{a} \in D$ in the direction of $\mathbf{v}$ $$ D_\mathbf{v} f(\mathbf{a}) = \big\langle \mathbf{v},\; \nabla f(\mathbf{a}) \big\rangle. $$ ## The general case *Definition*: let $D \subseteq \mathbb{R}^n$ and let $\mathbf{f}: D \to \mathbb{R}^m$, with $f_i: D \to \mathbb{R}$, with $i = 1, \dotsc, m$ being the components of $\mathbf{f}$. * $\mathbf{f}$ is continuous at $\mathbf{a} \in D$ $\iff$ all $f_i$ continuous at $\mathbf{a}$, * $\mathbf{f}$ is partially/totally differentiable at $\mathbf{a}$ $\iff$ all $f_i$ are partially/totally differentiable at $\mathbf{a}$. The linearization of every component $f_i$ we have $$ f_i(\mathbf{x}) = f_i(\mathbf{a}) + \big\langle \nabla f_i(\mathbf{a}),\; \mathbf{x} - \mathbf{a} \big\rangle + r_i(\mathbf{x}), $$ so in total we have $$ \mathbf{f}(\mathbf{x}) = \mathbf{f}(\mathbf{a}) + D\mathbf{f}(\mathbf{a}) \big(\mathbf{x} - \mathbf{a}\big) + \mathbf{r}(\mathbf{x}), $$ with $D\mathbf{f}(\mathbf{a})$ the Jacobian of $\mathbf{f}$. *Definition*: the Jacobian is given by $\big[D\mathbf{f}(\mathbf{a}) \big]_{i,\;j} = \partial_j f_i(\mathbf{a}).$ ### Chain rule Let $D \subseteq \mathbb{R}^n$ and let $E \subseteq \mathbb{R}^m$ be sets and let $\mathbf{f}: D \to \mathbb{R}^m$ and let $\mathbf{g}: E \to \mathbb{R}^k$ with $\mathbf{f}$ differentiable at $\mathbf{x}$ and $\mathbf{g}$ differentiable at $\mathbf{f}(\mathbf{x})$. Then $D\mathbf{f}(\mathbf{x}) \in \mathbb{R}^{m \times n}$ and $D\mathbf{g}\big(\mathbf{f}(\mathbf{x})\big) \in \mathbb{R}^{k \times m}$. Then if we differentiate $\mathbf{g} \circ \mathbf{f}$ we obtain $$ D(\mathbf{g} \circ \mathbf{f})(\mathbf{x}) = D\mathbf{g}\big(\mathbf{f}(\mathbf{x})\big) D\mathbf{f}(\mathbf{x}). $$ We have two interpretations: * the composition of linear maps, * the matrix multiplication of the Jacobian. ??? note "*Proof*:" Will be added later.