> *Definition 1*: two subspaces $S$ and $T$ of an inner product space $V$ are **orthogonal** if
>
> $$
> \langle \mathbf{u}, \mathbf{v} \rangle = 0,
> $$
>
> for all $\mathbf{u} \in S$ and $\mathbf{v} \in T$. Orthogonality of $S$ and $T$ may be denoted by $S \perp T$.
The notion of orthogonality is only valid in vector spaces with a defined inner product.
> *Definition 2*: let $S$ be a subspace of an inner product space $V$. The set of all vectors in $V$ that are orthogonal to every vector in $S$ will be denoted by $S^\perp$. Which implies
>
> $$
> S^\perp = \{\mathbf{v} \in V \;|\; \langle \mathbf{v}, \mathbf{u} \rangle = 0 \; \forall \mathbf{u} \in S \}.
> $$
>
> The set $S^\perp$ is called the **orthogonal complement** of $S$.
For example the subspaces $X = \mathrm{span}(\mathbf{e}_1)$ and $Y = \mathrm{span}(\mathbf{e}_2)$ of $\mathbb{R}^3$ are orthogonal, but they are not orthogonal complements. Indeed,
We may observe that if $S$ and $T$ are orthogonal subspaces of an inner product space $V$, then $S \cap T = \{\mathbf{0}\}$. Since for $\mathbf{v} \in S \cap T$ and $S \perp T$ then $\langle \mathbf{v}, \mathbf{v} \rangle = 0$ and hence $\mathbf{v} = \mathbf{0}$.
Additionally, we may also observe that if $S$ is a subspace of an inner product space $V$, then $S^\perp$ is also a subspace of $V$. Since for $\mathbf{u} \in S^\perp$ and $a \in \mathbb{K}$ then
$$
\langle a \mathbf{u}, \mathbf{v} \rangle = a \cdot 0 = 0
$$
for all $\mathbf{v} \in S$, therefore $a \mathbf{u} \in S^\perp$.
Let $V$ be an Euclidean inner product space $V = \mathbb{R}^n$ with its inner product defined by the [scalar product](../inner-product-spaces/#euclidean-inner-product-spaces). With this definition of the inner product on $V$ the following theorem may be posed.
> *Theorem 1*: let $A$ be an $m \times n$ matrix, then
>
> $$
> N(A) = R(A^T)^\perp,
> $$
>
> and
>
> $$
> N(A^T) = R(A)^\perp,
> $$
>
> for all $A \in \mathbb{R}^{m \times n}$ with $R(A)$ denoting the column space of $A$ and $R(A^T)$ denoting the row space of $A$.
??? note "*Proof*:"
Let $A \in \mathbb{R}^{m \times n}$ with $R(A) = \mathrm{span}(\mathbf{\vec{a}}_i^T)$ for $i \in \mathbb{N}[i \leq n]$ denoting the column space of $A$ and $R(A^T) = \mathrm{span}(\mathbf{a}_i)$ for $i \in \mathbb{N}[i \leq m]$ denoting the row space of $A$.
For the first equation, let $\mathbf{v} \in R(A^T)^\perp$ then $\mathbf{v}^T \mathbf{\vec{a}}_i^T = \mathbf{0}$ which obtains
so $A \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A)$. Which implies that $R(A^T)^\perp \subseteq N(A)$. Similarly, let $\mathbf{w} \in N(A)$ then $A \mathbf{w} = \mathbf{0}$ which obtains
so $A^T \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A^T)$. Which implies that $R(A)^\perp \subseteq N(A^T)$. Similarly, let $\mathbf{w} \in N(A^T)$ then $A^T \mathbf{w} = \mathbf{0}$ which obtains
and hence $\mathbf{w} \in R(A)^\perp$ which implies that $N(A^T) \subseteq R(A)^\perp$. Therefore $N(A^T) = R(A)^\perp$.
Known as the fundamental theorem of linear algebra. Which can be used to prove the following theorem.
> *Theorem 2*: if $S$ is a subspace of the inner product space $V = \mathbb{R}^n$, then
>
> $$
> \dim S + \dim S^\perp = n.
> $$
>
> Furthermore, if $\{\mathbf{v}_i\}_{i=1}^r$ is a basis of $S$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ is a basis of $S^\perp$ then $\{\mathbf{v}_i\}_{i=1}^n$ is a basis of $V$.
??? note "*Proof*:"
If $S = \{\mathbf{0}\}$, then $S^\perp = V$ and
$$
\dim S + \dim S^\perp = 0 + n = n.
$$
If $S \neq \{\mathbf{0}\}$, then let $\{\mathbf{x}_i\}_{i=1}^r$ be a basis of $S$ and define $X \in \mathbb{R}^{r \times m}$ whose $i$th row is $\mathbf{x}_i^T$ for each $i$. Matrix $X$ has rank $r$ and $R(X^T) = S$. Then by theorem 2
Let $\mathbf{u} = c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r$ and let $\mathbf{w} = c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n$. Then we have
$$
\mathbf{u} + \mathbf{w} = \mathbf{0},
$$
implies $\mathbf{u} = - \mathbf{w}$ and thus both elements must be in $S \cap S^\perp$. However, $S \cap S^\perp = \{\mathbf{0}\}$, therefore
since $\{\mathbf{v}_i\}_{i=1}^r$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ are linearly independent, we must also have that $\{\mathbf{v}_i\}_{i=1}^n$ are linearly independent and therefore form a basis of $V$.
We may further extend this with the notion of a direct sum.
> *Definition 3*: if $U$ and $V$ are subspaces of a vector space $W$ and each $\mathbf{w} \in W$ can be written uniquely as
>
> $$
> \mathbf{w} = \mathbf{u} + \mathbf{v},
> $$
>
> with $\mathbf{u} \in U$ and $\mathbf{v} \in V$ then $W$ is a **direct sum** of U and $V$ denoted by $W = U \oplus V$.
In the following theorem it will be posed that the direct sum of a subspace and its orthogonal complement make up the whole vector space, which extends the notion of theorem 2.
> *Theorem 3*: if $S$ is a subspace of the inner product space $V = \mathbb{R}^n$, then
>
> $$
> V = S \oplus S^\perp.
> $$
??? note "*Proof*:"
Will be added later.
The following results emerge from these posed theorems.
Recall that the system $A \mathbf{x} = \mathbf{b}$ is consistent if and only if $\mathbf{b} \in R(A)$ since $R(A) = N(A^T)^\perp$ we have the following result.
> *Proposition 2*: let $A \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$, then either there is a vector $\mathbf{x} \in \mathbb{R}^n$ such that
>
> $$
> A \mathbf{x} = \mathbf{b},
> $$
>
> or there is a vector $\mathbf{y} \in \mathbb{R}^m$ such that
> *Theorem 4*: if $\{\mathbf{v}_i\}_{i=1}^n$ is an orthogonal set of nonzero vectors in an inner product space $V$, then $\{\mathbf{v}_i\}_{i=1}^n$ are linearly independent.
In particular, if $\mathbf{u} = \mathbf{v}$ then $\|Q \mathbf{u}\|^2 = \|\mathbf{u}\|^2$ and hence $\|Q \mathbf{u}\| = \|\mathbf{u}\|$. Multiplication by an orthogonal matrix preserves the lengths of vectors.
## Orthogonalization process
Let $\{\mathbf{a}_i\}_{i=1}^n$ be a basis of an inner product space $V$. We may use the modified method of Gram-Schmidt to determine the orthonormal basis $\{\mathbf{q}_i\}_{i=1}^n$ of $V$.
Let $\mathbf{q}_1 = \frac{1}{\|\mathbf{a}_1\|} \mathbf{a}_1$ be the first step.
Then we may induce the following step for $i \in \mathrm{range}(2,n)$:
A standard technique in mathematical and statistical modeling is to find a least squares fit to a set of data points. This implies that the sum of squares fo errors between the model and the data points are minimized. A least squares problem can generally be formulated as an overdetermined linear system of equations.
For a system of equations $A \mathbf{x} = \mathbf{b}$ with $A \in \mathbb{R}^{m \times n}$ with $m, n \in \mathbb{N}[m>n]$ and $\mathbf{b} \in \mathbb{R}^m$ then for each $\mathbf{x} \in \mathbb{R}^n$ a *residual* $\mathbf{r}: \mathbb{R}^n \to \mathbb{R}^m$ can be formed
$$
\mathbf{r}(\mathbf{x}) = \mathbf{b} - A \mathbf{x}.
$$
The distance between $\mathbf{b}$ and $A \mathbf{x}$ is given by
$$
\| \mathbf{b} - A \mathbf{x} \| = \|\mathbf{r}(\mathbf{x})\|,
$$
We wish to find a vector $\mathbf{x} \in \mathbb{R}^n$ for which $\|\mathbf{r}(\mathbf{x})\|$ will be a minimum. A solution $\mathbf{\hat x}$ that minimizes $\|\mathbf{r}(\mathbf{x})\|$ is a *least squares solution* of the system $A \mathbf{x} = \mathbf{b}$. Do note that minimizing $\|\mathbf{r}(\mathbf{x})\|$ is equivalent to minimizing $\|\mathbf{r}(\mathbf{x})\|^2$.
Let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix with rank $n$. Let $\mathbf{v}$ be a solution of
$$
A^T A \mathbf{x} = \mathbf{0},
$$
then $A \mathbf{v} \in N(A^T)$, but we also have that $A \mathbf{v} \in R(A) = N(A^T)^\perp$. Since $N(A^T) \cap N(A^T)^\perp = \{\mathbf{0}\}$ it follows that
$$
A\mathbf{v} = \mathbf{0},
$$
so $\mathbf{v} = \mathbf{0}$ by the nonsingularity of $A$.
It follows that
$$
\mathbf{\hat x} = (A^T A)^{-1} A^T \mathbf{b},
$$
is the unique solution of the normal equations for $A$ nonsingular and consequently, the unique least squares solution of the system $A \mathbf{x} = \mathbf{b}$.