1
0
Fork 0
mathematics-physics-wiki/docs/en/mathematics/linear-algebra/orthogonality.md

16 KiB

Orthogonality

Orthogonal subspaces

Definition 1: two subspaces S and T of an inner product space V are orthogonal if

\langle \mathbf{u}, \mathbf{v} \rangle = 0,

for all \mathbf{u} \in S and \mathbf{v} \in T. Orthogonality of S and T may be denoted by S \perp T.

The notion of orthogonality is only valid in vector spaces with a defined inner product.

Definition 2: let S be a subspace of an inner product space V. The set of all vectors in V that are orthogonal to every vector in S will be denoted by S^\perp. Which implies

S^\perp = {\mathbf{v} \in V ;|; \langle \mathbf{v}, \mathbf{u} \rangle = 0 ; \forall \mathbf{u} \in S }.

The set S^\perp is called the orthogonal complement of S.

For example the subspaces X = \mathrm{span}(\mathbf{e}_1) and Y = \mathrm{span}(\mathbf{e}_2) of \mathbb{R}^3 are orthogonal, but they are not orthogonal complements. Indeed,

X^\perp = \mathrm{span}(\mathbf{e}_2, \mathbf{e}_3) \quad \text{and} \quad Y^\perp = \mathrm{span}(\mathbf{e}_1, \mathbf{e}_3).

We may observe that if S and T are orthogonal subspaces of an inner product space V, then S \cap T = \{\mathbf{0}\}. Since for \mathbf{v} \in S \cap T and S \perp T then \langle \mathbf{v}, \mathbf{v} \rangle = 0 and hence \mathbf{v} = \mathbf{0}.

Additionally, we may also observe that if S is a subspace of an inner product space V, then S^\perp is also a subspace of V. Since for \mathbf{u} \in S^\perp and a \in \mathbb{K} then

\langle a \mathbf{u}, \mathbf{v} \rangle = a \cdot 0 = 0

for all \mathbf{v} \in S, therefore a \mathbf{u} \in S^\perp.

If \mathbf{u}_1, \mathbf{u}_2 \in S^\perp then

\langle \mathbf{u}_1 + \mathbf{u}_2, \mathbf{v} \rangle = \langle \mathbf{u}_1, \mathbf{v} \rangle + \langle \mathbf{u}_2, \mathbf{v} \rangle = 0 + 0 = 0,

for all \mathbf{v} \in S, and hence \mathbf{u}_1 + \mathbf{u}_2 \in S^\perp. Therefore S^\perp is a subspace of V.

Fundamental subspaces

Let V be an Euclidean inner product space V = \mathbb{R}^n with its inner product defined by the scalar product. With this definition of the inner product on V the following theorem may be posed.

Theorem 1: let A be an m \times n matrix, then

N(A) = R(A^T)^\perp,

and

N(A^T) = R(A)^\perp,

for all A \in \mathbb{R}^{m \times n} with R(A) denoting the column space of A and R(A^T) denoting the row space of A.

??? note "Proof:"

Let $A \in \mathbb{R}^{m \times n}$ with $R(A) = \mathrm{span}(\mathbf{\vec{a}}_i^T)$ for $i \in \mathbb{N}[i \leq n]$ denoting the column space of $A$ and $R(A^T) = \mathrm{span}(\mathbf{a}_i)$ for $i \in \mathbb{N}[i \leq m]$ denoting the row space of $A$. 

For the first equation, let $\mathbf{v} \in R(A^T)^\perp$ then $\mathbf{v}^T \mathbf{\vec{a}}_i^T = \mathbf{0}$ which obtains

$$
    \mathbf{0} = \mathbf{v}^T \mathbf{\vec{a}}_i^T = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{\vec{a}}_i \mathbf{v},
$$

so $A \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A)$. Which implies that $R(A^T)^\perp \subseteq N(A)$. Similarly, let $\mathbf{w} \in N(A)$ then $A \mathbf{w} = \mathbf{0}$ which obtains

$$
    \mathbf{0} = \mathbf{\vec{a}}_i \mathbf{v} = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{v}^T \mathbf{\vec{a}}_i^T,
$$

and hence $\mathbf{w} \in R(A^T)^\perp$ which implies that $N(A) \subseteq R(A^T)^\perp$. Therefore $N(A) = R(A^T)^\perp$.

For the second equation, let $\mathbf{v} \in R(A)^\perp$ then $\mathbf{v}^T \mathbf{a}_i = \mathbf{0}$ which obtains

$$
    \mathbf{0} = \mathbf{v}^T \mathbf{a}_i = \big(\mathbf{v}^T \mathbf{a}_i \big)^T = \mathbf{a}_i^T \mathbf{v},
$$

so $A^T \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A^T)$. Which implies that $R(A)^\perp \subseteq N(A^T)$. Similarly, let $\mathbf{w} \in N(A^T)$ then $A^T \mathbf{w} = \mathbf{0}$ which obtains

$$
    \mathbf{0} = \mathbf{a}_i^T \mathbf{w} = \big(\mathbf{a}_i^T \mathbf{w} \big)^T = \mathbf{w}^T \mathbf{a}_i, 
$$

and hence $\mathbf{w} \in R(A)^\perp$ which implies that $N(A^T) \subseteq R(A)^\perp$. Therefore $N(A^T) = R(A)^\perp$. 

Known as the fundamental theorem of linear algebra. Which can be used to prove the following theorem.

Theorem 2: if S is a subspace of the inner product space V = \mathbb{R}^n, then

\dim S + \dim S^\perp = n.

Furthermore, if \{\mathbf{v}_i\}_{i=1}^r is a basis of S and \{\mathbf{v}_i\}_{i=r+1}^n is a basis of S^\perp then \{\mathbf{v}_i\}_{i=1}^n is a basis of V.

??? note "Proof:"

If $S = \{\mathbf{0}\}$, then $S^\perp = V$ and 

$$
    \dim S + \dim S^\perp = 0 + n = n.
$$

If $S \neq \{\mathbf{0}\}$, then let $\{\mathbf{x}_i\}_{i=1}^r$ be a basis of $S$ and define $X \in \mathbb{R}^{r \times m}$ whose $i$th row is $\mathbf{x}_i^T$ for each $i$. Matrix $X$ has rank $r$ and $R(X^T) = S$. Then by theorem 2

$$
    S^\perp = R(X^T)^\perp = N(X),
$$

from the [rank nullity theorem](../vector-spaces/#rank-and-nullity) it follows that

$$
    \dim S^\perp = \dim N(X) = n - r.
$$

and therefore 

$$
    \dim S + \dim S^\perp = r + n - r = n.
$$

Let $\{\mathbf{v}_i\}_{i=1}^r$ be a basis of $S$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ be a basis of $S^\perp$. Suppose that 

$$
    c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r + c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n = \mathbf{0}.
$$

Let $\mathbf{u} = c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r$ and let $\mathbf{w} = c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n$. Then we have

$$
    \mathbf{u} + \mathbf{w} = \mathbf{0},
$$

implies $\mathbf{u} = - \mathbf{w}$ and thus both elements must be in $S \cap S^\perp$. However, $S \cap S^\perp = \{\mathbf{0}\}$, therefore

$$
\begin{align*}
    c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r &= \mathbf{0}, \\
    c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n &= \mathbf{0},
\end{align*}
$$

since $\{\mathbf{v}_i\}_{i=1}^r$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ are linearly independent, we must also have that $\{\mathbf{v}_i\}_{i=1}^n$ are linearly independent and therefore form a basis of $V$.

We may further extend this with the notion of a direct sum.

Definition 3: if U and V are subspaces of a vector space W and each \mathbf{w} \in W can be written uniquely as

\mathbf{w} = \mathbf{u} + \mathbf{v},

with \mathbf{u} \in U and \mathbf{v} \in V then W is a direct sum of U and V denoted by W = U \oplus V.

In the following theorem it will be posed that the direct sum of a subspace and its orthogonal complement make up the whole vector space, which extends the notion of theorem 2.

Theorem 3: if S is a subspace of the inner product space V = \mathbb{R}^n, then

V = S \oplus S^\perp.

??? note "Proof:"

Will be added later.

The following results emerge from these posed theorems.

Proposition 1: let S be a subspace of V, then (S^\perp)^\perp = S.

??? note "Proof:"

Will be added later.

Recall that the system A \mathbf{x} = \mathbf{b} is consistent if and only if \mathbf{b} \in R(A) since R(A) = N(A^T)^\perp we have the following result.

Proposition 2: let A \in \mathbb{R}^{m \times n} and \mathbf{b} \in \mathbb{R}^m, then either there is a vector \mathbf{x} \in \mathbb{R}^n such that

A \mathbf{x} = \mathbf{b},

or there is a vector \mathbf{y} \in \mathbb{R}^m such that

A^T \mathbf{y} = \mathbf{0} ;\land; \mathbf{y}^T \mathbf{b} \neq 0 .

??? note "Proof:"

Will be added later.

Orthonormal sets

In working with an inner product space V, it is generally desirable to have a basis of mutually orthogonal unit vectors.

Definition 4: the set of vectors \{\mathbf{v}_i\}_{i=1}^n in an inner product space V is orthogonal if

\langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0,

whenever i \neq j. Then \{\mathbf{v}_i\}_{i=1}^n is said to be an orthogonal set of vectors.

For example the trivial set \mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3 is an orthogonal set in \mathbb{R}^3.

Theorem 4: if \{\mathbf{v}_i\}_{i=1}^n is an orthogonal set of nonzero vectors in an inner product space V, then \{\mathbf{v}_i\}_{i=1}^n are linearly independent.

??? note "Proof:"

Suppose that $\{\mathbf{v}_i\}_{i=1}^n$ is an orthogonal set of nonzero vectors in an inner product space $V$ and 

$$
    c_1 \mathbf{v}_1 + \dots + c_n \mathbf{v}_n = \mathbf{0},
$$

then

$$
    c_1 \langle \mathbf{v}_j, \mathbf{v}_1 \rangle + \dots + c_n \langle \mathbf{v}_j, \mathbf{v}_n \rangle = 0,
$$

for $j \in \mathbb{N}[j \leq n]$ obtains $c_j \|\mathbf{v}_j\| = 0$ and hence $c_j = 0$ for all $j \in \mathbb{N}[j \leq n]$. 

We may even go further and define a set of vectors that are orthogonal and have a length of 1, a unit vector by definition.

Definition 5: an orthonormal set of vectors is an orthogonal set of unit vectors.

For example the set \{\mathbf{u}_i\}_{i=1}^n will be orthonormal if and only if

\langle \mathbf{u}_i, \mathbf{u}j \rangle = \delta{ij},

where

\delta_{ij} = \begin{cases} 1 &\text{ for } i = j, \ 0 &\text{ for } i \neq j.\end{cases}

Theorem 5: let \{\mathbf{u}_i\}_{i=1}^n be an orthonormal basis of an inner product space V. If

\mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,

then c_i = \langle \mathbf{v}, \mathbf{u}_i \rangle for all i \in \mathbb{N}[i \leq n].

??? note "Proof:"

Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let 

$$
    \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
$$

we have

$$
    \langle \mathbf{v}, \mathbf{u}_i \rangle = \Big\langle \sum_{j=1}^n c_j \mathbf{u}_j, \mathbf{u}_i \Big\rangle = \sum_{j=1}^n c_j \langle \mathbf{u}_j, \mathbf{u}_i \rangle = \sum_{j=1}^n c_j \delta_{ij} = c_i.
$$

Implying that it is much easier to calculate the coordinates of a given vector with respect to an orthonormal basis.

Corollary 1: let \{\mathbf{u}_i\}_{i=1}^n be an orthonormal basis of an inner product space V. If

\mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i,

and

\mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i,

then \langle \mathbf{v}, \mathbf{w} \rangle = \sum_{i=1}^n a_i b_i.

??? note "Proof:"

Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let 

$$
    \mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i,
$$

and 

$$
    \mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i,
$$

by theorem 5 we have 

$$
    \langle \mathbf{v}, \mathbf{w} \rangle = \Big\langle \sum_{i=1}^n a_i \mathbf{u}_i, \mathbf{w} \Big\rangle = \sum_{i=1}^n a_i \langle \mathbf{w}, \mathbf{u}_i \rangle = \sum_{i=1}^n a_i b_i. 
$$

Corollary 2: let \{\mathbf{u}_i\}_{i=1}^n be an orthonormal basis of an inner product space V and

\mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,

then

|\mathbf{v}|^2 = \sum_{i=1}^n c_i^2.

??? note "Proof:"

Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let 

$$
    \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
$$

then by corollary 1 we have

$$
    \|\mathbf{v}\|^2 = \langle \mathbf{v}, \mathbf{v} \rangle = \sum_{i=1}^n c_i \mathbf{u}_i.
$$

Orthogonal matrices

Definition 6: an n \times n matrix Q is an orthogonal matrix if

Q^T Q = I.

Orthogonal matrices have column vectors that form an orthonormal set in V, as may be posed in the following theorem.

Theorem 6: let Q = (\mathbf{q}_1, \dots, \mathbf{q}_n) be an orthogonal matrix, then \{\mathbf{q}_i\}_{i=1}^n is an orthonormal set.

??? note "Proof:"

Let $Q = (\mathbf{q}_1, \dots, \mathbf{q}_n)$ be an orthogonal matrix. Then

$$
    Q^T Q = I,
$$

and hence $\mathbf{q}_i^T \mathbf{q}_j = \delta_{ij}$ such that for an inner product space with a scalar product we have

$$
    \langle \mathbf{q}_i, \mathbf{q}_j \rangle = 0,
$$

for $i \neq j$. 

It follows then that if Q is an orthogonal matrix, then Q is nonsingular and Q^{-1} = Q^T.

In general scalar products are preserved under multiplication by an orthogonal matrix since

\langle Q \mathbf{u}, Q \mathbf{v} \rangle = (Q \mathbf{v})^T Q \mathbf{u} = \mathbf{v}^T Q^T Q \mathbf{u} = \langle \mathbf{u}, \mathbf{v} \rangle.

In particular, if \mathbf{u} = \mathbf{v} then \|Q \mathbf{u}\|^2 = \|\mathbf{u}\|^2 and hence \|Q \mathbf{u}\| = \|\mathbf{u}\|. Multiplication by an orthogonal matrix preserves the lengths of vectors.

Orthogonalization process

Let \{\mathbf{a}_i\}_{i=1}^n be a basis of an inner product space V. We may use the modified method of Gram-Schmidt to determine the orthonormal basis \{\mathbf{q}_i\}_{i=1}^n of V.

Let \mathbf{q}_1 = \frac{1}{\|\mathbf{a}_1\|} \mathbf{a}_1 be the first step.

Then we may induce the following step for i \in \mathrm{range}(2,n):

\begin{align*} \mathbf{w} &= \mathbf{a}_i - \langle \mathbf{a}_i, \mathbf{q}_1 \rangle \mathbf{q}1 - \dots - \langle \mathbf{a}i, \mathbf{q}{i-1} \rangle \mathbf{q}{i-1}, \ \mathbf{q}_i &= \frac{1}{|\mathbf{w}|} \mathbf{w}. \end{align*}

??? note "Proof:"

Will be added later.

Least squares solutions of overdetermined systems

A standard technique in mathematical and statistical modeling is to find a least squares fit to a set of data points. This implies that the sum of squares fo errors between the model and the data points are minimized. A least squares problem can generally be formulated as an overdetermined linear system of equations.

For a system of equations A \mathbf{x} = \mathbf{b} with A \in \mathbb{R}^{m \times n} with m, n \in \mathbb{N}[m>n] and \mathbf{b} \in \mathbb{R}^m then for each \mathbf{x} \in \mathbb{R}^n a residual \mathbf{r}: \mathbb{R}^n \to \mathbb{R}^m can be formed

\mathbf{r}(\mathbf{x}) = \mathbf{b} - A \mathbf{x}.

The distance between \mathbf{b} and A \mathbf{x} is given by

| \mathbf{b} - A \mathbf{x} | = |\mathbf{r}(\mathbf{x})|,

We wish to find a vector \mathbf{x} \in \mathbb{R}^n for which \|\mathbf{r}(\mathbf{x})\| will be a minimum. A solution \mathbf{\hat x} that minimizes \|\mathbf{r}(\mathbf{x})\| is a least squares solution of the system A \mathbf{x} = \mathbf{b}. Do note that minimizing \|\mathbf{r}(\mathbf{x})\| is equivalent to minimizing \|\mathbf{r}(\mathbf{x})\|^2.

Theorem 7: let S be a subspace of \mathbb{R}^m. For each b \in \mathbb{R}^m, there exists a unique \mathbf{p} \in S that suffices

|\mathbf{b} - \mathbf{s}| > |\mathbf{b} - \mathbf{p}|,

for all \mathbf{s} \in S\backslash\{\mathbf{p}\} and \mathbf{b} - \mathbf{p} \in S^\perp.

??? note "Proof:"

Will be added later.

If \mathbf{p} = A \mathbf{\hat x} in R(A) that is closest to \mathbf{b} then it follows that

\mathbf{b} - \mathbf{p} = \mathbf{b} - A \mathbf{x} = \mathbf{r}(\mathbf{\hat x}),

must be an element of R(A)^\perp. Thus, \mathbf{\hat x} is a solution to the least squares problem if and only if

\mathbf{r}(\mathbf{\hat x}) \in R(A)^\perp = N(A^T).

Thus to solve for \mathbf{\hat x} we have the normal equations given by

A^T A \mathbf{x} = A^T \mathbf{b}.

Uniqueness of \mathbf{\hat x} can be obtained if A^T A is nonsingular which will be posed in the following theorem.

Theorem 8: let A \in \mathbb{R}^{m \times n} be an m \times n matrix with rank n, then A^T A is nonsingular.

??? note "Proof:"

Let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix with rank $n$. Let $\mathbf{v}$ be a solution of 

$$
    A^T A \mathbf{x} = \mathbf{0},
$$

then $A \mathbf{v} \in N(A^T)$, but we also have that $A \mathbf{v} \in R(A) = N(A^T)^\perp$. Since $N(A^T) \cap N(A^T)^\perp = \{\mathbf{0}\}$ it follows that 

$$
    A\mathbf{v} = \mathbf{0},
$$

so $\mathbf{v} = \mathbf{0}$ by the nonsingularity of $A$. 

It follows that

\mathbf{\hat x} = (A^T A)^{-1} A^T \mathbf{b},

is the unique solution of the normal equations for A nonsingular and consequently, the unique least squares solution of the system A \mathbf{x} = \mathbf{b}.