port from mathematics-physics notes

2025-08-26 15:48:53 +02:00 · 2025-08-26 15:48:53 +02:00 · c009ea53f0
commit c009ea53f0
parent a4e106ce02
124 changed files with 13224 additions and 0 deletions
--- a/docs/mathematics/linear-algebra/orthogonality.md
+++ b/docs/mathematics/linear-algebra/orthogonality.md
@ -0,0 +1,467 @@
+# Orthogonality
+
+## Orthogonal subspaces
+
+> *Definition 1*: two subspaces $S$ and $T$ of an inner product space $V$ are **orthogonal** if 
+>
+> $$
+>   \langle \mathbf{u}, \mathbf{v} \rangle = 0,
+> $$
+>
+> for all $\mathbf{u} \in S$ and $\mathbf{v} \in T$. Orthogonality of $S$ and $T$ may be denoted by $S \perp T$. 
+
+The notion of orthogonality is only valid in vector spaces with a defined inner product.
+
+> *Definition 2*: let $S$ be a subspace of an inner product space $V$. The set of all vectors in $V$ that are orthogonal to every vector in $S$ will be denoted by $S^\perp$. Which implies
+>
+> $$
+>   S^\perp = \{\mathbf{v} \in V \;|\; \langle \mathbf{v}, \mathbf{u} \rangle = 0 \; \forall \mathbf{u} \in S \}.
+> $$
+>
+> The set $S^\perp$ is called the **orthogonal complement** of $S$. 
+
+For example the subspaces $X = \mathrm{span}(\mathbf{e}_1)$  and $Y = \mathrm{span}(\mathbf{e}_2)$ of $\mathbb{R}^3$ are orthogonal, but they are not orthogonal complements. Indeed,
+
+$$
+    X^\perp = \mathrm{span}(\mathbf{e}_2, \mathbf{e}_3) \quad \text{and} \quad Y^\perp = \mathrm{span}(\mathbf{e}_1, \mathbf{e}_3). 
+$$
+
+We may observe that if $S$ and $T$ are orthogonal subspaces of an inner product space $V$, then $S \cap T = \{\mathbf{0}\}$. Since for $\mathbf{v} \in S \cap T$ and $S \perp T$ then $\langle \mathbf{v}, \mathbf{v} \rangle = 0$ and hence $\mathbf{v} = \mathbf{0}$. 
+
+Additionally, we may also observe that if $S$ is a subspace of an inner product space $V$, then $S^\perp$ is also a subspace of $V$. Since for $\mathbf{u} \in S^\perp$ and $a \in \mathbb{K}$ then 
+
+$$
+    \langle a \mathbf{u}, \mathbf{v} \rangle = a \cdot 0 = 0
+$$
+
+for all $\mathbf{v} \in S$, therefore $a \mathbf{u} \in S^\perp$. 
+
+If $\mathbf{u}_1, \mathbf{u}_2 \in S^\perp$ then 
+
+$$
+    \langle \mathbf{u}_1 + \mathbf{u}_2, \mathbf{v} \rangle = \langle \mathbf{u}_1, \mathbf{v} \rangle + \langle \mathbf{u}_2, \mathbf{v} \rangle = 0 + 0 = 0,
+$$
+
+for all $\mathbf{v} \in S$, and hence $\mathbf{u}_1 + \mathbf{u}_2 \in S^\perp$. Therefore $S^\perp$ is a subspace of $V$.
+
+### Fundamental subspaces
+
+Let $V$ be an Euclidean inner product space $V = \mathbb{R}^n$ with its inner product defined by the [scalar product](../inner-product-spaces/#euclidean-inner-product-spaces). With this definition of the inner product on $V$ the following theorem may be posed.
+
+> *Theorem 1*: let $A$ be an $m \times n$ matrix, then
+>
+> $$
+>   N(A) = R(A^T)^\perp,
+> $$
+>
+> and 
+>
+> $$
+>   N(A^T) = R(A)^\perp,
+> $$
+>
+> for all $A \in \mathbb{R}^{m \times n}$ with $R(A)$ denoting the column space of $A$ and $R(A^T)$ denoting the row space of $A$. 
+
+??? note "*Proof*:"
+
+    Let $A \in \mathbb{R}^{m \times n}$ with $R(A) = \mathrm{span}(\mathbf{\vec{a}}_i^T)$ for $i \in \mathbb{N}[i \leq n]$ denoting the column space of $A$ and $R(A^T) = \mathrm{span}(\mathbf{a}_i)$ for $i \in \mathbb{N}[i \leq m]$ denoting the row space of $A$. 
+
+    For the first equation, let $\mathbf{v} \in R(A^T)^\perp$ then $\mathbf{v}^T \mathbf{\vec{a}}_i^T = \mathbf{0}$ which obtains
+
+    $$
+        \mathbf{0} = \mathbf{v}^T \mathbf{\vec{a}}_i^T = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{\vec{a}}_i \mathbf{v},
+    $$
+
+    so $A \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A)$. Which implies that $R(A^T)^\perp \subseteq N(A)$. Similarly, let $\mathbf{w} \in N(A)$ then $A \mathbf{w} = \mathbf{0}$ which obtains
+
+    $$
+        \mathbf{0} = \mathbf{\vec{a}}_i \mathbf{v} = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{v}^T \mathbf{\vec{a}}_i^T,
+    $$
+
+    and hence $\mathbf{w} \in R(A^T)^\perp$ which implies that $N(A) \subseteq R(A^T)^\perp$. Therefore $N(A) = R(A^T)^\perp$.
+
+    For the second equation, let $\mathbf{v} \in R(A)^\perp$ then $\mathbf{v}^T \mathbf{a}_i = \mathbf{0}$ which obtains
+
+    $$
+        \mathbf{0} = \mathbf{v}^T \mathbf{a}_i = \big(\mathbf{v}^T \mathbf{a}_i \big)^T = \mathbf{a}_i^T \mathbf{v},
+    $$
+
+    so $A^T \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A^T)$. Which implies that $R(A)^\perp \subseteq N(A^T)$. Similarly, let $\mathbf{w} \in N(A^T)$ then $A^T \mathbf{w} = \mathbf{0}$ which obtains
+
+    $$
+        \mathbf{0} = \mathbf{a}_i^T \mathbf{w} = \big(\mathbf{a}_i^T \mathbf{w} \big)^T = \mathbf{w}^T \mathbf{a}_i, 
+    $$
+
+    and hence $\mathbf{w} \in R(A)^\perp$ which implies that $N(A^T) \subseteq R(A)^\perp$. Therefore $N(A^T) = R(A)^\perp$. 
+
+Known as the fundamental theorem of linear algebra. Which can be used to prove the following theorem.
+
+> *Theorem 2*: if $S$ is a subspace of the inner product space $V = \mathbb{R}^n$, then 
+>
+> $$
+>   \dim S + \dim S^\perp = n.
+> $$
+>
+> Furthermore, if $\{\mathbf{v}_i\}_{i=1}^r$ is a basis of $S$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ is a basis of $S^\perp$ then $\{\mathbf{v}_i\}_{i=1}^n$ is a basis of $V$. 
+
+??? note "*Proof*:"
+
+    If $S = \{\mathbf{0}\}$, then $S^\perp = V$ and 
+
+    $$
+        \dim S + \dim S^\perp = 0 + n = n.
+    $$
+
+    If $S \neq \{\mathbf{0}\}$, then let $\{\mathbf{x}_i\}_{i=1}^r$ be a basis of $S$ and define $X \in \mathbb{R}^{r \times m}$ whose $i$th row is $\mathbf{x}_i^T$ for each $i$. Matrix $X$ has rank $r$ and $R(X^T) = S$. Then by theorem 2
+
+    $$
+        S^\perp = R(X^T)^\perp = N(X),
+    $$
+
+    from the [rank nullity theorem](../vector-spaces/#rank-and-nullity) it follows that
+
+    $$
+        \dim S^\perp = \dim N(X) = n - r.
+    $$
+
+    and therefore 
+
+    $$
+        \dim S + \dim S^\perp = r + n - r = n.
+    $$
+
+    Let $\{\mathbf{v}_i\}_{i=1}^r$ be a basis of $S$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ be a basis of $S^\perp$. Suppose that 
+
+    $$
+        c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r + c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n = \mathbf{0}.
+    $$
+
+    Let $\mathbf{u} = c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r$ and let $\mathbf{w} = c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n$. Then we have
+
+    $$
+        \mathbf{u} + \mathbf{w} = \mathbf{0},
+    $$
+
+    implies $\mathbf{u} = - \mathbf{w}$ and thus both elements must be in $S \cap S^\perp$. However, $S \cap S^\perp = \{\mathbf{0}\}$, therefore
+
+    $$
+    \begin{align*}
+        c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r &= \mathbf{0}, \\
+        c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n &= \mathbf{0},
+    \end{align*}
+    $$
+
+    since $\{\mathbf{v}_i\}_{i=1}^r$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ are linearly independent, we must also have that $\{\mathbf{v}_i\}_{i=1}^n$ are linearly independent and therefore form a basis of $V$.
+
+We may further extend this with the notion of a direct sum.
+
+> *Definition 3*: if $U$ and $V$ are subspaces of a vector space $W$ and each $\mathbf{w} \in W$ can be written uniquely as
+>
+> $$
+>   \mathbf{w} = \mathbf{u} + \mathbf{v},
+> $$
+>
+> with $\mathbf{u} \in U$ and $\mathbf{v} \in V$ then $W$ is a **direct sum** of U and $V$ denoted by $W = U \oplus V$. 
+
+In the following theorem it will be posed that the direct sum of a subspace and its orthogonal complement make up the whole vector space, which extends the notion of theorem 2. 
+
+> *Theorem 3*: if $S$ is a subspace of the inner product space $V = \mathbb{R}^n$, then 
+>
+> $$
+>   V = S \oplus S^\perp. 
+> $$
+
+??? note "*Proof*:"
+
+    Will be added later.
+
+The following results emerge from these posed theorems.
+
+> *Proposition 1*: let $S$ be a subspace of $V$, then $(S^\perp)^\perp = S$. 
+
+??? note "*Proof*:"
+
+    Will be added later.
+
+Recall that the system $A \mathbf{x} = \mathbf{b}$ is consistent if and only if $\mathbf{b} \in R(A)$ since $R(A) = N(A^T)^\perp$ we have the following result.
+
+> *Proposition 2*: let $A \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$, then either there is a vector $\mathbf{x} \in \mathbb{R}^n$ such that 
+>
+> $$
+>   A \mathbf{x} = \mathbf{b}, 
+> $$
+>
+> or there is a vector $\mathbf{y} \in \mathbb{R}^m$ such that
+>
+> $$
+>   A^T \mathbf{y} = \mathbf{0} \;\land\; \mathbf{y}^T \mathbf{b} \neq 0 .
+> $$
+
+??? note "*Proof*:"
+
+    Will be added later.
+
+## Orthonormal sets
+
+In working with an inner product space $V$, it is generally desirable to have a basis of mutually orthogonal unit vectors.
+
+> *Definition 4*: the set of vectors $\{\mathbf{v}_i\}_{i=1}^n$ in an inner product space $V$ is **orthogonal** if 
+>
+> $$
+>   \langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0,
+> $$
+>
+> whenever $i \neq j$. Then $\{\mathbf{v}_i\}_{i=1}^n$ is said to be an **orthogonal set** of vectors.
+
+For example the trivial set $\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3$ is an orthogonal set in $\mathbb{R}^3$. 
+
+> *Theorem 4*: if $\{\mathbf{v}_i\}_{i=1}^n$ is an orthogonal set of nonzero vectors in an inner product space $V$, then $\{\mathbf{v}_i\}_{i=1}^n$ are linearly independent.
+
+??? note "*Proof*:"
+
+    Suppose that $\{\mathbf{v}_i\}_{i=1}^n$ is an orthogonal set of nonzero vectors in an inner product space $V$ and 
+
+    $$
+        c_1 \mathbf{v}_1 + \dots + c_n \mathbf{v}_n = \mathbf{0},
+    $$
+
+    then
+
+    $$
+        c_1 \langle \mathbf{v}_j, \mathbf{v}_1 \rangle + \dots + c_n \langle \mathbf{v}_j, \mathbf{v}_n \rangle = 0,
+    $$
+
+    for $j \in \mathbb{N}[j \leq n]$ obtains $c_j \|\mathbf{v}_j\| = 0$ and hence $c_j = 0$ for all $j \in \mathbb{N}[j \leq n]$. 
+
+We may even go further and define a set of vectors that are orthogonal and have a length of $1$, a unit vector by definition. 
+
+> *Definition 5*: an **orthonormal** set of vectors is an orthogonal set of unit vectors.
+
+For example the set $\{\mathbf{u}_i\}_{i=1}^n$ will be orthonormal if and only if
+
+$$
+    \langle \mathbf{u}_i, \mathbf{u}_j \rangle = \delta_{ij},
+$$
+
+where 
+
+$$
+    \delta_{ij} = \begin{cases} 1 &\text{ for } i = j, \\ 0 &\text{ for } i \neq j.\end{cases}
+$$
+
+> *Theorem 5*: let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$. If
+>
+> $$
+>   \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
+> $$
+>
+> then $c_i = \langle \mathbf{v}, \mathbf{u}_i \rangle$ for all $i \in \mathbb{N}[i \leq n]$. 
+
+??? note "*Proof*:"
+
+    Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let 
+
+    $$
+        \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
+    $$
+
+    we have
+
+    $$
+        \langle \mathbf{v}, \mathbf{u}_i \rangle = \Big\langle \sum_{j=1}^n c_j \mathbf{u}_j, \mathbf{u}_i \Big\rangle = \sum_{j=1}^n c_j \langle \mathbf{u}_j, \mathbf{u}_i \rangle = \sum_{j=1}^n c_j \delta_{ij} = c_i.
+    $$
+
+Implying that it is much easier to calculate the coordinates of a given vector with respect to an orthonormal basis.
+
+> *Corollary 1*: let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$. If 
+>
+> $$
+>   \mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i,
+> $$ 
+>
+> and 
+>
+> $$
+>   \mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i,
+> $$
+>
+> then $\langle \mathbf{v}, \mathbf{w} \rangle = \sum_{i=1}^n a_i b_i$. 
+
+??? note "*Proof*:"
+
+    Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let 
+
+    $$
+        \mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i,
+    $$
+
+    and 
+
+    $$
+        \mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i,
+    $$
+
+    by theorem 5 we have 
+
+    $$
+        \langle \mathbf{v}, \mathbf{w} \rangle = \Big\langle \sum_{i=1}^n a_i \mathbf{u}_i, \mathbf{w} \Big\rangle = \sum_{i=1}^n a_i \langle \mathbf{w}, \mathbf{u}_i \rangle = \sum_{i=1}^n a_i b_i. 
+    $$
+
+> *Corollary 2*: let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and
+>
+> $$
+>   \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
+> $$
+>
+> then
+>
+> $$
+>   \|\mathbf{v}\|^2 = \sum_{i=1}^n c_i^2.
+> $$
+
+??? note "*Proof*:"
+
+    Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let 
+
+    $$
+        \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
+    $$
+
+    then by corollary 1 we have
+
+    $$
+        \|\mathbf{v}\|^2 = \langle \mathbf{v}, \mathbf{v} \rangle = \sum_{i=1}^n c_i \mathbf{u}_i.
+    $$
+
+### Orthogonal matrices
+
+> *Definition 6*: an $n \times n$ matrix $Q$ is an **orthogonal matrix** if 
+>
+> $$
+>   Q^T Q = I.
+> $$
+
+Orthogonal matrices have column vectors that form an orthonormal set in $V$, as may be posed in the following theorem.
+
+> *Theorem 6*: let $Q = (\mathbf{q}_1, \dots, \mathbf{q}_n)$ be an orthogonal matrix, then $\{\mathbf{q}_i\}_{i=1}^n$ is an orthonormal set.
+
+??? note "*Proof*:"
+
+    Let $Q = (\mathbf{q}_1, \dots, \mathbf{q}_n)$ be an orthogonal matrix. Then
+
+    $$
+        Q^T Q = I,
+    $$
+
+    and hence $\mathbf{q}_i^T \mathbf{q}_j = \delta_{ij}$ such that for an inner product space with a scalar product we have
+
+    $$
+        \langle \mathbf{q}_i, \mathbf{q}_j \rangle = 0,
+    $$
+
+    for $i \neq j$. 
+
+It follows then that if $Q$ is an orthogonal matrix, then $Q$ is nonsingular and $Q^{-1} = Q^T$. 
+
+In general scalar products are preserved under multiplication by an orthogonal matrix since 
+
+$$
+    \langle Q \mathbf{u}, Q \mathbf{v} \rangle = (Q \mathbf{v})^T Q \mathbf{u} = \mathbf{v}^T Q^T Q \mathbf{u} = \langle \mathbf{u}, \mathbf{v} \rangle. 
+$$
+
+In particular, if $\mathbf{u} = \mathbf{v}$ then $\|Q \mathbf{u}\|^2 = \|\mathbf{u}\|^2$ and hence $\|Q \mathbf{u}\| = \|\mathbf{u}\|$. Multiplication by an orthogonal matrix preserves the lengths of vectors.
+
+## Orthogonalization process
+
+Let $\{\mathbf{a}_i\}_{i=1}^n$ be a basis of an inner product space $V$. We may use the modified method of Gram-Schmidt to determine the orthonormal basis $\{\mathbf{q}_i\}_{i=1}^n$ of $V$.
+
+Let $\mathbf{q}_1 = \frac{1}{\|\mathbf{a}_1\|} \mathbf{a}_1$ be the first step. 
+
+Then we may induce the following step for $i \in \mathrm{range}(2,n)$:
+
+$$
+\begin{align*}
+    \mathbf{w} &= \mathbf{a}_i - \langle \mathbf{a}_i, \mathbf{q}_1 \rangle \mathbf{q}_1 - \dots - \langle \mathbf{a}_i, \mathbf{q}_{i-1} \rangle \mathbf{q}_{i-1}, \\
+    \mathbf{q}_i &= \frac{1}{\|\mathbf{w}\|} \mathbf{w}.
+\end{align*}
+$$
+
+??? note "*Proof*:"
+
+    Will be added later.
+
+## Least squares solutions of overdetermined systems
+
+A standard technique in mathematical and statistical modeling is to find a least squares fit to a set of data points. This implies that the sum of squares fo errors between the model and the data points are minimized. A least squares problem can generally be formulated as an overdetermined linear system of equations. 
+
+For a system of equations $A \mathbf{x} = \mathbf{b}$ with $A \in \mathbb{R}^{m \times n}$ with $m, n \in \mathbb{N}[m>n]$ and $\mathbf{b} \in \mathbb{R}^m$ then for each $\mathbf{x} \in \mathbb{R}^n$ a *residual* $\mathbf{r}: \mathbb{R}^n \to \mathbb{R}^m$ can be formed
+
+$$
+    \mathbf{r}(\mathbf{x}) = \mathbf{b} - A \mathbf{x}.
+$$
+
+The distance between $\mathbf{b}$ and $A \mathbf{x}$ is given by
+
+$$
+    \| \mathbf{b} - A \mathbf{x} \| = \|\mathbf{r}(\mathbf{x})\|,
+$$
+
+We wish to find a vector $\mathbf{x} \in \mathbb{R}^n$ for which $\|\mathbf{r}(\mathbf{x})\|$ will be a minimum. A solution $\mathbf{\hat x}$ that minimizes $\|\mathbf{r}(\mathbf{x})\|$ is a *least squares solution* of the system $A \mathbf{x} = \mathbf{b}$. Do note that minimizing $\|\mathbf{r}(\mathbf{x})\|$ is equivalent to minimizing $\|\mathbf{r}(\mathbf{x})\|^2$.
+
+> *Theorem 7*: let $S$ be a subspace of $\mathbb{R}^m$. For each $b \in \mathbb{R}^m$, there exists a unique $\mathbf{p} \in S$ that suffices
+>
+> $$
+>   \|\mathbf{b} - \mathbf{s}\| > \|\mathbf{b} - \mathbf{p}\|,
+> $$
+>
+> for all $\mathbf{s} \in S\backslash\{\mathbf{p}\}$ and $\mathbf{b} - \mathbf{p} \in S^\perp$. 
+
+??? note "*Proof*:"
+
+    Will be added later.
+
+If $\mathbf{p} = A \mathbf{\hat x}$ in $R(A)$ that is closest to $\mathbf{b}$ then it follows that
+
+$$
+    \mathbf{b} - \mathbf{p} = \mathbf{b} - A \mathbf{x} = \mathbf{r}(\mathbf{\hat x}),
+$$
+
+must be an element of $R(A)^\perp$. Thus, $\mathbf{\hat x}$ is a solution to the least squares problem if and only if 
+
+$$
+    \mathbf{r}(\mathbf{\hat x}) \in R(A)^\perp = N(A^T).
+$$
+
+Thus to solve for $\mathbf{\hat x}$ we have the *normal equations* given by
+
+$$
+    A^T A \mathbf{x} = A^T \mathbf{b}. 
+$$
+
+Uniqueness of $\mathbf{\hat x}$ can be obtained if $A^T A$ is nonsingular which will be posed in the following theorem.
+
+> *Theorem 8*: let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix with rank $n$, then $A^T A$ is nonsingular. 
+
+??? note "*Proof*:"
+
+    Let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix with rank $n$. Let $\mathbf{v}$ be a solution of 
+
+    $$
+        A^T A \mathbf{x} = \mathbf{0},
+    $$
+
+    then $A \mathbf{v} \in N(A^T)$, but we also have that $A \mathbf{v} \in R(A) = N(A^T)^\perp$. Since $N(A^T) \cap N(A^T)^\perp = \{\mathbf{0}\}$ it follows that 
+
+    $$
+        A\mathbf{v} = \mathbf{0},
+    $$
+
+    so $\mathbf{v} = \mathbf{0}$ by the nonsingularity of $A$. 
+
+It follows that 
+
+$$
+    \mathbf{\hat x} = (A^T A)^{-1} A^T \mathbf{b},
+$$
+
+is the unique solution of the normal equations for $A$ nonsingular and consequently, the unique least squares solution of the system $A \mathbf{x} = \mathbf{b}$.