port from mathematics-physics notes

This commit is contained in:
Luc Bijl 2025-08-26 15:48:53 +02:00
parent a4e106ce02
commit c009ea53f0
124 changed files with 13224 additions and 0 deletions

View file

@ -0,0 +1,303 @@
# Determinants
## Definition
With each $n \times n$ matrix $A$ with $n \in \mathbb{N}$ it is possible to associate a scalar, the determinant of $A$ denoted by $\det (A)$ or $|A|$.
> *Definition*: let $A = (a_{ij})$ be an $n \times n$ matrix and let $M_{ij}$ denote the $(n-1) \times (n-1)$ matrix obtained from $A$ by deleting the row and column containing $a_{ij}$ with $n \in \mathbb{N}$ and $(i,j) \in \{1, \dots, n\} \times \{1, \dots, n\}$. The determinant of $M_{ij}$ is called the **minor** of $a_{ij}$. We define the **cofactor** of $a_{ij}$ by
>
> $$
> A_{ij} = (-1)^{i+j} \det(M_{ij}).
> $$
This definition is necessary to formulate a definition for the determinant, as may be observed below.
> *Definition*: the **determinant** of an $n \times n$ matrix $A$ with $n \in \mathbb{N}$, denoted by $\det (A)$ or $|A|$ is a scalar associated with the matrix $A$ that is defined inductively as
>
> $$
> \det (A) = \begin{cases}a_{11} &\text{ if } n = 1 \\ a_{11} A_{11} + a_{12} A_{12} + \dots + a_{1n} A_{1n} &\text{ if } n > 1\end{cases}
> $$
>
> where
>
> $$
> A_{1j} = (-1)^{1+j} \det (M_{1j})
> $$
>
> with $j \in \{1, \dots, n\}$ are the cofactors associated with the entries in the first row of $A$.
<br>
> *Theorem*: if $A$ is an $n \times n$ matrix with $n \in \mathbb{N} \backslash \{1\}$ then $\det(A)$ cam be expressed as a cofactor expansion using any row or column of $A$.
??? note "*Proof*:"
Will be added later.
We then have for a $n \times n$ matrix $A$ with $n \in \mathbb{N} \backslash \{1\}$
$$
\begin{align*}
\det(A) &= a_{i1} A_{i1} + a_{i2} A_{i2} + \dots + a_{in} A_{in}, \\
&= a_{1j} A_{1j} + a_{2j} A_{2j} + \dots + a_{nj} A_{nj},
\end{align*}
$$
with $i,j \in \mathbb{N}$.
For example, the determinant of a $4 \times 4$ matrix $A$ given by
$$
A = \begin{pmatrix} 0 & 2 & 3 & 0\\ 0 & 4 & 5 & 0\\ 0 & 1 & 0 & 3\\ 2 & 0 & 1 & 3\end{pmatrix}
$$
may be determined using the definition and the theorem above
$$
\det(A) = 2 \cdot (-1)^5 \det\begin{pmatrix} 2 & 3 & 0\\ 4 & 5 & 0\\ 1 & 0 & 3\end{pmatrix} = -2 \cdot 3 \cdot (-1)^6 \det\begin{pmatrix} 2 & 3 \\ 4 & 5\end{pmatrix} = 12.
$$
## Properties of determinants
> *Theorem*: if $A$ is an $n \times n$ matrix then $\det (A^T) = \det (A)$.
??? note "*Proof*:"
It may be observed that the result holds for $n=1$. Assume that the results holds for all $k \times k$ matrices and that $A$ is a $(k+1) \times (k+1)$ matrix for some $k \in \mathbb{N}$. Expanding $\det (A)$ along the first row of $A$ obtains
$$
\det(A) = a_{11} \det(M_{11}) - a_{12} \det(M_{12}) + \dots + (-1)^{k+2} a_{1(k+1)} \det(M_{1(k+1)}),
$$
since the minors are all $k \times k$ matrices it follows from the principle of natural induction that
$$
\det(A) = a_{11} \det(M_{11}^T) - a_{12} \det(M_{12}^T) + \dots + (-1)^{k+2} a_{1(k+1)} \det(M_{1(k+1)}^T).
$$
The right hand side of the above equation is the expansion by minors of $\det(A^T)$ using the first column of $A^T$, therefore $\det(A^T) = \det(A)$.
> *Theorem*: if $A$ is an $n \times n$ triangular matrix with $n \in \mathbb{N}$, then the determinant of $A$ equals the product of the diagonal elements of $A$.
??? note "*Proof*:"
Let $A$ be a $n \times n$ triagular matrix with $n \in \mathbb{N}$ given by
$$
A = \begin{pmatrix} a_{11} & \cdots &a_{1n}\\ & \ddots & \vdots \\ & & a_{nn} \end{pmatrix}.
$$
We claim that $\det(A) = a_{11} \cdot a_{22} \cdots a_{nn}$. We first check the claim for $n=1$ which is given by $\det(A) = a_{11}$.
Now suppose for some $k \in \mathbb{N}$, the determinant of a $k \times k$ triangular $A_{k}$ is given by
$$
\det(A_k) = a_1{11} \cdot a_{22} \cdots a_{kk}
$$
then by assumption
$$
\det(A_{k+1}) = \begin{pmatrix} A_k & a_{(k+1)1}\\& \vdots\\ 0 \cdots 0 & a_{(k+1)(k+1)}\end{pmatrix} = a_{(k+1)(k+1)} \det(A_k) + 0 = a_{11}a_1{11} \cdot a_{22} \cdots a_{kk} \cdot a_{(k+1)(k+1)}.
$$
Hence if the claim holds for some $k \in \mathbb{N}$ then it also holds for $k+1$. The principle of natural induction implies now that for all $n \in \mathbb{N}$ we have
$$
\det(A) = a_{11} \cdot a_{22} \cdots a_{nn}.
$$
> *Theorem*: let $A$ be an $n \times n$ matrix
>
> 1. if $A$ has a row or column consisting entirely of zeros, then $\det(A) = 0$.
> 2. if $A$ has two identical rows or two identical columns, then $\det(A) = 0$.
??? note "*Proof*:"
Will be added later.
> *Lemma*: let $A$ be an $n \times n$ matrix with $n \in \mathbb{N}$. If $A_{jk}$ denotes the cofactor of $a_{jk}$ for $k \in \mathbb{N}$ then
>
> $$
> a_{i1} A_{j1} + a_{i2} A_{j2} + \dots + a_{in} A_{jn} = \begin{cases} \det(A) &\text{ if } i = j,\\ 0 &\text{ if } i \neq j.\end{cases}
> $$
??? note "*Proof*:"
If $i = j$ then we obtain the cofactor expansion of $\det(A)$ along the $i$th row of $A$.
If $i \neq j$, let $A^*$ be the matrix obtained by replacing the $j$th row of $A$ by the $i$th row of $A$
$$
A^* = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ \vdots \\ a_{11} & a_{12} & \cdots & a_{1n} \\ \vdots \\ a_{11} & a_{12} & \cdots & a_{1n} \\ \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix} \begin{array}{ll} j\text{th row}\\ \\ \\ \\\end{array}
$$
since two rows of $A^*$ are the same its determinant must be zero. It follows from the cofactor expansion of $\det(A^*)$ along the $j$th row that
$$
\begin{align*}
0 &= \det(A^*) = a_{i1} A_{j1}^* + a_{i2} A_{j2}^* + \dots + a_{in} A_{jn}^*, \\
&= a_{i1} A_{j1} + a_{i2} A_{j2} + \dots + a_{in} A_{jn}.
\end{align*}
$$
> *Theorem*: let $E$ be an $n \times n$ elementary matrix and $A$ an $n \times n$ matrix with $n \in \mathbb{N}$ then we have
>
> $$
> \det(E A) = \det(E) \det(A),
> $$
>
> where
>
> $$
> \det(E) = \begin{cases} -1 &\text{ if $E$ is of type I},\\ \alpha \in \mathbb{R}\backslash \{0\} &\text{ if $E$ is of type II},\\ 1 &\text{ if $E$ is of type III}. \end{cases}
> $$
??? note "*Proof*:"
Will be added later.
Similar results hold for column operations, since for the elementary matrix $E$, $E^T$ is also an elementary matrix and $\det(A E) = \det((AE)^T) = \det(E^T A^T) = \det(E^T) \det(A^T) = \det(E) \det(A)$.
> *Theorem*: an $n \times n$ matrix A with $n \in \mathbb{N}$ is singular if and only if
>
> $$
> \det(A) = 0
> $$
??? note "*Proof*:"
Let $A$ be an $n \times n$ matrixwith $n \in \mathbb{N}$. Matrix $A$ can be reduced to row echelon form with a finite number of row operations obtaining
$$
U = E_k E_{k-1} \cdots E_1 A,
$$
where $U$ is in $n \times n$ row echelon form and $E_i$ are $n \times n$ elementary matrices for $i \in \{1, \dots, k\}$. It follows then that
$$
\begin{align*}
\det(U) &= \det(E_k E_{k-1} \cdots E_1 A), \\
&= \det(E_k) \det(E_{k-1}) \cdots \det(E_1) \det(A).
\end{align*}
$$
Since the determinants of the elementary matrices are all nonzero, it follows that $\det(A) = 0$ if and only if $\det(U) = 0$. If $A$ is singular then $U$ has a row consisting entirely of zeros and hence $\det(U) = 0$. If $A$ is nonsingular then $U$ is triangular with 1's along the diagonal and hence $\det(U) = 1$.
From this theorem we may pose a method for computing $\det(A)$ by taking
$$
\det(A) = \Big(\det(E_k) \det(E_{k-1} \cdots \det(E_1)\Big)^{-1}.
$$
> *Theorem*: let $A$ and $B$ be $n \times n$ matrices with $n \in \mathbb{N}$ then
>
> $$
> \det(AB) = \det(A) \det(B)
> $$
??? note "*Proof*:"
If $n \times n$ matrix $B$ is singular with $n \in \mathbb{N}$ then it follows that $AB$ is also singular and therefore
$$
\det(AB) = 0 = \det(A) \det(B),
$$
If $B$ is nonsingular, $B$ can be written as a product of elementary matrices. Therefore
$$
\begin{align*}
\det(AB) &= \det(A E_k \cdots E_1)
&= \det(A)\det(E_k)\cdots\det(E_1)
&- \det(A)\det(E_K \cdots E_1)
&= \det(A)\det(B).
\end{align*}
$$
> *Theorem*: let $A$ be a nonsingular $n \times n$ matrix with $n \in \mathbb{N}$,then we have
>
> $$
> \det(A^{-1}) = \frac{1}{\det(A)}.
> $$
??? note "*Proof*:"
Suppose $A$ is a nonsingular $n \times n$ matrix then
$$
A^{-1} A = I,
$$
and taking the determinant on both sides
$$
\det(A^{-1}A) = \det(A^{-1})\det(A) = \det(I) = 1,
$$
therefore
$$
\det(A^{-1}) = \frac{1}{\det(A)}.
$$
## The adjoint of a matrix
> *Definition*: let $A$ be an $n \times n$ matrix with $n \in \mathbb{N}$, the adjoint of $A$ is given by
>
> $$
> \mathrm{adj}(A) = \begin{pmatrix} A_{11} & A_{21} & \dots & A_{n1} \\ A_{12} & A_{22} & \dots & A_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ A_{1n} & A_{2n} & \dots & A_{nn}\end{pmatrix}
> $$
>
> with $A_{ij}$ for $(i,j) \in \{1, \dots, n\} \times \{1, \dots, n\}$ the cofactors of $A$.
The use of the adjoint becomes in the following theorem, that generally saves a lot of time and brain capacity.
> *Theorem*: let $A$ be a nonsingular $n \times n$ matrix with $n \in \mathbb{N}$ then we have
>
> $$
> A^{-1} = \frac{1}{\det(A)} \text{ adj}(A).
> $$
??? note "*Proof*:"
Suppose $A$ is a nonsingular $n \times n$ matrix with $n \in \mathbb{N}$, from the definition and the lemma above it follows that
$$
\text{adj}(A) A= \det(A) I,
$$
this may be rewritten into
$$
A^{-1} = \frac{1}{\det(A)} \text{ adj}(A).
$$
## Cramer's rule
> *Theorem*: let $A$ be an $n \times n$ nonsingular matrix with $n \in \mathbb{N}$ and let $\mathbf{b} \in \mathbb{R}^n$. Let $A_i$ be the matrix obtained by replacing the $i$th column of $A$ by $\mathbf{b}$. If $\mathbf{x}$ is the unique solution of $A\mathbf{x} = \mathbf{b}$ then
>
> $$
> x_i = \frac{\det(A_i)}{\det(A)}
> $$
>
> for $i \in \{1, \dots, n\}$.
??? note "*Proof*:"
Let $A$ be an $n \times n$ nonsingular matrix with $n \in \mathbb{N}$ and let $\mathbf{b} \in \mathbb{R}^n$. If $\mathbf{x}$ is the unique solution of $A\mathbf{x} = \mathbf{b}$ then we have
$$
\mathbf{x} = A^{-1} \mathbf{b} = \frac{1}{\det(A)} \text{ adj}(A) \mathbf{b}
$$
it follows that
$$
\begin{align*}
x_i &= \frac{b_1 A_1i + \dots + b_n A_{ni}}{\det(A)} \\
&= \frac{\det(A_i)}{\det(A)}
\end{align*}
$$
for $i \in \{1, \dots, n\}$.

View file

@ -0,0 +1,57 @@
# Dual vector spaces
We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n.$ In the following sections we make use of the Einstein summation convention introduced in [vector analysis](/en/physics/mathematical-physics/vector-analysis/curvilinear-coordinates/) and $\mathbb{K} = \mathbb{R} \lor\mathbb{K} = \mathbb{C}$.
> *Definition 1*: let $\mathbf{\hat f}: V \to \mathbb{K}$ be a **covector** or **linear functional** on $V$ if for all $\mathbf{v}_{1,2} \in V$ and $\lambda, \mu \in \mathbb{K}$ we have
>
> $$
> \mathbf{\hat f}(\lambda \mathbf{v}_1 + \mu \mathbf{v}_2) = \lambda \mathbf{\hat f}(\mathbf{v}_1) + \mu \mathbf{\hat f}(\mathbf{v}_2).
> $$
Throughout this section covectors will be denoted by hats to increase clarity.
> *Definition 2*: let the the dual space $V^* \overset{\text{def}} = \mathscr{L}(V, \mathbb{K})$ denote the vector space of covectors on the vector space $V$.
Each basis $\{\mathbf{e}_i\}$ of $V$ therefore induces a basis $\{\mathbf{\hat e}^i\}$ of $V^*$ by
$$
\mathbf{\hat e}^i(\mathbf{v}) = v^i,
$$
for all $\mathbf{v} = v^i \mathbf{e}_i \in V$.
> *Theorem 1*: the dual basis $\{\mathbf{\hat e}^i\}$ of $V^*$ is uniquely determined by
>
> $$
> \mathbf{\hat e}^i(\mathbf{e}_j) = \delta_j^i,
> $$
>
> for each basis $\{\mathbf{e}_i\}$ of $V$.
??? note "*Proof*:"
Let $\mathbf{\hat f} = f_i \mathbf{\hat e}^i \in V^*$ and let $\mathbf{v} = v^i \mathbf{e}_i \in V$, then we have
$$
\mathbf{\hat f}(\mathbf{v}) = \mathbf{\hat f}(v^i \mathbf{e}_i) = \mathbf{\hat f}(\mathbf{e}_i) v^i = \mathbf{\hat f}(\mathbf{e}_i) \mathbf{\hat e}^i(\mathbf{v}) = f_i \mathbf{\hat e}^i (\mathbf{v}),
$$
therefore $\{\mathbf{\hat e}^i\}$ spans $V^*$.
Suppose $\mathbf{\hat e}^i(\mathbf{e}_j) = \delta_j^i$ and $\lambda_i \mathbf{\hat e}^i = \mathbf{0} \in V^*$, then
$$
\lambda_i = \lambda_j \delta_i^j = \lambda_j \mathbf{\hat e}^j(\mathbf{e}_i) = (\lambda_j \mathbf{\hat e}^j)(\mathbf{e}_i) = \mathbf{0},
$$
for all $i \in \mathbb{N}[i \leq n]$. Showing that $\{\mathbf{\hat e}^i\}$ is a linearly independent set.
Obtaining a vector and consequent covector space having the same dimension $n$.
From theorem 1 it follows that for each covector basis $\{\mathbf{\hat e}^i\}$ of $V^*$ and each $\mathbf{\hat f} \in V^*$ there exists a unique collection of numbers $\{f_i\}$ such that $\mathbf{\hat f} = f_i \mathbf{\hat e}^i$.
> *Theorem 2*: the dual of the covector space $(V^*)^* \overset{\text{def}} = V^{**}$ is isomorphic to $V$.
??? note "*Proof*:"
Will be added later.

View file

@ -0,0 +1,267 @@
# Eigenspaces
## Eigenvalues and eigenvectors
If a linear transformation is represented by an $n \times n$ matrix $A$ and there exists a nonzero vector $\mathbf{x} \in V$ such that $A \mathbf{x} = \lambda \mathbf{x}$ for some $\lambda \in \mathbb{K}$, then for this transformation $\mathbf{x}$ is a natural choice to use as a basis vector for $V$.
> *Definition 1*: let $A$ be a $n \times n$ matrix, a scalar $\lambda \in \mathbb{K}$ is defined as an **eigenvalue** of $A$ if and only if there exists a vector $\mathbf{x} \in V \backslash \{\mathbf{0}\}$ such that
>
> $$
> A \mathbf{x} = \lambda \mathbf{x},
> $$
>
> with $\mathbf{x}$ defined as an **eigenvector** belonging to $\lambda$.
This notion can be further generalized to a linear operator $L: V \to V$ such that
$$
L(\mathbf{x}) = \lambda \mathbf{x},
$$
note that $L(\mathbf{x}) = A \mathbf{x}$, which implies the similarity.
Furthermore it follows from the definition that any linear combination of eigenvectors is also a eigenvector of $A$.
> *Theorem 1*: let $A$ be a $n \times n$ matrix, a scalar $\lambda \in \mathbb{K}$ is an eigenvalue of $A$ if and only if
>
> $$
> \det (A - \lambda I) = 0.
> $$
??? note "*Proof*:"
A scalar $\lambda \in \mathbb{K}$ is an eigenvalue of $A$ if and only if there exists a vector $\mathbf{x} \in V \backslash \{\mathbf{0}\}$ such that
$$
A \mathbf{x} = \lambda \mathbf{x},
$$
obtains
$$
A \mathbf{x} - \lambda \mathbf{x} = (A - \lambda I) \mathbf{x} = \mathbf{0},
$$
which implies that $(A - \lambda I)$ is singular and $\det(A - \lambda I) = 0$ by [definition](../determinants/#properties-of-determinants).
The eigenvalues $\lambda$ may thus be determined from the **characteristic polynomial** of degree $n$ that is obtained from $\det (A - \lambda I) = 0$. In particular, the eigenvalues are the roots of this polynomial.
> *Theorem 2*: let $A$ be a $n \times n$ matrix and let $\lambda \in \mathbb{K}$ be an eigenvalue of $A$. A vector $\mathbf{x} \in V$ is an eigenvector of $A$ corresponding to $\lambda$ if and only if
>
> $$
> \mathbf{x} \in N(A - \lambda I) \backslash \{\mathbf{0}\}.
> $$
??? note "*Proof*:"
Let $A$ be a $n \times n$ matrix, $\mathbf{x} \in V$ is an eigenvector of $A$ if and only if
$$
A \mathbf{x} = \lambda \mathbf{x},
$$
for an eigenvalue $\lambda \in \mathbb{K}$. Therefore
$$
A \mathbf{x} - \lambda \mathbf{x} = (A - \lambda I) \mathbf{x} = \mathbf{0},
$$
which implies that $\mathbf{x} \in N(A - \lambda I)$.
Which implies that the eigenvectors can be obtained by determining the corresponding null space of $A - \lambda I$.
> *Definition 2*: let $L: V \to V$ be a linear operator and let $\lambda \in \mathbb{K}$ be an eigenvalue of $L$. Let the **eigenspace** $E_\lambda$ of the corresponding eigenvalue $\lambda$ be defined as
>
> $$
> E_\lambda = \{\mathbf{x} \in V \;|\; L(\mathbf{x}) = \lambda \mathbf{x}\} = N(A - \lambda I),
> $$
>
> with $L(\mathbf{x}) = A \mathbf{x}$.
It may be observed that $E_\lambda$ is a subspace of $V$ consisting of the zero vector and the eigenvectors of $L$ or $A.$
### Properties
> *Theorem 3*: if $\lambda_1, \dots, \lambda_n \in \mathbb{K}$ are distinct eigenvalues of an $n \times n$ matrix $A$ with corresponding eigenvectors $\mathbf{x}_1, \dots \mathbf{x}_k \in V\backslash \{\mathbf{0}\}$, then $\mathbf{x}_1, \dots \mathbf{x}_k$ are linearly independent.
??? note "*Proof*:"
Will be added later.
If $A \in \mathbb{R}^{n \times n}$ and $A \mathbf{x} = \lambda \mathbf{x}$ for some $\mathbf{x} \in V$ and $\lambda \in \mathbb{K}$. Then
$$
A \mathbf{\bar x} = \overline{A \mathbf{x}} = \overline{\lambda \mathbf{x}} = \bar \lambda \mathbf{\bar x}.
$$
The complex conjugate of an eigenvector of $A$ is also an eigenvector of $A$ with an eigenvalue $\bar \lambda$.
> *Theorem 4*: let $A$ be a $n \times n$ matrix and let $\lambda_1, \dots, \lambda_n \in \mathbb{K}$ be the eigenvalues of $A$. It follows that
>
> $$
> \det (A - \lambda I) = (\lambda_1 - \lambda)(\lambda_2 - \lambda) \cdots (\lambda_n - \lambda),
> $$
>
> and
>
> $$
> \det (A) = \lambda_1 \lambda_2 \cdots \lambda_n.
> $$
??? note "*Proof*:"
Let $A$ be a $n \times n$ matrix and let $\lambda_1, \dots, \lambda_n \in \mathbb{K}$ be the eigenvalues of $A$. It follows from the [fundamental theorem of algebra](../../number-theory/complex-numbers/#roots-of-polynomials) that
$$
\det (A - \lambda I) = (\lambda_1 - \lambda)(\lambda_2 - \lambda) \cdots (\lambda_n - \lambda),
$$
by taking $\lambda = 0$ it follows that
$$
\det (A) = \lambda_1 \lambda_2 \cdots \lambda_n.
$$
From $\det (A) = \lambda_1 \lambda_2 \cdots \lambda_n$ it must follow that
$$
\mathrm{trace}(A) = \sum_{i=1}^n \lambda_i.
$$
> *Theorem 5*: let $A$ and $B$ be $n \times n$ matrices. If $B$ is similar to $A$, then $A$ and $B$ have the same eigenvalues.
??? note "*Proof*:"
Let $A$ and $B$ be similar $n \times n$ matrices, then there exists a nonsingular matrix $S$ such that
$$
B = S^{-1} A S.
$$
Let $\lambda \in \mathbb{K}$ be an eigenvalue of $B$ then
$$
\begin{align*}
0 &= \det(B - \lambda I), \\
&= \det(S^{-1} A S - \lambda I), \\
&= \det(S^{-1}(A - \lambda I) S), \\
&= \det(S^{-1}) \det(A - \lambda I) \det(S), \\
&= \det(A - \lambda I).
\end{align*}
$$
## Diagonalization
> *Definition 3*: an $n \times n$ matrix $A$ is **diagonalizable** if there exists a nonsingular diagonalizing matrix $X$ and a diagonal matrix $D$ such that
>
> $$
> A X = X D.
> $$
We may now pose the following theorem.
> *Theorem 6*: an $n \times n$ matrix $A$ is diagonalizable if and only if $A$ has $n \in \mathbb{N}$ linearly independent eigenvectors.
??? note "*Proof*:"
Will be added later.
It follows from the proof that the column vectors of the diagonalizing matrix $X$ are eigenvectors of $A$ and the diagonal elements of $D$ are the corresponding eigenvalues of $A$. If $A$ is diagonalizable, then
$$
A = X D X^{-1},
$$
it follows then that
$$
A^k = X D^k X^{-1},
$$
for $k \in \mathbb{K}$.
### Hermitian case
The following section is for the special case that a matrix is [Hermitian](../matrices/matrix-arithmatic/#hermitian-matrix).
> *Theorem 7*: the eigenvalues of a Hermitian matrix are real.
??? note "*Proof*:"
Let $A$ be a Hermitian matrix and let $\mathbf{x} \in V \backslash \{\mathbf{0}\}$ be an eigenvector of $A$ with corresponding eigenvalue $\lambda \in \mathbb{C}$. We have
$$
\begin{align*}
\lambda \mathbf{x}^H \mathbf{x} &= \mathbf{x}^H (\lambda \mathbf{x}), \\
&= \mathbf{x}^H (A \mathbf{x}), \\
&= (\mathbf{x}^H A) \mathbf{x}, \\
&= (A^H \mathbf{x})^H \mathbf{x} , \\
&= (A \mathbf{x})^H \mathbf{x}, \\
&= (\lambda \mathbf{x})^H \mathbf{x}, \\
&= \bar \lambda \mathbf{x}^H \mathbf{x},
\end{align*}
$$
since $\bar \lambda = \lambda$ we must have that $\lambda \in \mathbb{R}$.
> *Theorem 8*: the eigenvectors of a Hermitian matrix corresponding to distinct eigenvalues are orthogonal.
??? note "*Proof*:"
Let $A$ be a Hermitian matrix and let $\mathbf{x}_1, \mathbf{x}_2 \in V \backslash \{\mathbf{0}\}$ be two eigenvectors of $A$ with corresponding eigenvalues $\lambda_1, \lambda_2 \in \mathbb{C}[\lambda_1 \neq \lambda_2]$. We have
$$
\begin{align*}
\lambda_1 \mathbf{x}_1^H \mathbf{x}_2 &= (\lambda_1 \mathbf{x}_1)^H \mathbf{x}_2, \\
&= (A \mathbf{x}_1)^H \mathbf{x}_2, \\
&= \mathbf{x}_1^H A^H \mathbf{x}_2, \\
&= \mathbf{x}_1^H A \mathbf{x}_2, \\
&= \mathbf{x}_1^H (\lambda_2 \mathbf{x}_2), \\
&= \lambda_2 \mathbf{x}_1^H \mathbf{x}_2,
\end{align*}
$$
since $\lambda_1 \neq \lambda_2$ this must imply that $\mathbf{x}_1^H \mathbf{x}_2 = 0$, implying orthogonality in terms of the Hermite scalar product.
Theorem 7 and 8 impose that the following definition can be used.
> *Definition 4*: an $n \times n$ matrix $U$ is **unitary** if the column vectors of $U$ form an orthonormal set in $V$.
Thus, $U$ is unitary if and only if $U^H U = I$. Then it also follows that $U^{-1} = U^H$. A real unitary matrix is an orthogonal matrix.
One may observe that theorem 8 implies that the diagonalizing matrix of a Hermitian matrix $A$ is unitary when $A$ has distinct eigenvalues.
> *Lemma 1*: if the eigenvalues of a Hermitian matrix $A$ are distinct, then there exists a unitary matrix $U$ and a diagonal matrix $D$ such that
>
> $$
> A U = U D.
> $$
??? note "*Proof*:"
Will be added later.
With the column vectors of $U$ the eigenvectors of $A$ and the diagonal elements of $D$ the corresponding eigenvalues of $A$.
> *Theorem 9*: let $A$ be an $n \times n$ matrix, there exists a unitary matrix $U$ and a upper triangular matrix $T$ such that
>
> $$
> A U = U T.
> $$
??? note "*Proof*:"
Will be added later.
The factorization $A = U T U^H$ is often referred to as the *Schur decomposition* of $A$.
> *Theorem 10*: if $A$ is Hermitian, then there exists a unitary matrix $U$ and a diagonal matrix $D$ such that
>
> $$
> A U = U D.
> $$
??? note "*Proof*:"
Will be added later.

View file

@ -0,0 +1,219 @@
# Inner product spaces
## Definition
An introduction of length in a vector space may be formulated in terms of an inner product space.
> *Definition 1*: an **inner product** on $V$ is an operation on $V$ that assigns, to each pair of vectors $\mathbf{x},\mathbf{y} \in V$, a real number $\langle \mathbf{x},\mathbf{y}\rangle$ satisfying the following conditions
>
> 1. $\langle \mathbf{x},\mathbf{x}\rangle > 0, \text{ for } \mathbf{x} \in V\backslash\{\mathbf{0}\} \text{ and } \langle \mathbf{x},\mathbf{x}\rangle = 0, \; \text{for } \mathbf{x} = \mathbf{0}$,
> 2. $\langle \mathbf{x},\mathbf{y}\rangle = \overline{\langle \mathbf{y},\mathbf{x}\rangle}, \; \forall \mathbf{x}, \mathbf{y} \in V$,
> 3. $\langle a \mathbf{x} + b \mathbf{y}, \mathbf{z}\rangle = a \langle \mathbf{x},\mathbf{z}\rangle + b \langle \mathbf{y},\mathbf{z}\rangle, \; \forall \mathbf{x}, \mathbf{y}, \mathbf{z} \in V \text{ and } a,b \in \mathbb{K}$.
A vector space $V$ with an inner product is called an **inner product space**.
### Euclidean inner product spaces
The standard inner product on the Euclidean vector spaces $V = \mathbb{R}^n$ with $n \in \mathbb{N}$ is given by the scalar product defined by
$$
\langle \mathbf{x},\mathbf{y}\rangle = \mathbf{x}^T \mathbf{y},
$$
for all $\mathbf{x},\mathbf{y} \in V$.
??? note "*Proof*:"
Will be added later.
This can be extended to matrices $V = \mathbb{R}^{m \times n}$ with $m,n \in \mathbb{N}$ for which an inner product may be given by
$$
\langle A, B\rangle = \sum_{i=1}^m \sum_{j=1}^n a_{ij} b_{ij},
$$
for all $A, B \in V$.
??? note "*Proof*:"
Will be added later.
### Function inner product spaces
Let $V$ be a function space with a domain $X$. An inner product on $V$ may be defined by
$$
\langle f, g\rangle = \int_X \bar f(x) g(x) dx
$$
for all $f,g \in V$.
??? note "*Proof*:"
Will be added later.
### Polynomial inner product spaces
Let $V$ be a polynomial space of degree $n \in \mathbb{N}$ with the set of numbers $\{x_i\}_{i=1}^n \subset \mathbb{K}^n$. An inner product on $V$ may be defined by
$$
\langle p, q \rangle = \sum_{i=1}^n \bar p(x_i) q(x_i),
$$
for all $p,q \in V$.
??? note "*Proof*:"
Will be added later.
## Properties of inner product spaces
> *Definition 2*: let $V$ be an inner product space, the Euclidean length $\|\mathbf{v}\|$ of a vector $\mathbf{v}$ is defined as
>
> $$
> \|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle},
> $$
>
> for all $\mathbf{v} \in V$.
Which is consistent with Euclidean geometry. According to definition 1 the distance between two vectors $\mathbf{v}, \mathbf{w} \in V$ is $\|\mathbf{v} - \mathbf{w}\|$.
> *Definition 3*: let $V$ be an inner product space, the vectors $\mathbf{u}$ and $\mathbf{v}$ are orthogonal if
>
> $$
> \langle \mathbf{u}, \mathbf{v} \rangle = 0,
> $$
>
> for all $\mathbf{u}, \mathbf{v} \in V$.
A pair of orthogonal vectors will satisfy the theorem of Pythagoras.
> *Theorem 1*: let $V$ be an inner product space and $\mathbf{u}$ and $\mathbf{v}$ are orthogonal then
>
> $$
> \|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2,
> $$
>
> for all $\mathbf{u}, \mathbf{v} \in V$.
??? note "*Proof*:"
let $V$ be an inner product space and let $\mathbf{u}, \mathbf{v} \in V$ be orthogonal, then
$$
\begin{align*}
\|\mathbf{u} + \mathbf{v}\|^2 &= \langle \mathbf{u} + \mathbf{v}, \mathbf{u} + \mathbf{v}\rangle, \\
&= \langle \mathbf{u}, \mathbf{u} \rangle + 2 \langle \mathbf{u}, \mathbf{v} \rangle + \langle \mathbf{v}, \mathbf{v} \rangle, \\
&= \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2.
\end{align*}
$$
Interpreted in $\mathbb{R}^2$ this is just the familiar Pythagorean theorem.
> *Definition 4*: let $V$ be an inner product space then the **scalar projection** $a$ of $\mathbf{u}$ onto $\mathbf{v}$ is defined as
>
> $$
> a = \frac{1}{\|\mathbf{v}\|} \langle \mathbf{u}, \mathbf{v} \rangle,
> $$
>
> for all $\mathbf{u} \in V$ and $\mathbf{v} \in V \backslash \{\mathbf{0}\}$.
>
> The **vector projection** $p$ of $\mathbf{u}$ onto $\mathbf{v}$ is defined as
>
> $$
> \mathbf{p} = a \bigg(\frac{1}{\|\mathbf{v}\|} \mathbf{v}\bigg) = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\langle \mathbf{v}, \mathbf{v} \rangle} \mathbf{v},
> $$
>
> for all $\mathbf{u} \in V$ and $\mathbf{v} \in V \backslash \{\mathbf{0}\}$.
It may be observed that $\mathbf{u} - \mathbf{p}$ and $\mathbf{p}$ are orthogonal since $\langle \mathbf{p}, \mathbf{p} \rangle = a^2$ and $\langle \mathbf{u}, \mathbf{p} \rangle = a^2$ which implies
$$
\langle \mathbf{u} - \mathbf{p}, \mathbf{p} \rangle = \langle \mathbf{u}, \mathbf{p} \rangle - \langle \mathbf{p}, \mathbf{p} \rangle = a^2 - a^2 = 0.
$$
Additionally, it may be observed that $\mathbf{u} = \mathbf{p}$ if and only if $\mathbf{u}$ is a scalar multiple of $\mathbf{v}$; $\mathbf{u} = b \mathbf{v}$ for some $b \in \mathbb{K}$. Since
$$
\mathbf{p} = \frac{\langle b \mathbf{v}, \mathbf{v} \rangle}{\langle \mathbf{v}, \mathbf{v} \rangle} \mathbf{v} = b \mathbf{v} = \mathbf{u}.
$$
> *Theorem 2*: let $V$ be an inner product space then
>
> $$
> | \langle \mathbf{u}, \mathbf{v} \rangle | \leq \| \mathbf{u} \| \| \mathbf{v} \|,
> $$
>
> is true for all $\mathbf{u}, \mathbf{v} \in V$. With equality only holding if and only if $\mathbf{u}$ and $\mathbf{v}$ are linearly dependent.
??? note "*Proof*:"
let $V$ be an inner product space and let $\mathbf{u}, \mathbf{v} \in V$. If $\mathbf{v} = \mathbf{0}$, then
$$
| \langle \mathbf{u}, \mathbf{v} \rangle | = 0 = \| \mathbf{u} \| \| \mathbf{v} \|,
$$
If $\mathbf{v} \neq \mathbf{0}$, then let $\mathbf{p}$ be the vector projection of $\mathbf{u}$ onto $\mathbf{v}$. Since $\mathbf{p}$ is orthogonal to $\mathbf{u} - \mathbf{p}$ it follows that
$$
\| \mathbf{p} \|^2 + \| \mathbf{u} - \mathbf{p} \|^2 = \| \mathbf{u} \|^2,
$$
thus
$$
\frac{1}{\|\mathbf{v}\|^2} \langle \mathbf{u}, \mathbf{v} \rangle^2 = \| \mathbf{p}\|^2 = \| \mathbf{u} \|^2 - \| \mathbf{u} - \mathbf{p} \|^2,
$$
and hence
$$
\langle \mathbf{u}, \mathbf{v} \rangle^2 = \|\mathbf{u}\|^2 \|\mathbf{v}\|^2 - \|\mathbf{u} - \mathbf{p}\|^2 \|\mathbf{v}\|^2 \leq \|\mathbf{u}\|^2 \|\mathbf{v}\|^2,
$$
therefore
$$
| \langle \mathbf{u}, \mathbf{v} \rangle | \leq \| \mathbf{u} \| \| \mathbf{v} \|.
$$
Equality holds if and only if $\mathbf{u} = \mathbf{p}$. From the above observations, this condition may be restated to linear dependence of $\mathbf{u}$ and $\mathbf{v}$.
A consequence of the Cauchy-Schwarz inequality is that if $\mathbf{u}$ and $\mathbf{v}$ aer nonzero vectors in an inner product space then
$$
-1 \leq \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\|\mathbf{u}\| \|\mathbf{v}\|} \leq 1,
$$
and hence there is a unique angle $\theta \in [0, \pi]$ such that
$$
\cos \theta = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\|\mathbf{u}\| \|\mathbf{v}\|}.
$$
## Normed spaces
> *Definition 5*: a vector space $V$ is said to be a **normed linear space** if to each vector $\mathbf{v} \in V$ there is associated a real number $\| \mathbf{v} \|$ satisfying the following conditions
>
> 1. $\|\mathbf{v}\| > 0, \text{ for } \mathbf{v} \in V\backslash\{\mathbf{0}\} \text{ and } \| \mathbf{v} \| = 0, \text{ for } \mathbf{v} = \mathbf{0}$,
> 2. $\|a \mathbf{v}\| = |a| \|\mathbf{v}\|, \; \forall \mathbf{v} \in V \text{ and } a \in \mathbb{K}$,
> 3. $\| \mathbf{v} + \mathbf{w}\| \geq \|\mathbf{v}\| + \| \mathbf{w}\|, \; \forall \mathbf{v}, \mathbf{w} \in V$,
>
> is called the **norm** of $\mathbf{v}$.
With the third condition, the *triangle inequality*.
> *Theorem 3*: let $V$ be an inner product space then
>
> $$
> \| \mathbf{v} \| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle},
> $$
>
> for all $\mathbf{v} \in V$ defines a norm on $V$.
??? note "*Proof*:"
Will be added later.
We therefore have that the Euclidean length (definition 2) is a norm, justifying the notation.

View file

@ -0,0 +1,126 @@
# Linear transformations
## Definition
> *Definition*: let $V$ and $W$ be vector spaces, a mapping $L: V \to W$ is a **linear transformation** or **linear map** if
>
> $$
> L(\lambda \mathbf{v}_1 + \mu \mathbf{v}_2) = \lambda L(\mathbf{v}_1) + \mu L(\mathbf{v}_2),
> $$
>
> for all $\mathbf{v}_{1,2} \in V$ and $\lambda, \mu \in \mathbb{K}$.
A linear transformation may also be called a **vector space homomorphism**. If the linear transformation is a bijection then it may be called a **linear isomorphism**.
In the case that the vector spaces $V$ and $W$ are the same; $V=W$, a linear transformation $L: V \to V$ will be referred to as a **linear operator** on $V$ or **linear endomorphism** .
## The image and kernel
Let $L: V \to W$ be a linear transformation from a vector space $V$ to a vector space $W$. In this section the effect is considered that $L$ has on subspaces of $V$. Of particular importance is the set of vectors in $V$ that get mapped into the zero vector of $W$.
> *Definition*: let $L: V \to W$ be a linear transformation. The **kernel** of $L$, denoted by $\ker(L)$, is defined by
>
> $$
> \ker(L) = \{\mathbf{v} \in V \;|\; L(\mathbf{v}) = \mathbf{0}\}.
> $$
The kernel is therefore a set consisting of vectors in $V$ that get mapped into the zero vector of $W$.
> *Definition*: let $L: V \to W$ be a linear transformation and let $S$ be a subspace of $V$. The **image** of $S$, denoted by $L(S)$, is defined by
>
> $$
> L(S) = \{\mathbf{w} \in W \;|\; \mathbf{w} = L(\mathbf{v}) \text{ for } \mathbf{v} \in S \}.
> $$
>
> The image of the entire vector space $L(V)$, is called the **range** of $L$.
With these definitions the following theorem may be posed.
> *Theorem*: if $L: V \to W$ is a linear transformation and $S$ is a subspace of $V$, then
>
> 1. $\ker(L)$ is a subspace of $V$.
> 2. $L(S)$ is a subspace of $W$.
??? note "*Proof*:"
Let $L: V \to W$ be a linear transformation and $S$ is a subspace of $V$.
To prove 1, let $\mathbf{v}_{1,2} \in \ker(L)$ and let $\lambda, \mu \in \mathbb{K}$. Then
$$
L(\lambda \mathbf{v}_1 + \mu \mathbf{v}_2) = \lambda L(\mathbf{v}_1) + \mu L(\mathbf{v}_2) = \lambda \mathbf{0} + \mu \mathbf{0} = \mathbf{0},
$$
therefore $\lambda \mathbf{v}_1 + \mu \mathbf{v}_2 \in \ker(L)$ and hence $\ker(L)$ is a subspace of $V$.
To prove 2, let $\mathbf{w}_{1,2} \in L(S)$ then there exist $\mathbf{v}_{1,2} \in S$ such that $\mathbf{w}_{1,2} = L(\mathbf{v}_{1,2})$ For any $\lambda, \mu \in \mathbb{K}$ we have
$$
\lambda \mathbf{w}_1 + \mu \mathbf{w}_2 = \lambda L(\mathbf{v}_1) + \mu L(\mathbf{v}_2) = L(\lambda \mathbf{v}_1 + \mu \mathbf{v}_2),
$$
since $\lambda \mathbf{v}_1 + \mu \mathbf{v}_2 \in S$ it follows that $\lambda \mathbf{w}_1 + \mu \mathbf{w}_2 \in L(S)$ and hence $L(S)$ is a subspace of $W$.
## Matrix representations
> *Theorem*: let $L: \mathbb{R}^n \to \mathbb{R}^m$ be a linear transformation, then there is an $m \times n$ matrix $A$ such that
>
> $$
> L(\mathbf{x}) = A \mathbf{x},
> $$
>
> for all $x \in \mathbb{R}^n$. With the $i$th column vector of $A$ given by
>
> $$
> \mathbf{a}_i = L(\mathbf{e}_i),
> $$
>
> for a basis $\{\mathbf{e}_1, \dots, \mathbf{e}_n\} \subset \mathbb{R}^n$ and $i \in \{1, \dots, n\}$.
??? note "*Proof*:"
For $i \in \{1, \dots, n\}$, define
$$
\mathbf{a}_i = L(\mathbf{e}_i),
$$
and let
$$
A = (\mathbf{a}_1, \dots, \mathbf{a}_n).
$$
If $\mathbf{x} = x_1 \mathbf{e}_1 + \dots + x_n \mathbf{e}_n$ is an arbitrary element of $\mathbb{R}^n$, then
$$
\begin{align*}
L(\mathbf{x}) &= x_1 L(\mathbf{e}_1) + \dots + x_n L(\mathbf{e}_n), \\
&= x_1 \mathbf{a}_1 + \dots + x_n \mathbf{a}_n, \\
&= A \mathbf{x}.
\end{align*}
$$
It has therefore been established that each linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$ can be represented in terms of an $m \times n$ matrix.
> *Theorem*: let $E = \{\mathbf{e}_1, \dots, \mathbf{e}_n\}$ and $F = \{\mathbf{f}_1, \dots, \mathbf{f}_n\}$ be two ordered bases for a vector space $V$, and let $L: V \to V$ be a linear operator on $V$, $\dim V = n \in \mathbb{N}$. Let $S$ be the $n \times n$ transition matrix representing the change from $F$ to $E$,
>
> $$
> \mathbf{e}_i = S \mathbf{f}_i,
> $$
>
> for $i \in \mathbb{N}; i\leq n$.
>
> If $A$ is the matrix representing $L$ with respect to $E$, and $B$ is the matrix representing $L$ with respect to $F$, then
>
> $$
> B = S^{-1} A S.
> $$
??? note "*Proof*:"
Will be added later.
> *Definition*: let $A$ and $B$ be $n \times n$ matrices. $B$ is said to be **similar** to $A$ if there exists a nonsingular matrix $S$ such that $B = S^{-1} A S$.
It follows from the above theorem that if $A$ and $B$ are $n \times n$ matrices representing the same operator $L$, then $A$ and $B$ are similar.

View file

@ -0,0 +1,69 @@
# Elementary matrices
> *Definition*: an *elementary* matrix is defined as an identity matrix with exactly one elementary row operation undergone.
>
> 1. An elementary matrix of type 1 $E_1$ is obtained by changing two rows $I$.
> 2. An elementary matrix of type 2 $E_2$ is obtained by multiplying a row of $I$ by a nonzero constant.
> 3. An elementary matrix of type 3 $E_3$ is obtained from $I$ by adding a multiple of one row to another row.
For example the elementary matrices could be given by
$$
E_1 = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1\end{pmatrix}, \qquad E_2 = \begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 3\end{pmatrix}, \qquad E_3 = \begin{pmatrix}1 & 0 & 3\\ 0 & 1 & 0\\ 0 & 0 & 1\end{pmatrix}.
$$
> *Theorem*: if $E$ is an elementary matrix, then $E$ is nonsingular and $E^{-1}$ is an elementary matrix of the same type.
??? note "*Proof*:"
If $E$ is the elementary matrix of type 1 formed from $I$ by interchanging the $i$th and $j$th rows, then $E$ can be transfomred back into $I$ by interchanging these same rows again. Therefore, $EE = I$ and hence $E$ is its own inverse.
IF $E$ is the elementray matrix of type 2 formed by multiplying the $i$th row of $I$ by a nonzero scalar $\alpha$ then $E$ can be transformed into the identity matrix by multiplying either its $i$th row or its $i$th column by $1/\alpha$.
If $E$ is the elemtary matrix of type 3 formed from $I$ by adding $m$ times the $i$th row to the $j$th row then $E$ can be transformed back into $I$ either by subtracting $m$ times the $i$th row from the $j$th row or by subtracting $m$ times the $j$th column from the $i$th column.
> *Definition*: a matrix $B$ is **row equivalent** to a matrix $A$ if there exists a finite sequence $E_1, E_2, \dots, E_K$ of elementary matrices with $k \in \mathbb{N}$ such that
>
> $$
> B = E_k E_{k-1} \cdots E_1 A.
> $$
It may be observed that row equivalence is a reflexive, symmetric and transitive relation.
> *Theorem*: let $A$ be an $n \times n$ matrix, the following are equivalent
>
> 1. $A$ is nonsingular,
> 2. $A\mathbf{x} = \mathbf{0}$ has only the trivial solution $\mathbf{0}$,
> 3. $A$ is row equivalent to $I$.
??? note "*Proof*:"
Let $A$ be a nonsingular $n \times n$ matrix and $\mathbf{\hat x}$ is a solution of $A \mathbf{x} = \mathbf{0}$ then
$$
\mathbf{\hat x} = I \mathbf{\hat x} = (A^{-1} A)\mathbf{\hat x} = A^{-1} (A \mathbf{\hat x}) = A^{-1} \mathbf{0} = \mathbf{0}.
$$
Let $U$ be the row echelon form of $A$. If one of the diagonal elements of $U$ were 0, the last row of $U$ would consist entirely of zeros. But then $A \mathbf{x} = \mathbf{0}$ would have a nontrivial solution. Thus $U$ must be a strictly triangular matrix with diagonal elements all equal to 1. It then follows that $I$ is the reduced row echelon form of $A$ and hence $A$ is row equivalent to $I$.
If $A$ is row equivalent to $I$ there exists elementary matrices $E_1, E_2, \dots, E_k$ with $k \in \mathbb{N}$ such that
$$
A = E_k E_{k-1} \cdots E_1 I = E_k E_{k-1} \cdots E_1.
$$
Since $E_i$ is invertible for $i \in \{1, \dots, k\}$ the product $E_k E_{k-1} \cdots E_1$ is also invertible, hence $A$ is nonsingular.
If $A$ is nonsingular then $A$ is row equivalent to $I$ and hence there exists elemtary matrices $E_1, \dots, E_k$ such that
$$
E_k E_{k-1} \cdots E_1 A = I,
$$
multiplyting both sides on the right by $A^{-1}$ obtains
$$
E_k E_{k-1} \cdots E_1 = A^{-1}
$$
a method for computing $A^{-1}$.

View file

@ -0,0 +1,94 @@
# Matrix algebra
> *Theorem*: let $A, B$ and $C$ be matrices and $\alpha$ and $\beta$ be scalars. Each of the following statements is valid
>
> 1. $A + B = B + A$,
> 2. $(A + B) + C = A + (B + C)$,
> 3. $(AB)C = A(BC)$,
> 4. $A(B + C) = AB + AC$,
> 5. $(A + B)C = AC + BC$,
> 6. $(\alpha \beta) A = \alpha(\beta A)$,
> 7. $\alpha (AB) = (\alpha A)B = A (\alpha) B$,
> 8. $(\alpha + \beta)A = \alpha A + \beta A$,
> 9. $\alpha (A + B) = \alpha A + \alpha B$.
??? note "*Proof*:"
Will be added later.
In the case where an $n \times n$ matrix $A$ is multiplied by itself $k$ times it is convenient to use exponential notation: $AA \cdots A = A^k$.
> *Definition*: the $n \times n$ **identity matrix** is the matrix $I = (\delta_{ij})$, where
>
> $$
> \delta_{ij} = \begin{cases} 1 &\text{ if } i = j, \\ 0 &\text{ if } i \neq j.\end{cases}
> $$
Obtaining for the multiplication of a $n \times n$ matrix $A$ with the identitiy matrix; $A I = A$.
> *Definition*: an $n \times n$ matrix $A$ is said to be **nonsingular** or **invertible** if there exists a matrix $A^{-1}$ such that $AA^{-1} = A^{-1}A = I$. The matrix $A^{-1}$ is said to be a **multiplicative inverse** of $A$.
If $B$ and $C$ are both multiplicative inverses of $A$ then
$$
B = BI = B(AC) = (BA)C = IC = C,
$$
thus a matrix can have at most one multiplicative inverse.
> *Definition*: an $n \times n$ matrix is said to be **singular** if it does not have a multiplicative inverse.
Or similarly, an $n \times n$ matrix $A$ is singular if $A \mathbf{x} = \mathbf{0}$ for some non trivial $\mathbf{x} \in \mathbb{R}^n \backslash \{\mathbf{0}\}$. For a nonsingular matrix $A$, $\mathbf{x} = \mathbf{0}$ is the only solution to $A \mathbf{x} = \mathbf{0}$.
> *Theorem*: if $A$ and $B$ are nonsingular $n \times n$ matrices, then $AB$ is also nonsingular and
>
> $$
> (AB)^{-1} = B^{-1} A^{-1}.
> $$
??? note "*Proof*:"
Let $A$ and $B$ be nonsingular $n \times n$ matrices. If we suppose $AB$ is nonsingular and $(AB)^{-1} = B^{-1} A^{-1}$ we have
$$
(AB)^{-1}AB = (B^{-1} A^{-1})AB = B^{-1} (A^{-1} A) B = B^{-1} B = I, \\
AB(AB)^{-1} = AB(B^{-1} A^{-1}) = A (B B^{-1}) A^{-1} = A A^{-1} = I.
$$
> *Theorem*: let $A$ be a nonsingular $n \times n$ matrix, the inverse of $A$ given by $A^{-1}$ is nonsingular.
??? note "*Proof*:"
Let $A$ be a nonsingular $n \times n$ matrix, $A^{-1}$ its inverse and $\mathbf{x} \in \mathbb{R}^n$ a vector. Suppose $A^{-1} \mathbf{x} = \mathbf{0}$ then
$$
\mathbf{x} = I \mathbf{x} = (A A^{-1}) \mathbf{x} = A(A^{-1} \mathbf{x}) = \mathbf{0}.
$$
> *Theorem*: let $A$ be a nonsingular $n \times n$ matrix then the solution of the system $A\mathbf{x} = \mathbf{b}$ is $\mathbf{x} = A^{-1} \mathbf{b}$ with $\mathbf{x}, \mathbf{b} \in \mathbb{R}^n$.
??? note "*Proof*:"
Let $A$ be a nonsingular $n \times n$ matrix, $A^{-1}$ its inverse and $\mathbf{x}, \mathbf{b} \in \mathbb{R}^n$ vectors. Suppose $\mathbf{x} = A^{-1} \mathbf{b}$ then we have
$$
A \mathbf{x} = A (A^{-1} \mathbf{b}) = (A A^{-1}) \mathbf{b} = \mathbf{b}.
$$
> *Corollary*: the system $A \mathbf{x} = \mathbf{b}$ of $n$ linear equations in $n$ unknowns has a unique solution if and only if $A$ is nonsingular.
??? note "*Proof*:"
The proof follows from the above theorem.
> *Theorem*: let $A$ and $B$ be matrices and $\alpha$ and $\beta$ be scalars. Each of the following statements valid
>
> 1. $(A^T)^T = A$,
> 2. $(\alpha A)^T = \alpha A^T$,
> 3. $(A + B)^T = A^T + B^T$,
> 4. $(AB)^T = B^T A^T$.
??? note "*Proof*:"
Will be added later.

View file

@ -0,0 +1,105 @@
# Matrix arithmetic
## Definitions
> *Definition*: let $A$ be a $m \times n$ *matrix* given by
>
> $$
> A = \begin{pmatrix} a_{11} & a_{12}& \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}
> $$
>
> with $a_{ij}$ referred to as the entries of $A$ or scalars in general, with $(i,j) \in \{1, \dots, m\} \times \{1, \dots, n\}$. For real entries in $A$ we may denote $A \in \mathbb{R}^{m \times n}$.
This matrix may be denoted in a shorter way by $A = (a_{ij})$.
> *Definition*: let $\mathbf{x}$ be a $1 \times n$ matrix, referred to as *row vector* given by
>
> $$
> \mathbf{x} = \begin{pmatrix}x_1 \\ x_2 \\ \vdots \\ x_n\end{pmatrix}
> $$
>
> with $x_i$ referred to as the entries of $\mathbf{x}$, with $i \in \{1, \dots, n\}$. For real entries we may denote $\mathbf{x} \in \mathbb{R}^n$.
<br>
> *Definition*: let $\mathbf{x}$ be a $n \times 1$ matrix, referred to as *column vector* given by
>
> $$
> \mathbf{x} = (x_1, x_2, \dots, x_n)
> $$
>
> with $x_i$ referred to as the entries of $\mathbf{x}$, with $i \in \{1, \dots, n\}$. Also for the column vector we have for real entries $\mathbf{x} \in \mathbb{R}^n$.
From these two definitions it may be observed that row and column vectors may be used interchangebly, however using both it is important to state the difference. Best practice is to always work with row vectors and take the transpose if necessary.
## Matrix operations
> *Definition*: two $m \times n$ matrices $A$ and $B$ are said to be **equal** if $a_{ij} = b_{ij}$ for each $i(i,j) \in \{1, \dots, m\} \times \{1, \dots, n\}$.
<br>
> *Definition*: if $A$ is an $m \times n$ matrix and $\alpha$ is a scalar, then $\alpha A$ is the $m \times n$ matrix whose $(i,j) \in \{1, \dots, m\} \times \{1, \dots, n\}$ entry is $\alpha a_{ij}$.
<br>
> *Definition*: if $A = (a_{ij})$ and $B = (b_{ij})$ are both $m \times n$ matrices, then the sum $A + B$ is the $m \times n$ matrix whose $(i,j) \in \{1, \dots, m\} \times \{1, \dots, n\}$ entry is $a_{ij} + b_{ij}$ for each ordered pair $(i,j)$.
If $A$ is an $m \times n$ matrix and $\mathbf{x}$ is a vector in $\mathbb{R}^n$, then
$$
A \mathbf{x} = x_1 \mathbf{a}_1 + x_2 \mathbf{a}_2 + \dots + x_n \mathbf{a}_n
$$
with $A = (\mathbf{a_1}, \mathbf{a_2}, \dots, \mathbf{a_n})$.
> *Definition*: if $\mathbf{a_1}, \mathbf{a_2}, \dots, \mathbf{a_n}$ are vectors in $\mathbb{R}^m$ and $x_1, x_2 \dots, x_n$ are scalars, then a sum of the form
>
> $$
> x_1 \mathbf{a}_1 + x_2 \mathbf{a}_2 + \dots + x_n \mathbf{a}_n
> $$
>
> is said to be a **linear combination** of the vectors $\mathbf{a_1}, \mathbf{a_2}, \dots, \mathbf{a_n}$.
<br>
> *Theorem*: a linear system $A \mathbf{x} = \mathbf{b}$ is consistent if and only if $\mathbf{b}$ can be written as a linear combination of the column vectors $A$.
??? note "*Proof*:"
Will be added later.
## Transpose matrix
> *Definition*: the **transpose** of an $m \times n$ matrix A is the $n \times m$ matrix $B$ defined by
>
> $$
> b_{ji} = a_{ij},
> $$
>
> for $j \in \{1, \dots, n\}$ and $i \in \{1, \dots m\}$. The transpose of $A$ is denoted by $A^T$.
<br>
> *Definition*: an $n \times n$ matrix $A$ is said to be **symmetric** if $A^T = A$.
## Hermitian matrix
> *Definition*: the **conjugate transpose** of an $m \times n$ matrix A is the $n \times m$ matrix $B$ defined by
>
> $$
> b_{ji} = \bar a_{ij},
> $$
>
> for $j \in \{1, \dots, n\}$ and $i \in \{1, \dots m\}$. The **conjugate transpose** of $A$ is denoted by $A^H$.
<br>
> *Definition*: an $n \times n$ matrix $A$ is said to be **Hermitian** if $A^H = A$.
## Matrix multiplication
> *Definition*: if $A = (a_{ij})$ is an $m \times n$ matrix and $B = (b_{ij})$ is an $n \times r$ matrix, then the product $A B = C = (c_{ij})$ is the $m \times r$ matrix whose entries are defined by
>
> $$
> c_{ij} = \mathbf{a}_i \mathbf{b}_j = \sum_{k=1}^n a_{ik} b_{kj}
> $$

View file

@ -0,0 +1,467 @@
# Orthogonality
## Orthogonal subspaces
> *Definition 1*: two subspaces $S$ and $T$ of an inner product space $V$ are **orthogonal** if
>
> $$
> \langle \mathbf{u}, \mathbf{v} \rangle = 0,
> $$
>
> for all $\mathbf{u} \in S$ and $\mathbf{v} \in T$. Orthogonality of $S$ and $T$ may be denoted by $S \perp T$.
The notion of orthogonality is only valid in vector spaces with a defined inner product.
> *Definition 2*: let $S$ be a subspace of an inner product space $V$. The set of all vectors in $V$ that are orthogonal to every vector in $S$ will be denoted by $S^\perp$. Which implies
>
> $$
> S^\perp = \{\mathbf{v} \in V \;|\; \langle \mathbf{v}, \mathbf{u} \rangle = 0 \; \forall \mathbf{u} \in S \}.
> $$
>
> The set $S^\perp$ is called the **orthogonal complement** of $S$.
For example the subspaces $X = \mathrm{span}(\mathbf{e}_1)$ and $Y = \mathrm{span}(\mathbf{e}_2)$ of $\mathbb{R}^3$ are orthogonal, but they are not orthogonal complements. Indeed,
$$
X^\perp = \mathrm{span}(\mathbf{e}_2, \mathbf{e}_3) \quad \text{and} \quad Y^\perp = \mathrm{span}(\mathbf{e}_1, \mathbf{e}_3).
$$
We may observe that if $S$ and $T$ are orthogonal subspaces of an inner product space $V$, then $S \cap T = \{\mathbf{0}\}$. Since for $\mathbf{v} \in S \cap T$ and $S \perp T$ then $\langle \mathbf{v}, \mathbf{v} \rangle = 0$ and hence $\mathbf{v} = \mathbf{0}$.
Additionally, we may also observe that if $S$ is a subspace of an inner product space $V$, then $S^\perp$ is also a subspace of $V$. Since for $\mathbf{u} \in S^\perp$ and $a \in \mathbb{K}$ then
$$
\langle a \mathbf{u}, \mathbf{v} \rangle = a \cdot 0 = 0
$$
for all $\mathbf{v} \in S$, therefore $a \mathbf{u} \in S^\perp$.
If $\mathbf{u}_1, \mathbf{u}_2 \in S^\perp$ then
$$
\langle \mathbf{u}_1 + \mathbf{u}_2, \mathbf{v} \rangle = \langle \mathbf{u}_1, \mathbf{v} \rangle + \langle \mathbf{u}_2, \mathbf{v} \rangle = 0 + 0 = 0,
$$
for all $\mathbf{v} \in S$, and hence $\mathbf{u}_1 + \mathbf{u}_2 \in S^\perp$. Therefore $S^\perp$ is a subspace of $V$.
### Fundamental subspaces
Let $V$ be an Euclidean inner product space $V = \mathbb{R}^n$ with its inner product defined by the [scalar product](../inner-product-spaces/#euclidean-inner-product-spaces). With this definition of the inner product on $V$ the following theorem may be posed.
> *Theorem 1*: let $A$ be an $m \times n$ matrix, then
>
> $$
> N(A) = R(A^T)^\perp,
> $$
>
> and
>
> $$
> N(A^T) = R(A)^\perp,
> $$
>
> for all $A \in \mathbb{R}^{m \times n}$ with $R(A)$ denoting the column space of $A$ and $R(A^T)$ denoting the row space of $A$.
??? note "*Proof*:"
Let $A \in \mathbb{R}^{m \times n}$ with $R(A) = \mathrm{span}(\mathbf{\vec{a}}_i^T)$ for $i \in \mathbb{N}[i \leq n]$ denoting the column space of $A$ and $R(A^T) = \mathrm{span}(\mathbf{a}_i)$ for $i \in \mathbb{N}[i \leq m]$ denoting the row space of $A$.
For the first equation, let $\mathbf{v} \in R(A^T)^\perp$ then $\mathbf{v}^T \mathbf{\vec{a}}_i^T = \mathbf{0}$ which obtains
$$
\mathbf{0} = \mathbf{v}^T \mathbf{\vec{a}}_i^T = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{\vec{a}}_i \mathbf{v},
$$
so $A \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A)$. Which implies that $R(A^T)^\perp \subseteq N(A)$. Similarly, let $\mathbf{w} \in N(A)$ then $A \mathbf{w} = \mathbf{0}$ which obtains
$$
\mathbf{0} = \mathbf{\vec{a}}_i \mathbf{v} = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{v}^T \mathbf{\vec{a}}_i^T,
$$
and hence $\mathbf{w} \in R(A^T)^\perp$ which implies that $N(A) \subseteq R(A^T)^\perp$. Therefore $N(A) = R(A^T)^\perp$.
For the second equation, let $\mathbf{v} \in R(A)^\perp$ then $\mathbf{v}^T \mathbf{a}_i = \mathbf{0}$ which obtains
$$
\mathbf{0} = \mathbf{v}^T \mathbf{a}_i = \big(\mathbf{v}^T \mathbf{a}_i \big)^T = \mathbf{a}_i^T \mathbf{v},
$$
so $A^T \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A^T)$. Which implies that $R(A)^\perp \subseteq N(A^T)$. Similarly, let $\mathbf{w} \in N(A^T)$ then $A^T \mathbf{w} = \mathbf{0}$ which obtains
$$
\mathbf{0} = \mathbf{a}_i^T \mathbf{w} = \big(\mathbf{a}_i^T \mathbf{w} \big)^T = \mathbf{w}^T \mathbf{a}_i,
$$
and hence $\mathbf{w} \in R(A)^\perp$ which implies that $N(A^T) \subseteq R(A)^\perp$. Therefore $N(A^T) = R(A)^\perp$.
Known as the fundamental theorem of linear algebra. Which can be used to prove the following theorem.
> *Theorem 2*: if $S$ is a subspace of the inner product space $V = \mathbb{R}^n$, then
>
> $$
> \dim S + \dim S^\perp = n.
> $$
>
> Furthermore, if $\{\mathbf{v}_i\}_{i=1}^r$ is a basis of $S$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ is a basis of $S^\perp$ then $\{\mathbf{v}_i\}_{i=1}^n$ is a basis of $V$.
??? note "*Proof*:"
If $S = \{\mathbf{0}\}$, then $S^\perp = V$ and
$$
\dim S + \dim S^\perp = 0 + n = n.
$$
If $S \neq \{\mathbf{0}\}$, then let $\{\mathbf{x}_i\}_{i=1}^r$ be a basis of $S$ and define $X \in \mathbb{R}^{r \times m}$ whose $i$th row is $\mathbf{x}_i^T$ for each $i$. Matrix $X$ has rank $r$ and $R(X^T) = S$. Then by theorem 2
$$
S^\perp = R(X^T)^\perp = N(X),
$$
from the [rank nullity theorem](../vector-spaces/#rank-and-nullity) it follows that
$$
\dim S^\perp = \dim N(X) = n - r.
$$
and therefore
$$
\dim S + \dim S^\perp = r + n - r = n.
$$
Let $\{\mathbf{v}_i\}_{i=1}^r$ be a basis of $S$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ be a basis of $S^\perp$. Suppose that
$$
c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r + c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n = \mathbf{0}.
$$
Let $\mathbf{u} = c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r$ and let $\mathbf{w} = c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n$. Then we have
$$
\mathbf{u} + \mathbf{w} = \mathbf{0},
$$
implies $\mathbf{u} = - \mathbf{w}$ and thus both elements must be in $S \cap S^\perp$. However, $S \cap S^\perp = \{\mathbf{0}\}$, therefore
$$
\begin{align*}
c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r &= \mathbf{0}, \\
c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n &= \mathbf{0},
\end{align*}
$$
since $\{\mathbf{v}_i\}_{i=1}^r$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ are linearly independent, we must also have that $\{\mathbf{v}_i\}_{i=1}^n$ are linearly independent and therefore form a basis of $V$.
We may further extend this with the notion of a direct sum.
> *Definition 3*: if $U$ and $V$ are subspaces of a vector space $W$ and each $\mathbf{w} \in W$ can be written uniquely as
>
> $$
> \mathbf{w} = \mathbf{u} + \mathbf{v},
> $$
>
> with $\mathbf{u} \in U$ and $\mathbf{v} \in V$ then $W$ is a **direct sum** of U and $V$ denoted by $W = U \oplus V$.
In the following theorem it will be posed that the direct sum of a subspace and its orthogonal complement make up the whole vector space, which extends the notion of theorem 2.
> *Theorem 3*: if $S$ is a subspace of the inner product space $V = \mathbb{R}^n$, then
>
> $$
> V = S \oplus S^\perp.
> $$
??? note "*Proof*:"
Will be added later.
The following results emerge from these posed theorems.
> *Proposition 1*: let $S$ be a subspace of $V$, then $(S^\perp)^\perp = S$.
??? note "*Proof*:"
Will be added later.
Recall that the system $A \mathbf{x} = \mathbf{b}$ is consistent if and only if $\mathbf{b} \in R(A)$ since $R(A) = N(A^T)^\perp$ we have the following result.
> *Proposition 2*: let $A \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$, then either there is a vector $\mathbf{x} \in \mathbb{R}^n$ such that
>
> $$
> A \mathbf{x} = \mathbf{b},
> $$
>
> or there is a vector $\mathbf{y} \in \mathbb{R}^m$ such that
>
> $$
> A^T \mathbf{y} = \mathbf{0} \;\land\; \mathbf{y}^T \mathbf{b} \neq 0 .
> $$
??? note "*Proof*:"
Will be added later.
## Orthonormal sets
In working with an inner product space $V$, it is generally desirable to have a basis of mutually orthogonal unit vectors.
> *Definition 4*: the set of vectors $\{\mathbf{v}_i\}_{i=1}^n$ in an inner product space $V$ is **orthogonal** if
>
> $$
> \langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0,
> $$
>
> whenever $i \neq j$. Then $\{\mathbf{v}_i\}_{i=1}^n$ is said to be an **orthogonal set** of vectors.
For example the trivial set $\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3$ is an orthogonal set in $\mathbb{R}^3$.
> *Theorem 4*: if $\{\mathbf{v}_i\}_{i=1}^n$ is an orthogonal set of nonzero vectors in an inner product space $V$, then $\{\mathbf{v}_i\}_{i=1}^n$ are linearly independent.
??? note "*Proof*:"
Suppose that $\{\mathbf{v}_i\}_{i=1}^n$ is an orthogonal set of nonzero vectors in an inner product space $V$ and
$$
c_1 \mathbf{v}_1 + \dots + c_n \mathbf{v}_n = \mathbf{0},
$$
then
$$
c_1 \langle \mathbf{v}_j, \mathbf{v}_1 \rangle + \dots + c_n \langle \mathbf{v}_j, \mathbf{v}_n \rangle = 0,
$$
for $j \in \mathbb{N}[j \leq n]$ obtains $c_j \|\mathbf{v}_j\| = 0$ and hence $c_j = 0$ for all $j \in \mathbb{N}[j \leq n]$.
We may even go further and define a set of vectors that are orthogonal and have a length of $1$, a unit vector by definition.
> *Definition 5*: an **orthonormal** set of vectors is an orthogonal set of unit vectors.
For example the set $\{\mathbf{u}_i\}_{i=1}^n$ will be orthonormal if and only if
$$
\langle \mathbf{u}_i, \mathbf{u}_j \rangle = \delta_{ij},
$$
where
$$
\delta_{ij} = \begin{cases} 1 &\text{ for } i = j, \\ 0 &\text{ for } i \neq j.\end{cases}
$$
> *Theorem 5*: let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$. If
>
> $$
> \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
> $$
>
> then $c_i = \langle \mathbf{v}, \mathbf{u}_i \rangle$ for all $i \in \mathbb{N}[i \leq n]$.
??? note "*Proof*:"
Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let
$$
\mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
$$
we have
$$
\langle \mathbf{v}, \mathbf{u}_i \rangle = \Big\langle \sum_{j=1}^n c_j \mathbf{u}_j, \mathbf{u}_i \Big\rangle = \sum_{j=1}^n c_j \langle \mathbf{u}_j, \mathbf{u}_i \rangle = \sum_{j=1}^n c_j \delta_{ij} = c_i.
$$
Implying that it is much easier to calculate the coordinates of a given vector with respect to an orthonormal basis.
> *Corollary 1*: let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$. If
>
> $$
> \mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i,
> $$
>
> and
>
> $$
> \mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i,
> $$
>
> then $\langle \mathbf{v}, \mathbf{w} \rangle = \sum_{i=1}^n a_i b_i$.
??? note "*Proof*:"
Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let
$$
\mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i,
$$
and
$$
\mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i,
$$
by theorem 5 we have
$$
\langle \mathbf{v}, \mathbf{w} \rangle = \Big\langle \sum_{i=1}^n a_i \mathbf{u}_i, \mathbf{w} \Big\rangle = \sum_{i=1}^n a_i \langle \mathbf{w}, \mathbf{u}_i \rangle = \sum_{i=1}^n a_i b_i.
$$
> *Corollary 2*: let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and
>
> $$
> \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
> $$
>
> then
>
> $$
> \|\mathbf{v}\|^2 = \sum_{i=1}^n c_i^2.
> $$
??? note "*Proof*:"
Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let
$$
\mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
$$
then by corollary 1 we have
$$
\|\mathbf{v}\|^2 = \langle \mathbf{v}, \mathbf{v} \rangle = \sum_{i=1}^n c_i \mathbf{u}_i.
$$
### Orthogonal matrices
> *Definition 6*: an $n \times n$ matrix $Q$ is an **orthogonal matrix** if
>
> $$
> Q^T Q = I.
> $$
Orthogonal matrices have column vectors that form an orthonormal set in $V$, as may be posed in the following theorem.
> *Theorem 6*: let $Q = (\mathbf{q}_1, \dots, \mathbf{q}_n)$ be an orthogonal matrix, then $\{\mathbf{q}_i\}_{i=1}^n$ is an orthonormal set.
??? note "*Proof*:"
Let $Q = (\mathbf{q}_1, \dots, \mathbf{q}_n)$ be an orthogonal matrix. Then
$$
Q^T Q = I,
$$
and hence $\mathbf{q}_i^T \mathbf{q}_j = \delta_{ij}$ such that for an inner product space with a scalar product we have
$$
\langle \mathbf{q}_i, \mathbf{q}_j \rangle = 0,
$$
for $i \neq j$.
It follows then that if $Q$ is an orthogonal matrix, then $Q$ is nonsingular and $Q^{-1} = Q^T$.
In general scalar products are preserved under multiplication by an orthogonal matrix since
$$
\langle Q \mathbf{u}, Q \mathbf{v} \rangle = (Q \mathbf{v})^T Q \mathbf{u} = \mathbf{v}^T Q^T Q \mathbf{u} = \langle \mathbf{u}, \mathbf{v} \rangle.
$$
In particular, if $\mathbf{u} = \mathbf{v}$ then $\|Q \mathbf{u}\|^2 = \|\mathbf{u}\|^2$ and hence $\|Q \mathbf{u}\| = \|\mathbf{u}\|$. Multiplication by an orthogonal matrix preserves the lengths of vectors.
## Orthogonalization process
Let $\{\mathbf{a}_i\}_{i=1}^n$ be a basis of an inner product space $V$. We may use the modified method of Gram-Schmidt to determine the orthonormal basis $\{\mathbf{q}_i\}_{i=1}^n$ of $V$.
Let $\mathbf{q}_1 = \frac{1}{\|\mathbf{a}_1\|} \mathbf{a}_1$ be the first step.
Then we may induce the following step for $i \in \mathrm{range}(2,n)$:
$$
\begin{align*}
\mathbf{w} &= \mathbf{a}_i - \langle \mathbf{a}_i, \mathbf{q}_1 \rangle \mathbf{q}_1 - \dots - \langle \mathbf{a}_i, \mathbf{q}_{i-1} \rangle \mathbf{q}_{i-1}, \\
\mathbf{q}_i &= \frac{1}{\|\mathbf{w}\|} \mathbf{w}.
\end{align*}
$$
??? note "*Proof*:"
Will be added later.
## Least squares solutions of overdetermined systems
A standard technique in mathematical and statistical modeling is to find a least squares fit to a set of data points. This implies that the sum of squares fo errors between the model and the data points are minimized. A least squares problem can generally be formulated as an overdetermined linear system of equations.
For a system of equations $A \mathbf{x} = \mathbf{b}$ with $A \in \mathbb{R}^{m \times n}$ with $m, n \in \mathbb{N}[m>n]$ and $\mathbf{b} \in \mathbb{R}^m$ then for each $\mathbf{x} \in \mathbb{R}^n$ a *residual* $\mathbf{r}: \mathbb{R}^n \to \mathbb{R}^m$ can be formed
$$
\mathbf{r}(\mathbf{x}) = \mathbf{b} - A \mathbf{x}.
$$
The distance between $\mathbf{b}$ and $A \mathbf{x}$ is given by
$$
\| \mathbf{b} - A \mathbf{x} \| = \|\mathbf{r}(\mathbf{x})\|,
$$
We wish to find a vector $\mathbf{x} \in \mathbb{R}^n$ for which $\|\mathbf{r}(\mathbf{x})\|$ will be a minimum. A solution $\mathbf{\hat x}$ that minimizes $\|\mathbf{r}(\mathbf{x})\|$ is a *least squares solution* of the system $A \mathbf{x} = \mathbf{b}$. Do note that minimizing $\|\mathbf{r}(\mathbf{x})\|$ is equivalent to minimizing $\|\mathbf{r}(\mathbf{x})\|^2$.
> *Theorem 7*: let $S$ be a subspace of $\mathbb{R}^m$. For each $b \in \mathbb{R}^m$, there exists a unique $\mathbf{p} \in S$ that suffices
>
> $$
> \|\mathbf{b} - \mathbf{s}\| > \|\mathbf{b} - \mathbf{p}\|,
> $$
>
> for all $\mathbf{s} \in S\backslash\{\mathbf{p}\}$ and $\mathbf{b} - \mathbf{p} \in S^\perp$.
??? note "*Proof*:"
Will be added later.
If $\mathbf{p} = A \mathbf{\hat x}$ in $R(A)$ that is closest to $\mathbf{b}$ then it follows that
$$
\mathbf{b} - \mathbf{p} = \mathbf{b} - A \mathbf{x} = \mathbf{r}(\mathbf{\hat x}),
$$
must be an element of $R(A)^\perp$. Thus, $\mathbf{\hat x}$ is a solution to the least squares problem if and only if
$$
\mathbf{r}(\mathbf{\hat x}) \in R(A)^\perp = N(A^T).
$$
Thus to solve for $\mathbf{\hat x}$ we have the *normal equations* given by
$$
A^T A \mathbf{x} = A^T \mathbf{b}.
$$
Uniqueness of $\mathbf{\hat x}$ can be obtained if $A^T A$ is nonsingular which will be posed in the following theorem.
> *Theorem 8*: let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix with rank $n$, then $A^T A$ is nonsingular.
??? note "*Proof*:"
Let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix with rank $n$. Let $\mathbf{v}$ be a solution of
$$
A^T A \mathbf{x} = \mathbf{0},
$$
then $A \mathbf{v} \in N(A^T)$, but we also have that $A \mathbf{v} \in R(A) = N(A^T)^\perp$. Since $N(A^T) \cap N(A^T)^\perp = \{\mathbf{0}\}$ it follows that
$$
A\mathbf{v} = \mathbf{0},
$$
so $\mathbf{v} = \mathbf{0}$ by the nonsingularity of $A$.
It follows that
$$
\mathbf{\hat x} = (A^T A)^{-1} A^T \mathbf{b},
$$
is the unique solution of the normal equations for $A$ nonsingular and consequently, the unique least squares solution of the system $A \mathbf{x} = \mathbf{b}$.

View file

@ -0,0 +1,115 @@
# Systems of linear equations
> *Definition*: a *linear equation* in $n$ unknowns is an equation of the form
>
> $$
> a_1 x_1 + a_2 x_2 + \dots + a_n x_n = b,
> $$
>
> with $a_i, b \in \mathbb{C}$ the constants and $x_i \in \mathbb{C}$ the variables for $i \in \{1, \dots, n\}$.
>
> A *linear system* of $m$ equations in $n$ unknowns is then a $m \times n$ system of the form
>
> $$
> \begin{align*}
> &a_{11} x_1 + a_{12} x_2 + \dots + a_{1n} x_n = b_1, \\
> &a_{21} x_1 + a_{22} x_2 + \dots + a_{2n} x_n = b_2, \\
> &\vdots \\
> &a_{m1} x_1 + a_{m2} x_2 + \dots + a_{mn} x_n = b_m,
> \end{align*}
> $$
>
> with $a_{ij}, b_i \in \mathbb{C}$ for $i \in \{1, \dots, n\}$ and $j \in \{1, \dots, m\}$.
A system of linear equations may have one solution, no solution or infinitely many solutions. Think of two lines in euclidean space that may intersect at one point (one solution), are parellel (no solution) or are the same line (infinitely many solutions). If the system has at least one solution that it may be referred to as consistent if it has not than it may be referred to as inconsistent.
> *Definition*: two systems of equations involving the same variables are to be **equivalent** if they have the same solution set.
A system may be transformed into an equivalent system by
1. changing the order of the equations,
2. multiplying an equation by a non-zero number,
3. and adding a multiple of an equation to another equation.
> *Definition*: a linear system is said to be *overdetermined* if there are more equations than unknows. A linear system is said to be *underdetermined* if the opposite is true, there are fewer equations than unknowns.
Overdetermined systems are usually inconsistent and a consistent underdetermined system has always infinitely many solutions.
> *Definition*: a $n \times n$ system is said to be in **strict triangular form** if in the $k$th equation the coefficients of the first $k-1$ variables are all zero and the coefficient of $x_k$ is nonzero for $k \in \{1, \dots, n\}$ with $n \in \mathbb{N}$.
For example the system given by
$$
\begin{align*}
3x_1 + 2x_2 + x_3 &= 1, \\
x_2 - x_3 &= 2, \\
2x_3 &= 4,
\end{align*}
$$
with $x_i \in \mathbb{C}$ for $i \in \{1,2,3\}$ is in strict triangular form. This system can be solved with *back substitution* by finding $x_3 = 2$, then $x_2 = 4$ and $x_1 = -3$.
A $m \times n$ system of equations may be represented by a augmented matrix of the form
$$
\left( \begin{array}{cccc|c} a_{11} & a_{12} & \cdots & a_{1n} & b_1 \\ a_{21} & a_{22} & \cdots & a_{2n} & b_2 \\ \vdots & \vdots & & \vdots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} & b_m\end{array} \right)
$$
with $a_{ij}, b_i \in \mathbb{C}$ for $i \in \{1, \dots, n\}$ and $j \in \{1, \dots, m\}$.
It may be solved using the following elementary row operations
1. interchange two rows,
2. multiply a row by a nonzero real number,
3. and replace a row by its sum with a multiple of another row.
based of the equivalence transformations.
## Row echelon form
> *Definition*: a matrix is said to be in **row echelon form**
>
> * if the first nonzero entry in each nonzero row is 1, the pivots.
> * if row $k$ does not consist entirely of zeros, the number of leading zero entries in row $k+1$ is greater than the number of leading zero entries in row $k$.
> * if there are rows whose entries are all zero, they are below the rows having nonzero entries.
For example the following matrices are in row echelon form:
$$
\begin{pmatrix} 1 & 4 & 2 \\ 0 & 1 & 3 \\ 0 & 0 & 1\end{pmatrix}, \qquad \begin{pmatrix} 1 & 2 & 3 \\ 0 & 0 & 1 \\ 0 & 0 & 0\end{pmatrix}, \qquad \begin{pmatrix} 1 & 3 & 1 & 0 \\ 0 & 0 & 1 & 3 \\ 0 & 0 & 0 & 0\end{pmatrix}.
$$
> *Definition*: the process of using row operations 1, 2 and 3 to transform a linear system into one whose augmented matrix is in row echelon form is called **Gaußian elimination**. Obtaining a reduced matrix. Where the variables corresponding to the pivots of reduced matrix will be referred to as *lead variables* and the variables corresponding to the columns skipped in the process will be referred to as *free variables*.
## Reduced row echelon form
> *Definition*: a matrix is said to be in **reduced row echelon form**
>
> * if the matrix is in row echelon form.
> * if the first nonzero entry in each row is the only nonzero entry in its column.
For example the following matrices are in reduced row echelon form:
$$
\begin{pmatrix}
1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1
\end{pmatrix}, \qquad \begin{pmatrix}
1 & 0 & 0 & 3 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 1
\end{pmatrix}, \qquad \begin{pmatrix}
0 & 1 & 2 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0
\end{pmatrix}.
$$
The process of using elementary row operations to transform a matrix into reduced row echelon form is called *Gauß-Jordan reduction*.
## Homogeneous systems
> *Definition*: a system of linear equations is said to be *homogeneous* if the constants on the righthand side are all zero.
Homogeneous are always consistent. Due to their trivial solution; setting all the variables equal to zero.
> *Theorem*: an $m \times n$ homogeneous system of linear equations has a nontrivial solution if $n > m$.
??? note "*Proof*:"
Since a homogeneous system is always consistent, the row echelon form of the matrix can have at most $m$ nonzero rows. Thus there are at most $m$ lead variables. Since there are $n$ variables altogether $n > m$ there must be some free variables. The free variables can be assigned arbitrary values. For each assignment of values to the free variables, there is a solution of the system.

View file

@ -0,0 +1,242 @@
# Tensor formalism
We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n$ and a corresponding dual space $V^*$ with a basis $\{\mathbf{\hat e}^i\}.$ In the following sections we make use of the Einstein summation convention introduced in [vector analysis](/en/physics/mathematical-physics/vector-analysis/curvilinear-coordinates/) and $\mathbb{K} = \mathbb{R} \lor\mathbb{K} = \mathbb{C}.$
## Definition
> *Definition 1*: a **tensor** is a multilinear mapping of the type
>
> $$
> \mathbf{T}: \underbrace{V^* \times \dots \times V^*}_p \times \underbrace{V \times \dots \times V}_q \to \mathbb{K},
> $$
>
> with $p, q \in \mathbb{N}$. Tensors are collectively denoted as
>
> $$
> \mathbf{T} \in \underbrace{V \otimes \dots \otimes V}_p \otimes \underbrace{V^* \otimes \dots \otimes V^*}_q = \mathscr{T}_q^p(V),
> $$
>
> with $\mathscr{T}_0^0(V) = \mathbb{K}$.
We refer to $\mathbf{T} \in \mathscr{T}_q^p(V)$ as a $(p, q)$-tensor; a mixed tensor of **contravariant rank** $p$ and **covariant rank** $q.$ It may be observed that we have $\dim \mathscr{T}_q^p (V) = n^{p+q}$ with $\dim V = n \in \mathbb{N}$.
It follows from definition 1 and by virtue of the isomorphism between $V^{**}$ and $V$ that $\mathbf{T} \in \mathscr{T}_1^0(V) = V^*$ is a covector and $\mathbf{T} \in \mathscr{T}_0^1(V) = V$ is a vector.
## Kronecker tensor
> *Definition 2*: let $\mathbf{k} \in \mathscr{T}_1^1(V)$ be the **Kronecker tensor** be defined such that
>
> $$
> \mathbf{k}(\mathbf{\hat e}^i, \mathbf{e}_j) = \delta^i_j,
> $$
>
> with $\delta_j^i$ the Kronecker symbol.
Let $\mathbf{\hat u} = u_i \mathbf{\hat e}^i \in V^*$ and $\mathbf{v} = v^j \mathbf{e}_j \in V$ then the tensor properties and the definition of the Kronecker tensor imply that
$$
\begin{align*}
\mathbf{k}(\mathbf{\hat u}, \mathbf{v}) &= \mathbf{k}(u_i \mathbf{\hat e}^i, v^j \mathbf{e}_j), \\
&= u_i v^j \mathbf{k}(\mathbf{\hat e}^i, \mathbf{e}_j), \\
&= u_i v^j \delta^i_j, \\
&= u_i v^i.
\end{align*}
$$
## Outer product
> *Definition 3*: the outer product $f \otimes g: X \times Y \to \mathbb{K}$ of two scalar functions $f: X \to \mathbb{K}$ and $g: Y \to \mathbb{K}$ is defined as
>
> $$
> (f \otimes g)(x,y) = f(x) g(y),
> $$
>
> for all $(x,y) \in X \times Y$.
The outer product is associative and distributive with respect to addition and scalar multiplication, but not commutative.
Note that although the same symbol is used for the outer product and the denotation of a tensor space, these are not equivalent.
The following statements are given with $p=q=r=s=1$ without loss of generality.
> *Definition 4*: the mixed $(p, q)$-tensor $\mathbf{e}_i \otimes \mathbf{\hat e}^j \in \mathscr{T}_q^p(V)$ is defined as
>
> $$
> (\mathbf{e}_i \otimes \mathbf{\hat e}^j)(\mathbf{\hat u}, \mathbf{v}) = \mathbf{k}(\mathbf{\hat u}, \mathbf{e}_i) \mathbf{k}(\mathbf{\hat e}^j, \mathbf{v}),
> $$
>
> for all $(\mathbf{\hat u}, \mathbf{v}) \in V^* \times V$.
From this definition the subsequent theorem follows naturally.
> *Theorem 1*: let $\mathbf{T} \in \mathscr{T}_q^p(V)$ be a tensor, then there exists **holors** $T_j^i \in \mathbb{K}$ such that
>
> $$
> \mathbf{T} = T^i_j \mathbf{e}_i \otimes \mathbf{\hat e}^j,
> $$
>
> with $T^i_j = \mathbf{T}(\mathbf{\hat e}^i, \mathbf{e}_j)$.
??? note "*Proof*:"
Let $\mathbf{T} \in \mathscr{T}_q^p(V)$ such that
$$
\begin{align*}
\mathbf{T}(\mathbf{\hat e}^i, \mathbf{e}_j) &= T^k_l (\mathbf{e}_k \otimes \mathbf{\hat e}^l)(\mathbf{\hat e}^i, \mathbf{e}_j), \\
&= T^k_l \mathbf{k}(\mathbf{\hat e}^i, \mathbf{e}_k) \mathbf{k}(\mathbf{\hat e}^l,\mathbf{e}_j), \\
&= T^k_l \delta^i_k \delta^l_j, \\
&= T^i_j.
\end{align*}
$$
For $\mathbf{T} \in \mathscr{T}^0_q(V)$ it follows that there exists holors $T_i \in \mathbb{K}$ such that $\mathbf{T} = T_i \mathbf{\hat e}^i$ with $T_i = \mathbf{T}(\mathbf{e}_i)$, are referred to as the **covariant components** of $\mathbf{T}$ relative to a basis $\{\mathbf{e}_i\}$.
For $\mathbf{T} \in \mathscr{T}^p_0(V)$ it follows that there exists holors $T^i \in \mathbb{K}$ such that $\mathbf{T} = T^i \mathbf{e}_i$ with $T^i = \mathbf{T}(\mathbf{\hat e}^i)$, are referred to as the **contravariant components** of $\mathbf{T}$ relative to a basis $\{\mathbf{e}_i\}$.
If $\mathbf{T} \in \mathscr{T}^p_q(V)$, it follows that there exists holors $T^i_j \in \mathbb{K}$ are coined the **mixed components** of $\mathbf{T}$ relative to a basis $\{\mathbf{e}_i\}$.
By definition tensors are basis independent. Holors are basis dependent.
> *Theorem 2*: let $\mathbf{S} \in \mathscr{T}^p_q(V)$ and $\mathbf{T} \in \mathscr{T}^r_s(V)$ be tensors with
>
> $$
> \mathbf{S} = S^i_j \mathbf{e}_i \otimes \mathbf{\hat e}^j \quad \land \quad \mathbf{T} = T^r_s \mathbf{e}_r \otimes \mathbf{\hat e}^s,
> $$
>
> then the outer product of $\mathbf{S}$ and $\mathbf{T}$ is given by
>
> $$
> \mathbf{S} \otimes \mathbf{T} = S^i_j T^k_l \mathbf{e}_i \otimes \mathbf{e}_k \otimes \mathbf{\hat e}^j \otimes \mathbf{\hat e}^l,
> $$
>
> with $\mathbf{S} \otimes \mathbf{T} \in \mathscr{T}^{p+r}_{q+s}(V)$.
??? note "*Proof*:"
Let $\mathbf{S} \in \mathscr{T}^p_q(V)$ and $\mathbf{T} \in \mathscr{T}^r_s(V)$ with
$$
\mathbf{S} = S^i_j \mathbf{e}_i \otimes \mathbf{\hat e}^j \quad \land \quad \mathbf{T} = T^r_s \mathbf{e}_r \otimes \mathbf{\hat e}^s,
$$
then
$$
\begin{align*}
\mathbf{S} \otimes \mathbf{T} &= S^i_j (\mathbf{e}_i \otimes \mathbf{\hat e}^j) \otimes T^r_s (\mathbf{e}_r \otimes \mathbf{\hat e}^s), \\
&= S^i_j T^r_s \mathbf{e}_i \otimes \mathbf{e}_r \otimes \mathbf{\hat e}^j \otimes \mathbf{\hat e}^s.
\end{align*}
$$
Which maps two vectors and two covectors, therefore $\mathbf{S} \otimes \mathbf{T} \in \mathscr{T}^{p+r}_{q+s}(V)$.
We have from theorem 2 that the outer product of two tensors yields another tensor, with ranks adding up.
## Inner product
> *Definition 5*: an **inner product** on $V$ is a bilinear mapping $\bm{g}: V \times V \to \mathbb{K}$ which satisfies
>
> 1. for all $\mathbf{u}, \mathbf{v} \in V: \; \bm{g}(\mathbf{u}, \mathbf{v}) = \overline{\bm{g}}(\mathbf{v}, \mathbf{u}),$
> 2. for all $\mathbf{u}, \mathbf{v}, \mathbf{w} \in V$ and $\lambda, \mu \in \mathbb{K}: \;\bm{g}(\mathbf{u}, \lambda \mathbf{v} + \mu \mathbf{w}) = \lambda \bm{g}(\mathbf{u}, \mathbf{v}) + \mu \bm{g}(\mathbf{u}, \mathbf{w}),$
> 3. for all $\mathbf{u} \in V\backslash \{\mathbf{0}\}: \bm{g}(\mathbf{u},\mathbf{u}) > 0,$
> 4. for $\mathbf{u} = \mathbf{0} \iff \bm{g}(\mathbf{u},\mathbf{u}) = 0.$
It may be observed that $\bm{g} \in \mathscr{T}_2^0$. Unlike the Kronecker tensor, the existence of an inner product is never implied.
> *Definition 6*: let $G$ be the Gram matrix with its components $G \overset{\text{def}}= (g_{ij})$ defined as
>
> $$
> g_{ij} = \bm{g}(\mathbf{e}_i, \mathbf{e}_j).
> $$
For $\mathbf{u} = u^i \mathbf{e}_i, \mathbf{v} = v^j \mathbf{e}_j \in V$ we then have
$$
\begin{align*}
\bm{g}(\mathbf{u}, \mathbf{v}) &= \bm{g}(u^i \mathbf{e}_i, v^j \mathbf{e}_j), \\
&= u^i v^j \bm{g}(\mathbf{e}_i, \mathbf{e}_j), \\
&\overset{\text{def}}= u^i v^j g_{ij}.
\end{align*}
$$
> *Proposition 1*: the Gram matrix $G$ is symmetric and nonsingular such that
>
> $$
> g^{ik} g_{kj} = \delta^i_j,
> $$
>
> with $G^{-1} \overset{\text{def}}= (g^{ij})$.
??? note "*Proof*:"
Let $G$ be the Gram matrix, symmetry of $G$ follows from defintion 5. Suppose that $G$ is singular, then there exists $\mathbf{u} = u^i \mathbf{e}_i \in V \backslash \{\mathbf{0}\}$ such that $G \mathbf{u} = \mathbf{0} \implies u^i g_{ij} = 0$, as a result we find that
$$
\forall \mathbf{v} = v^j \mathbf{e}_j \in V: 0 = u^i g_{ij} v^j = u^i \bm{g}(\mathbf{e}_i, \mathbf{e}_j) v^j = \bm{g}(u^i \mathbf{e}_i, v^j \mathbf{e}_j) = \bm{g}(\mathbf{u}, \mathbf{v}),
$$
which contradicts the non-degeneracy of the pseudo inner product in definition 5.
> *Theorem 3*: there exists a bijective linear map $\mathbf{g}: V \to V^*$ with inverse $\mathbf{g}^{-1}$ such that
>
> 1. $\forall \mathbf{u}, \mathbf{v} \in V: \; \bm{g}(\mathbf{u}, \mathbf{v}) = \mathbf{k}(\mathbf{g}(\mathbf{u}), \mathbf{v})$,
> 2. $\forall \mathbf{\hat u} \in V^*, \mathbf{v} \in V: \; \bm{g}(\mathbf{g}^{-1}(\mathbf{\hat u}), \mathbf{v}) = \mathbf{k}(\mathbf{\hat u}, \mathbf{v})$,
>
> with $\mathbf{g}(\mathbf{v}) = G \mathbf{v}$ for all $\mathbf{v} \in V$.
??? note "*Proof*:"
Let $\mathbf{u} \in V$ and let $\mathbf{\hat u} \in V^*$, suppose $\mathbf{\hat u}: \mathbf{v} \mapsto \bm{g}(\mathbf{u}, \mathbf{v})$ then we may define $\mathbf{g}: V \to V^*: \mathbf{u} \mapsto \mathbf{g}(\mathbf{u}) \overset{\text{def}} = \mathbf{\hat u}$.
Let $\mathbf{v} \in V \backslash \{\mathbf{0}\}: \mathbf{g}(\mathbf{v}) = \mathbf{0}$, then
$$
0 = \mathbf{k}(\mathbf{g}(\mathbf{v}), \mathbf{w}) \overset{\text{def}} = \bm{g}(\mathbf{v}, \mathbf{w}),
$$
for all $\mathbf{w} \in V$, which contradicts the non-degeneracy of the pseude inner product in definition 5. Hence $\mathbf{g}$ is injective, since $\dim V$ is finite $\mathbf{g}$ is also bijective.
Let $\mathbf{u} = u^i \mathbf{e}_i, \mathbf{v} = v^j \mathbf{e}_j \in V$ and define $\mathbf{g}(\mathbf{e}_i) = \text{g}_{ij} \mathbf{\hat e}^j$ such that
$$
\mathbf{k}(\mathbf{g}(\mathbf{u}), \mathbf{v}) \overset{\text{def}} = \bm{g}(\mathbf{u}, \mathbf{v}) = g_{ij} u^i v^j,
$$
but also
$$
\mathbf{k}(\mathbf{g}(\mathbf{u}), \mathbf{v}) = \text{g}_{ij} u^i v^k\mathbf{k}(\mathbf{\hat e}^j, \mathbf{e}_k) = \text{g}_{ij} u^i v^k \delta^j_k = \text{g}_{ij} u^i v^j.
$$
Since $u^i, v^j \in \mathbb{K}$ are arbitrary it follows that $\text{g}_{ij} = g_{ij}$.
Consequently, the inverse $\mathbf{g}^{-1}: V^* \to V$ has the property $\mathbf{g}^{-1}(\mathbf{\hat u}) = G^{-1} \mathbf{\hat u}$ for all $\mathbf{\hat u} \in V^*$. The bijective linear map $\mathbf{g}$ is commonly known as the **metric** and $\mathbf{g}^{-1}$ as the **dual metric**.
It follows from theorem 3 that for $\mathbf{u} = u^i \mathbf{e}_i \in V$ and $\mathbf{\hat u} = u_i \mathbf{\hat e}^i \in V^*$ we have
$$
\mathbf{g}(\mathbf{u}) = g_{ij} u^i \mathbf{\hat e}^j = u_j \mathbf{\hat e}^j = \mathbf{\hat u},
$$
with $u_j = g_{ij} u^i$ and
$$
\mathbf{g}^{-1}(\mathbf{\hat u}) = g^{ij} u_i \mathbf{e}_j = u^j \mathbf{e}_j = \mathbf{u},
$$
with $u^j = g^{ij} u_i$.
> *Definition 7*: the basis $\{\mathbf{e}_i\}$ of $V$ induces a **reciprocal basis** $\{\mathbf{g}^{-1}(\mathbf{\hat e}^i)\}$ of $V$ given by
>
> $$
> \mathbf{g}^{-1}(\mathbf{\hat e}^i) = g^{ij} \mathbf{e}_j.
> $$
>
> Likewise, the basis $\{\mathbf{\hat e}^i\}$ of $V^*$ induces a **reciprocal dual basis** $\{\mathbf{g}(\mathbf{e}_i)\}$ of $V^*$ given by
>
> $$
> \mathbf{g}(\mathbf{e}^i) = g_{ij} \mathbf{\hat e}^j.
> $$
So far, a vector space $V$ and its associated dual space $V^*$ have been introduced as a priori independent entities. An inner product provides us with an explicit mechanism to construct a bijective linear mapping associated with each vector by virtue of the metric.

View file

@ -0,0 +1,193 @@
# Tensor symmetries
We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n,$ a corresponding dual space $V^*$ with a basis $\{\mathbf{\hat e}^i\}$ and a pseudo inner product $\bm{g}$ on $V.$
## Symmetric tensors
> *Definition 1*: let $\pi = [\pi(1), \dots, \pi(k)]$ be any permutation of the set $\{1, \dots, k\}$, then $\mathbf{T} \in \mathscr{T}^0_q(V)$ is a **symmetric covariant** $q$-tensor if for all $\mathbf{v}_1, \dots, \mathbf{v}_q \in V$ we have
>
> $$
> \mathbf{T}(\mathbf{v}_{\pi(1)}, \dots, \mathbf{v}_{\pi(q)}) = \mathbf{T}(\mathbf{v}_1, \dots, \mathbf{v}_q),
> $$
>
> with $k = q \in \mathbb{N}$.
>
> Likewise, $\mathbf{T} \in \mathscr{T}^p_0(V)$ is a **symmetric contravariant** $p$-tensor if for all $\mathbf{\hat u}_1, \dots, \mathbf{\hat u}_p \in V^*$ we have
>
> $$
> \mathbf{T}(\mathbf{\hat u}_{\pi(1)}, \dots, \mathbf{\hat u}_{\pi(p)}) = \mathbf{T}(\mathbf{\hat u}_1, \dots, \mathbf{\hat u}_p),
> $$
>
> with $k = p \in \mathbb{N}$.
This symmetry implies that the ordering of the (co)vector arguments in a tensor evaluation do not affect the outcome.
> *Definition 2*: the vector space of symmetric covariant $q$-tensors is denoted by $\bigvee_q(V) \subset \mathscr{T}^0_q(V)$ and the vector space of symmetric contravariant $p$-tensors is denoted by $\bigvee^p(V) \subset \mathscr{T}^p_0(V).$
Alternatively one may write $\bigvee_q(V) = V^* \otimes_s \cdots \otimes_s V^*$ and $\bigvee^p(V) = V \otimes_s \cdots \otimes_s V.$
## Antisymmetric tensors
> *Definition 3*: let $\pi = [\pi(1), \dots, \pi(k)]$ be any permutation of the set $\{1, \dots, k\}$, then $\mathbf{T} \in \mathscr{T}^0_q(V)$ is an **antisymmetric covariant** $q$-tensor if for all $\mathbf{v}_1, \dots, \mathbf{v}_q \in V$ we have
>
> $$
> \mathbf{T}(\mathbf{v}_{\pi(1)}, \dots, \mathbf{v}_{\pi(q)}) = \mathrm{sign}(\pi) \mathbf{T}(\mathbf{v}_1, \dots, \mathbf{v}_q),
> $$
>
> with $k = q \in \mathbb{N}$.
>
> Likewise, $\mathbf{T} \in \mathscr{T}^p_0(V)$ is an **antisymmetric contravariant** $p$-tensor if for all $\mathbf{\hat u}_1, \dots, \mathbf{\hat u}_p \in V^*$ we have
>
> $$
> \mathbf{T}(\mathbf{\hat u}_{\pi(1)}, \dots, \mathbf{\hat u}_{\pi(p)}) = \mathrm{sign}(\pi)\mathbf{T}(\mathbf{\hat u}_1, \dots, \mathbf{\hat u}_p),
> $$
>
> with $k = p \in \mathbb{N}$.
This antisymmetry implies that the ordering of the (co)vector arguments in a tensor evaluation only change the sign of the outcome.
> *Definition 4*: the vector space of antisymmetric covariant $q$-tensors is denoted by $\bigwedge_q(V) \subset \mathscr{T}^0_q(V)$ and the vector space of antisymmetric contravariant $p$-tensors is denoted by $\bigwedge^p(V) \subset \mathscr{T}^p_0(V).$
Alternatively one may write $\bigwedge_q(V) = V^* \otimes_a \cdots \otimes_a V^*$ and $\bigwedge^p(V) = V \otimes_a \cdots \otimes_a V.$
It follows from the definitions of symmetric and antisymmetric tensors that for $0$-tensors we have
$$
{\bigvee}_0(V) = {\bigvee}^0(V) = {\bigwedge}_0(V) = {\bigwedge}^0(V) = \mathbb{K}.
$$
Furthermore, for $1$-tensors we have
$$
{\bigvee}_1(V) = {\bigwedge}_1(V) = V^*,
$$
and
$$
{\bigvee}^1(V) = {\bigwedge}^1(V) = V.
$$
## Symmetrisation maps
The following statements are given with the covariant $q$-tensor without loss of generality.
> *Definition 5*: the linear **symmetrisation map** $\mathscr{S}: \mathscr{T}^0_q \to \bigvee_q(V)$ is given by
>
> $$
> \mathscr{S}(\mathbf{T})(\mathbf{v}_1, \dots, \mathbf{v}_q) = \frac{1}{q!} \sum_\pi \mathbf{T}(\mathbf{v}_{\pi(1)}, \dots, \mathbf{v}_{\pi(q)}),
> $$
>
> for all $\mathbf{T} \in \mathscr{T}^0_q(V)$ in which summation runs over all permutations $\pi$ of the set $\{1, \dots, q\}$.
Let $\mathbf{T} = T_{i_1 \cdots i_q} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q} \in \mathscr{T}^0_q(V)$, then we have $\mathscr{S}(\mathbf{T}) = T_{(i_1 \cdots i_q)} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q} \in \bigvee_q(V)$ with
$$
T_{(i_1 \cdots i_q)} = \frac{1}{q!} \sum_\pi T_{i_{\pi(1)} \cdots i_{\pi(q)}}.
$$
If $\mathbf{T} \in \bigvee_q(V)$ then $\mathbf{T} = \mathscr{S}(\mathbf{T})$. The symmetrisation map is idempotent such that $\mathscr{S} \circ \mathscr{S} = \mathscr{S}.$
> *Definition 6*: the linear **antisymmetrisation map** $\mathscr{A}: \mathscr{T}^0_q(V) \to \bigwedge_q(V)$ is given by
>
> $$
> \mathscr{A}(\mathbf{T})(\mathbf{v}_1, \dots, \mathbf{v}_q) = \frac{1}{q!} \sum_\pi \mathrm{sign}(\pi) \mathbf{T}(\mathbf{v}_{\pi(1)}, \dots, \mathbf{v}_{\pi(q)}),
> $$
>
> for all $\mathbf{T} \in \mathscr{T}^0_q(V)$ in which summation runs over all permutations $\pi$ of the set $\{1, \dots, q\}$.
Let $\mathbf{T} = T_{i_1 \cdots i_q} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q} \in \mathscr{T}^0_q(V)$, then we have $\mathscr{A}(\mathbf{T}) = T_{[i_1 \cdots i_q]} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q} \in \bigwedge_q(V)$ with
$$
T_{[i_1 \cdots i_q]} = \frac{1}{q!} \sum_\pi \mathrm{sign}(\pi) T_{i_{\pi(1)} \cdots i_{\pi(q)}}.
$$
If $\mathbf{T} \in \bigwedge_q(V)$ then $\mathbf{T} = \mathscr{A}(\mathbf{T})$. The antisymmetrisation map is idempotent such that $\mathscr{A} \circ \mathscr{A} = \mathscr{A}.$
## Symmetric product
The outer product does not preserve (anti)symmetry. For this reason alternative product operators are introduced which preserve (anti)symmetry. The following statements are given with covariant tensors without loss of generality.
> *Definition 7*: the **symmetric product** between two tensors is defined as
>
> $$
> \mathbf{T} \vee \mathbf{S} = (q+s)! \cdot \mathscr{S}(\mathbf{T} \otimes \mathbf{S}),
> $$
>
> for all $\mathbf{T} \in \mathscr{T}^0_q(V)$ and $\mathbf{S} \in \mathscr{T}^0_s(V)$ with $q,s \in \mathbb{N}$.
It follows from definition 7 that the symmetric product is associative, bilinear and symmetric. Subsequently, we may write a basis of $\bigvee_q(V)$ as
$$
\mathscr{S}(\mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q}) = \frac{1}{q!} \mathbf{\hat e}^{i_1} \vee \cdots \vee \mathbf{\hat e}^{i_q},
$$
with $\{1 \leq i_1 \leq \dots \leq i_q \leq n\}$.
Let $\mathbf{T} \in \bigvee_q(V)$ and $\mathbf{S} \in \bigvee_s(V)$ then it follows that
$$
\mathbf{T} \vee \mathbf{S} = \mathbf{S} \vee \mathbf{T}.
$$
> *Definition 8*: the **antisymmetric product** between two tensors is defined as
>
> $$
> \mathbf{T} \wedge \mathbf{S} = (q+s)! \cdot \mathscr{A}(\mathbf{T} \otimes \mathbf{S}),
> $$
>
> for all $\mathbf{T} \in \mathscr{T}^0_q(V)$ and $\mathbf{S} \in \mathscr{T}^0_s(V)$ with $q,s \in \mathbb{N}$.
It follows from definition 8 that the antisymmetric product is associative, bilinear and antisymmetric. Subsequently, we may write a basis of $\bigwedge_q(V)$ as
$$
\mathscr{A}(\mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q}) = \frac{1}{q!} \mathbf{\hat e}^{i_1} \wedge \cdots \wedge \mathbf{\hat e}^{i_q},
$$
with $\{1 \leq i_1 < \dots < i_q \leq n\}$.
Let $\mathbf{T} \in \bigwedge_q(V)$ and $\mathbf{S} \in \bigwedge_s(V)$ then it follows that
$$
\mathbf{T} \wedge \mathbf{S} = (-1)^{qs} \mathbf{S} \wedge \mathbf{T}.
$$
> *Theorem 1*: the dimension of the vector space of symmetric covariant $q$-tensors is given by
>
> $$
> \dim \Big({\bigvee}_q(V) \Big) = \binom{n+q-1}{q},
> $$
>
> and for antisymmetric covariant $q$-tensors the dimension is given by
>
> $$
> \dim \Big({\bigwedge}_q(V) \Big) = \binom{n}{q}.
> $$
??? note "*Proof*:"
Will be added later.
An interesting result of the definition of the symmetric and antisymmetric product is given in the theorem below.
> *Theorem 2*: let $\mathbf{\hat u}_{1,2} \in V^*$ be covectors, the symmetric product of $\mathbf{\hat u}_1$ and $\mathbf{\hat u}_2$ may be given by
>
> $$
> (\mathbf{\hat u}_1 \vee \mathbf{\hat u}_2)(\mathbf{v}_1, \mathbf{v}_2) = \mathrm{perm}\big(\mathbf{k}(\mathbf{\hat u}_i, \mathbf{v}_j)\big),
> $$
>
> for all $(\mathbf{v}_1, \mathbf{v}_2) \in V \times V$ with $(i,j)$ denoting the entry of the matrix over which the permanent is taken.
>
> The antisymmetric product of $\mathbf{\hat u}_1$ and $\mathbf{\hat u}_2$ may be given by
>
> $$
> (\mathbf{\hat u}_1 \wedge \mathbf{\hat u}_2)(\mathbf{v}_1, \mathbf{v}_2) = \det \big(\mathbf{k}(\mathbf{\hat u}_i, \mathbf{v}_j) \big),
> $$
>
> for all $(\mathbf{v}_1, \mathbf{v}_2) \in V \times V$ with $(i,j)$ denoting the entry of the matrix over which the determinant is taken.
??? note "*Proof*:"
Will be added later.
In some literature theorem 2 is used as definition for the symmetric and antisymmetric product from which the relation with the symmetrisation maps can be proven. Either method is valid, however it has been chosen that defining the products in terms of the symmetrisation maps is more general.

View file

@ -0,0 +1,97 @@
# Tensor transformations
We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n,$ a corresponding dual space $V^*$ with a basis $\{\mathbf{\hat e}^i\}_{i=1}^n$ and a pseudo inner product $\bm{g}$ on $V.$
Let us introduce a different basis $\{\mathbf{f}_i\}_{i=1}^n$ of $V$ with a corresponding dual basis $\{\mathbf{\hat f}^i\}_{i=1}^n$ of $V^*$ which are related to the former basis $\{\mathbf{e}_i\}_{i=1}^n$ by
$$
\mathbf{f}_j = A^i_j \mathbf{e}_i,
$$
so that $\mathbf{\hat e}^i = A^i_j \mathbf{\hat f}^j$.
## Transformation of tensors
Recall from the section of [tensor-formalism]() that a holor depends on the chosen basis, but the corresponding tensor itself does not. This implies that holors transform in a particular way under a change of basis, which is characteristic for tensors.
> *Theorem 1*: let $\mathbf{T} \in \mathscr{T}^p_q(V)$ be a tensor with $p=q=1$ without loss of generality and $B = A^{-1}$. Then $\mathbf{T}$ may be decomposed into
>
> $$
> \begin{align*}
> \mathbf{T} &= T^i_j \mathbf{e}_i \otimes \mathbf{\hat e}^j, \\
> &= \overline T^i_j \mathbf{f}_i \otimes \mathbf{\hat f}^j,
> \end{align*}
> $$
>
> with the holors related by
>
> $$
> \overline T^i_j = B^i_k A^j_l T^k_l.
> $$
??? note "*Proof*:"
Will be added later.
The homogeneous nature of the tensor transformation implies that a holor equation of the form $T^i_j = 0$ holds relative to any basis if it holds relative to a particular one.
## Transformation of volume forms
> *Lemma 1*: let $(V, \bm{\mu})$ be a vector space with an oriented volume form with
>
> $$
> \begin{align*}
> \bm{\mu} &= \mu_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_n}, \\
> &= \overline \mu_{i_1 \dots i_n} \mathbf{\hat f}^{i_1} \otimes \cdots \otimes \mathbf{\hat f}^{i_n},
> \end{align*}
> $$
>
> then we have
>
> $$
> \overline \mu_{j_1 \dots j_n} = A^{i_1}_{j_1} \cdots A^{i_n}_{j_n} \mu_{i_1 \dots i_n} = \mu_{j_1 \dots j_n} \det (A).
> $$
??? note "*Proof*:"
Will be added later.
Then $\det(A)$ is the volume scaling factor of the transformation with $A$. So that if $\bm{\mu}(\mathbf{e}_1, \dots, \mathbf{e}_n) = 1$, then $\bm{\mu}(\mathbf{f}_1, \dots, \mathbf{f}_n) = \det(A).$
> *Theorem 2*: let $(V, \bm{\mu})$ be a vector space with an oriented volume form with
>
> $$
> \begin{align*}
> \bm{\mu} &= \mu_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_n}, \\
> &= \overline \mu_{i_1 \dots i_n} \mathbf{\hat f}^{i_1} \otimes \cdots \otimes \mathbf{\hat f}^{i_n},
> \end{align*}
> $$
>
> and if we define
>
> $$
> \overline \mu_{i_1 \dots i_n} \overset{\text{def}}{=} \frac{1}{\det (A)} A^{j_1}_{i_1} \cdots A^{j_n}_{i_n} \mu_{j_1 \dots j_n},
> $$
>
> then $\mu_{i_1 \dots i_n} = \overline \mu_{i_1 \dots i_n} = [i_1, \dots, i_n]$ is an invariant holor.
??? note "*Proof*:"
Will be added later.
## Transformation of Levi-Civita form
> *Theorem 3*: let $\bm{\epsilon} \in \bigwedge_n(V)$ be the Levi-Civita tensor with
>
> $$
> \begin{align*}
> \bm{\epsilon} &= \epsilon_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_n}, \\
> &= \overline \epsilon_{i_1 \dots i_n} \mathbf{\hat f}^{i_1} \otimes \cdots \otimes \mathbf{\hat f}^{i_n},
> \end{align*}
> $$
>
> then $\epsilon_{i_1 \dots i_n} = \overline \epsilon_{i_1 \dots i_n}$ is an invariant holor.
??? note "*Proof*:"
Will be added later.

View file

@ -0,0 +1,131 @@
# Volume forms
We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n,$ a corresponding dual space $V^*$ with a basis $\{\mathbf{\hat e}^i\}_{i=1}^n$ and a pseudo inner product $\bm{g}$ on $V.$
## n-forms
> *Definition 1*: let $\bm{\mu} \in \bigwedge_n(V) \backslash \{\mathbf{0}\}$, if
>
> $$
> \bm{\mu}(\mathbf{e}_1, \dots, \mathbf{e}_n) = 1,
> $$
>
> then $\bm{\mu}$ is the **unit volume form** with respect to the basis $\{\mathbf{e}_i\}$.
Note that $\dim \bigwedge_n(V) = 1$ and consequently if $\bm{\mu}_1, \bm{\mu}_2 \in \bigwedge_n(V) \backslash \{\mathbf{0}\}$, then $\bm{\mu}_1 = \lambda \bm{\mu}_2$ with $\lambda \in \mathbb{K}$.
> *Proposition 1*: the unit volume form $\bm{\mu} \in \bigwedge_n(V) \backslash \{\mathbf{0}\}$ may be given by
>
> $$
> \begin{align*}
> \bm{\mu} &= \mathbf{\hat e}^1 \wedge \dots \wedge \mathbf{\hat e}^n, \\
> &= \mu_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \dots \otimes \mathbf{\hat e}^{i_n},
> \end{align*}
> $$
>
> with $\mu_{i_1 \dots i_n} = [i_1, \dots, i_n]$.
??? note "*Proof*:"
Will be added later.
The normalisation of the unit volume form $\bm{\mu}$ requires a basis. Consequently, the identification $\mu_{i_1 \dots i_n} = [i_1, \dots, i_n]$ holds only relative to the basis.
> *Definition 2*: let $(V, \bm{\mu})$ denote the vector space $V$ endowed with an **oriented volume form** $\bm{\mu}$. For $\bm{\mu} > 0$ we have a positive orientation of $(V, \bm{\mu})$ and for $\bm{\mu} < 0$ we have a negative orientation of $(V, \bm{\mu})$.
For a vector space with an oriented volume $(V, \bm{\mu})$ we may write
$$
\bm{\mu} = \mu_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_n},
$$
or, equivalently
$$
\bm{\mu} = \mu_{|i_1 \dots i_n|} \mathbf{\hat e}^{i_1} \wedge \cdots \wedge \mathbf{\hat e}^{i_n},
$$
by convention, to resolve ambiguity with respect to the meaning of $\mu_{i_1 \dots i_n}$ without using another symbol or extra accents.
Using theorem 2 in the section of [tensor symmetries]() we may state the following.
> *Proposition 2*: let $(V, \bm{\mu})$ be a vector space with an oriented volume form, then we have
>
> $$
> \bm{\mu}(\mathbf{v}_1, \dots, \mathbf{v}_n) = \det \big(\mathbf{k}(\mathbf{\hat e}^i, \mathbf{v}_j) \big),
> $$
>
> for all $\mathbf{v}_1, \dots, \mathbf{v}_n \in V$ with $(i,j)$ denoting the entry of the matrix over which the determinant is taken.
??? note "*Proof*:"
Will be added later.
Which reveals the role of the Kronecker tensor and thus the role of the dual space in the definition of $\bm{\mu}$. We may also conclude that an oriented volume $\bm{\mu} \in \bigwedge_n(V)$ on a vector space $V$ does not require an inner product.
From proposition 2 it may also be observed that within a geometrical context the oriented volume form may represent the area of a parallelogram in $n=2$ or the volume of a parallelepiped in $n=3$, span by its basis.
## (n - k)-forms
> *Definition 3*: let $(V, \bm{\mu})$ be a vector space with an oriented volume form and let $\mathbf{u}_1, \dots, \mathbf{u}_k \in V$ with $k \in \mathbb{N}[k < n]$. Let the $(n-k)$-form $\bm{\mu} \lrcorner \mathbf{u}_1 \lrcorner \dots \lrcorner \mathbf{u}_k \in \bigwedge_{n-k}(V)$ be defined as
>
> $$
> \bm{\mu} \lrcorner \mathbf{u}_1 \lrcorner \dots \lrcorner \mathbf{u}_k(\mathbf{v}_{k+1}, \dots, \mathbf{v}_n) = \bm{\mu}(\mathbf{u}_1, \dots, \mathbf{u}_k, \mathbf{v}_{k+1}, \dots, \mathbf{v}_n),
> $$
>
> for all $\mathbf{v}_{k+1}, \dots, \mathbf{v}_n \in V$ with $\lrcorner$ the insert operator.
It follows that $(n-k)$-form $\bm{\mu} \lrcorner \mathbf{u}_1 \lrcorner \dots \lrcorner \mathbf{u}_k \in \bigwedge_{n-k}(V)$ can be written as
$$
\begin{align*}
\bm{\mu} \lrcorner \mathbf{u}_1 \lrcorner \dots \lrcorner \mathbf{u}_k &= u_1^{i_1} \cdots u_k^{i_k} (\bm{\mu} \lrcorner \mathbf{e}_{i_1} \lrcorner \dots \lrcorner \mathbf{e}_{i_k}), \\
&= u_1^{i_1} \cdots u_k^{i_k} \mu_{i_1 \dots i_n} (\mathbf{\hat e}^{i_{k+1}} \wedge \cdots \wedge \mathbf{\hat e}^{i_{n}}),
\end{align*}
$$
for $\mathbf{u}_1, \dots, \mathbf{u}_k \in V$ with $k \in \mathbb{N}[k < n]$ and decomposition by $\mathbf{u}_q = u_q^{i_q} \mathbf{e}_{i_q}$ for $q \in \mathbb{N}[q \leq k]$.
If we have a unit volume form $\bm{\mu}$ with respect to $\{\mathbf{e}_i\}$ then
$$
\bm{\mu}\lrcorner\mathbf{e}_1 \lrcorner \dots \lrcorner \mathbf{e}_k = \mathbf{\hat e}^{i_{k+1}} \wedge \cdots \wedge \mathbf{e}^{i_n},
$$
for $k \in \mathbb{N}[k < n]$.
## Levi-Civita form
> *Definition 4*: let $(V, \bm{\mu})$ be a vector space with a unit volume form with invariant holor. Let $\bm{\epsilon} \in \bigwedge_n(V)$ be the **Levi-Civita tensor** which is the unique unit volume form of positive orientation defined as
>
> $$
> \bm{\epsilon} = \sqrt{g} \bm{\mu},
> $$
>
> with $g \overset{\text{def}}{=} \det (G)$, the determinant of the [Gram matrix]().
Therefore, if we decompose the Levi-Civita tensor by
$$
\bm{\epsilon} = \epsilon_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \dots \otimes \mathbf{\hat e}^{i_n} = \epsilon_{|i_1 \dots i_n|} \mathbf{\hat e}^{i_1} \wedge \dots \wedge \mathbf{\hat e}^{i_n},
$$
then we have $\epsilon_{i_1 \dots i_n} = \sqrt{g} \mu_{i_1 \dots i_n}$ and $\epsilon_{|i_1 \dots i_n|} = \sqrt{g}$.
> *Theorem 2*: let $(V, \bm{\mu})$ be a vector space with a unit volume form with invariant holor. Let $\mathbf{g}(\bm{\epsilon}) \in \bigwedge^n(V)$ be the **reciprocal Levi-Civita tensor** which is given by
>
> $$
> \mathbf{g}(\bm{\epsilon}) = \frac{1}{\sqrt{g}} \bm{\mu}.
> $$
??? note "*Proof*:"
Will be added later.
We may decompose the reciprocal Levi-Civita tensor by
$$
\mathbf{g}(\bm{\epsilon}) = \epsilon^{i_1 \dots i_n} \mathbf{e}_{i_1} \otimes \cdots \otimes \mathbf{e}_{i_n} = \epsilon^{|i_1 \dots i_n|} \mathbf{e}_{i_1} \wedge \cdots \wedge \mathbf{e}_{i_n},
$$
then we have $\epsilon^{i_1 \dots i_n} = \frac{1}{\sqrt{g}} \mu^{i_1 \dots i_n}$ and $\epsilon^{|i_1 \dots i_n|} = \frac{1}{\sqrt{g}}$.

View file

@ -0,0 +1,504 @@
# Vector spaces
## Definition
> *Definition*: a **vector space** $V$ is a set on which the operations of addition and scalar multiplication are defined. Such that for all vectors $\mathbf{u}$ and $\mathbf{v}$ in $V$ the vectors $\mathbf{u} + \mathbf{v}$ are in $V$ and for each scalar $a$ the vector $a\mathbf{v}$ is in $V$. With the following axioms satisfied.
>
> 1. Associativity of vector addition: $\mathbf{u} + (\mathbf{v} + \mathbf{w}) = (\mathbf{u} + \mathbf{v}) + \mathbf{w}$ for any $\mathbf{u},\mathbf{v}, \mathbf{w} \in V$.
> 2. Commutativity of vector addition: $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ for any $\mathbf{u},\mathbf{v} \in V$.
> 3. Identity element of vector addition: $\exists \mathbf{0} \in V$ such that $\mathbf{v} + \mathbf{0}$ for all $\mathbf{v} \in V$.
> 4. Inverse element of vector addition: $\forall \mathbf{v} \in V \exists (-\mathbf{v}) \in V$ such that $\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$.
> 5. Distributivity of scalar multiplication with respect to vector addition: $a(\mathbf{u} + \mathbf{v}) = a\mathbf{u} + a\mathbf{v}$ for any scalar $a$ and any $\mathbf{u}, \mathbf{v} \in V$.
> 6. Distributivity of scalar multiplication with respect to field addition: $(a + b) \mathbf{v} = a \mathbf{v} + b \mathbf{v}$ for any scalars $a$ and $b$ and any $\mathbf{v} \in V$.
> 7. Compatibility of scalar multiplication with field multiplication: $a(b\mathbf{v}) = (ab) \mathbf{v}$ for any scalars $a$ and $b$ and any $\mathbf{v} \in V$.
> 8. Identity element of scalar multiplication: $1 \mathbf{v} = \mathbf{v}$ for all $\mathbf{v} \in V$.
Some important properties of a vector space can be derived from this definition in the following proposition a few have been listed.
> *Proposition*: if $V$ is a vector space and $\mathbf{u}$, $\mathbf{v}$ is in $V$, then
>
> 1. $0 \mathbf{v} = \mathbf{0}$.
> 2. $\mathbf{u} + \mathbf{v} = \mathbf{0} \implies \mathbf{u} = - \mathbf{v}$.
> 3. $(-1)\mathbf{v} = - \mathbf{v}$.
??? note "*Proof*:"
For 1, suppose $\mathbf{v} \in V$ then it follows from axioms 3, 6 and 8
$$
\mathbf{v} = 1 \mathbf{v} = (1 + 0)\mathbf{v} = 1 \mathbf{v} + 0 \mathbf{v} = \mathbf{v} + 0\mathbf{v},
$$
therefore
$$
\begin{align*}
-\mathbf{v} + \mathbf{v} &= - \mathbf{v} + (\mathbf{v} + 0\mathbf{v}) = (-\mathbf{v} + \mathbf{v}) + 0\mathbf{v}, \\
\mathbf{0} &= \mathbf{0} + 0\mathbf{v} = 0\mathbf{v}.
\end{align*}
$$
For 2, suppose for $\mathbf{u}, \mathbf{v} \in V$ that $\mathbf{u} + \mathbf{v} = \mathbf{0}$ then it follows from axioms 1, 3 and 4
$$
- \mathbf{v} = - \mathbf{v} + \mathbf{0} = - \mathbf{v} + (\mathbf{v} + \mathbf{u}),
$$
therefore
$$
-\mathbf{v} = (-\mathbf{v} + \mathbf{v}) + \mathbf{u} = \mathbf{0} + \mathbf{u} = \mathbf{u}.
$$
For 3, suppose $\mathbf{v} \in V$ then it follows from 1 and axioms 4 and 6
$$
\mathbf{0} = 0 \mathbf{v} = (1 + (-1))\mathbf{v} = 1\mathbf{v} + (-1)\mathbf{v},
$$
therefore
$$
\mathbf{v} + (-1)\mathbf{v} = \mathbf{0},
$$
from 2 it follows then that
$$
(-1)\mathbf{v} = -\mathbf{v}.
$$
### Euclidean spaces
Perhaps the most elementary vector spaces are the Euclidean vector spaces $V = \mathbb{R}^n$ with $n \in \mathbb{N}$. Given a nonzero vector $\mathbf{u} \in \mathbb{R}^n$ defined by
$$
\mathbf{u} = \begin{pmatrix}u_1 \\ \vdots \\ u_n\end{pmatrix},
$$
it may be associated with the directed line segment from $(0, \dots, 0)$ to $(u_1, \dots, u_n)$. Or more generally line segments that have the same length and direction can be represented by any line segment from $(a_1, \dots, a_n)$ to $(a_1 + u_1, \dots, a_n + u_n)$. Vector addition and scalar multiplication in $\mathbb{R}^n$ are respectively defined by
$$
\mathbf{u} + \mathbf{v} = \begin{pmatrix} u_1 + v_1 \\ \vdots \\ u_n + v_n \end{pmatrix} \quad \text{ and } \quad a \mathbf{u} = \begin{pmatrix} a u_1 \\ \vdots \\ a u_n \end{pmatrix},
$$
for any $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$ and any scalar $a$.
This can be extended to matrices with $V = \mathbb{R}^{m \times n}$ with $m,n \in \mathbb{N}$, the set of all matrices. Given a nonzero matrix $A \in \mathbb{R}^{m \times n}$ defined by $A = (a_{ij})$. Matrix addition and scalar multiplication in $\mathbb{R}^{m \times n}$ are respectively defined by
$$
A + B = C \iff a_{ij} + b_{ij} = c_{ij} \quad \text{ and } \quad \alpha A = C \iff \alpha a_{ij} = c_{ij},
$$
for any $A, B, C \in \mathbb{R}^{m \times n}$ and any scalar $\alpha$.
### Function spaces
Let $V$ be a vector space over a field $F$ and let $X$ be any set. The functions $X \to F$ can be given the structure of a vector space over $F$ where the operations are defined by
$$
\begin{align*}
(f + g)(x) = f(x) + g(x), \\
(af)(x) = af(x),
\end{align*}
$$
for any $f,g: X \to F$, any $x \in X$ and any $a \in F$.
### Polynomial spaces
Let $P_n$ denote the set of all polynomials of degree less than $n \in \mathbb{N}$ where the operations are defined by
$$
\begin{align*}
(p+q)(x) = p(x) + q(x), \\
(ap)(x) = ap(x),
\end{align*}
$$
for any $p,q: X \to P_n$, any $x \in X$ and any $a \in P_n$.
## Vector subspaces
> *Definition*: if $S$ is a nonempty subset of a vector space $V$ and $S$ satisfies the conditions
>
> 1. $a \mathbf{u} \in S$ whenever $\mathbf{u} \in S$ for any scalar $a$.
> 2. $\mathbf{u} + \mathbf{v} \in S$ whenever $\mathbf{u}, \mathbf{v} \in S$.
>
> then $S$ is said to be a **subspace** of $V$.
In a vector space $V$ it can be readily verified that $\{\mathbf{0}\}$ and $V$ are subspaces of $V$. All other subspaces are referred to as *proper subspaces* and $\{\mathbf{0}\}$ is referred to as the *zero subspace*.
> *Theorem*: Every subspace of a vector space is a vector space.
??? note "*Proof*:"
May be proved by testing if all axioms remain valid for the definition of a subspace.
### The null space of a matrix
> *Definition*: let $A \in \mathbb{R}^{m \times n}$, $\mathbf{x} \in \mathbb{R}^n$ and let $N(A)$ denote the set of all solutions of the homogeneous system $A\mathbf{x} = \mathbf{0}$. Therefore
>
> $$
> N(A) = \{\mathbf{x} \in \mathbb{R}^n \;|\; A \mathbf{x} = \mathbf{0}\},
> $$
>
> referred to as the null space of $A$.
Claiming that $N(A)$ is a subspace of $\mathbb{R}^n$. Clearly $\mathbf{0} \in N(A)$ so $N(A)$ is nonempty. If $\mathbf{x} \in N(A)$ and $\alpha$ is a scalar then
$$
A(\alpha \mathbf{x}) = \alpha A\mathbf{x} = \alpha \mathbf{0} = \mathbf{0}
$$
and hence $\alpha \mathbf{x} \in N(A)$. If $\mathbf{x}, \mathbf{y} \in N(A)$ then
$$
A(\mathbf{x} + \mathbf{y}) = A\mathbf{x} + A\mathbf{y} = \mathbf{0} + \mathbf{0} = \mathbf{0}
$$
therefore $\mathbf{x} + \mathbf{y} \in N(A)$ and it follows that $N(A)$ is a subspace of $\mathbb{R}^n$.
### The span of a set of vectors
> *Definition*: let $\mathbf{v}_1, \dots, \mathbf{v}_n$ be vectors in a vector space $V$ with $n \in \mathbb{N}$. A sum of the form
>
> $$
> a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n,
> $$
>
> with scalars $a_1, \dots, a_n$ is called a **linear combination** of $\mathbf{v}_1, \dots, \mathbf{v}_n$.
>
> The set of all linear combinations of $\mathbf{v}_1, \dots, \mathbf{v}_n$ is called the **span** of $\mathbf{v}_1, \dots, \mathbf{v}_n$ which is denoted by $\text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$.
The nullspace can be for example defined by a span of vectors.
> *Theorem*: if $\mathbf{v}_1, \dots, \mathbf{v}_n$ are vectors in a vector space $V$ with $n \in \mathbb{N}$ then $\text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ is a subspace of $V$.
??? note "*Proof*:"
Let $b$ be a scalar and $\mathbf{u} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ given by
$$
a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n,
$$
with scalars $a_1, \dots, a_n$. Since
$$
b \mathbf{u} = (b a_1)\mathbf{v}_1 + \dots + (b a_n)\mathbf{v}_n,
$$
it follows that $b \mathbf{u} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$.
If we also have $\mathbf{w} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ given by
$$
b_1 \mathbf{v}_1 + \dots + b_n \mathbf{v}_n,
$$
with scalars $b_1, \dots, b_n$. Then
$$
\mathbf{u} + \mathbf{w} = (a_1 + b_1) \mathbf{v}_1 + \dots + (a_n + b_n)\mathbf{v}_n,
$$
it follows that $\mathbf{u} + \mathbf{w} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ is a subspace of $V$.
For example, a vector $\mathbf{x} \in \mathbb{R}^3$ is in $\text{span}(\mathbf{e}_1, \mathbf{e}_2)$ if and only if it lies in the $x_1 x_2$-plane in 3-space. Thus we can think of the $x_1 x_2$-plane as the geometrical representation of the subspace $\text{span}(\mathbf{e}_1, \mathbf{e}_2)$.
> *Definition*: the set $\{\mathbf{v}_1, \dots, \mathbf{v}_n\}$ with $n \in \mathbb{N}$ is a spanning set for $V$ if and only if every vector $V$ can be written as a linear combination of $\mathbf{v}_1, \dots, \mathbf{v}_n$.
## Linear independence
We have the following observations.
> *Proposition*: if $\mathbf{v}_1, \dots, \mathbf{v}_n$ with $n \in \mathbb{N}$ span a vector space $V$ and one of these vectors can be written as a linear combination of the other $n-1$ vectors then those $n-1$ vectors span $V$.
??? note "*Proof*:"
suppose $\mathbf{v}_n$ with $n \in \mathbb{N}$ can be written as a linear combination of the vectors $\mathbf{v}_1, \dots, \mathbf{v}_{n-1}$ given by
$$
\mathbf{v}_n = a_1 \mathbf{v}_1 + \dots + a_{n-1} \mathbf{v}_{n-1}.
$$
Let $\mathbf{v}$ be any element of $V$. Since we have
$$
\begin{align*}
\mathbf{v} &= b_1 \mathbf{v}_1 + \dots + b_{n-1} \mathbf{v}_{n-1} + b_n \mathbf{v}_n, \\
&= b_1 \mathbf{v}_1 + \dots + b_{n-1} \mathbf{v}_{n-1} + b_n (a_1 \mathbf{v}_1 + \dots + a_{n-1} \mathbf{v}_{n-1}), \\
&= (b_1 + b_n a_1)\mathbf{v}_1 + \dots + (b_{n-1} + b_n a_{n-1}) \mathbf{v}_{n-1},
\end{align*}
$$
we can write any vector $\mathbf{v} \in V$ as a linear combination of $\mathbf{v}_1, \dots, \mathbf{v}_{n-1}$ and hence these vectors span $V$.
> *Proposition*: given $n$ vectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ with $n \in \mathbb{N}$, it is possible to write one of the vectors as a linear combination of the other $n-1$ vectors if and only if there exist scalars $a_1, \dots, a_n$ not all zero such that
>
> $$
> a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n = \mathbf{0}.
> $$
??? note "*Proof*:"
Suppose that one of the vectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ with $n \in \mathbb{N}$ can be written as a linear combination of the others
$$
\mathbf{v}_n = a_1 \mathbf{v}_1 + \dots + a_{n-1} \mathbf{v}_{n-1}.
$$
Subtracting $\mathbf{v}_n$ from both sides obtains
$$
a_1 \mathbf{v}_1 + \dots + a_{n-1} \mathbf{v}_{n-1} - \mathbf{v}_n = \mathbf{0},
$$
we have $a_n = -1$ and
$$
a_1 \mathbf{v}_1 + \dots + a_n\mathbf{v}_n = \mathbf{0}.
$$
We may use these oberservations to state the following definitions.
> *Definition*: the vectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ in a vector space $V$ with $n \in \mathbb{N}$ are said to be **linearly independent** if
>
> $$
> a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n = \mathbf{0} \implies \forall i \in \{1, \dots, n\} [c_i = 0].
> $$
It follows from the above propositions that if $\{\mathbf{v}_1, \dots, \mathbf{v}_n\}$ is a minimal spanning set of a vector space $V$ then $\mathbf{v}_1, \dots, \mathbf{v}_n$ are linearly independent. A minimal spanning set is called a basis of the vector space.
> *Definition*: the vectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ in a vector space $V$ with $n \in \mathbb{N}$ are said to be **linearly dependent** if there exists scalars $a_1, \dots, a_n$ not all zero such that
>
> $$
> a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n = \mathbf{0}.
> $$
It follows from the above propositions that if a set of vectors is linearly dependent then at least one vector is a linear combination of the other vectors.
> *Theorem*: let $\mathbf{x}_1, \dots, \mathbf{x}_n$ be vectors in $\mathbb{R}^n$ with $n \in \mathbb{N}$ and let $X = (\mathbf{x}_1, \dots, \mathbf{x}_n)$. The vectors $\mathbf{x}_1, \dots, \mathbf{x}_n$ will be linearly dependent if and only if $X$ is singular.
??? note "*Proof*:"
Let $\mathbf{x}_1, \dots, \mathbf{x}_n$ be vectors in $\mathbb{R}^n$ with $n \in \mathbb{N}$ and let $X = (\mathbf{x}_1, \dots, \mathbf{x}_n)$. Suppose we have the linear combination given by
$$
a_1 \mathbf{x}_1 + \dots + a_n \mathbf{x}_n = \mathbf{0},
$$
can be rewritten as a matrix equation by
$$
X\mathbf{a} = \mathbf{0},
$$
with $\mathbf{a} = (a_1, \dots, a_n)^T$. This equation will have a nontrivial solution if and only if $X$ is singular. Therefore $\mathbf{x}_1, \dots, \mathbf{x}_n$ will be linearly dependent if and only if $X$ is singular.
This result can be used to test whether $n$ vectors are linearly independent in $\mathbb{R}^n$ for $n \in \mathbb{N}$.
> *Theorem*: let $\mathbf{v}_1, \dots, \mathbf{v}_n$ be vectors in a vector space $V$ with $n \in \mathbb{N}$. A vector $\mathbf{v} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ can be written uniquely as a linear combination of $\mathbf{v}_1, \dots, \mathbf{v}_n$ if and only if $\mathbf{v}_1, \dots, \mathbf{v}_n$ are linearly independent.
??? note "*Proof*:"
If $\mathbf{v} \in \text{span}(\mathbf{v}_1, \dots \mathbf{v}_n)$ with $n \in \mathbb{N}$ then $\mathbf{v}$ can be written as a linear combination
$$
\mathbf{v} = a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n.
$$
Suppose that $\mathbf{v}$ can also be expressed as a linear combination
$$
\mathbf{v} = b_1 \mathbf{v}_1 + \dots + b_n \mathbf{v}_n.
$$
If $\mathbf{v}_1, \dots \mathbf{v}_n$ are linearly independent then subtracting both expressions yields
$$
(a_1 - b_1)\mathbf{v}_1 + \dots + (a_n - b_n)\mathbf{v}_n = \mathbf{0}.
$$
By the linear independence of $\mathbf{v}_1, \dots \mathbf{v}_n$, the coefficients must all be 0, hence
$$
a_1 = b_1,\; \dots \;, a_n = b_n
$$
therefore the representation of $\mathbf{v}$ is unique when $\mathbf{v}_1, \dots \mathbf{v}_n$ are linearly independent.
On the other hand if $\mathbf{v}_1, \dots \mathbf{v}_n$ are linearly dependent then the coefficients must not all be 0 and $a_i \neq b_i$ for some $i \in \{1, \dots, n\}$. Therefore the representation of $\mathbf{v}$ is not unique when $\mathbf{v}_1, \dots \mathbf{v}_n$ are linearly dependent.
## Basis and dimension
> *Definition*: the vectors $\mathbf{v}_1,\dots,\mathbf{v}_n \in V$ form a basis if and only if
>
> 1. $\mathbf{v}_1,\dots,\mathbf{v}_n$ are linearly independent,
> 2. $\mathbf{v}_1,\dots,\mathbf{v}_n$ span $V$.
Therefore, a basis may define a vector space, but it is not necessarily unique.
> *Theorem*: if $\{\mathbf{v}_1,\dots,\mathbf{v}_n\}$ is a spanning set for a vector space $V$, then any collection of $m$ vectors in $V$ where $m>n$, is linearly dependent.
??? note "*Proof*:"
Let $\mathbf{u}_1, \dots, \mathbf{u}_m \in V$, where $m > n$. Then since $\{\mathbf{v}_1,\dots,\mathbf{v}_n\}$ span $V$ we have
$$
\mathbf{u}_i = a_{i1} \mathbf{v}_1 + \dots + a_{in} \mathbf{v}_n,
$$
for $i,j \in \{1, \dots, n\}$ with $a_{ij} \in \mathbb{R}$.
A linear combination $c_1 \mathbf{u}_1 + \dots + c_m \mathbf{u}_m$ can be written in the form
$$
c_1 \sum_{j=1}^n a_{1j} \mathbf{v}_j + \dots + c_m \sum_{j=1}^n a_{1j} a_{mj} \mathbf{v}_j,
$$
obtaining
$$
c_1 \mathbf{u}_1 + \dots + c_m \mathbf{u}_m = \sum_{i=1}^m \bigg( c_i \sum_{j=1}^n a_{ij} \mathbf{v}_j \bigg) = \sum_{j=1}^n \bigg(\sum_{i=1}^m a_{ij} c_i \bigg) \mathbf{v}_j.
$$
Considering the system of equations
$$
\sum_{i=1}^m a_{ij} c_i = 0
$$
for $j \in \{1, \dots, n\}$, a homogeneous system with more unknowns than equations. Therefore the system must have a nontrivial solution $(\hat c_1, \dots, \hat c_m)^T$, but then
$$
\hat c_1 \mathbf{u}_1 + \dots + \hat c_m \mathbf{u}_m = \sum_{j=1}^n 0 \mathbf{v}_j = \mathbf{0},
$$
hence $\mathbf{u}_1, \dots, \mathbf{u}_m$ are linearly dependent.
> *Corollary*: if both $\{\mathbf{v}_1,\dots,\mathbf{v}_n\}$ and $\{\mathbf{u}_1,\dots,\mathbf{u}_m\}$ are bases for a vector space $V$, then $n = m$.
??? note "*Proof*:"
Let both $\{\mathbf{v}_1,\dots,\mathbf{v}_n\}$ and $\{\mathbf{u}_1,\dots,\mathbf{u}_m\}$ be bases for $V$. Since $\mathbf{v}_1,\dots,\mathbf{v}_n$ span $V$ and $\mathbf{u}_1,\dots,\mathbf{u}_m$ are linearly independent then it follows that $m \leq n$, similarly $\mathbf{u}_1,\dots,\mathbf{u}_m$ span $V$ and $\mathbf{v}_1,\dots,\mathbf{v}_n$ are linearly independent so $n \leq m$. Which must imply $n=m$.
With this result we may now refer to the number of elements in any basis for a given vector space. Which leads to the following definition.
> *Definition*: let $V$ be a vector space. If $V$ has a basis consisting of $n \in \mathbb{N}$ vectors, then $V$ has **dimension** $n$. The subspace $\{\mathbf{0}\}$ of $V$ is said to have dimension $0$. $V$ is said to be **finite dimensional** if there is a finite set of vectors that spans $V$, otherwise $V$ is **infinite dimensional**.
So a single nonzero vector must span one-dimension exactly. For multiple vectors we have the following theorem.
> *Theorem*: if $V$ is a vector space of dimension $n \in \mathbb{N}\ \backslash \{\mathbf{0}\}$, then
>
> 1. any set of $n$ linearly independent vectors spans $V$,
> 2. any $n$ vectors that span $V$ are linearly independent,
??? note "*Proof*:"
To prove 1, suppose that $\mathbf{v}_1,\dots,\mathbf{v}_n \in V$ are linearly independent and $\mathbf{v} \in V$. Since $V$ has dimension $n$, it has a basis consisting of $n$ vectors and these vectors span $V$. It follows that $\mathbf{v}_1,\dots,\mathbf{v}_n, \mathbf{v}$ must be linearly dependent. Thus there exist scalars $c_1, \dots, c_n, c_{n+1}$ not all zero, such that
$$
c_1 \mathbf{v}_1 + \dots + c_n \mathbf{v}_n + c_{n+1} \mathbf{v} = \mathbf{0}.
$$
The scalar $c_{n+1}$ cannot be zero, since that would imply that $\mathbf{v}_1,\dots,\mathbf{v}_n$ are linearly dependent, hence
$$
\mathbf{v} = a_1 \mathbf{v}_1 + \dots a_n \mathbf{v}_n,
$$
with
$$
a_i = - \frac{c_i}{c_{n+1}}
$$
for $i \in \{1, \dots, n\}$. Since $\mathbf{v}$ was an arbitrary vector in $V$ it follows that $\mathbf{v}_1, \dots, \mathbf{v}_n$ span $V$.
To prove 2, suppose that $\mathbf{v}_1,\dots,\mathbf{v}_n$ span $V$. If $\mathbf{v}_1,\dots,\mathbf{v}_n$ are linearly dependent, then one vector $\mathbf{v}_i$ can be written as a linear combination of the others, take $i=n$ without loss of generality. It follows that $\mathbf{v}_1,\dots,\mathbf{v}_{n-1}$ will still span $V$, which contradicts with $\dim V = n$, therefore $\mathbf{v}_1, \dots, \mathbf{v}_n$ must be linearly independent.
Therefore no set fewer than $n$ vectors can span $V$, if $\dim V = n$.
### Change of basis
> *Definition*: let $V$ be a vector space and let $E = \{\mathbf{e}_1, \dots \mathbf{e}_n\}$ be an ordered basis for $V$. If $\mathbf{v}$ is any element of $V$, then $\mathbf{v}$ can be written in the form
>
> $$
> \mathbf{v} = v_1 \mathbf{e}_1 + \dots + v_n \mathbf{e}_n,
> $$
>
> where $v_1, \dots, v_n \in \mathbb{R}$ are the **coordinates** of $\mathbf{v}$ relative to $E$.
## Row space and column space
> *Definition*: if $A$ is an $m \times n$ matrix, the subspace of $\mathbb{R}^{n}$ spanned by the row vectors of $A$ is called the **row space** of $A$. The subspace of $\mathbb{R}^m$ spanned by the column vectors of $A$ is called the **column space** of $A$.
With the definition of a row space the following theorem may be posed.
> *Theorem*: two row equivalent matrices have the same row space.
??? note "*Proof*:"
Let $A$ and $B$ be two matrices, if $B$ is row equivalent to $A$ then $B$ can be formed from $A$ by a finite sequence of row operations. Thus the row vectors of $B$ must be linear combinations of the row vectors of $A$. Consequently, the row space of $B$ must be a subspace of the row space of $A$. Since $A$ is row equivalent to $B$, by the same reasoning, the row space of $A$ is a subspace of the row space of $B$.
With the definition of a column space a theorem posed in [systems of linear equations](systems-of-linear-equations.md) may be restated as.
> *Theorem*: a linear system $A \mathbf{x} = \mathbf{b}$ is consistent if and only if $\mathbf{b}$ is in the column space of $A$.
??? note "*Proof*:"
For the proof, see the initial proof in [systems of linear equations](systems-of-linear-equations.md).
With this restatement the following statements may be proposed.
> *Proposition*: let $A$ be an $m \times n$ matrix. The linear system $A \mathbf{x} = \mathbf{b}$ is consistent for every $\mathbf{b} \in \mathbb{R}^m$ if and only if the column vectors of $A$ span $\mathbb{R}^m$.
>
> The system $A \mathbf{x} = \mathbf{b}$ has at most one solution for every $\mathbf{b}$ if and only if the column vectors of $A$ are linearly independent.
??? note "*Proof*:"
Let $A$ be an $m \times n$ matrix. It follows that $A \mathbf{x} = \mathbf{b}$ will be consistent for every $\mathbf{b} \in \mathbb{R}^m$ if and only if the column vectors of $A$ span $\mathbb{R}^m$. To prove the second statement, the system $A \mathbf{x} = \mathbf{0}$ can have only the trivial solution and hence the column vectors of $A$ must be linearly independent. Conversely, if the column vectors of $A$ are linearly independent, $A \mathbf{x} = \mathbf{0}$ has only the trivial solution. If $\mathbf{x}_1, \mathbf{x}_2$ were both solutions of $A \mathbf{x} = \mathbf{b}$ then $\mathbf{x}_1 - \mathbf{x}_2$ would be a solution of $A \mathbf{x} = \mathbf{0}$
$$
A(\mathbf{x}_1 - \mathbf{x}_2) = A\mathbf{x}_1 - A\mathbf{x}_2 = \mathbf{b} - \mathbf{b} = \mathbf{0}.
$$
It follows that $\mathbf{x}_1 - \mathbf{x}_2 = \mathbf{0}$ and hence $\mathbf{x}_1 = \mathbf{x}_2$.
From these propositions the following corollary emerges.
> *Corollary*: an $n \times n$ matrix $A$ is nonsingular if and only if the column vectors of $A$ form a basis for $\mathbb{R}^n$.
??? note "*Proof*:"
Let $A$ be an $m \times n$ matrix. If the column vectors of $A$ span $\mathbb{R}^m$, then $n$ must be greater or equal to $m$, since no set of fewer than $m$ vectors could span $\mathbb{R}^m$. If the columns of $A$ are linearly independent, then $n$ must be less than or equal to $m$, since every set of more than $m$ vectors in $\mathbb{R}^m$ is linearly dependent. Thus, if the column vectors of $A$ form a basis for $\mathbb{R}^m$, then $n = m$.
> *Theorem*: if $A$ is an $m \times n$ matrix, the dimension of the row space of $A$ equals the dimension of the column space of $A$.
??? note "*Proof*:"
Will be added later.
## Rank and nullity
> *Definition*: the **rank** of a matrix $A$, denoted as $\text{rank}(A)$, is the dimension of the row space of $A$.
The rank of a matrix may be determined by reducing the matrix to row echelon form. The nonzero rows of the row echelon matrix will form a basis for the row space. The rank may be interpreted as a measure for singularity of the matrix.
> *Definition*: the **nullity** of a matrix $A$, denoted as $\text{nullity}(A)$, is the dimension of the null space of $A$.
The nullity of $A$ is the number of columns without a pivot in the reduced echelon form.
> *Theorem*: if $A$ is an $m \times n$ matrix, then
>
> $$
> \text{rank}(A) + \text{nullity}(A) = n.
> $$
??? note "*Proof*:"
Let $U$ be the reduced echelon form of $A$. The system $A \mathbf{x} = \mathbf{0}$ is equivalent to the system $U \mathbf{x} = \mathbf{0}$. If $A$ has rank $r$, then $U$ will have $r$ nonzero rows and consequently the system $U \mathbf{x} = \mathbf{0}$ will involve $r$ pivots and $n - r$ free variables. The dimension of the null space will equal the number of free variables.