port from mathematics-physics notes

2025-08-26 15:48:53 +02:00 · 2025-08-26 15:48:53 +02:00 · c009ea53f0
commit c009ea53f0
parent a4e106ce02
124 changed files with 13224 additions and 0 deletions
--- a/docs/index.md
+++ b/docs/index.md
@ -0,0 +1,3 @@
 # Welcome
 Here you can find some notes on various matters that serve as a fallback alongside the memory-leak prone neuronal contraption of mine.
--- a/docs/javascripts/katex.js
+++ b/docs/javascripts/katex.js
@ -0,0 +1,10 @@
 document$.subscribe(({ body }) => { 
    renderMathInElement(body, {
        delimiters: [
            { left: "$$",  right: "$$",  display: true },
            { left: "$",   right: "$",   display: false },
            { left: "\\(", right: "\\)", display: false },
            { left: "\\[", right: "\\]", display: true }
        ],
    })
 })
--- a/docs/mathematics/calculus/concavity-and-inflections.md
+++ b/docs/mathematics/calculus/concavity-and-inflections.md
@ -0,0 +1,22 @@
 # Concavity and inflections
 ## Concave up
 A function $f$ is **concave up** on an open differentiable interval $I$ if the derivative $f'$ is an increasing function on $I$, then $f'' > 0$. Obtaining tangent line above the graph. 
 ## Concave dowm
 A function $f$ is **concave down** on an open and differentiable interval $I$ if the derivative is a decreasing function on $I$, then $f'' < 0$. Obtaining tangent lines below the graph.
 ## Inflection points
 The function $f$ has an inflection point at $x_0$ if
 1. the tangent line in $(x_0, f(x_0))$ exists, and
 2. the concavity of $f$ is opposite on opposite sides of $x_0$.
 If $f$ has an inflection point at $x_0$ and $f''(x_0)$ exists, then $f''(x_0) = 0$
 ## The second derivative test
 ...
--- a/docs/mathematics/calculus/continuity.md
+++ b/docs/mathematics/calculus/continuity.md
@ -0,0 +1,53 @@
 # Continuity
 Continuity is a local property. A function $f$ is continuous at an interior point $c$ of its domain if
 $$\lim_{x \to c} f(x) = f(c).$$
 If either $\lim_{x \to c} f(x)$ fails to exist or it exists but is not equal to $f(c)$, then $f$ is discontinuous at $c$.
 ## Right and left continuity
 $f$ is **right continuous** at $c$ thereby having a left endpoint $c$ of its domain if 
 $$\lim_{x \downarrow c} f(x) = f(c)$$
 and **left continuous** thereby having a right endpoint $c$ if
 $$\lim_{x \uparrow c} f(x) = f(c).$$
 ## Continuity on an interval
 $f$ is continuous on the interval $I$ if and only if $f$ is continuous in each point of $I$. In endpoints left/right continuity is sufficient.
 $f$ is called a continuous function if and only if $f$ is continuous on its domain.
 ## Discontinuity
 A discontinuity is removable if and only if the limit exists otherwise the discontinuity is non-removable.
 ## Combining continuous functions
 If the functions $f$ and $g$ are both defined on an interval containing $c$ and both are continuous at $c$, then the following functions are also continuous at $c$:
 * the sum $f + g$ and the difference $f - g$;
 * the product $f g$;
 * the constant multiple $k f$, where $k$ is any number;
 * the quotient $\frac{f}{g}$, provided $g(c) \neq 0$; and
 * the *n*th root $(f(x))^{\frac{1}{n}}$, provided $f(c) > 0$ if $n$ is even.
 This may be proved using the various [limit rules](limits.md/#limit-rules).
 ## The extreme value theorem 
 If $f(x)$ is continuous on the closed, bounded interval $[a,b]$, then there exists numbers $p$ and $q$ in $[a,b]$ such that $\forall x \in [a,b]$,
 $$f(p) \leq f(x) \leq f(q).$$
 Thus, $f$ has the absolute minimum value $m=f(p)$, taken on at the point $p$, and the absolute maximum value $M=f(q)$, taken on at the point $q$. This follows from the consequence of the completeness property of the real numbers. 
 ## The intermediate value theorem
 If $f(x)$ is continuous on the interval $[a,b]$ and if $s$ is a number between $f(a)$ and $f(b)$, then there exists a number $c$ in $[a,b]$ such that $f(c)=s$. This follows from the consequence of the completeness property of the real numbers. 
 In particular, a continuous function defined on a closed interval takes on all values between its minimum value $m$ and its maximum value $M$, so its range is also a closed interval, $[m,M]$.
--- a/docs/mathematics/calculus/differentation.md
+++ b/docs/mathematics/calculus/differentation.md
@ -0,0 +1,231 @@
 # Differentation
 ## The slope of a curve
 The slope $a$ of a curve $C$ at a point $p$ is the slope of the tangent line to $C$ at $P$ if such a tangent line exists. In particular, the slope of the graph of $y=f(x)$ at the point $x_0$ is
 $$
 \lim_{h \to 0} \frac{f(x_0 + h) - f(x_0)}{h} = a.
 $$
 ### Normal line
 If a curve $C$ has a tangent line $L$ at point $p$, then the straight line $N$ through $P$ perpendicular to $L$ is called the **normal** to $C$ at $P$. The slope of the normal $s$ is the negative reciprocal of the slope of the curve $a$, that is
 $$
 s = \frac{-1}{a}
 $$
 ## Derivative
 The **derivative** of a function $f$ is another function $f'$ defined by
 $$
 f'(x) = \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}
 $$
 at all points $x$ for which the limits exists. If $f'(x)$ exists, then $f$ is **differentiable** at $x$.
 ## Differentiability implies continuity
 If $f$ is differentiable at $x$, then $f$ is continuous at $x$.
 **Proof:** Since $f$ is differentiable at $x$
 $$
 \lim_{h \to 0} \frac{f(x + h) - f(x)}{h} = f'(x)
 $$
 must exist. Then, using the [limit rules](limits.md/#limit-rules)
 $$
 \lim_{h \to 0} f(x + h) - f(x) = \lim_{h \to 0} (\frac{f(x + h) - f(x)}{h}) (h) = (f'(x)) (0) = 0
 $$
 This is equivalent to $\lim_{h \to 0} f(x + h) = f(x)$, which says that $f$ is continuous at $x$.
 ## Differentation rules
 * **Differentation of a sum:** $(f + g)'(x) = f'(x) + g'(x)$. 
    * **Proof:** Follows from the [limit rules](limits.md/#limit-rules)
 $$
 \begin{array}{ll}
 (f + g)'(x) &= \lim_{h \to 0} \frac{(f + g)(x + h) - (f + g)(x)}{h}, \\
            &= \lim_{h \to 0} (\frac{f(x + h) - f(x)}{h} + \frac{g(x + h) - g(x)}{h}), \\
            &= f'(x) + g'(x).
 \end{array}
 $$
 * **Differentation of a constant multiple:** $(C f)'(x) = C f'(x)$.
    * **Proof:** Follows from the [limit rules](limits.md/#limit-rules)
 $$
 \begin{array}{ll}
 (C f)'(x) &= \lim_{h \to 0} \frac{C f(x + h) - C f(x)}{h}, \\
          &= C \lim_{h \to 0} \frac{f(x + h) - f(x)}{h}, \\
          &= C f'(x).
 \end{array}
 $$
 * **Differentation of a product:** $(f g)'(x) = f'(x) g(x) + f(x) g'(x)$.
    *  **Proof:** Follows from the [limit rules](limits.md/#limit-rules)
 $$
 \begin{array}{ll}
 (f g)'(x) &= \lim_{h \to 0} \frac{f(x+h) g(x+h) - f(x) g(x)}{h}, \\
          &= \lim_{h \to 0} (\frac{f(x+h) - f(x)}{h} g(x+h) + f(x) \frac{g(x+h) - g(x)}{h}), \\
          &= f'(x) g(x) + f(x) g'(x).
 \end{array}
 $$
 * **Differentation of the reciprocal:** $(\frac{1}{f})'(x) = \frac{-f'(x)}{(f(x))^2}$.
  *  **Proof:** Follows from the [limit rules](limits.md/#limit-rules)
 $$
 \begin{array}{ll}
 (\frac{1}{f})'(x) &= \lim_{h \to 0} \frac{\frac{1}{f(x+h)} - \frac{1}{f(x)}}{h}, \\
                  &= \lim_{h \to 0} \frac{f(x) - f(x+h)}{h f(x+h) f(x)}, \\
                  &= \lim_{h \to 0} (\frac{-1}{f(x+h) f(x)}) \frac{f(x+h) - f(x)}{h}, \\
                  &= \frac{-1}{(f(x))^2} f'(x).
 \end{array} 
 $$
 * **Differentation of a quotient:** $(\frac{f}{g})'(x) = \frac{f'(x) g(x) - f(x) g'(x)}{(g(x))^2}$.
    * **Proof:** Follows from the product and reciprocal rule
 $$
 \begin{array}{ll}
 (\frac{f}{g})'(x) &= (f \frac{1}{g})'(x), \\
                  &= f'(x) \frac{1}{g(x)} + f(x) (- \frac{g'(x)}{(g(x))^2}), \\
                  &= \frac{f'(x) g(x) - f(x) g'(x)}{(g(x))^2}.
 \end{array}
 $$
 * **Differentation of a composite:** $(f \circ g)'(x) = f'(g(x)) g'(x)$.
    * **Proof:** Follows from the [limit rules](limits.md/#limit-rules)
 $$
 \begin{array}{ll}
 (f \circ g)'(x) &= \lim_{h \to 0} \frac{f(g(x+h)) - f(g(x))}{h} \quad \mathrm{let} \space h = a - x, \\
                &= \lim_{a \to x} \frac{f(g(a)) - f(g(x))}{a - x}, \\
                &= \lim_{a \to x} (\frac{f(g(a)) - f(g(x))}{g(a) - g(x)}) (\frac{g(a) - g(x)}{a -x}), \\
                &= f'(g(x)) g'(x).
 \end{array}
 $$
 ## The derivative of the sine and cosine function
 The derivative of the sine function is the cosine function $\frac{d}{dx} \sin x = \cos x$.
 **Proof:** using the definition of the derivative, the addition formula for the sine and the [limit rules](limits.md/#limit-rules)
 $$
 \begin{array}{ll}
 \frac{d}{dx} \sin x &= \lim_{h \to 0} \frac{\sin(x+h) - \sin x}{h}, \\
                    &= \lim_{h \to 0} \frac{\sin x \cos h + \cos x \sin h}{h}, \\
                    &= \lim_{h \to 0} (\sin x (\frac{\cos h - 1}{h}) + \cos x (\frac{\sin h}{h})), \\
                    &= (\sin x) \cdot (0) + (\cos x) \cdot (1) = \cos x.
 \end{array}
 $$
 The derivative of the cosine function is the negative of the sine function $\frac{d}{dx} \cos x = -\sin x$.
 **Proof:** using the derivative of the sine and the composite (chain) rule
 $$
 \begin{array}{ll}
 \frac{d}{dx} \cos x &= \frac{d}{dx} \sin (\frac{\pi}{2} - x), \\
                    &= (-1) \cos (\frac{\pi}{2} - x) = - \sin x.
 \end{array}
 $$
 ## Implicit differentation
 Implicit equations; equations that cannot be solved may still be differentiated by implicit differentation.
 **Example:** $x y^2 + y = 4 x$
 $$
 \begin{array}{ll}
 \frac{dy}{dx}(x y^2 + y = 4 x) &\implies (y^2 + 2 x y \frac{dy}{dx} + \frac{dy}{dx} = 4), \\
                               &\implies (\frac{dy}{dx} = \frac{f- y^2}{1 + 2 x y}).
 \end{array}
 $$
 ## Rolle's theorem
 Suppose that the function $g$ is continuous on the closed and bounded interval $[a,b]$ and is differentiable in the open interval $(a,b)$. If $g(a) = g(b)$ then there exists a point $c$ in the open interval $(a,b)$ such that $g'(c) = 0$.
 **Proof:** By the [extereme value theorem](continuity.md/#the-extreme-value-theorem) $g$ attains its maximum and its minimum in $[a,b]$, if these are both attained at the endpoints of $[a,b]$, then $g$ is constant on $[a.b]$ and so the derivative of $g$ is zero at every point in $(a,b)$.
 Suppose then that the maximum is obtained at an interior point $c$ of $(a,b)$. For a real $h$ such that $c + h$ is in $[a,b]$, the value $g(c + h)$ is smaller or equal to $g(c)$ because $g$ attains its maximum at $c$. 
 Therefore, for every $h>0$, 
 $$\frac{g(c + h) - g(c)}{h} \leq 0,$$
 hence,
 $$\lim_{h \downarrow 0} \frac{g(c + h) - g(c)}{h} \leq 0.$$
 Similarly, for every $h < 0$
 $$\lim_{h \uparrow 0} \frac{g(c + h) - g(c)}{h} \geq 0.$$
 Thereby obtaining,
 $$\lim_{h \to 0} \frac{g(c + h) - g(c)}{h} = 0 = g'(c)$$
 The proof for a minimum value at $c$ is similar.
 ## Mean-value theorem
 Suppose that the function $f$ is continuous on the closed and bounded interval $[a,b]$ and is differentiable in the open interval $(a,b)$. Then there exists a point $c$ in the open interval $(a,b)$ such that
 $$
 \frac{f(b) - f(a)}{b - a} = f'(c).
 $$
 **Proof:** Define $g(x) = f(x) - r x$, where $r$ is a constant. Since $f$ is continuous on $[a,b]$ and differentiable on $(a,b)$, the same is true for $g$. Now $r$ is chosen such that $g$ satisfies the conditions of [Rolle's theorem](differentation.md/#rolles-theorem). Namely
 $$
 \begin{array}{ll}
 g(a) = g(b) &\iff f(a) - ra = f(b) - rb \\
            &\iff r(b - a) = f(b) - f(a) \\
            &\iff r = \frac{f(b) - f(a)}{b - a}
 \end{array}
 $$
 By [Rolle's theorem](differentation.md/#rolles-theorem), since $g$ is differentiable and $g(a) = g(b)$, there is some $c$ in $(a,b)$ for which $g'(c) = 0$, and it follows from the equality $g(x) = f(x) - rx$ that,
 $$
 \begin{array}{ll}
 g'(x) &= f'(x) - r\\
 g'(c) &= 0 \\
 g'(c) &= f'(c) - r = 0 \implies f'(c) = r = \frac{f(b) - f(a)}{b - a}
 \end{array}
 $$
 ## Generalized Mean-value theorem
 If the functions $f$ and $g$ are both continuous on $[a,b]$ and differentiable on $(a,b)$ and if $g'(x) \neq 0$ for every $x$ between $(a,b)$. Then there exists a $c \in (a,b)$ such that
 $$
 \frac{f(b) - f(a)}{g(b) - g(a)} = \frac{f'(c)}{g'(c)}.
 $$
 **Proof:** Let $h(x) = (f(b) - f(a))(g(x) - g(a)) - (g(b) - g(a))(f(x) - f(a))$. 
 Applying [Rolle's theorem](differentation.md/#rolles-theorem), since $h$ is differentiable and $h(a) = h(b)$, there is some $c$ in $(a,b)$ for which $h'(c) = 0$
 $$
 h'(c) = (f(b) - f(a))g'(c) - (g(b) - g(a))f'(c) = 0,
 $$
 $$
 \begin{array}{ll}
 \implies (f(b) - f(a))g'(c) = (g(b) - g(a))f'(c), \\
 \implies \frac{f(b) - f(a)}{g(b) - g(a)} = \frac{f'(c)}{g'(c)}.
 \end{array}
 $$
--- a/docs/mathematics/calculus/extremes-values.md
+++ b/docs/mathematics/calculus/extremes-values.md
@ -0,0 +1,47 @@
 # Extreme values
 ## Absolute extreme values
 Function $f$ has an **absolute maximum value** $f(x_0)$ at the point $x_0$ in its domain if $f(x) \leq f(x_0)$ holds ofr every $x$ in the domain of $f$.
 Similarly, $f$ has an **absolute minimum value** $f(x_1)$ at the point $x_1$ in its domain if $f(x) \geq f(x_1)$ holds for every $x$ in the domain of $f$.
 ## Local extreme values
 Function $f$ has an **local maximum value** $f(x_0)$ at the point $x_0$ in its domain provided there exists a number $h > 0$ such that $f(x) \leq f(x_0)$ whenever $x$ is in the domain of $f$ and $|x - x_0| < h$.
 Similarly, $f$ has an **local minimum value** $f(x_1)$ at the point $x_1$ in its domain provided there exists a number $h > 0$ such that $f(x) \geq f(x_1)$ whenever $x$ is in the domain of $f$ and $|x - x_1| < h$.
 ## Critical points
 A critical point is a point $x \in \mathrm{Dom}(f)$ where $f'(x) =0$.
 ## Singular points
 A singular point is a point $x \in \mathrm{Dom}(f)$ where $f'(x)$ is not defined.
 ## Endpoints
 An endpoint $x \in \mathrm{Dom}(f)$ that does not belong to any open interval contained in $\mathrm{Dom}(f)$
 ## Locating extreme values
 If the function $f$ is defined on an interval $I$ and has a local maxima or minima in $I$ then the point must be either a critical point of $f$, a singular point of $f$ or an endpoint of $I$.
 **Proof:**
 Suppose that $f$ has a local maximum value at $x_0$ and that $x_0$ is neither an endpoint of the domain of $f$ nor a singular point of $f$. Then for some $h > 0$, $f(x)$ is defined on the open interval $(x_0 - h, x_0 + h)$ and has an absolute maximum at $x_0$. Also, $f'(x_0) exists, following from [Rolle's theorem](differentation.md#rolles-theorem).
 ## The first derivative test
 ### Example
 Find the local and absolute extreme values of $f(x) = x^4 - 2x^2 -3$ on the interval $[-2,2]$.
 $$f'(x) = 4x^3 - 4x = 4x(x^2 - 1) = 4x(x - 1)(x + 1)$$
 | $x$ | $-2$| $-1$| $0$ | $1$ | $2$ |
 | --- | --- | --- | --- | --- | --- |
 | $f'$|  |-  0  +|+  0  -|-  0  +|  |
 | $f$ | max | min | max | min | max |
 |     | EP  | CP  | CP  | CP  | EP  |
--- a/docs/mathematics/calculus/improper-integrals.md
+++ b/docs/mathematics/calculus/improper-integrals.md
@ -0,0 +1,151 @@
 # Improper integrals
 Proper integrals are [definite integrals](integration.md/#the-definite-integral) where the integrand $f$ is *continuous* on a *closed, finite* interval $[a,b]$. For positive $f$ it corresponds to the area of a **bounded region** of the plane, a region contained inside some disk of finite radius with centre at the origin. To extend the definite integral by allowing for two possibilities excluded in the situation described above.
 1. There may exist $a=-\infty$ or $b=\infty$ or both.
 2. $f$ may be unbounded as $x$ approaches $a$ or $b$ or both.
 Integrals satisfying 1. are called **improper integrals of type I.** and integrals satisfying 2. are called **improper integrals of type II**.
 ## Improper integrals of type I
 If $f$ is continuous on $[a,\infty]$, the improper integral of $f$ over $[a,\infty]$ is defined as a limit of proper integrals:
 $$
 \int_a^\infty f(x)dx = \lim_{R \to \infty} \int_a^R f(x)dx.
 $$
 Similarly, if $f$ is continuous on $[-\infty,b]$, then the improper integrals is defined as:
 $$
 \int_{-\infty}^b f(x)dx = \lim_{R \to -\infty} \int_R^b f(x)dx.
 $$
 In either case, if the limit exists, the improper integral **converges**. If the limit does not exist, the improper integral **diverges**. If the limit is $\infty$ or $-\infty$, the proper integral **diverges to (negative) infinity**.
 ## Improper integrals of type II
 If $f$ is continuous on the interval $(a,b]$ and is possibly unbounded near $a$, the improper integral may be defined as:
 $$
 \int_a^b f(x)dx = \lim_{c \downarrow a} \int_c^b f(x)dx.
 $$
 Similarly, if $f$ is continuous on $[a,b)$ and is possibly unbounded near $b$, the improper integral may be defined as:
 $$
 \int_a^b f(x)dx = \lim_{c \uparrow b} \int_a^c f(x)dx.
 $$
 These improper integrals may also converge, diverge or diverge to (negative) infinity.
 ## p-integrals
 Summerizing the behaviour of improper integrals of types I and II for powers of $x$ if $0 < a< \infty$, then:
 1. 
 $$
 \int_a^\infty x^{-p}dx = 
    \begin{cases}
        \text{converges to } \frac{a^{1-p}}{p-1} \quad \text{if } p > 1, \\
        \text{diverges to } \infty \quad \text{if } p \leq 1.
    \end{cases}
 $$
 2. 
 $$
 \int_0^a x^{-p}dx =
    \begin{cases}
        \text{converges to } \frac{a^{1-p}}{1-p} \quad \text{if } p < 1, \\
        \text{diverges to } \infty \quad \text{if } p \geq 1.
    \end{cases}
 $$
 **Proof of 1:**
 For $p=1$:
 $$
 \int_a^\infty x^{-1}dx = \lim_{R \to \infty} \int_a^R x^{-1}dx = \lim_{R \to \infty} (\ln R - \ln a) = \infty.
 $$
 For $p < 1$:
 $$
 \begin{array}{ll}
 \int_a^\infty x^{-p}dx &= \lim_{R \to \infty} \int_a^R x^{-p}dx, \\
                       &= \lim_{R \to \infty} [\frac{x^{-p+1}}{-p+1}]_a^R, \\
                       &= \lim_{R \to \infty} \frac{R^{1-p}-a^{1-p}}{1-p} = \infty.
 \end{array}
 $$
 For $p > 1$
 $$
 \begin{array}{ll}
 \int_a^\infty x^{-p}dx &= \lim_{R \to \infty} \int_a^R x^{-p}dx, \\
                       &= \lim_{R \to \infty} [\frac{x^{-p+1}}{-p+1}]_a^R, \\
                       &= \lim_{R \to \infty} \frac{a^{-(p-1)}-R^{-(p-1)}}{p-1} = \frac{a^{1-p}}{p-1}.
 \end{array}
 $$
 **Proof of 2:**
 For $p=1$:
 $$
 \int_0^a x^{-1}dx = \lim_{c \space\downarrow\space 0} \int_c^a x^{-1}dx = \lim_{c \space\downarrow\space 0} (\ln a - \ln c) = \infty.
 $$
 For $p > 1$
 $$
 \begin{array}{ll}
 \int_0^a x^{-p}dx &= \lim_{c \space\downarrow\space 0} \int_c^a x^{-p}dx, \\
                  &= \lim_{c \space\downarrow\space 0} [\frac{x^{-p+1}}{-p+1}]_c^a, \\
                  &= \lim_{c \space\downarrow\space 0} \frac{c^{-(p-1)} - a^{-(p-1)}}{p-1} = \infty.
 \end{array}
 $$
 For $p < 1$:
 $$
 \begin{array}{ll}
 \int_0^a x^{-p}dx &= \lim_{c \space\downarrow\space 0} \int_c^a x^{-p} dx, \\
                  &= \lim_{c \space\downarrow\space 0} [\frac{x^{-p+1}}{-p+1}]_c^a, \\
                  &= \lim_{c \space\downarrow\space 0} \frac{a^{1-p}-c^{1-p}}{1-p} = \frac{a^{1-p}}{1-p}.
 \end{array}
 $$
 ## Comparison theorem for integrals
 Let $-\infty \leq a < b \leq \infty$, and suppose that functions $f$ and $g$ are continuous on the interval $(a,b)$ and satisfy $0 \leq f(x) \leq g(x)$. If $\int_a^b g(x)dx$ converges, then so does $\int_a^b f(x)dx$, and:
 $$
 \int_a^b f(x)dx \leq \int_a^b g(x)dx.
 $$
 Equivalently, if $\int_a^b f(x)dx$ diverges to $\infty$, then so does $\int_a^b g(x)dx$.
 **Proof:** Since both integrands are nonnegative, there are only two possibilities for each integral: it can either converge to a nonnegative number or diverge to $\infty$. Since $f(x) \leq g(x)$ on $(a,b)$, it follows by the [properties of the definite integral](integration.md/#properties) that if $a < r < s < b$, then:
 $$
 \int_r^s f(x)dx \leq \int_r^s g(x)dx.
 $$
 By taking limits as $r \space\downarrow\space a$ and $s \space\uparrow\space b$.
 ### To prove convergence
 To find a function $g$, such that 
 1. $\forall x \in [a,\infty], \space 0 \leq f(x) \leq g(x)$.
 2. $\int_0^\infty g(x)dx$ is convergent.
 ### To prove divergence
 To find a function $f$ such that
 1. $\forall x \in [a,\infty], \space g(x) \geq f(x) \geq 0$.
 2. $\int_0^\infty f(x)dx$ is divergent.
--- a/docs/mathematics/calculus/integration-techniques.md
+++ b/docs/mathematics/calculus/integration-techniques.md
@ -0,0 +1,90 @@
 # Integration techniques
 ## Elementary integrals
 $$
 \int \frac{1}{a^2 + x^2} dx = \frac{1}{a} \arctan(\frac{x}{a}) + C
 $$
 $$
 \int \frac{1}{\sqrt{a^2-x^2}} dx = \arcsin(\frac{x}{a}) + C
 $$
 ## Linearity of the integral
 $$
 \int Af(x) + Bg(x)dx = A\int f(x)dx + B\int g(x)dx
 $$
 **Proof:** is missing.
 ## Substitution
 Suppose that $g$ is a differentiable on $[a,b]$, that satisfies $g(a)=A$ and $g(b)=B$. Also suppose that $f$ is continuous on the range of $g$, then 
 let $u = g(x)$ then $du = g'(x)dx$,
 $$
 \int_a^b f(g(x))g'(x)dx = \int_A^B f(u)du.
 $$
 ## Inverse substitution
 Inverse substitutions appear to make the integral more complicated, thereby this strategy must act as last resort. Substituting $x=g(u)$ in the integral
 $$
 \int_a^b f(x)dx,
 $$
 leads to the integral
 $$
 \int_{x=a}^{x=b} f(g(u))g'(u)du.
 $$
 ## Integration by parts
 Suppose $U(x)$ and $V(x)$ are two differentiable functions. According to the [product rule](differentation.md/#differentation-rules),
 $$
 \frac{d}{dx}(U(x)V(x)) = U(x) \frac{dV}{dx} + V(x) \frac{dU}{dx}.
 $$
 Integrating both sides of this equation and transposing terms
 $$
 \int U(x) \frac{dV}{dx} dx = U(x)V(x) - \int V(x) \frac{dU}{dx} dx,
 $$
 obtaining:
 $$
 \int U dV = U V - \int V dU.
 $$
 For definite integrals that is:
 $$
 \int_a^b f'(x)g(x)dx = [f(x)g(x)]_a^b - \int_a^b f(x)g'(x)dx.
 $$
 ## Integration of rational functions
 Let $P(x)$ and $Q(x)$ be polynomial functions with real coefficients. Forming a rational function, $\frac{P(x)}{Q(x)}$. Let $\frac{P(x)}{Q(x)}$ be a **strictly proper rational function**, that is; $\mathrm{deg}(P(x)) < \mathrm{deg}(Q(x))$. If the function is not it can be possibly made into a **strictly proper rational function** by using **long division**.
 Then, $Q(x)$ can be factored into the product of a constant $K$, real linear factors of the form $x-a_i$, and real quadratic factors of the form $x^2+b_ix + c_i having no real roots.
 The rational function can be expressed as a sum of partial fractions. Corresponding to each factor $(x-a)^m$ of $Q(x)$ the decomposition contains a sum of fractions of the form
 $$
 \frac{A_1}{x-a} + \frac{A_2}{(x-a)^2} + ... + \frac{A_m}{(x-a)^m}.
 $$
 Corresponding to each factor $(x^2+bx+c)^n$ of $Q(x)$ the decomposition contains a sum of fractions of the form
 $$
 \frac{B_1x+C_1}{x^2+bx+c} + \frac{B_2x+C_2}{(x^2+bx+c)^2} + ... + \frac{B_nx+C_n}{(x^2+bx+c)^n}. 
 $$
 The constant $A_1,A_2,...,A_m,B_1,B_2,...,B_n,C_1,C_2,....,C_n$ can be determined by adding up the fractions in the decomposition and equating the coefficients of like powers of $x$ in the numerator of the sum those in $P(x)$.
--- a/docs/mathematics/calculus/integration.md
+++ b/docs/mathematics/calculus/integration.md
@ -0,0 +1,191 @@
 # Integration
 ## Sigma notation
 if $m$ and $n$ are integers with $m \leq n$, and if $f$ is a function defined as $f: \{m,m+1,...,n\} \to \mathbb{R}$, the symbol $\sum_{i=m}^{n} f(i)$ represents the sum of the values of $f$ at those integers:
 $$
 \sum_{i=m}^{n} f(i) = f(m) + f(m+1) + f(m+2) + ... + f(n).
 $$
 The explicit sum appearing on the right side of this equation is the **expansion** of the sum represented in sigma notation on the left side.
 ## Partitions
 Let $P$ be a finite set of points arranged in order between $a$ and $b$ on the real line
 $$
 P = {x_0, x_1, ... , x_{n-1}, x_n},
 $$
 where $a = x_0 < x_1 < ... < x_{n-1} < x_n = b$. Such a set $P$ is called a **partition** of $[a,b]$; it divides $[a,b]$ into $n$ subintervals of which the *i*th is $[x_{i-1},x_i]. The length of the *i*th subinterval of $P$ is
 $$
 \Delta x_i = x_i - x_{i-1} \quad \mathrm{for} \space 1 \leq i \leq n,
 $$
 Then, the **norm** of the partition $P$ is defined as
 $$
 \parallel P \parallel = \max_{1 \leq i \leq n} \Delta x_i
 $$
 If the function $f$ is continuous on the interval $[a,b]$, it is continuous on each subinterval $[x_{i-1},x_i]$, and has a maximum $u_i$ and minimum $l_i$ on each subinterval by the [Extreme value theorem](continuity.md/#the-extreme-value-theorem) such that
 $$
 f(l_i) \leq f(x) \leq f(u_i) \quad \forall x \in [x_{i-1},x_i]/
 $$
 ## Upper and lower Riemann sums
 The **lower Riemann sum**, $L(f,P)$, and the **upper Riemann sum**, $U(f,P)$, for the function $f$ amd the partition $P$ are defined by:
 $$
 \begin{array}{ll}
 L(f,P) &= f(l_1)\Delta x_1 + f(l_2)\Delta x_2 + ... + f(l_n)\Delta x_n \\
       &= \sum_{i=1}^n f(l_i)\Delta x_i, \\
 U(f,P) &= f(u_1)\Delta x_1 + f(u_2)\Delta x_2 + ... + f(u_n)\Delta x_n \\
       &= \sum_{i=1}^n f(u_i)\Delta x_i.
 \end{array}
 $$
 **Theorem:** for any partitions $P$, $Q$ on $[a,b]$ all lower Riemann sums are smaller than or equal to any upper Riemann sums:
 $$
 L(f,P) \leq U(f,Q).
 $$
 **Proof:** let $P$, $Q$ be partitions on $[a,b]$, suppose $L(f,P) \leq U(f,Q)$, define $R = P \cup Q$, $R$ is a refinement of $P$, $Q$. Then,
 $$
 L(f,P) \leq L(f,R) \leq U(f,R) \leq U(f,Q).
 $$
 ## The definite integral
 Suppose there exists exactly one number $I \in \mathbb{R}$ such that for every partition $P$ of $[a,b]$:
 $$
 L(f,P) \leq I \leq U(f,P).
 $$
 Then the function $f$ is integrable on $[a,b]$ and $I$ is called the definite integral
 $$
 I = \int_a^b f(x) dx.
 $$
 **Theorem:** suppose that a function $f$ is bounded on the interval $[a,b]$, then $f$ is integrable on $[a,b]$ if and only if $\forall \varepsilon > 0$ there exists a partition $P$ of $[a,b]$ such that
 $$
 U(f,P) - L(f,P) < \varepsilon.
 $$
 **Proof:** let $a,b \in \mathbb{R}$, $\forall \varepsilon > 0$ there is $|a-b| < \varepsilon$ then $a=b$.
 **Theorem:** if $f$ is continuous on the interval $[a,b]$, then $f$ is integrable on $[a,b]$.
 **Proof:** is missing...
 ### Properties
 * If $a \leq b$ and $f(x) \leq g(x) \space \forall x \in [a,b]$:
 $$
 \int_a^b f(x)dx \leq \int_a^b g(x)dx.
 $$
 **Proof:** is missing...
 * The **triangle inequality** for sums extends to definite integrals. If $a \leq b$, then
 $$
 |\int_a^b f(x)dx| \leq \int_a^b |f(x)|dx.
 $$
 **Proof:** is missing...
 * Integral of an odd function $f(-x) = -f(x)$:
 $$
 \int_{-a}^a f(x)dx = 0.
 $$
 **Proof:** is missing...
 * Integral of an even function $f(-x) = f(x)$:
 $$
 \int_{-a}^a f(x)dx = 2\int_0^a f(x)dx.
 $$
 **Proof:** is missing...
 ## The Mean-value theorem for integrals
 If the function $f$ is continuous on $[a,b]$ then there exists a point $c$ in $[a,b]$ such that
 $$
 \int_a^b f(x)dx = (b-a)f(c)
 $$
 **Proof:** $\forall x \in [a,b]$,
 let $m \leq f(x) \leq M$,
 $$m(b-a)=\int_a^b mdx \leq \int_a^b f(x)dx \leq \int_a^b Mdx = M(b-a),$$
 $$m \leq \frac{1}{b-a} \int_a^b f(x)dx \leq M,$$
 According to [the intermediate value theorem](continuity.md/#the-intermediate-value-theorem) there exists a $c \in [a,b]$ such that
 $$\frac{1}{b-a} \int_a^b f(x)dx = f(c)$$
 ## Piecewise continuous functions
 Let $c_0 < c_1 < ... < c_n$ be a finite set of points on the real line. A function $f$ defined on $[c_0,c_n]$ except possibly at some of the points $c_i$, $(0 \leq i \leq n)$, is called piecewise continuous on that interval if for each $i$ $(1 \leq i \leq b)$ there exists a function $F_i$ continuous on the *closed* interval $[c_{i-1},c_i]$ such that
 $$
 f(x) = F_i(x).
 $$
 In this case, te integral of $f$ from $c_0$ to $c_n$ is defined to be
 $$
 \int_{c_0}^{c_n} f(x)dx = \sum_{i=1}^n \int_{c_i-1}^{c_i} F_i(x)dx.
 $$
 ## The fundamental theorem of calculus
 Suppose that the function $f$ is continuous on an interval $I$ containing the point $a$. 
 **Part I.** Let the function $F$ be defined on $I$ by
 $$
 F(x) = \int_a^x f(t)dt.
 $$
 Then $F$ is differentiable on $I$, and $F'(x) = f(x)$ there. Thus, $F$ is an antiderivative of $f$ on $I$:
 $$
 \frac{d}{dx} \int_a^x f(t)dt = f(x).
 $$
 **Part II.** If $G(x)$ is *any* antiderivative of $f(x)$ on $I$, so that $G'(x) = f(x)$ on $I$, then for any $b$ in $I$ there is
 $$
 \int_a^b f(x)dx = G(b) - G(a).
 $$
 **Proof:** using the definitions of the derivative
 $$
 \begin{array}{ll}
 F'(x) &= \lim_{h \to 0} \frac{F(x+h)-F(x)}{h}, \\
      &= \lim_{h \to 0} \frac{1}{h}(\int_a^{x+h} f(t)dt - \int_a^x f(t)dt), \\
      &= \lim_{h \to 0} \frac{1}{h} \int_x^{x+h} f(t)dt, \\
      &= \lim_{h \to 0} hf(c) \quad \mathrm{for \space some} \space c \in [x,x+h], \\
      &= \lim_{c \to x} f(c), \\
      &= f(x).
 \end{array}
 $$
--- a/docs/mathematics/calculus/limits.md
+++ b/docs/mathematics/calculus/limits.md
@ -0,0 +1,107 @@
 # Limits
 If $f(x)$ is defined for all $x$ near a, except possibly at a itself, and if it can be ensured that $f(x)$ is as close to $L$ by taking $x$ close enough to $a$, but not equal to $a$. Then $f$ approaches the **limit** $L$ as $x$ approaches $a$:
 $$
 \lim_{x \to a} f(x) = L
 $$
 ## One-sided limits
 If $f(x)$ is defined on some interval $(b,a)$ extending to the left of $x=a$, and if it can be ensured that $f(x)$ is as close to $L$ by taking $x$ to the left of $a$ and close enough to $a$, then $f(x) has **left limit** $L$ at $x=a$ and:
 $$
 \lim_{x \uparrow a} f(x) = L.
 $$
 If $f(x)$ is defined on some interval $(b,a)$ extending to the right of $x=a$ and if it can be ensured that $f(x)$ is as close to $L$ by taking $x$ to the right of $a$ and close enough to $a$, then $f(x) has **right limit** $L$ at $x=a$ and:
 $$
 \lim_{x \downarrow a} f(x) = L.
 $$
 ## Limits at infinity
 If $f(x)$ is defined on an interval $(a,\infty)$ and if it can be ensured that $f(x)$ is as close to $L$ by taking $x$ large enough, then $f(x)$ **approaches the limit $L$ as $x$ approaches infinity** and 
 $$
 \lim_{x \to \infty} f(x) = L
 $$
 ## Limit rules
 If $\lim_{x \to a} f(x) = L$, $\lim_{x \to a} g(x) = M$, and $k$ is a constant then,
 * **Limit of a sum:** $\lim_{x \to a}[f(x) + g(x)] = L + M$.
 * **Limit of a difference:** $\lim_{x \to a}[f(x) - g(x)] = L - M$.
 * **Limit of a multiple:** $\lim_{x \to a}k f(x) = k L$.
 * **Limit of a product:** $\lim_{x \to a}f(x) g(x) = L M$.
 * **Limit of a quotient:** $\lim_{x \to a}\frac{f(x)}{g(x)} = \frac{L}{M}$,  if $M \neq 0$.
 * **Limit of a power:** $\lim_{x \to a}[f(x)]^\frac{m}{n} = L^{\frac{m}{n}}$.
 ## Formal definition of a limit
 The limit $\lim_{x \to a} f(x) = L$ means,
 $$
 \forall \varepsilon_{> 0} \exists \delta_{>0} \Big[ 0<|x-a|<\delta \implies |f(x) - L| < \varepsilon \Big].
 $$
 The limit $\lim_{x \to \infty} f(x) = L$ means,
 $$
 \forall \varepsilon_{> 0} \exists N_{>0} \Big[x > N \implies |f(x) - L | < \varepsilon \Big].
 $$
 The limit $\lim_{x \to a} f(x) = \infty$ means,
 $$
 \forall M_{> 0} \exists \delta_{>0} \Big[ 0<|x-a|<\delta \implies f(x) > M \Big].
 $$
 The limit $\lim_{x \to \infty} f(x) = \infty$ means,
 $$
 \forall M_{> 0} \exists N_{>0} \Big[ x > N \implies f(x) > M \Big].
 $$
 For one-sided limits there are similar formal definitions.
 ### Example
 Applying the formal definition of a limit for $\lim_{x \to 4}\sqrt{2x + 1}$
 * Given $\varepsilon > 0$
 * Choose $\delta = \frac{\varepsilon}{2}$
 * Suppose $0 < |x - 4| < \delta$
 * Check $|\sqrt{2x + 1} - 3|$
 $$
 \begin{array}{ll}
 |\sqrt{2x + 1} - 3| &= |\frac{(\sqrt{2x + 1} - 3)(\sqrt{2x + 1} + 3)}{\sqrt{2x + 1} + 3}|\\
                    &= \frac{2|x - 4|}{\sqrt{2x + 1} + 3}\\
                    &< 2|x-4|\\
                    &< 2\delta = \varepsilon
 \end{array}
 $$
 ## Squeeze Theorem
 Suppose that $f(x) \leq g(x) \leq h(x)$ holds for all $x$ in some open interval containing $a$, except possibly at $x=a$ itself. Suppose also that
 $$\lim_{x \to a} f(x) = \lim_{x \to a} h(x) = L.$$
 Then $\lim_{x \to a} g(x) = L$ also. Similar statements hold for left and right limits.
 ### Example
 Applying squeeze theorem on $\lim_{x \to 0} x^2 \cos(\frac{1}{x})$.
 $$
 \begin{array}{ll}
 \forall x \neq 0\\
 -1 \leq \cos(\frac{1}{x}) \leq 1 \implies -x^2 \leq x^2 \cos(\frac{1}{x}) \leq x^2\\
 \mathrm{Since,} \space \lim_{x \to 0} x^2 = \lim_{x \to 0} -x^2 = 0\\
 \lim_{x \to 0} x^2 \cos(\frac{1}{x}) = 0
 \end{array}
 $$
--- a/docs/mathematics/calculus/taylor-polynomials.md
+++ b/docs/mathematics/calculus/taylor-polynomials.md
@ -0,0 +1,197 @@
 # Taylor polynomials
 ## Linearization
 A function $f(x)$ about $x = a$ may be linearized into
 $$
 P_1(x) = f(a) + f'(a)(x-a),
 $$
 obtaining a polynomial that matches the value and derivative of $f$ at $x = a$. 
 ## Taylor's theorem
 Even better approximations of $f(x)$ can be obtained by using higher degree polynomials if $f^{n+1}(t)$ exists for all $t$ in an interval containing $a$ and $x$. Thereby matching more derivatives at $x = a$,
 $$
 P_n(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2+ ... + \frac{f^{(n)}(a)}{n!}(x-a)^n.
 $$
 Then the error $E_n(x) = f(x) - P_n(x)$ in the approximation $f(x) \approx P_n(x)$ is given by
 $$
 E_n(a) = \frac{f^{(n+1)}(s)}{(n+1)!}(x-a)^{n+1},
 $$
 where $s$ is some number between $a$ and $x$. The resulting formula
 $$
 f(x) = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + ... + \frac{f^{(n)}(a)}{n!}(x-a)^n + \frac{f^{(n+1)}(s)}{(n+1)!}(x-a)^{n+1},
 $$
 for some $s$ between $a$ and $x$, is called **Taylor's formula with Lagrange remainder**; the Lagrange is the error term $E_n(x)$.
 **Proof:**
 Observe that the case $n=0$ of Taylor's formula, namely
 $$
 f(x) = P_0(x) + E_0(x) = f(a) + \frac{f'(s)}{1!}(x-a),
 $$
 is just the [Mean-value theorem](differentation.md#mean-value-theorem) for some $s$ between $a$ and $x$
 $$
 \frac{f(x) - f(a)}{x-a} = f'(s).
 $$
 Using induction to prove for $n > 0$. Suppose $n = k-1$ where $k \geq 1$ is an integer, then
 $$
 E_{k-1}(x) = \frac{f^{(k)}(s)}{k!}(x-a)^k,
 $$
 where $s$ is some number between $a$ and $x$. Consider the next higher case: $n=k$. Applying the [Generalized Mean-value theorem](differentation.md/#generalized-mean-value-theorem) to the functions $E_k(t)$ and $(t-a)^{k+1}$ on $[a,x]$. Since $E_k(a)=0$, a number $u$ in $(a,x)$ is obtained such that
 $$
 \frac{E_k(x) - E_k(a)}{(x-a^{k+1}) - (a-a)^{k+1}}= \frac{E_k(x)}{(x-a)^{k+1}} = \frac{E_k'(u)}{(k+1)(u - a)^k}.
 $$
 Since
 $$
 \begin{array}{ll}
 E_k'(u)&=\frac{d}{dx}(f(x)-f(a)-f'(a)(x-a)-\frac{f''(a)}{2!}(t-a)^2-...-\frac{f^{(k)}(a)}{k!}(t-a)^k)|_{x=u} \\
       &= f'(u) - f'(a) - f''(a)(u-a)-...-\frac{f^{(k)}(a)}{(k-1)!}(u-a)^{k-1}
 \end{array}
 $$
 is just $E_{k-1}(u)$ for the function $f'$ instead of $f$. By the induction assumption it is equal to
 $$
 \frac{(f')^{(k)}(s)}{k!}(u-a)^k = \frac{f^{(k+1)}(s)}{k!}(u-a)^k
 $$
 for some $s$ between $a$ and $u$. Therefore,
 $$
 E_k(x) = \frac{f^{(k+1)}(s)}{(k+1)!}(x-a)^{k+1}
 $$
 ## Big-O notation
 $f(x) = O(u(x))$ for $x \to a$ if and only if there exists a $k > 0$ such that
 $$
 |f(x)| \leq k|u(x)|
 $$
 For all $x$ in the open interval around $x=a$.
 The following properties follow from the definition:
 1. If $f(x) = O(u(x))$ as $x \to a$, then $Cf(x) = O(u(x))$ as $x \to a$ for any value of the constant $C$.
 2. If $f(x) = O(u(x))$ as $x \to a$ and $g(x) = O(u(x))$ as $x \to a$, then $f(x) \pm g(x) = O(u(x))$ as $x \to a$.
 3. If $f(x) = O((x-a)^ku(x))$ as $x \to a$, then $\frac{f(x)}{(x-a)^k} = O(u(x))$ as $x \to a$ for any constant $k$.
 If $f(x) = Q_n(x) + O((x-a)^{n+1})$ as $x \to a$, where $Q_n$ is a polynomial of degree at most $n$, then $Q_n(x) = P_n(x)$.
 **Proof:** Follows from the properties of the big-O notation
 Let $P_n$ be the Taylor polynomial, then properties 1 and 2 of big-O imply
 that $R_n(x) = Q_n(x) - P_n(x) = O((x - a)^{n+1})$ as $x \to a$. It must be shown that $R_n(x)$ is identically zero so that $Q_n(x) = P_n(x)$ for all $x$. $R_n(x)$ may be written in the form
 $$
 R_n(x) = c_0 + c_1(x-a) + c_2(x-a)^2 + ... + c_n(x-a)^n
 $$
 If $R_n(x)$ is not identically zero, then there is a smallest coefficient $c_k$ $k \leq n$, such that $c_k \neq 0$, but $c_j = 0$ for $0 \leq j \leq k -1$
 $$
 R_n(x) = (x-a)^k(c_k + c_{k+1}(x-a) + ... + c_n(x-a)^{n-k}).
 $$
 Therefore, 
 $$
 \lim_{x \to a} \frac{R_n(x)}{(x-a)^k} = c_k \neq 0.
 $$ 
 However, by property 3
 $$
 \frac{R_n(x)}{(x-a)^k} = O((x-a)^{n+1-k}).
 $$ 
 Since $n+1-k > 0$, $\frac{R_n(x)}{(x-a)^k} \to 0$ as $x \to a$. This contradiction shows that $R_n(x)$ must be
 identically zero.
 ## Maclaurin formulas
 Some Maclaurin formulas with errors in big-O notation. These may be used in constructing Taylor polynomials from compsite functions. As $x \to 0$
 1. $$\frac{1}{1-x} = 1 + x + ... + x^n + O(x^{n+1}),$$
 2. $$\ln(1+x) = x - \frac{x^2}{2} + \frac{x^3}{3} - ... + (-1)^{n-1}\frac{x^n}{n} + O(x^{n+1}),$$
 3. $$e^x = 1 + x + \frac{x^2}{2!} + ... + \frac{x^n}{n!} + O(x^{n+1}),$$
 4. $$\sin x = x - \frac{x^3}{3!} + \frac{x^5}{5!} - ... + (-1)^n\frac{x^{2n+1}}{(2n+1)!} + O(x^{2n+3}),$$
 5. $$\cos x = 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - ... + (-1)^n\frac{x^{2n}}{(2n)!} + O(x^{2n+1}),$$
 6. $$\arctan x = x - \frac{x^3}{3} + \frac{x^5}{5} - ... + (-1)^n\frac{x^{2n+1}}{2n+1} + O(x^{2n+3}).$$
 ### Example
 Construct $P_4(x)$ for $f(x) = e^{\sin x}$ around $x=0$.
 $$
 e^{\sin x} \approx 1 + (x - \frac{x^3}{3!} + \frac{x^5}{5!}) + \frac{1}{2!}(x - \frac{x^3}{3!} + \frac{x^5}{5!})^2 + \frac{1}{3!}(x - \frac{x^3}{3!} + \frac{x^5}{5!})^3
 $$
 $$
 \begin{array}{ll}
 P_4(x) &= 1 + x \frac{1}{2}x^2 + (-\frac{1}{6} + \frac{1}{6})x^3 + (-\frac{1}{6} + \frac{1}{4!})x^4 + O(x^5), \\
       &= 1 + x \frac{1}{2}x^2 - \frac{1}{8}x^4 + O(x^5).
 \end{array}
 $$
 ## Evaluating limits with Taylor polynomials
 Taylor and Macluarin polynomials provide a method for evaluating limits of indeterminate forms.
 ### Example
 Determine the limit $\lim_{x \to 0} \frac{x \arctan x - \ln(1+x^2)}{x \sin x - x^2}$.
 $$
 \begin{array}{ll}
 x \sin x - x^2 \approx x^2 - \frac{x^4}{6} + O(x^6) - x^2 = - \frac{x^4}{6} + O(x^6) \\
 x \arctan x - \ln(1+x^2) \approx x^2 - \frac{x^4}{3} + O(x^6) - x^2 + \frac{x^4}{2} + O(x^6) = \frac{x^4}{6} + O(x^6)
 \end{array}
 $$
 $$
 \lim_{x \to 0} \frac{\frac{x^4}{6} + O(x^6)}{- \frac{x^4}{6} + O(x^6)} = -1
 $$
 ## L'Hôpital's rule
 Suppose the function $f$ and $g$ are differentiable on the interval $(a,b)$, and $g'(x) \neq 0$ there. Also suppose that $\lim_{x \downarrow a} f(x) = \lim_{x \downarrow a} g(x) = 0$ then
 $$
 \lim_{x \downarrow a} \frac{f(x)}{g(x)} = \lim_{x \downarrow a} \frac{f'(x)}{g'(x)} = L.
 $$
 The outcome is exactly the same as using Taylor polynomials.
 **Proof:** using Taylor polynomials around $x = a$.
 $$
 \lim_{x \to a} \frac{f(x)}{g(x)} = \lim_{x \to a} \frac{f(a) + f'(a)(x - a) + \frac{f''(a)}{2}(x-a)^2 + O((x-a)^3)}{g(a) + g'(a)(x-a) + \frac{g''(a)}{2}(x-a)^2 + O((x-a)^3)}.
 $$
 If $f(a)$ and $g(a)$ are both zero
 $$
 \lim_{x \to a} \frac{f'(a)(x - a) + \frac{f''(a)}{2}(x-a)^2 + O((x-a)^3)}{g'(a)(x-a) + \frac{g''(a)}{2}(x-a)^2 + O((x-a)^3)},
 $$
 enzovoort.
--- a/docs/mathematics/calculus/transcendental-functions/exponential-and-logarithmic-functions.md
+++ b/docs/mathematics/calculus/transcendental-functions/exponential-and-logarithmic-functions.md
@ -0,0 +1,50 @@
 # Exponential and logarithmic functions
 ## The natural logarithm 
 The natural logarithm is defined as having its derivative equal to $\frac{1}{x}$. For $x > 0$, then
 $$
 \frac{d}{dx} \ln x = \frac{1}{x}.
 $$
 ### Standard limit
 $$
 \lim_{h \to 0} \frac{\ln (1+h)}{h} = 1
 $$
 ## The exponential function
 The exponential function is defined as the inverse of the natural logarithm
 $$
 \ln e^x = x.
 $$
 Furthermore $e$ may be defined by,
 $$
 \begin{array}{ll}
 \lim_{n \to \infty} (1 + \frac{1}{n})^n = e, \\
 \lim_{n \to \infty} (1 + \frac{x}{n})^n = e^x.
 \end{array}
 $$
 ### Derivative of exponential function
 The derivative of $y = e^x$ may be calculated by [implicit differentation](../differentation.md#implicit-differentation):
 $$
 \begin{array}{ll}
 y = e^x &\implies x = \ln y, \\
        &\implies 1 = \frac{1}{y} \frac{dy}{dx}, \\
        &\implies \frac{dy}{dx} = y = e^x.
 \end{array}
 $$
 ### Standard limit
 $$
 \lim_{h \to 0} \frac{e^h - 1}{h} = 1
 $$
--- a/docs/mathematics/calculus/transcendental-functions/inverse-functions.md
+++ b/docs/mathematics/calculus/transcendental-functions/inverse-functions.md
@ -0,0 +1,69 @@
 # Inverse functions
 ## Injectivity
 A function $f$ is called injective if for all $x_1,x_2 \in \mathrm{Dom}(f), \space x_1 \neq x_2$ implies that $f(x_1) \neq f(x_2).$ Meaning that for every $y \in \mathrm{Rang}(f)$ there is precisely one $x \in \mathrm{Dom}(f)$ such that $y = f(x)$. Meaning, every $x$ has an unique $y$.
 ## Inverse function
 If $f$ is injective, then it has an inverse function $f^{-1}$. The value of $f^{-1}(x)$ is the unique number $y$ in the domain of $f$ for which $f(y) = x$. Thus,
 $$
 y = f^{-1}(x) \iff x = f(y)
 $$
 Suppose $f$ is a continuous function, $f$ is injective if $f$ is strictly increasing or decreasing. That is, $f' \leq 0 \vee f' \geq 0$.
 ### Derivative of inverse function
 When $f$ is differentiable and injective $(f^{-1})'(x) = \frac{1}{f'(f^{-1}(x))}$.
 **Proof:**
 $$f(y) = x \implies f'(y) \frac{dy}{dx} = 1$$
 $$\frac{dy}{dx} = \frac{1}{f'(y)} = \frac{1}{f'(f^{-1}(x))}$$
 Without knowing the inverse function a value of the inverse derivative may be determined.
 ## The arcsine function
 Always $\arcsin$ not $\sin^{-1}$ that is wrong since $\sin$ is not injective.
 For $x \in [-\frac{\pi}{2},\frac{\pi}{2}] \space \arcsin(\sin x) = x$
 For $x \in [-1,1] \space \sin(\arcsin x) = x$
 The arccosine function is similar.
 ## Example question
 Prove that $\forall x \geq 0$: $\arctan(x + 1) - \arctan(x) < \frac{1}{1 + x^2}$.
 For $x = 0$: $\frac{\pi}{4} < 1$.
 For $x > 0$: Consider the function $f(t) = \arctan(t)$ on the interval $[x, x+1]$. Apply the [Mean-value theorem](../differentation.md/#mean-value-theorem) of $f$ at the interval $[x,x+1]$,
 $$\frac{f(x+1) - f(x)}{(x+1) - 1} = f'(c).$$
 Let $\arctan(c) = y$ then, $c = \tan y$,
 $$
 \begin{array}{ll}
 \frac{dy}{dc} (c = \tan y) &\implies 1 = \sec^2 (y) \frac{dy}{dc} = (\tan^2 y + 1) \frac{dy}{dc} \\
                           &\implies 1 = (c^2 + 1) \frac{dy}{dc} \\
                           &\implies \frac{dy}{dc} = \frac{1}{c^2 + 1}.
 \end{array}
 $$
 Obtaining,
 $$\arctan(x+1) - \arctan(x) = f'(c) = \frac{1}{c^2 + 1}.$$
 For some $c \in (x,x+1)$, since $c > x$
 $$\frac{1}{1 + c^2} < \frac{1}{1 + x^2},$$
 thereby
 $$\arctan(x+1) - \arctan(x) < \frac{1}{1 + x^2}.$$
--- a/docs/mathematics/differential-geometry/curvature.md
+++ b/docs/mathematics/differential-geometry/curvature.md
@ -0,0 +1,63 @@
 # Curvature
 Let $\mathrm{M}$ be a differential manifold with $\dim \mathrm{M} = n \in \mathbb{N}$ used throughout the section. Let $\mathrm{TM}$ and $\mathrm{T^*M}$ denote the tangent and cotangent bundle, $V$ and $V^*$ the fiber and dual fiber bundle and $\mathscr{B}$ the tensor fiber bundle. 
 ## Curvature operator
 > *Definition 1*: the **curvature operator** $\Omega: \Gamma(\mathrm{TM})^3 \to \Gamma(\mathrm{TM})$ is defined as
 >
 > $$
 >   \Omega(\mathbf{v}, \mathbf{w}) \mathbf{u} = [\nabla_\mathbf{v}, \nabla_\mathbf{w}] \mathbf{u} - \nabla_{[\mathbf{v}, \mathbf{w}]}\mathbf{u}, 
 > $$
 >
 > for all $\mathbf{u}, \mathbf{v}, \mathbf{w} \in \Gamma(\mathrm{TM})$ with $[\cdot, \cdot]$ denoting the [Lie bracket](). 
 It then follows from the definition that the curvature operator $\Omega$ can be decomposed. 
 > *Proposition 1*: the decomposition of the curvature operator $\Omega$ relative to a basis $\{\partial_i\}_{i=1}^n$ of $\Gamma(\mathrm{TM})$ results into
 >
 > $$
 >   \Omega(\mathbf{v}, \mathbf{w}) \mathbf{u} = v^i w^j [D_i, D_j] u^l \partial_l,
 > $$
 >
 > for all $\mathbf{u}, \mathbf{v}, \mathbf{w} \in \Gamma(\mathrm{TM})$. 
 ??? note "*Proof*:"
    Will be added later.
 ## Curvature tensor
 > *Definition 2*: the **Riemann curvature tensor** $\mathbf{R}: \Gamma(\mathrm{T}^*\mathrm{M}) \times \Gamma(\mathrm{TM})^3 \to \mathbb{K}$ is defined as
 >
 > $$
 >   \mathbf{R}(\bm{\omega}, \mathbf{u}, \mathbf{v}, \mathbf{w}) = \mathbf{k}(\bm{\omega}, \Omega(\mathbf{v}, \mathbf{w}) \mathbf{u}), 
 > $$
 >
 > for all $\bm{\omega} \in \Gamma(\mathrm{T}^*\mathrm{M})$ and $\mathbf{u}, \mathbf{v}, \mathbf{w} \in \Gamma(\mathrm{TM})$. 
 The Riemann curvature defines the curvature of the differential manifold at a certain point $x \in \mathrm{M}$. 
 > *Proposition 2*: let $\mathbf{R}: \Gamma(\mathrm{T}^*\mathrm{M}) \times \Gamma(\mathrm{TM})^3 \to \mathbb{K}$ be the Riemann curvature tensor, with its decomposition given by
 >
 > $$
 >   \mathbf{R} = R^i_{jkl} \partial_i \otimes dx^j \otimes dx^k \otimes dx^l, 
 > $$
 >
 > then we have that its holor is given by
 >
 > $$
 >   R^i_{jkl} = \partial_k \Gamma^i_{jl} + \Gamma^m_{jl} \Gamma^i_{mk} - \partial_k \Gamma^i_{jk} - \Gamma^m_{jk} \Gamma^i_{ml}, 
 > $$
 >
 > for all $(i,j,k,l) \in \{1, \dots, n\}^4$ with $\Gamma^i_{jk}$ denoting the linear connection symbols.
 ??? note "*Proof*:"
    Will be added later.
 It may then be observed that $R^i_{jkl} = - R^i_{jlk}$ such that
 $$
    \mathbf{R} = \frac{1}{2} R^i_{jkl} \partial_i \otimes dx^j \otimes (dx^k \wedge dx^l).
 $$
--- a/docs/mathematics/differential-geometry/derivatives.md
+++ b/docs/mathematics/differential-geometry/derivatives.md
@ -0,0 +1,60 @@
 # Derivatives
 Let $\mathrm{M}$ be a differential manifold with $\dim \mathrm{M} = n \in \mathbb{N}$ used throughout the section. Let $\mathrm{TM}$ and $\mathrm{T^*M}$ denote the tangent and cotangent bundle, $V$ and $V^*$ the fiber and dual fiber bundle and $\mathscr{B}$ the tensor fiber bundle. 
 ## Lie derivative
 > *Definition 1*: the **Lie derivative** on a section of a tangent bundle $\mathscr{L}: \Gamma(\mathrm{TM}) \times \Gamma(\mathrm{TM}) \to \Gamma(\mathrm{TM})$ is a map defined by
 >
 > $$
 >   \mathscr{L}_\mathbf{w} \mathbf{v} = \mathbf{w} \circ \mathbf{v} - \mathbf{v} \circ \mathbf{w} = [\mathbf{w}, \mathbf{v}], 
 > $$
 >
 > for all $\mathbf{w}, \mathbf{v} \in \Gamma(\mathrm{TM})$. 
 In which the bracket formulation is also referred to as the Lie bracket. 
 > *Proposition 1*: the Lie derivative can be decomposed into
 >
 > $$
 >   \mathscr{L}_\mathbf{w} \mathbf{v} = \mathscr{L}_\mathbf{w}^i \mathbf{v} \partial_i = (w^j \partial_j v^i - v^j \partial_j w^i) \partial_i,
 > $$
 >
 > for all $\mathbf{w}, \mathbf{v} \in \Gamma(\mathrm{TM})$. 
 ??? note "*Proof*:"
    Will be added later.
 ## Exterior derivative
 > *Definition 2*: the **exterior derivative** $d: \Gamma \big(\bigwedge_k(\mathrm{T}\mathrm{M}) \big) \to \Gamma \big(\bigwedge_{k+1}(\mathrm{T}\mathrm{M}) \big)$ of a $k$-form field, $k \in \mathbb{N}[k \leq n]$ is the $(k+1)$-form field 
 >
 > $$
 > \begin{align*}
 >   d \bm{\omega} &= d \omega_{|i_1 \dots i_k|} \wedge dx^{i_1} \wedge \dots \wedge dx^{i_k}, \\
 >                 &= \partial_j \omega_{|i_1 \dots i_k|} dx^j \wedge dx^{i_1} \wedge \dots \wedge dx^{i_k},
 > \end{align*} 
 > $$
 >
 > for all $\bm{\omega} \in \Gamma \big(\bigwedge_k(\mathrm{T}\mathrm{M}) \big)$. 
 From the definition of the exterior definition the following results arises. 
 > *Theorem 1*: we have that
 >
 > 1. $\forall\bm{\omega} \in \Gamma \big(\bigwedge_n(\mathrm{T}\mathrm{M}) \big): d \bm{\omega} = \mathbf{0}$, 
 > 2. $\forall\bm{\omega} \in \Gamma \big(\bigwedge_k(\mathrm{T}\mathrm{M}) \big), k \in \mathbb{N}[k \leq n]: d^2 \bm{\omega} = \mathbf{0}$.
 ??? note "*Proof*:"
    Will be added later.
 ## Hodge star operator
 > *Definition 3*: the **hodge star operator** $*: \Gamma \big(\bigwedge_k(\mathrm{T}\mathrm{M}) \big) \to \Gamma \big(\bigwedge_{n-k}(\mathrm{T}\mathrm{M}) \big)$ with $k \in \mathbb{N}[k \leq n]$ has the following properties
 >
 > 1. $\forall \bm{\omega} \in \Gamma \big(\bigwedge_0(\mathrm{T}\mathrm{M}) \big): * \bm{\omega} = \bm{\epsilon}$, 
 > 2. $* (dx^{i_1} \wedge \dots \wedge dx^{i_k}) = \bm{\epsilon} \lrcorner \mathbf{g}^{-1}(dx^{i_1}) \lrcorner \dots \lrcorner \mathbf{g}^{-1}(dx^{i_k})$, 
 >
 > for all $dx^{i_1} \wedge \dots \wedge dx^{i_k} \in \Gamma \big(\bigwedge_k(\mathrm{T}\mathrm{M}) \big)$ with $\bm{\epsilon}$ the Levi-Civita tensor $\bm{\epsilon} \in \big(\bigwedge_n(\mathrm{T}\mathrm{M}) \big)$ and $\mathbf{g}^{-1}: \Gamma(\mathrm{T}^*\mathrm{M}) \to \Gamma(\mathrm{T}\mathrm{M})$ the [dual metric](). 
--- a/docs/mathematics/differential-geometry/differential-manifolds.md
+++ b/docs/mathematics/differential-geometry/differential-manifolds.md
@ -0,0 +1,41 @@
 # Differential manifolds
 In the following sections of differential geometry we make use of the Einstein summation convention introduced in [vector analysis](/en/physics/mathematical-physics/vector-analysis/curvilinear-coordinates/) and $\mathbb{K} = \mathbb{R}$ or $\mathbb{K} = \mathbb{C}.$
 ## Definition
 Differential geometry is concerned with *differential manifolds*, smooth continua that are locally Euclidean. 
 > *Definition 1*: let $n \in \mathbb{N}$, a $n$-dimensional **differential manifold** is a Hausdorff (T2) space $M$ furnished with a family of smooth diffeomorphisms $\phi_\alpha: \mathscr{D}(\phi_\alpha) \to \mathscr{R}(\phi_\alpha)$ with $\mathscr{D}(\phi_\alpha) \subset\mathrm{M}$ and $\mathscr{R}(\phi_\alpha) \subset E$, with the following axioms
 >
 > 1. $\mathscr{D}(\phi_\alpha)$ is open and $\bigcup_{\alpha \in \mathbb{N}} \mathscr{D}(\phi_\alpha) =\mathrm{M}$,
 > 2. if $\Omega = \mathscr{D}(\phi_\alpha) \cap \mathscr{D}(\phi_\beta) \neq \empty$ then $\phi_\alpha(\Omega), \phi_\beta(\Omega) \subset E$ are open sets and $\phi_\alpha \circ \phi_\beta^{-1}, \phi_\beta \circ \phi_\alpha$ are diffeomorphisms,
 > 3. the atlas $\mathscr{A} = \{(\mathscr{D}(\phi_\alpha), \phi_\alpha)\}$ is maximal.
 >
 > with $E$ a $n$-dimensional [Euclidean space](). 
 The last axiom ensures that any chart is tacitly assumed to be already contained in the atlas.
 ## Coordinate transformations
 > *Definition 2*: let $p,q \in \mathrm{M}$ be points on the differential manifold and let $\psi: \mathscr{D}(\psi) \to\mathrm{M}: p \mapsto \psi(p) \overset{\text{def}}{=} q$ be a **transformation** from $p$ to $q$ on the manifold, we define two diffeomorphisms
 > 
 > $$
 >   \phi_\alpha: \mathscr{D}(\phi_\alpha) \to \mathscr{R}(\phi_\alpha): p \mapsto \phi_\alpha(p) \overset{\text{def}}{=} x,
 > $$
 >
 > $$
 >   \phi_\beta: \mathscr{D}(\phi_\beta) \to \mathscr{R}(\phi_\beta): q \mapsto \phi_\beta(q) \overset{\text{def}}{=} y,
 > $$
 >
 > with $\mathscr{D}(\phi_{\alpha,\beta}) \subset\mathrm{M}$ and $\mathscr{R}(\phi_{\alpha,\beta}) \subset E$. Then we have a **coordinate transformation** given by
 >
 > $$
 >   \phi_{\alpha \beta}^\psi = \phi_\beta \circ \psi \circ \phi_\alpha^{-1}: x \mapsto y,
 > $$
 >
 > then $\phi_{\alpha \beta}^\psi$ is an **active transformation** if $p \neq q$ and $\phi_{\alpha \beta}^\psi$ is a **passive transformation** if $p = q$. 
 To clarify the definitions, a passive transformation corresponds only to a descriptive transformation. Whereas an active transformation corresponds to a transformation on the manifold $M$.  
 A passive transformation may also be given directly by $\phi_\beta \circ \phi_\alpha: x \mapsto y$ since $\psi = \mathrm{id}$ in this case. Note that the definitions could also have been given by the inverse as the transformations are all diffeomorphisms. 
--- a/docs/mathematics/differential-geometry/lengths-and-volumes.md
+++ b/docs/mathematics/differential-geometry/lengths-and-volumes.md
@ -0,0 +1,47 @@
 # Lengths and volumes
 Let $\mathrm{M}$ be a differential manifold with $\dim \mathrm{M} = n \in \mathbb{N}$ used throughout the section. Let $\mathrm{TM}$ and $\mathrm{T^*M}$ denote the tangent and cotangent bundle, $V$ and $V^*$ the fiber and dual fiber bundle and $\mathscr{B}$ the tensor fiber bundle.
 ## Riemannian geometry
 > *Definition 1*: the length of a vector $\mathbf{v} \in \Gamma(\mathrm{TM})$ is defined by the norm $\|\cdot\|$ induced by the inner product $\bm{g}$ such that
 >
 > $$
 >   \|\mathbf{v}\| = \sqrt{\bm{g}(\mathbf{v},\mathbf{v})}.
 > $$
 In the context of a smooth curve $\mathbf{v}: \mathscr{D}(\mathbf{v}) \to \Gamma(\mathrm{TM}):t \mapsto \mathbf{v}(t)$ parameterized by an open interval $\mathscr{D}(\mathbf{v}) \subset \mathbb{R}$, the length $l_{12}$ of a closed section $[t_1, t_2] \subset \mathbb{R}$ of this curve is given by
 $$
 \begin{align*}
    l_{12} &= \int_{t_1}^{t_2} \|\mathbf{\dot v}(t)\| dt, \\
           &= \int_{t_1}^{t_2} \sqrt{\bm{g}(\mathbf{\dot v},\mathbf{\dot v})} dt, \\
           &= \int_{t_1}^{t_2} \sqrt{g_{ij} \dot v^i \dot v^j} dt,
 \end{align*}
 $$
 with $\mathbf{\dot v} = \dot v^i \partial_i \in \Gamma(\mathrm{TM})$. 
 > *Definition 2*: the volume $V$ span by the vectors $\{\mathbf{v}_i\}_{i=1}^n$ in $\Gamma(\mathrm{TM})$ is defined by
 >
 > $$
 >   V = \bm{\epsilon}(\mathbf{v}_1, \dots, \mathbf{v}_n) = \sqrt{g} \bm{\mu}(\mathbf{v}_1, \dots, \mathbf{v}_n),
 > $$
 >
 > with $\bm{\epsilon}$ the unique unit volume form. 
 In the context of a subspace $S \subset M$ with $\dim S = k \in \mathbb{N}[k \leq n]$, the volume $V$ is given by
 $$
    V = \int_S \bm{\epsilon} = \int_S \sqrt{g} dx^1 \dots dx^k.
 $$
 It follows that for $k=1$ 
 $$
    \int_S \bm{\epsilon} = \int_S \sqrt{\bm{g}}.
 $$
 ## Finsler geometry
 Will be added later.
--- a/docs/mathematics/differential-geometry/linear-connections.md
+++ b/docs/mathematics/differential-geometry/linear-connections.md
@ -0,0 +1,146 @@
 # Linear connections
 Let $\mathrm{M}$ be a differential manifold with $\dim \mathrm{M} = n \in \mathbb{N}$ used throughout the section. Let $\mathrm{TM}$ and $\mathrm{T^*M}$ denote the tangent and cotangent bundle, $V$ and $V^*$ the fiber and dual fiber bundle and $\mathscr{B}$ the tensor fiber bundle. 
 > *Definition 1*: a **linear connection** on the fiber bundle $\mathscr{B}$ is a map
 >
 > $$
 >   \nabla: \Gamma(\mathrm{TM}) \times \Gamma(\mathscr{B}) \to \Gamma(\mathscr{B}): (\mathbf{v}, \mathbf{T}) \mapsto \nabla_\mathbf{v} \mathbf{T}, 
 > $$
 >
 > satisfying the following properties, if $f,g \in C^\infty(\mathrm{M})$, $\mathbf{v} \in \Gamma(\mathrm{TM})$ and $\mathbf{T}, \mathbf{S} \in \Gamma(\mathscr{B})$ then
 >
 > 1. $\nabla_{f\mathbf{v}} \mathbf{T} = f \nabla_\mathbf{v} \mathbf{T}$
 > 2. $\nabla_\mathbf{v} (f \mathbf{T} + g \mathbf{S}) = (\nabla_\mathbf{v} f) \mathbf{T} + f \nabla_\mathbf{v} \mathbf{T} + (\nabla_\mathbf{v} g) \mathbf{S} + g \nabla_{\mathbf{v}} \mathbf{S}$, 
 > 3. $\nabla_\mathbf{v} f = \mathbf{v} f = \mathbf{k}(df, \mathbf{v})$. 
 From property 3 it becomes clear that $\nabla_\mathbf{v}$ is an analogue of a directional derivative. The linear connection can also be defined in terms of the cotangent bundle and the dual fiber bundle. 
 Note that the first (trivial) element in the notion of the section $\Gamma$ is omitted, generally it should be $\Gamma(\mathrm{M}, \mathrm{TM})$ as the elements of this set are maps from $\mathrm{M}$ to $\mathrm{TM}$. 
 ## Covariant derivative
 > *Definition 2*: let $\mathbf{v} = v^i \mathbf{e}_i\in \Gamma(\mathscr{B})$ then the **covariant derivative** on $\mathbf{v}$ is defined as
 >
 > $$
 >   D_k \mathbf{v} \overset{\text{def}}= \nabla_{\partial_k} \mathbf{v} = (\partial_k v^i) \mathbf{e}_i + v^i \Gamma^j_{ik} \mathbf{e}_j = (\partial_k v^i + \Gamma^i_{jk} v^j)\mathbf{e}_i, 
 > $$
 >
 > with formally $\mathbf{k}(\mathbf{\hat e}^j, \nabla_{\partial_k} \mathbf{e}_i) = \Gamma^j_{ik}$ the **linear connection symbols**, in this case $\nabla_{\partial_k} \mathbf{e}_i = \Gamma^j_{ik} \mathbf{e}_j$. 
 The covariant derivative can thus be seen as a linear connection for which only the basis is used of the tangent vector. The covariant derivative can also be applied on higher, mixed rank tensors $\mathbf{T} = T^{ij}_k \mathbf{e}_i \otimes \mathbf{e}_j \otimes \mathbf{\hat e}^k \in \Gamma(\mathscr{B})$ which obtains
 $$
    D_l \mathbf{T} = (\partial_l T^{ij}_k) \mathbf{e}_i \otimes \mathbf{e}_j \otimes \mathbf{\hat e}^k + T^{ij}_k (\Gamma_{il}^m\mathbf{e}_m) \otimes \mathbf{e}_j \otimes \mathbf{\hat e}^k + T^{ij}_k \mathbf{e}_i \otimes (\Gamma^m_{jl} \mathbf{e}_m) \otimes \mathbf{\hat e}^k + T^{ij}_k \mathbf{e}_i \otimes \mathbf{e}_j \otimes (\hat \Gamma^k_{ml} \mathbf{\hat e}^m),
 $$
 with the dual linear connection symbols given by $\mathbf{k}(\nabla_{\partial_k} \mathbf{\hat e}^i, \mathbf{e}_j) = \hat \Gamma^j_{ik}$ with $\nabla_{\partial_k} \mathbf{\hat e}^i = \hat \Gamma^j_{ik} \mathbf{\hat e}^j$. We then have the following proposition such that we can simplify the above expression.
 > *Proposition 1*: let $\Gamma^j_{ik}$ be the linear connection symbols of a covariant derivative and let $\hat \Gamma^j_{ik}$ be the dual linear connection symbols given by $\mathbf{k}(\nabla_{\partial_k} \mathbf{\hat e}^i, \mathbf{e}_j) = \hat \Gamma^j_{ik}$, then we have that
 >
 > $$
 >   \hat \Gamma^j_{ik} = - \Gamma^j_{ik},
 > $$
 >
 > for all $(i,j,k) \in \mathbb{N}^3$. 
 ??? note "*Proof*:"
    Will be added later.
 With the result of proposition 1 we may write
 $$
    D_l \mathbf{T} = (\partial_l T^{ij}_k + \Gamma_{ml}^i T^{mj}_k + \Gamma_{ml}^j T^{im}_k - \Gamma_{kl}^m T^{ij}_m) \mathbf{e}_i \otimes \mathbf{e}_j \otimes \mathbf{\hat e}^k. 
 $$
 ### Transformation of linear connection symbols
 Will be added later.
 ## Intrinsic derivative
 > *Definition 3*: let $\gamma: \mathscr{D}(\gamma) \to M: t \mapsto \gamma(t)$ be a smooth curve on the manifold parameterized by an open interval $\mathscr{D}(\gamma) \subset \mathbb{R}$ and let $\mathbf{v}: \mathscr{D}(\gamma) \to \mathrm{TM}: t \mapsto \mathbf{v}(t) = \mathbf{u} \circ \gamma(t)$ be a vector field defined along the curve with $\mathbf{u} \in \Gamma(\mathrm{TM})$, the **intrinsic derivative** of $\mathbf{v}$ is defined as
 >
 > $$
 >   D_t \mathbf{v}(t) = \nabla_{\dot\gamma} \mathbf{v}(t),
 > $$
 >
 > for all $t \in \mathscr{D}(\gamma)$. 
 By decomposition of $\dot \gamma = \dot \gamma^i \partial_i$ and $\mathbf{v} = v^i \partial_i$ and using the chain rule we obtain
 $$
 \begin{align*}
    \nabla_{\dot\gamma} \mathbf{v}(t) &= \dot \gamma^i \nabla_{\partial_i} (v^j \partial_j), \\
                                      &= \dot \gamma^i \big((\partial_i v^j) \partial_j + v^j \Gamma_{ji}^k \partial_k  \big), \\
                                      &= (\dot \gamma^i \partial_i v^j + \dot \gamma^i \Gamma^j_{ki}v^k) \partial_j, \\
                                      &= (\dot v^j + \Gamma^j_{ki} v^k \dot \gamma^i) \partial_j,
 \end{align*}
 $$
 for all $t \in \mathscr{D}(\gamma)$. This notion of the intrinsic derivative can of course be extended to any tensor. 
 ### Parallel transport 
 > *Definition 4*: let $\gamma: \mathscr{D}(\gamma) \to M: t \mapsto \gamma(t)$ be a smooth curve on the manifold parameterized by an open interval $\mathscr{D}(\gamma) \subset \mathbb{R}$ and let $\mathbf{v}: \mathbb{R} \to \mathrm{TM}: t \mapsto \mathbf{v}(t) = \mathbf{u} \circ \gamma(t)$ be a vector field defined along the curve with $\mathbf{u} \in \mathrm{TM}$, then **parallel transport** of $\mathbf{v}$ along the curve is defined as
 >
 > $$
 >   D_t \mathbf{v}(t) = \mathbf{0},
 > $$
 >
 > for all $t \in \mathscr{D}(\gamma)$. 
 Parallel transport implies the transport of a vector that is held constant along the path; constant direction and magnitude. It then follows that for $\dot \gamma = \dot \gamma^i \partial_i$ and $\mathbf{v} = v^i \partial_i$ parallel transport obtains
 $$
    D_t \mathbf{v}(t) = (\dot v^j + \Gamma^j_{ki} v^k \dot \gamma^i) \partial_j = \mathbf{0},
 $$
 obtaining the equations
 $$
    \dot v^j + \Gamma^j_{ki} v^k \dot \gamma^i = 0,
 $$
 such that
 $$
    \dot v^j = - \Gamma^j_{ki} v^k \dot \gamma^i,
 $$
 for all $t \in \mathscr{D}(\gamma)$. These equations can be solved for $\gamma$, obtaining the curve under which $\mathbf{v}$ stays constant. 
 If we let $\mathbf{v} = \dot \gamma^i \partial_i$ be the tangent vector along the curve then parallel transport of $\mathbf{v}$ preserves the tangent vector and we obtain the **geodesic equations** given by
 $$
    \dot v^j + \Gamma^j_{ki} v^k \dot \gamma^i = \ddot\gamma^j + \Gamma^j_{ki} \dot\gamma^k \dot\gamma^i = 0,
 $$
 for all $t \in \mathscr{D}(\gamma)$. 
 One may interpret a geodesic as a generalization of the notion of a straight line or shortest path defined by $\gamma$. As follows from the following proposition.
 > *Proposition 2*: let $\gamma: \mathscr{D}(\gamma) \to M: t \mapsto \gamma(t)$ be a smooth curve on the manifold parameterized by an open interval $\mathscr{D}(\gamma) \subset \mathbb{R}$ and let $\mathscr{L}$ be the Lagrangian defined by
 >
 > $$
 >   \mathscr{L} = \|\dot \gamma\|^2,
 > $$
 >
 > for all $t \in \mathscr{D}(\gamma)$. By demanding [Hamilton's principle]() we obtain the geodesic equations 
 >
 > $$
 >   \ddot\gamma^j + \Gamma^j_{ki} \dot\gamma^k \dot\gamma^i = 0, 
 > $$
 >
 > for all $t \in \mathscr{D}(\gamma)$.
 ??? note "*Proof*:"
    Will be added later.
 It may be observed that by demanding the stationary state of the length of the curve we obtain the geodesic equations. 
 ## Contravariant derivative
 Will be added later.
--- a/docs/mathematics/differential-geometry/tangent-spaces.md
+++ b/docs/mathematics/differential-geometry/tangent-spaces.md
@ -0,0 +1,136 @@
 # Tangent spaces
 Let $\mathrm{M}$ be a differential manifold with $\dim \mathrm{M} = n \in \mathbb{N}$ used throughout the section.
 ## Definition
 > *Definition 1*: let $f \in C^{\infty}(\mathrm{M})$ with $C^{\infty}$ the class of [smooth functions]() and $M$ a differential manifold. A derivation of $f$ at $x \in \mathrm{M}$ is defined as a linear map $\mathbf{v}_x: C^\infty(\mathrm{M}) \to \mathbb{K}$ that satisfies
 >
 > $$
 >   \forall f,g \in C^{\infty}(\mathrm{M}): \mathbf{v}_x(f g) = (\mathbf{v}_xf) g + f (\mathbf{v}_x g).
 > $$
 >
 > Let $\mathrm{T}_x\mathrm{M}$ be the set of all derivations at $x$ such that $\mathbf{v}_x \in \mathrm{T}_x\mathrm{M}$. With $\mathrm{T}_x\mathrm{M}$ denoted as the **tangent space** at $x$. 
 We may think of the tangent space at a point $x \in \mathrm{M}$ as a space attached to $x$ on the differential manifold $M$. 
 ## Properties of tangent spaces
 > *Theorem 1*: let $M$ be a differential manifold and let $x \in \mathrm{M}$, the tangent space $\mathrm{T}_x\mathrm{M}$ is a vector space. 
 ??? note "*Proof*:"
    Will be added later.
 Thus, the tangent space is a vector space attached to $x \in \mathrm{M}$ on the differential manifold. It follows that its vectors have interesting properties. 
 > *Theorem 2*: let $M$ be a differential manifold, let $x \in \mathrm{M}$ and let $\mathbf{v}_x \in \mathrm{T}_x\mathrm{M}$, then we have that
 >
 > $$
 >   \forall f \in C^{\infty}(\mathrm{M}): \mathbf{v}_x f = v^i \partial_i f(x),
 > $$
 >
 > such that $\mathbf{v}_x = v^i \partial_i \in \mathrm{T}_x\mathrm{M}$ is denoted as a **tangent vector** in the tangent space $\mathrm{T}_x\mathrm{M}$. 
 ??? note "*Proof*:"
    Will be added later.
 Theorem 2 adds the notion of tangent vectors to the explanation of the tangent space. The tangent space at a point on the manifold thus represents the space of tangent vectors.
 > *Proposition 1*: let $M$ be a differential manifold of $\dim\mathrm{M} = n \in \mathbb{N}$. The tangent space $\mathrm{T}_x\mathrm{M}$ has dimension $n$ such that
 > 
 > $$
 >   \forall x \in \mathrm{M}: \dim \mathrm{T}_x\mathrm{M} = \dim\mathrm{M}
 > $$
 > 
 > and is span by the vector basis $\{\partial_i\}_{i=1}^n$. 
 ??? note "*Proof*:"
    Will be added later.
 Proposition 1 states that the tangent space is of the same dimension as the manifold and its basis are partial derivative operators. In the context of the [covariant basis](), this definition of the basis leaves out the coordinate map, but is in fact equivalent to the covariant basis.
 As a last step in the explanation, we may think of the 2 dimensional surface of a sphere, which may define a differential manifold $M$. The tangent space at a point $x \in \mathrm{M}$ on the surface of the sphere may then be compared to the tangent plane to the sphere attached at point $x \in \mathrm{M}$. The catch is that the 3 dimensional space necessary to understand this construction exists only in our imagination and not in the mathematical construct.
 ## Tangent bundle
 > *Definition 2*: let $M$ be a differential manifold, the collection of tangent spaces $\mathrm{T}_x\mathrm{M}$ for all $x \in \mathrm{M}$ define the **tangent bundle** as 
 >
 > $$
 >   \mathrm{TM} = \bigcup_{x \in \mathrm{M}} \mathrm{T}_x\mathrm{M}.
 > $$
 In particular, we may think of the tangent bundle $\mathrm{TM}$ as a subspace $\mathrm{TM} \subset V$ of the fiber bundle $V$ for a differential manifold. With the special properties given in theorem 2 and proposition 1. 
 The connection of each tangent vector to its base point may be formalised with the projection map $\pi$ which in this case is given by
 $$
    \pi: \mathrm{TM} \to\mathrm{M}: (x, \mathbf{v}) \mapsto \pi(x, \mathbf{v}) \overset{\text{def}}{=} x,
 $$
 and its inverse
 $$
    \pi^{-1}:\mathrm{M} \to \mathrm{TM}: x \mapsto \pi^{-1}(x) \overset{\text{def}}{=} \mathrm{T}_x\mathrm{M}.
 $$
 > *Definition 3*: a vector field $\mathbf{v}$ on a differential manifold $M$ is a section
 >
 > $$
 >   \mathbf{v} \in \Gamma(\mathrm{TM}),
 > $$
 >
 > of the tangent bundle $\mathrm{TM}$. 
 ## Cotangent spaces
 > *Definition 4*: let $M$ be a differential manifold and $\mathrm{T}_x\mathrm{M}$ the tangent space at $x \in \mathrm{M}$. We define the **cotangent space** $\mathrm{T}_x^*\mathrm{M}$ as the dual space of $\mathrm{T}_x\mathrm{M}$
 >
 > $$
 >   \mathrm{T}_x^*\mathrm{M} = (\mathrm{T}_x\mathrm{M})^*.
 > $$
 > 
 > Then every element $\bm{\omega}_x \in \mathrm{T}_x^*\mathrm{M}$ is a linear map $\bm{\omega}_x: \mathrm{T}_x\mathrm{M} \to \mathbb{K}$ denoted as the **cotangent vector**. 
 This definition is a logical consequence of the notion of the [dual vector space](). It then also follows that the dual cotangent space is isomorphic to the tangent space at a point $x \in \mathrm{M}$. 
 > *Theorem 3*: let $\mathrm{M}$ be a differential manifold of $\dim \mathrm{M} = n \in \mathbb{N}$, then we have that for every $x \in \mathrm{M}$ the basis $\{dx^i\}_{i=1}^n$ of $\mathrm{T}_x^*\mathrm{M}$ is uniquely determined by
 >
 > $$
 >   dx^i(\partial_j) = \delta^i_j,
 > $$
 >
 > for each basis $\{\partial_j\}_{j=1}^n$ in $\mathrm{T}_x\mathrm{M}$. 
 ??? note "*Proof*:"
    The proof follows directly from theorem 1 in [dual vector spaces](). 
 The choice of $dx^i$ can be explained by taking the differential $df = \partial_i f dx^i \in \mathrm{T}_x^*\mathrm{M}$ with $f \in C^\infty(\mathrm{M})$. Then if we take
 $$
    \mathbf{k}_x(df, \mathbf{v}) = \mathbf{k}(\partial_i f dx^i, v^j \partial_j) = v^j \partial_i f \mathbf{k}(dx^i, \partial_j) = v^j \partial_i f \delta^i_j = v^i \partial_i f = \mathbf{v} f,
 $$
 with $\mathbf{k}_x: \mathrm{T}_x^*\mathrm{M} \times \mathrm{T}_x\mathrm{M} \to \mathbb{K}$ the Kronecker tensor at $x \in \mathrm{M}$. Which shows that defining the basis of the cotangent space as differentials corresponds with respect to the basis of the tangent space.
 So, a cotangent vector $\bm{\omega}_x \in \mathrm{T}_x^*\mathrm{M}$ may be decomposed into
 $$
    \bm{\omega}_x = \omega_i dx^i. 
 $$
 In the context of the [contravariant basis](), this definition of the basis leaves out the coordinate map, but is in fact equivalent to the contravariant basis.
 ## Cotangent bundle
 > *Definition 5*: let $M$ be a differential manifold, the collection of cotangent spaces $\mathrm{T}_x^*\mathrm{M}$ for all $x \in \mathrm{M}$ define the **cotangent bundle** as 
 >
 > $$
 >   \mathrm{T^*M} = \bigcup_{x \in \mathrm{M}} \mathrm{T}_x^*\mathrm{M}.
 > $$
 Thus, we may think of the cotangent bundle $\mathrm{T^*M}$ as a subspace $\mathrm{T^*M} \subset V^*$ of the dual fiber bundle $V^*$ for a differential manifold.
--- a/docs/mathematics/differential-geometry/torsion.md
+++ b/docs/mathematics/differential-geometry/torsion.md
@ -0,0 +1,39 @@
 # Torsion
 Let $\mathrm{M}$ be a differential manifold with $\dim \mathrm{M} = n \in \mathbb{N}$ used throughout the section. Let $\mathrm{TM}$ and $\mathrm{T^*M}$ denote the tangent and cotangent bundle, $V$ and $V^*$ the fiber and dual fiber bundle and $\mathscr{B}$ the tensor fiber bundle. 
 ## Torsion operator
 > *Definition 1*: the **torsion operator** $\Theta: \Gamma(\mathrm{TM}) \times \Gamma(\mathrm{TM}) \to \Gamma(\mathrm{TM})$ is defined as
 >
 > $$
 >   \Theta(\mathbf{u}, \mathbf{v}) = \nabla_\mathbf{u} \mathbf{v} - \nabla_\mathbf{v} \mathbf{u} - \mathscr{L}_\mathbf{u} \mathbf{v},
 > $$
 >
 > for all $\mathbf{u}, \mathbf{v} \in \Gamma(\mathrm{TM})$ and $\mathscr{L}$ the [Lie derivative](). 
 Using this definition we obtain the following results.
 > *Proposition 1*: the decomposition of the torsion operator results into
 >
 > $$
 >   \mathbf{k}(\bm{\omega}, \Theta(\mathbf{u}, \mathbf{v})) = \omega_i u^j v^k (\Gamma^i_{kj} - \Gamma^i_{jk}),
 > $$
 >
 > for all $\bm{\omega} \in \Gamma(\mathrm{T}^*\mathrm{M})$ and $\mathbf{u}, \mathbf{v} \in \Gamma(\mathrm{TM})$. 
 ??? note "*Proof*:"
    Will be added later.
 ## Torsion tensor
 As a result of proposition 1 we may view torsion as a locally defined mixed tensor of type $\mathbf{T} \in \mathrm{T}_x \mathrm{M} \otimes \mathrm{T}_x^* \mathrm{M} \otimes \mathrm{T}_x^* \mathrm{M}$. 
 > *Definition 2*: the **torsion tensor** $\mathbf{T}: \mathrm{T}_x^* \mathrm{M} \times \mathrm{T}_x \mathrm{M} \times \mathrm{T}_x \mathrm{M} \to \mathbb{K}$ with $x \in \mathrm{M}$ is defined as
 >
 > $$
 >   \mathbf{T}(\bm{\omega}, \mathbf{u}, \mathbf{v}) = \mathbf{k} \big(\bm{\omega}, \Theta(\mathbf{u}, \mathbf{v}) \big),
 > $$
 >
 > for all $\bm{\omega} \in \mathrm{T}^*_x\mathrm{M}$ and $\mathbf{u}, \mathbf{v} \in \mathrm{T}_x \mathrm{M}$. 
--- a/docs/mathematics/differential-geometry/transformations.md
+++ b/docs/mathematics/differential-geometry/transformations.md
@ -0,0 +1,35 @@
 # Transformations
 Let $\mathrm{M}$ be a differential manifold with $\dim \mathrm{M} = n \in \mathbb{N}$ used throughout the section. Let $\mathrm{TM}$ and $\mathrm{T^*M}$ denote the tangent and cotangent bundle. 
 ## Push forward and pull back
 > *Definition 1*: let $\mathrm{M}, \mathrm{N}$ be two differential manifolds with $\dim \mathrm{N} \geq \dim \mathrm{M}$ and let $\psi: \mathrm{M} \to \mathrm{N}$ be the diffeomorphism between the manifolds. Then we define the **pull back** $\psi^*$ and **push forward** $\psi_*$ operators, such that for $\mathbf{v} \in \mathrm{T}_x \mathrm{M}$ and $\bm{\omega} \in \mathrm{T}_{\psi(x)}^* \mathrm{M}$ we have
 >
 > $$
 >   \mathbf{k}_x(\psi^* \bm{\omega}, \mathbf{v}) = \mathbf{k}_{\psi(x)}(\bm{\omega}, \psi_* \mathbf{v}),
 > $$
 >
 > for all $x \in \mathrm{M}$.
 Which indicates the proper separation between the elements of both spaces.
 ## Basis transformation
 Let $\psi: \mathscr{D}(\mathrm{M}) \to \mathrm{M}: x \mapsto \psi(x) \overset{\text{def}}{=} \overline{x}$ be an active coordinate transformation from a point $x$ to a point $\overline{x}$ on $\mathrm{M}$. Then we have a basis $\{\partial_i\}_{i=1}^n \subset \mathrm{T}_x\mathrm{M}$ for the tangent space $\mathrm{T}_x\mathrm{M}$ at $x$ and a basis $\{\overline{\partial_i}\}_{i=1}^n \subset \mathrm{T}_{\overline{x}}\mathrm{M}$ for the tangent space $\mathrm{T}_{\overline{x}}\mathrm{M}$ at $\overline{x}$. Which are related by
 $$
    \partial_i = J^j_i \overline{\partial_j} = \partial_i \psi^j(x) \overline{\partial_j}, 
 $$
 with $J^j_i = \partial_i \psi^j(x)$ the [Jacobian]() at $x \in \mathrm{M}$. For it to make sense, it helps to change notation to 
 $$
    \frac{\partial}{\partial x_i} = \frac{\partial \overline{x}^j}{\partial x_i} \frac{\partial}{\partial \overline{x}_j} = \frac{\partial \psi^j}{\partial x_i} \frac{\partial}{\partial \overline{x}_j}. 
 $$
 Similarly, we have a basis $\{dx^i\}_{i=1}^n \subset \mathrm{T}_x^*\mathrm{M}$ for the cotangent space $\mathrm{T}_x\mathrm{M}$ at $x$ and a basis $\{d\overline{x}^i\}_{i=1}^n \subset \mathrm{T}_{\overline{x}}^*\mathrm{M}$ for the cotangent space $\mathrm{T}_{\overline{x}}^*\mathrm{M}$ at $\overline{x}$. Which are related by
 $$
    d\overline{x}^i = J^i_j dx^j = \partial_j \psi^i(x) dx^j. 
 $$
--- a/docs/mathematics/functional-analysis/inner-product-spaces/direct-sums.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/direct-sums.md
@ -0,0 +1,99 @@
 # Direct sums
 > *Definition 1*: in a metric space $(X,d)$, the **distance** $\delta$ from an element $x \in X$ to a nonempty subset $M \subset X$ is defined as
 >
 > $$
 >   \delta = \inf_{\tilde y \in M} d(x,\tilde y).
 > $$
 In a normed space $(X, \|\cdot\|)$ this becomes
 $$
    \delta = \inf_{\tilde y \in M} \|x - \tilde y\|.
 $$
 > *Definition 2*: let $X$ be a vector space and let $x, y \in X$, the **line segment** $l$ between the vectors $x$ and $y$ is defined as
 >
 > $$
 >   l = \{z \in X \;|\; \exists \alpha \in [0,1]: z = \alpha x + (1 - \alpha) y\}.
 > $$
 Using definition 2, we may define the following.
 > *Definition 3*: a subset $M \subset X$ of a vector space $X$ is **convex** if for all $x, y \in M$ the line segment between $x$ and $y$ is contained in $M$. 
 This definition is true for projections of convex lenses which have been discussed in [optics](). 
 We can now provide the main theorem in this section.
 > *Theorem 1*: let $X$ be an inner product space and let $M \subset X$ be a complete convex subset of $X$. Then for every $x \in X$ there exists a unique $y \in M$ such that
 >
 > $$
 >   \delta = \inf_{\tilde y \in M} \|x - \tilde y\| = \|x - y\|,
 > $$
 >
 > if $M$ is a complete subspace $Y$ of $X$, then $x - y$ is orthogonal to $X$.
 ??? note "*Proof*:"
    Will be added later.
 Now that the foundation is set, we may introduce direct sums.
 > *Definition 4*: a vector space $X$ is a **direct sum** $X = Y \oplus Z$ of two subspaces $Y \subset X$ and $Z \subset X$ of $X$ if each $x \in X$ has a unique representation
 >
 > $$
 >   x = y + z,
 > $$
 >
 > for $y \in Y$ and $z \in Z$. 
 Then $Z$ is called an *algebraic complement* of $Y$ in $X$ and vice versa, and $Y$, $Z$ is called a *complementary pair* of subspaces in $X$.
 In the case $Z = \{z \in X \;|\; z \perp Y\}$ we have that $Z$ is the *orthogonal complement* or *annihilator* of $Y$. Also denoted as $Y^\perp$. 
 > *Proposition 1*: let $Y \subset X$ be any closed subspace of a Hilbert space $X$, then
 >
 > $$
 >   X = Y \oplus Y^\perp,
 > $$
 >
 > with $Y^\perp = \{x\in X \;|\; x \perp Y\}$ the orthogonal complement of $Y$. 
 ??? note "*Proof*:"
    Will be added later.
 We have that $y \in Y$ for $x = y + z$ is called the *orthogonal projection* of $x$ on $Y$. Which defines an operator $P: X \to Y: x \mapsto Px \overset{\mathrm{def}}= y$. 
 > *Lemma 1*: let $Y \subset X$ be a subset of a Hilbert space $X$ and let $P: X \to Y$ be the orthogonal projection operator, then we have
 >
 > 1. $P$ is a bounded linear operator,
 > 2. $\|P\| = 1$,
 > 3. $\mathscr{N}(P) = \{x \in X \;|\; Px = 0\}$.
 ??? note "*Proof*:"
    Will be added later.
 > *Lemma 2*: if $Y$ is a closed subspace of a Hilbert space $X$, then $Y = Y^{\perp \perp}$. 
 ??? note "*Proof*:"
    Will be added later.
 Then it follows that $X = Y^\perp \oplus Y^{\perp \perp}$. 
 ??? note "*Proof*:"
    Will be added later.
 > *Lemma 3*: for every non-empty subset $M \subset X$ of a Hilbert space $X$ we have
 >
 > $$
 >   \mathrm{span}(M) \text{ is dense in } X \iff M^\perp = \{0\}.
 > $$
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/inner-product-spaces/fourier-series/convergence.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/fourier-series/convergence.md
--- a/docs/mathematics/functional-analysis/inner-product-spaces/fourier-series/formalism.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/fourier-series/formalism.md
--- a/docs/mathematics/functional-analysis/inner-product-spaces/inner-product-spaces.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/inner-product-spaces.md
@ -0,0 +1,122 @@
 # Inner product spaces
 > *Definition 1*: a vector space $X$ over a field $F$ is an **inner product space** if an **inner product** $\langle \cdot, \cdot \rangle: X \times X \to F$ is defined on $X$ satisfying
 >
 > 1. $\forall x \in X: \langle x, x \rangle \geq 0$,
 > 2. $\langle x, x \rangle = 0 \iff x = 0$,
 > 3. $\forall x, y \in X: \langle x, y \rangle = \overline{\langle y, x \rangle}$,
 > 4. $\forall x, y \in X, \alpha \in F: \langle \alpha x, y \rangle = \alpha \langle x, y \rangle$,
 > 5. $\forall x, y, z \in X: \langle x + y, z \rangle = \langle x, z \rangle + \langle y, z \rangle$.
 Similar to the case in normed spaces we have the following proposition.
 > *Proposition 1*: an inner product $\langle \cdot, \cdot \rangle$ on a vector space $X$ defines a norm $\|\cdot\|$ on $X$ given by
 >
 > $$
 >   \|x\| = \sqrt{\langle x, x \rangle},
 > $$
 >
 > for all $x \in X$ and is called the **norm induced by the inner product**.
 ??? note "*Proof*:"
    Will be added later.
 Which makes an inner product space also a normed space as well as a metric space, referring to proposition 1 in normed spaces.
 > *Definition 2*: a **Hilbert space** $H$ is a complete inner product space with its metric induced by the inner product.
 Definition 2 makes a Hilbert space also a Banach space, using proposition 1.
 ## Properties of inner product spaces
 > *Proposition 2*: let $(X, \langle \cdot, \cdot \rangle)$ be an inner product space, then
 >
 > $$
 >   \| x + y \|^2 + \| x - y \|^2 = 2\big(\|x\|^2 + \|y\|^2\big), 
 > $$
 >
 > for all $x, y \in X$.
 ??? note "*Proof*:"
    Will be added later.
 Proposition 2 is also called the parallelogram identity.
 > *Lemma 1*: let $(X, \langle \cdot, \cdot \rangle)$ be an inner product space, then
 >
 > 1. $\forall x, y \in X: |\langle x, y \rangle| \leq \|x\| \cdot \|y\|$,
 > 2. $\forall x, y \in X: \|x + y\| \leq \|x\| + \|y\|$.
 ??? note "*Proof*:"
    Will be added later.
 Statement 1 in lemma 1 is known as the Schwarz inequality and statement 2 is known as the triangle inequality and will be used throughout the section of inner product spaces.
 > *Lemma 2*: let $(X, \langle \cdot, \cdot \rangle)$ be an inner product space and let $(x_n)_{n \in \mathbb{N}}$ and $(y_n)_{n \in \mathbb{N}}$ be sequences in $X$, if we have  $x_n \to x$ and $y_n \to y$ as $n \to \infty$, then
 >
 > $$
 >   \lim_{n \to \infty} \langle x_n, y_n \rangle = \langle x, y \rangle.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 ## Completion
 > *Definition 3*: an **isomorphism** $T$ of an inner product space $(X, \langle \cdot, \cdot \rangle)_X$ onto an inner product space $(\tilde X, \langle \cdot, \cdot \rangle)_{\tilde X}$ over the same field $F$ is a bijective linear operator $T: X \to \tilde X$ which preserves the inner product
 >
 > $$
 >   \langle Tx, Ty \rangle_{\tilde X} = \langle x, y \rangle_X,
 > $$
 >
 > for all $x, y \in X$. 
 As a first application of lemma 2, let us prove the following.
 > *Theorem 1*: for every inner product space $(X, \langle \cdot, \cdot \rangle)_X$ there exists a Hilbert space $(\tilde X, \langle \cdot, \cdot \rangle)_{\tilde X}$ that contains a subspace $W$ that satisfies the following conditions
 >
 > 1. $W$ is an inner product space isomorphic with $X$. 
 > 2. $W$ is dense in $X$. 
 ??? note "*Proof*:"
    Will be added later.
 Somewhat trivially, we have that a subspace $M$ of an inner product space $X$ is defined to be a vector subspace of $X$ taken with the inner product on $X$ restricted to $M \times M$. 
 > *Proposition 3*: let $Y$ be a subspace of a Hilbert space $X$, then
 >
 > 1. $Y$ is complete $\iff$ $Y$ is closed in $X$,
 > 2. if $Y$ is finite-dimensional, then $Y$ is complete,
 > 3. $Y$ is separable if $X$ is separable.
 ??? note "*Proof*:"
    Will be added later.
 ## Orthogonality
 > *Definition 4*: let $(X, \langle \cdot, \cdot \rangle)$ be an inner product space, a vector $x \in X$ is **orthogonal** to a vector $y \in X$ if
 >
 > $$
 >   \langle x, y \rangle = 0,
 > $$
 >
 > and we write $x \perp y$. 
 Furthermore, we can also say that $x$ and $y$ *are orthogonal*. 
 > *Definition 5*: let $(X, \langle \cdot, \cdot \rangle)$ be an inner product space and let $A, B \subset X$ be subspaces of $X$. Then $A$ is **orthogonal** to $B$ if for every $x \in A$ and $y \in B$ we have
 >
 > $$
 >   \langle x, y \rangle = 0,
 > $$
 >
 > and we write $A \perp B$.
 Similarly, we may state that $A$ and $B$ *are orthogonal*.
--- a/docs/mathematics/functional-analysis/inner-product-spaces/operator-classes.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/operator-classes.md
@ -0,0 +1,95 @@
 # Operator classes
 ## Hilbert-adjoint operator
 > *Definition 1*: let $(X, \langle \cdot, \cdot \rangle_X)$ and $(Y, \langle \cdot, \cdot \rangle_Y)$ be Hilbert spaces over the field $F$ and let $T: X \to Y$ be a bounded linear operator. The **Hilbert-adjoint operator** $T^*$ of $T$ is the operator $T^*: Y \to X$ such that for all $x \in X$ amd $y \in Y$
 >
 > $$
 >   \langle Tx, y \rangle_Y = \langle x, T^* y \rangle.
 > $$
 We should first prove that for a given $T$ such a $T^*$ exists. 
 > *Proposition 1*: the Hilbert-adjoint operator $T^*$ of $T$ exists is unique and is a bounded linear operator with norm 
 > 
 > $$
 >   \|T^*\| = \|T\|.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 The Hilbert-adjoint operator has the following properties.
 > *Proposition 2*: let $T,S: X \to Y$ be bounded linear operators, then
 >
 > 1. $\forall x \in X, y \in Y: \langle T^* y, x \rangle_X = \langle y, Tx \rangle_Y$,
 > 2. $(S + T)^* = S^* + T^*$,
 > 3. $\forall \alpha \in F: (\alpha T)^* = \overline \alpha T^*$,
 > 4. $(T^*)^* = T$,
 > 5. $\|T^* T\| = \|T T^*\| = \|T\|^2$,
 > 6. $T^*T = 0 \iff T = 0$,
 > 7. $(ST)^* = T^* S^*, \text{ when } X = Y$.
 ??? note "*Proof*:"
    Will be added later.
 ## Self-adjoint operator
 > *Definition 2*: a bounded linear operator $T: X \to X$ on a Hilbert space $X$ is **self-adjoint** if
 >
 > $$
 >   T^* = T.
 > $$
 If a basis for $\mathbb{C}^n$ $(n \in \mathbb{N})$ is given and a linear operator on $\mathbb{C}^n$ is represented by a matrix, then its Hilbert-adjoint operator is represented by the complex conjugate transpose of that matrix (the Hermitian). 
 Proposition 3, 4 and 5 pose some interesting results of self-adjoint operators.
 > *Proposition 3*: let $T: X \to X$ be a bounded linear operator on a Hilbert space $(X, \langle \cdot, \cdot \rangle_X)$ over the field $\mathbb{C}$, then
 >
 > $$
 >   T \text{ is self-adjoint} \iff \forall x \in X: \langle Tx, x \rangle \in \mathbb{R}.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 > *Proposition 4*: the product of two bounded self-adjoint linear operators $T$ and $S$ on a Hilbert space is self-adjoint if and only if 
 >
 > $$
 >   ST = TS.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 Commuting operators therefore imply self-adjointness. 
 > *Proposition 5*: let $(T_n)_{n \in \mathbb{N}}$ be a sequence of bounded self-adjoint operators $T_n: X \to X$ on a Hilbert space $X$. If $T_n \to T$ as $n \to \infty$, then $T$ is a bounded self-adjoint linear operator on $X$. 
 ??? note "*Proof*:"
    Will be added later.
 ## Unitary operator
 > *Definition 3*: a bounded linear operator $T: X \to X$ on a Hilbert space $X$ is **unitary** if $T$ is bijective and $T^* = T^{-1}$. 
 A bounded unitary linear operator has the following properties.
 > *Proposition 6*: let $U, V: X \to X$ be bounded unitary linear operators on a Hilbert space $X$, then
 >
 > 1. $U$ is isometric,
 > 2. $\|U\| = 1 \text{ if } X \neq \{0\}$,
 > 3. $UV$ is unitary,
 > 4. $U$ is normal, that is $U U^* = U^* U$,
 > 5. $T \in \mathscr{B}(X,X)$ is unitary $\iff$ $T$ is isometric and surjective.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/inner-product-spaces/orthonormal-sets.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/orthonormal-sets.md
@ -0,0 +1,65 @@
 # Orthonormal sets
 > *Definition 1*: an **orthogonal set** $M$ in an inner product space $X$ is a subset $M \subset X$ whose elements are pairwise orthogonal.
 Pairwise orthogonality implies that $x, y \in M: x \neq y \implies \langle x, y \rangle = 0$. 
 > *Definition 2*: an **orthonormal set** $M$ in an inner product space $X$ is an orthogonal set in $X$ whose elements have norm 1. 
 That is for all $x, y \in M$:
 $$
    \langle x, y \rangle = \begin{cases}0 &\text{if } x \neq y, \\ 1 &\text{if } x = y.\end{cases}
 $$
 > *Lemma 1*: an orthonormal set is linearly independent.
 ??? note "*Proof*:"
    Will be added later.
 In the case that an orthogonal or orthonormal set is countable it can be arranged in a sequence and call it can be called an *orthogonal* or *orthonormal sequence*. 
 > *Theorem 1*: let $(e_n)_{n \in \mathbb{N}}$ be an orthonormal sequence in an inner product space $(X, \langle \cdot, \cdot \rangle)$, then
 >
 > $$
 >   \sum_{n=1}^\infty |\langle x, e_n \rangle|^2 \leq \|x\|^2,
 > $$
 >
 > for all $x \in X$. 
 ??? note "*Proof*:"
    Will be added later.
 Theorem 1 is known as the Bessel inequality, and we have that $|\langle x, e_n \rangle|$ are called the Fourier coefficients of $x$ with respect to the orthonormal sequence $(e_n)_{n \in \mathbb{N}}$. 
 ## Orthonormalisation process
 Let $(x_n)_{n \in \mathbb{N}}$ be a linearly independent sequence in an inner product space $(X, \langle \cdot, \cdot \rangle)$, then we can use the **Gram-Schmidt process** to determine the corresponding orthonormal sequence $(e_n)_{n \in \mathbb{N}}$. 
 Let $e_1 = \frac{1}{\|x_1\|} x_1$ be the first step and let $e_n = \frac{1}{\|v_n\|} v_n$ be the $n$th step with
 $$
    v_n = x_n - \sum_{k=1}^{n-1} \langle x_n, e_k \rangle e_k.
 $$
 ## Properties
 > *Proposition 1*: let $(e_n)_{n \in \mathbb{N}}$ be an orthonormal sequence in a Hilbert space $(X, \langle \cdot, \cdot \rangle)$ and let $(\alpha_n)_{n \in \mathbb{N}}$ be a sequence in the field of $X$, then
 >
 > 1. the series $\sum_{n=1}^\infty \alpha_n e_n$ is convergent in $X$ $\iff$ $\sum_{n=1}^\infty | \alpha_n|^2$ is convergent in $X$.
 > 2. if the series $\sum_{n=1}^\infty \alpha_n e_n$ is convergent in $X$ and $s = \sum_{n=1}^\infty \alpha_n e_n$ then $a_n = \langle s, e_n \rangle$.
 > 3. the series $\sum_{n=1}^\infty \alpha_n e_n = \sum_{n=1}^\infty \langle s, e_n \rangle e_n$ is convergent in $X$ for all $x \in X$. 
 ??? note "*Proof*:"
    Will be added later.
 Furthermore, we also have that.
 > *Proposition 2*: let $M$ be an orthonormal set in an inner product space $(X, \langle \cdot, \cdot \rangle)$, then any $x \in X$ can have at most countably many nonzero Fourier coefficients $\langle x, e_k \rangle$ for $e_k \in M$ over the uncountable index set $k \in I$ of $M$. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/inner-product-spaces/polynomials/hermite-polynomials.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/polynomials/hermite-polynomials.md
--- a/docs/mathematics/functional-analysis/inner-product-spaces/polynomials/laguerre-polynomials.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/polynomials/laguerre-polynomials.md
--- a/docs/mathematics/functional-analysis/inner-product-spaces/polynomials/legendre-polynomials.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/polynomials/legendre-polynomials.md
--- a/docs/mathematics/functional-analysis/inner-product-spaces/representations-of-functionals.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/representations-of-functionals.md
@ -0,0 +1,68 @@
 # Representations of functionals
 > *Lemma 1*: let $(X, \langle \cdot, \cdot \rangle)$ be an inner product space, if
 >
 > $$
 >   \forall z \in X: \langle x, z \rangle = \langle y, z \rangle \implies x = y,
 > $$
 >
 > and if
 >
 > $$
 >   \forall z \in X: \langle x, z \rangle = 0 \implies x = 0.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 Lemma 1 will be used in the following theorem.
 > *Theorem 1*: for every bounded linear functional $f$ on a Hilbert space $(X, \langle \cdot, \cdot \rangle)$, there exists a $z \in X$ such that
 >
 > $$
 >   f(x) = \langle x, z \rangle,
 > $$
 >
 > for all $x \in x$, with $z$ uniquely dependent on $f$ and $\|z\| = \|f\|$.
 ??? note "*Proof*:"
    Will be added later.
 ## Sequilinear form
 > *Definition 1*: let $X$ and $Y$ be vector spaces over the field $F$. A **sesquilinear** form $h$ on $X \times Y$ is an operator $h: X \times Y \to F$ satisfying the following conditions
 >
 > 1. $\forall x_{1,2} \in X, y \in Y: h(x_1 + x_2, y) = h(x_1, y) + h(x_2, y)$.
 > 2. $\forall x \in X, y_{1,2} \in Y: h(x, y_1 + y_2) = h(x_1, y_1) + h(x_2, y_2)$.
 > 3. $\forall x \in X, y \in Y, \alpha \in F: h(\alpha x, y) = \alpha h(x,y)$.
 > 4. $\forall x \in X, y \in Y, \beta \in F: h(x, \beta y) = \overline \beta h(x,y)$. 
 Hence, $h$ is linear in the first argument and conjugate linear in the second argument. Bilinearity of $h$ is only true for a real field $F$. 
 > *Definition 2*: let $X$ and $Y$ be normed spaces over the field $F$ and let $h: X \times Y \to F$ be a sesquilinear form, then $h$ is a **bounded sesquilinear form** if
 >
 > $$
 >   \exists c \in F: |h(x,y)| \leq c \|x\| \|y\|,
 > $$
 >
 > for all $(x,y) \in X \times Y$ and the norm of $h$ is given by
 >
 > $$
 >   \|h\| = \sup_{\substack{x \in X \backslash \{0\} \\ y \in Y \backslash \{0\}}} \frac{|h(x,y)|}{\|x\| \|y\|} = \sup_{\|x\|=\|y\|=1} |h(x,y)|.
 > $$
 For example, the inner product is sesquilinear and bounded. 
 > *Theorem 2*: let $(X, \langle \cdot, \cdot \rangle_X)$ and $(Y, \langle \cdot, \cdot \rangle_Y)$ be Hilbert spaces over the field $F$ and let $h: X \times Y \to F$ be a bounded sesquilinear form. Then there exists a bounded linear operators $T: X \to Y$ and $S: Y \to X$, such that
 >
 > $$
 >   h(x,y) = \langle Tx, y \rangle_Y = \langle x, Sy \rangle_X,
 > $$
 >
 > for all $(x,y) \in X \times Y$, with $T$ and $S$ uniquely determined by $h$ with norms $\|T\| = \|S\| = \|h\|$. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/inner-product-spaces/total-sets.md
+++ b/docs/mathematics/functional-analysis/inner-product-spaces/total-sets.md
@ -0,0 +1,58 @@
 # Total sets
 > *Definition 1*: a **total set** in a normed space $(X, \langle \cdot, \cdot \rangle)$ is a subset $M \subset X$ whose span is dense in $X$. 
 Accordingly, an orthonormal set in $X$ which is total in $X$ is called a total orthonormal set in $X$. 
 > *Proposition 1*: let $M \subset X$ be a subset of an inner product space $(X, \langle \cdot, \cdot \rangle)$, then
 >
 > 1. if $M$ is total in $X$, then $M^\perp = \{0\}$.
 > 2. if $X$ is complete and $M^\perp = \{0\}$ then $M$ is total in $X$.
 ??? note "*Proof*:"
    Will be added later.
 ## Total orthornormal sets
 > *Theorem 1*: an orthonormal sequence $(e_n)_{n \in \mathbb{N}}$ in a Hilbert space $(X, \langle \cdot, \cdot \rangle)$ is total in $X$ if and only if 
 >
 > $$
 >   \sum_{n=1}^\infty |\langle x, e_n \rangle|^2 = \|x\|^2,
 > $$
 >
 > for all $x \in X$. 
 ??? note "*Proof*:"
    Will be added later.
 > *Lemma 1*: in every non-empty Hilbert space there exists a total orthonormal set.
 ??? note "*Proof*:"
    Will be added later.
 > *Theorem 2*: all total orthonormal sets in a Hilbert space have the same cardinality.
 ??? note "*Proof*:"
    Will be added later.
 This cardinality is called the Hilbert dimension or the orthogonal dimension of the Hilbert space.
 > *Theorem 3*: let $X$ be a Hilbert space, then
 >
 > 1. if $X$ is separable, every orthonormal set in $X$ is countable.
 > 2. if $X$ contains a countable total orthonormal set, then $X$ is separable.
 ??? note "*Proof*:"
    Will be added later.
 > *Theorem 4*: two Hilbert spaces $X$ and $\tilde X$ over the same field are isomorphic if and only if they have the same Hilbert dimension.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/metric-spaces/completeness.md
+++ b/docs/mathematics/functional-analysis/metric-spaces/completeness.md
@ -0,0 +1,243 @@
 # Completeness
 > *Definition 1*: a sequence $(x_n)_{n \in \mathbb{N}}$ in a metric space $(X,d)$ is a **Cauchy sequence** if 
 >
 > $$
 >   \forall \varepsilon > 0 \exists N \in \mathbb{N} \forall n,m > N: \quad d(x_n, x_m) < \varepsilon.
 > $$
 A convergent sequence $(x_n)_{n \in \mathbb{N}}$ in a metric space $(X,d)$ is always a Cauchy sequence since 
 $$
    \forall \varepsilon > 0 \exists N \in \mathbb{N}: \quad d(x_n, x) < \frac{\varepsilon}{2}, 
 $$
 for all $n > N$. By axiom 4 of the definition of a metric space we have for $m, n > N$
 $$
    d(x_m, x_n) \leq d(x_m, x) + d(x, x_n) < \frac{\varepsilon}{2} + \frac{\varepsilon}{2} = \varepsilon,
 $$
 showing that $(x_n)$ is Cauchy.
 > *Definition 2*: a metric space $(X,d)$ is **complete** if every Cauchy sequence in $X$ is convergent.
 Therefore, in a complete metric space every Cauchy sequence is a convergent sequence.
 > *Proposition 1*: let $M \subset X$ be a nonempty subset of a metric space $(X,d)$ and let $\overline M$ be the closure of $M$, then
 >
 > 1. $x \in \overline M \iff \exists (x_n)_{n \in \mathbb{N}} \text{ in } M: x_n \to x$,
 > 2. $M \text{ is closed } \iff M = \overline M$. 
 ??? note "*Proof*:"
    To prove statement 1, let $x \in \overline M$. If $x \notin M$ then $x$ is an accumulation point of $M$. Hence, for each $n \in \mathbb{N}$ the ball $B(x,\frac{1}{n})$ contains an $x_n \in M$ and $x_n \to x$ since $\frac{1}{n} \to 0$ as $n \to \infty$. Conversely, if $(x_n)_{n \in \mathbb{N}}$ is in $M$ and $x_n \to x$, then $x \in M$ or every neighbourhood of $x$ contains points $x_n \neq x$, so that $x$ is an accumulation point of $M$. Hence $x \in \overline M$. 
    Statement 2 follows from statement 1.
 We have that the following statement is equivalent to statement 2: $x_n \in M: x_n \to x \implies x \in M$. 
 > *Proposition 2*: let $M \subset X$ be a subset of a complete metric space $(X,d)$, then
 >
 > $$
 >   M \text{ is complete} \iff M \text{ is a closed subset of } X
 > $$
 ??? note "*Proof*:"
    Let $M$ be complete, by proposition 1 statement 1 we have that 
    $$
        \forall x \in \overline M \exists (x_n)_{n \in \mathbb{N}} \text{ in } M: x_n \to x.
    $$
    Since $(x_n)$ is Cauchy and $M$ is complete, $x_n$ converges in $M$ with the limit being unique by statement 1 in [lemma 1](). Hence, $x \in M$ which proves that $M$ is closed because $x \in \overline M$ has been chosen arbitrary.
    Conversely, let $M$ be closed and $(x_n)$ Cauchy in $M$. Then $x_n \to x \in X$ which implies that $x \in \overline M$ by statement 1 in proposition 1, and $x \in M$ since $M = \overline M$ by assumption. Hence, the arbitrary Cauchy sequence $(x_n)$ converges in $M$.
 > *Proposition 3*: let $T: X \to Y$ be a map from a metric space $(X,d)$ to a metric space $(Y,\tilde d)$, then
 > 
 > $$
 >   T \text{ is continuous in } x_0 \in X \iff x_n \to x_0 \implies T(x_n) \to T(x_0),
 > $$
 >
 > for any sequence $(x_n)_{n \in \mathbb{N}}$ in $X$ as $n \to \infty$. 
 ??? note "*Proof*:"
    Suppose $T$ is continuous at $x_0$, then for a given $\varepsilon > 0$ there is a $\delta > 0$ such that
    $$
        \forall \varepsilon > 0 \exists \delta > 0: \quad d(x, x_0) < \delta \implies \tilde d(Tx, Tx_0) < \varepsilon.
    $$
    Let $x_n \to x_0$ then 
    $$
        \exists N \in \mathbb{N} \forall n > N: \quad d(x_n, x_0) < \delta.
    $$
    Hence,
    $$
        \forall n > N: \tilde d(Tx_n, Tx_0) < \varepsilon.
    $$
    Which means that $T(x_n) \to T(x_0)$. 
    Conversely, suppose that $x_n \to x_0 \implies T(x_n) \to T(x_0)$ and $T$ is not continuous. Then
    $$
        \exists \varepsilon > 0: \forall \delta > 0 \exists x \neq x_0: \quad d(x, x_0) < \delta \quad \text{ however } \quad \tilde d(Tx, Tx_0) \geq \varepsilon,
    $$
    in particular, for $\delta = \frac{1}{n}$ there is a $x_n$ satisfying 
    $$
        d(x_n, x_0) < \frac{1}{n} \quad \text{ however } \quad \tilde d(Tx_n, Tx_0) \geq \varepsilon,
    $$
    Clearly $x_n \to x_0$ but $(Tx_n)$ does not converge to $Tx_0$ which contradicts $Tx_n \to Tx_0$. 
 ## Completeness proofs
 To show that a metric space $(X,d)$ is complete, one has to show that every Cauchy sequence in $(X,d)$ has a limit in $X$. This depends explicitly on the metric on $X$.
 The steps in a completeness proof are as follows
 1. take an arbitrary Cauchy sequence $(x_n)_{n \in \mathbb{N}}$ in $(X,d)$,
 2. construct for this sequence a candidate limit $x$,
 3. prove that $x \in X$,
 4. prove that $x_n \to x$ with respect to metric $d$.
 > *Proposition 4*: the Euclidean space $\mathbb{R}^n$ with $n \in \mathbb{N}$ and the metric $d$ defined by
 >
 > $$
 >   d(x,y) = \sqrt{\sum_{j=1}^n \big(x(j) - y(j) \big)^2},
 > $$
 >
 > for all $x,y \in \mathbb{R}^n$ is complete.
 ??? note "*Proof*:"
    Let $(x_m)_{m \in \mathbb{N}}$ be a Cauchy sequence in $(\mathbb{R}^n, d)$, then we have
    $$
        \forall \varepsilon > 0 \exists N \in \mathbb{N}: \forall m, k > N: d(x_m, x_k) = \sqrt{\sum_{j=1}^n \big(x_m(j) - x_k(j) \big)^2} < \varepsilon,
    $$
    obtains for all $j \in \mathbb{N}$: $|x_m(j) - x_k(j)| < \varepsilon$. 
    Which shows that $(x_m(j))_{m \in \mathbb{N}}$ is a Cauchy sequence in $\mathbb{R}$. Suppose that it converged by $x_m(j) \to x(j)$ as $(m \to \infty)$ then $x \in \mathbb{R}^n$ since $x = \big(x(1), \dots, x(n)\big)$. 
    Thus for $(k \to \infty)$ we have 
    $$
        d(x_m, x) < \varepsilon \implies x_m \to x,
    $$
    which implies that $\mathbb{R}^n$ is complete.
 A similar proof exists for the completeness of the Unitary space $\mathbb{C}^n$. 
 > *Proposition 5*: the space $C([a,b])$ of all **real-valued continuous functions** on a closed interval $[a,b]$ with $a<b \in \mathbb{R}$ with the metric $d$ defined by
 >
 > $$
 >   d(x,y) = \max_{t \in [a,b]} |x(t) - y(t)|,
 > $$
 >
 > for all $x, y \in C$ is complete.
 ??? note "*Proof*:"
    Let $(x_n)_{n \in \mathbb{N}}$ be a Cauchy sequence in $(C,d)$, then we have
    $$
        \forall \varepsilon > 0 \exists N \in \mathbb{N}: \forall n, m > N: d(x_n, x_m) = \max_{t \in [a,b]} |x_n(t) - x_m(t)| < \varepsilon,
    $$
    obtains for all $t \in [a,b]$: $|x_n(t) - x_m(t)| < \varepsilon$. 
    Which shows that $(x_m(t))_{m \in \mathbb{N}}$ for fixed $t \in [a,b]$ is a Cauchy sequence in $\mathbb{R}$. Since $\mathbb{R}$ is complete the sequence converges; $x_m(t) \to x(t)$ as $m \to \infty$. 
    Thus, for $m \to \infty$ we have
    $$
        d(x_n, x) = \max_{t \in [a,b]} | x_n(t) - x(t) | < \varepsilon,
    $$
    hence $\forall t \in [a,b]: | x_n(t) - x(t) < \varepsilon$, obtaining convergence to $x_n \to x$ as $n \to \infty$ and $x \in C$ which implies that $C$ is complete.
 While $C$ with a metric $d$ defined by 
 $$
    d(x,y) = \int_a^b |x(t) - y(t)| dt,
 $$
 for all $x,y \in C$ is incomplete.
 ??? note "*Proof*:"
    Will be added later.
 > *Proposition 6*: the space $l^p$ with $p \geq 1$ and the metric $d$ defined by
 >
 > $$
 >   d(x,y) = \Big(\sum_{j \in \mathbb{N}} | x(j) - y(j) |^p\Big)^\frac{1}{p},
 > $$
 >
 > for all $x,y \in l^p$ is complete.
 ??? note "*Proof*:"
    Let $(x_n)_{n \in \mathbb{N}}$ be a Cauchy sequence in $(l^p,d)$, then we have
    $$
        \forall \varepsilon > 0 \exists N \in \mathbb{N}: n, m > N: d(x_n, x_m) = \Big(\sum_{j \in \mathbb{N}} |x_n(j) - x_m(j)|^p\Big)^\frac{1}{p} < \varepsilon,
    $$
    obtains for all $j \in \mathbb{N}$: $|x_n(j) - x_m(j)| <\varepsilon$.
    Which shows that $(x_m(j))_{m \in \mathbb{N}}$ for fixed $j \in \mathbb{N}$ is a Cauchy sequence in $\mathbb{C}$. Since $\mathbb{C}$ is complete the sequence converges; $x_m(j) \to x(j)$ as $m \to \infty$. 
    Thus, for $m \to \infty$ we have 
    $$
        d(x_n, x) = \Big(\sum_{j \in \mathbb{N}} |x_n(j) - x(j)|^p\Big)^\frac{1}{p} < \varepsilon,
    $$
    implies that $x_n - x \in l^p$ and $x = x_n - (x_n - x) \in l^p \implies x \in l^p$ and $x_n \to x$ as $n \to \infty$ which implies that $l^p$ is complete.
 > *Proposition 7*: the space $l^\infty$ with the metric $d$ defined by
 >
 > $$
 >   d(x,y) = \sup_{j \in \mathbb{N}} | x(j) - y(j) |,
 > $$
 >
 > for all $x,y \in l^\infty$ is complete.
 ??? note "*Proof*:"
    Let $(x_n)_{n \in \mathbb{N}}$ be a Cauchy sequence in $(l^\infty,d)$, then we have
    $$
        \forall \varepsilon > 0 \exists N \in \mathbb{N}: n, m > N: d(x_n, x_m) = \sup_{j \in \mathbb{N}} | x_n(j) - x_m(j) | < \varepsilon,
    $$
    obtains for all $j \in \mathbb{N}$: $|x_n(j) - x_m(j)| <\varepsilon$.
    Which shows that $(x_m(j))_{m \in \mathbb{N}}$ for fixed $j \in \mathbb{N}$ is a Cauchy sequence in $\mathbb{C}$. Since $\mathbb{C}$ is complete the sequence converges; $x_m(j) \to x(j)$ as $m \to \infty$. 
    Thus, for $m \to \infty$ we have 
    $$
        d(x_n, x) = \sup_{j \in \mathbb{N}} | x_n(j) - x(j) | < \varepsilon \implies |x_n(j) = x(j)| < \varepsilon.
    $$
    Since $x_n \in l^\infty$ there exists a $k_n \in \mathbb{R}: |x_n(j)| \leq k_n$ for all $j \in \mathbb{N}$. Hence 
    $$
        |x(j)| \leq |x(j) - x_n(j)| + |x_n(j)| < \varepsilon + k_n,
    $$
    for all $j \in \mathbb{N}$ which implies that $x \in l^\infty$ and $x_n \to x$ as $n \to \infty$ obtaining that $ l^\infty$ is complete.
--- a/docs/mathematics/functional-analysis/metric-spaces/completion.md
+++ b/docs/mathematics/functional-analysis/metric-spaces/completion.md
@ -0,0 +1,20 @@
 # Completion
 > *Definition 1*: let $(X,d)$ and $(\tilde X, \tilde d)$ be metric spaces, then
 >
 > 1. a mapping $T: X \to \tilde X$ is an **isometry** if $\forall x, y \in X: \tilde d(Tx, Ty) = d(x,y)$.
 > 2. $(X,d)$ and $(\tilde X, \tilde d)$ are **isometric** if there exists a bijective isometry $T: X \to \tilde X$. 
 Hence, isometric spaces may differ at most by the nature of their points but are indistinguishable from the viewpoint of the metric. 
 Or in other words, the metric space $(\tilde X, \tilde d)$ is unique up to isometry.
 > *Theorem 1*: for every metric space $(X,d)$ there exists a complete metric space $(\tilde X, \tilde d)$ that contains a subset $W$ that satisfies the following conditions
 >
 > 1. $W$ is a metric space isometric with $(X,d)$.
 > 2. $W$ is dense in $X$.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/metric-spaces/convergence.md
+++ b/docs/mathematics/functional-analysis/metric-spaces/convergence.md
@ -0,0 +1,59 @@
 # Convergence
 > *Definition 1*: a sequence $(x_n)_{n \in \mathbb{N}}$ in a metric space $(X,d)$ is **convergent** if there exists an $x \in X$ such that
 >
 > $$
 >   \lim_{n \to \infty} d(x_n, x) = 0.
 > $$
 >
 > $x$ is the **limit** of $(x_n)$ and is denoted by 
 >
 > $$
 >   \lim_{n \to \infty} x_n = x,
 > $$
 >
 > or simply by $x_n \to x$, $(n \to \infty)$.
 We say that $(x_n)$ *converges to* $x$ or *has the limit* $x$. If $(x_n)$ is not convergent then it is **divergent**. 
 We have that the limit of a convergent sequence must be a point of $X$.
 > *Definition 2*: a non-empty subset $M \subset X$ of a metric space $(X,d)$ is **bounded** if there exists an $x_0 \in X$ and an $r > 0$ such that $M \subset B(x_0,r)$. 
 Furthermore, we call a sequence $(x_n)$ in $X$ a **bounded sequence** if the corresponding point set is a bounded subset of $X$.
 > *Lemma 1*: let $(X,d)$ be a metric space then
 >
 > 1. a convergent sequence in $X$ is bounded and its limit is unique,
 > 2. if $x_n \to x$ and $y_n \to y$ then $d(x_n, y_n) \to d(x,y)$, $(n \to \infty)$. 
 ??? note "*Proof*:"
    For statement 1, suppose that $x_n \to x$. Then, taking $\varepsilon = 1$, we can find an $N$ such that $d(x_n, x) < 1$ for all $n > N$. Which shows that $(x_n)$ is bounded. Suppose that $x_n \to x$ and $x_n \to z$ then by axiom 4 of the definition of a metric space we have 
    $$
        d(x_n, x) \leq d(x_n, z) + d(x, z) \to 0,
    $$
    as $n \to \infty$ and by axiom 2 of the definition of a metric space it follows that $x = z$. 
    For statement 2, we have that 
    $$
        d(x_n,y_n) \leq d(x_n, x) + d(x, y) + d(y, y_n),
    $$
    by axiom 4 of the definition of a metric space. Hence we obtain
    $$
        d(x_n, y_n) - d(x, y) \leq d(x_n, x) + d(y_n, y),
    $$
    such that
    $$
        |d(x_n, y_n) - d(x, y)| \leq d(x_n, x) + d(y_n, y) \to 0
    $$
    as $n \to \infty$. 
--- a/docs/mathematics/functional-analysis/metric-spaces/metric-spaces.md
+++ b/docs/mathematics/functional-analysis/metric-spaces/metric-spaces.md
@ -0,0 +1,82 @@
 # Metric spaces
 > *Definition 1*: a **metric space** is a pair $(X,d)$, where $X$ is a set and $d$ is a metric on $X$, which is a function on $X \times X$ such that 
 >
 > 1. $d$ is real, finite and nonnegative, 
 > 2. $\forall x,y \in X: \quad d(x,y) = 0 \iff x = y$,
 > 3. $\forall x,y \in X: \quad d(x,y) = d(y,x)$,
 > 4. $\forall x,y,z \in X: \quad d(x,y) \leq d(x,z) + d(y,z)$.
 The metric $d$ is also referred to as a distance function. With $x,y \in X: d(x,y)$ the distance from $x$ to $y$. 
 ## Examples of metric spaces
 For the **Real line** $\mathbb{R}$ the usual metric is defined by
 $$
    d(x,y) = |x - y|,
 $$
 for all $x,y \in \mathbb{R}$. Obtaining a metric space $(\mathbb{R}, d)$. 
 ??? note "*Proof*:"
    Will be added later.
 For the **Euclidean space** $\mathbb{R}^n$ with $n \in \mathbb{N}$, the usual metric is defined by
 $$
    d(x,y) = \sqrt{\sum_{j=1}^n (x(j) - y(j))^2},
 $$
 for all $x,y \in \mathbb{R}^n$ with $x = (x(j))$ and $y = (y(j))$. Obtaining a metric space $(\mathbb{R}^n, d)$. 
 ??? note "*Proof*:"
    Will be added later.
 Similar examples exist for the complex plane $\mathbb{C}$ and the unitary space $\mathbb{C}^n$. 
 For the space $C([a,b])$ of all **real-valued continuous functions** on a closed interval $[a,b]$ with $a<b \in \mathbb{R}$ the metric may be defined by
 $$
    d(x,y) = \max_{t \in [a,b]} |x(t) - y(t)|,
 $$
 for all $x,y \in C([a,b])$. Obtaining a metric space $(C([a,b]), d)$. 
 ??? note "*Proof*:"
    Will be added later.
 > *Definition 2*: let $l^p$ with $p \geq 1$ be the set of sequences $x \in l^p$ of complex numbers with the property that
 >
 > $$
 >   \sum_{j \in \mathbb{N}} | x(j) |^p \text{ is convergent},
 > $$
 >
 > for all $x \in l^p$. 
 We have that a metric $d$ for $l^p$ may be defined by 
 $$
    d(x,y) = (\sum_{j \in \mathbb{N}} | x(j) - y(j) |^p)^\frac{1}{p},
 $$
 for all $x,y \in l^p$. 
 ??? note "*Proof*:"
    Will be added later.
 From definition 2 the sequence space $l^\infty$ follows, which is defined as the set of all bounded sequences $x \in l^\infty$ of complex numbers. A metric $d$ of $l^\infty$ may be defined by
 $$
    d(x,y) = \sup_{j \in \mathbb{N}} | x(j) - y(j) |,
 $$
 for all $x, y \in l^\infty$. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/metric-spaces/topological-notions.md
+++ b/docs/mathematics/functional-analysis/metric-spaces/topological-notions.md
@ -0,0 +1,97 @@
 # Topological notions
 > *Definition 1*: let $(X,d)$ be a metric space and let $x_0 \in X$ and $r > 0$, the following may be defined 
 >
 > 1. **open ball**: $B(x_0, r) = \{x \in X \;|\; d(x,x_0) < r\}$,
 > 2. **closed ball**: $\tilde B(x_0,r) = \{x \in X \;|\; d(x,x_0) \leq r\}$,
 > 3. **sphere**: $S(x_0,r) = \{x \in X \;|\; d(x,x_0) = r\}$. 
 In all three cases $x_0$ can be thought of as the center and $r$ as the radius.
 > *Definition 2*: a subset $M \subset X$ of a metric space $(X,d)$ is **open** if $\forall x_0 \in M \exists r > 0: B(x_0,r) \subset M$. 
 > 
 > $M$ is **closed** if $X \backslash M$ is open. 
 Therefore, one may observe that an open ball is an open set and a closed ball is a closed set.
 ## Neigbourhoods
 > *Definition 3*: let $(X,d)$ be a metric space and let $x_0 \in X$, then $B(x_0, \varepsilon)$ is an **$\varepsilon$-neighbourhood** of $x_0$ for some $\varepsilon > 0$. 
 Using definition 3 we may define the following.
 > *Definition 4*: a **neighbourhood** of $x_0$ is a set that contains an $\varepsilon$-neighbourhood of $x_0$ for some $\varepsilon > 0$. 
 Therefore $x_0$ is an element of each of its neighbourhoods and if $N$ is a neighbourhood of $x_0$ and $N \subset M$, then $M$ is also a neighbourhood of $x_0$. 
 > *Definition 5*: let $(X,d)$ be a metric space and let $M \subset X$, a point $x_0 \in M$ is an **interior point** of $M$ if $M$ is a neighbourhood of $x_0$. 
 One may think of an interior point of a subset as a point that lies within the interior of $M$.
 > *Definition 6*: let $(X,d)$ be a metric space and let $M \subset X$, the **interior** of $M$, denoted by $M^\circ$ is the set of all interior points of $M$. 
 One may observe that $M^\circ$ is open and is the largest open set contained in $M$.
 > *Lemma 1*: let $(X,d)$ be a metric space and let $\mathscr{T}$ be the set of all open subsets of $X$, then
 >
 > 1. $\empty \in \mathscr{T} \land X \in \mathscr{T}$, 
 > 2. the union of a collection of sets in $\mathscr{T}$ is itself a set in $\mathscr{T}$,
 > 3. the intersection of a finite collection of sets in $\mathscr{T}$ is a set in $\mathscr{T}$. 
 ??? note "*Proof*:"
    Statement 1 follows by noting that $\empty$ is open since $\empty$ has no elements and $X$ is open. 
    For statement 2 we have that for any point $x$ of the union $U$ of open sets belongs to at least one of these sets $M$ and $M$ contains a ball $B$ about $x$. Then $B \subset U$, by the definition of a union. 
    For statement 3 we have that if $y$ is any point of the intersection of open sets $M_1, \dots, M_n$ with $n \in \mathbb{N}$ then each $M_j$ contains a ball about $y$ and the smallest of these balls is contained in that intersection. 
 From statements 1 and 3 from *lemma 1* we may define a topological space $(X,\mathscr{T})$ to be a set $X$ and a collection $\mathscr{T}$ of subsets of $X$ such that $\mathscr{T}$ satisfies the axioms 1 and 3. The set $\mathscr{T}$ is a topology for $X$, and it follows that a metric space is a topological space.
 ## Continuity
 > *Definition 7*: let $(X,d)$ and $(Y,\tilde d)$ be metric spaces and let $T: X \to Y$ be a map. $T$ is **continuous in** $x_0 \in X$ if 
 >
 > $$
 >   \forall \varepsilon > 0 \exists \delta > 0 \forall x \in X: \quad d(x,x_0) < \delta \implies \tilde d \big(T(x), T(x_0) \big) < \varepsilon.
 > $$
 >
 > A mapping $T$ is **continuous** if it is continuous in all $x_0 \in X$. 
 Continuous mappings can be characterized in terms of open sets as follows.
 > *Theorem 1*: let $(X,d)$ and $(Y,\tilde d)$ be metric spaces, a mapping $T: X \to Y$ is continuous if and only if the inverse image of any open subset of $Y$ is an open subset of $X$.
 ??? note "*Proof*:"
    Suppose that $T$ is continuous. Let $S \subset Y$ be open and $S_0$ the inverse image of $S$. If $S_0 = \empty$, it is open. Let $S_0 = \empty$. For any $x \in S_0$ let $y_0 = T(x_0)$. Since $S$ is open, it contains an $\varepsilon$-neighbourhood $N$ of $y_0$. Since $T$ is continuous, $x_0$ has a $\delta$-neighbourhood $N_0$ which is mapped into $N$. Since, $N \subset S$ we have $N_0 \subset S_0$ so that $S_0$ is open because $x_0 \in S_0$ is arbitrary.
    Suppose that the inverse image of every open set in $Y$ is an open set in $X$. Then for every $x_0 \in X$ and any $\varepsilon$-neighbourhood $N$ of $T(x_0)$, the inverse image $N_0$ of $N$ is open, since $N$ is open, and $N_0$ contains $x_0$. Hence, $N_0$ also contains a $\delta$-neighbourhood of $x_0$, which is mapped into $N$ because $N_0$ is mapped into $N$. Consequently, $T$ is continuous at $x_0$. Since $x_0 \in X$ was chosen arbitrary, $T$ is continuous.
 ## Accumulation points
 > *Definition 8*: let $M \subset X$ be a subset of a metric space $(X,d)$. A point $x_0 \in X$ is an **accumulation point** of $M$ if 
 > 
 > $$
 > \forall \varepsilon > 0 \exists y \in M \backslash \{x_0\}: d(x_0,y) < \varepsilon.
 > $$ 
 An accumulation point of a subset $M$ is also sometimes called a limit point of $M$. Implying the nature of these points. 
 > *Definition 9*: the set consisting of all points of $M$ and all accumulation points of $M$ is the **closure** of $M$, denoted by $\overline M$. 
 Therefore, $\overline M$ is the smallest closed set containing $M$. 
 > *Definition 10*: let $(X,d)$ be a metric space and let $M$ be a subset of $X$. The set $M$ is dense in $X$ if $\overline M = X$. 
 Hence if $M$ is dense in $X$, then every ball in $X$, no matter how small, will contain points of $M$. 
 > *Definition 11*: a metric space $(X,d)$ is separable if $X$ contains a countable subset $M$ that is dense in $X$. 
 For example the real line $\mathbb{R}$ is separable, since the set $\mathbb{Q}$ of all rational numbers is countable and is dense in $\mathbb{R}$. 
 Furthermore, $l^\infty$ is not separable while $l^p$ is indeed separable.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/normed-spaces/compactness.md
+++ b/docs/mathematics/functional-analysis/normed-spaces/compactness.md
@ -0,0 +1,60 @@
 # Compactness
 > *Definition 1*: a metric space $X$ is **compact** if every sequence in $X$ has a convergent subsequence. A subset $M$ of $X$ is compact if every sequence in $M$ has a convergent subsequence whose limit is an element of $M$.
 A general property of compact sets is expressed in the following proposition.
 > *Proposition 1*: a compact subset $M$ of a metric space $(X,d)$ is closed and bounded. 
 ??? note "*Proof*:"
    Will be added later.
 The converse of this proposition is generally false.
 ??? note "*Proof*:"
    Will be added later.
 However, for a finite dimensional normed space we have the following proposition.
 > *Proposition 2*: in a finite dimensional normed space $(X, \|\cdot\|)$ a subset $M \subset X$ is compact if and only if $M$ is closed and bounded.
 ??? note "*Proof*:"
    Will be added later.
 A source of interesting results is the following lemma.
 > *Lemma 1*: let $Y$ and $Z$ be subspaces of a normed space $(X, \|\cdot\|)$, suppose that $Y$ is closed and that $Y$ is a strict subset of $Z$. Then for every $\alpha \in (0,1)$ there exists a $z \in Z$, such that
 >
 > 1. $\|z\| = 1$,
 > 2. $\forall y \in Y: \|z - y\| \geq \alpha$.
 ??? note "*Proof*:"
    Will be added later.
 Lemma 1 gives the following remarkable proposition.
 > *Proposition 3*: if a normed space $(X, \|\cdot\|)$ has the property that the closed unit ball $M = \{x \in X | \|x\| \leq 1\}$ is compact, then $X$ is finite dimensional.
 ??? note "*Proof*:"
    Will be added later.
 Compact sets have several basic properties similar to those of finite sets and not shared by non-compact sets. Such as the following.
 > *Proposition 4*: let $(X,d_X)$ and $(Y,d_Y)$ be metric spaces and let $T: X \to Y$ be a continuous mapping. Let $M$ be a compact subset of $(X,d_X)$, then $T(M)$ is a compact subset of $(Y,d_Y)$. 
 ??? note "*Proof*:"
    Will be added later.
 From this proposition we conclude that the following property carries over to metric spaces.
 > *Corollary 1*: let $M \subset X$ be a compact subset of a metric space $(X,d)$ over a field $F$, a continuous mapping $T: M \to F$ attains a maximum and minimum value.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/normed-spaces/linear-functionals.md
+++ b/docs/mathematics/functional-analysis/normed-spaces/linear-functionals.md
@ -0,0 +1,37 @@
 # Linear functionals
 > *Definition 1*: a **linear functional** $f$ is a linear operator with its domain in a vector space $X$ and its range in a scalar field $F$ defined in $X$. 
 The norm can be a linear functional $\|\cdot\|: X \to F$ under the condition that the norm is linear. Otherwise, it would solely be a functional.
 > *Definition 2*: a **bounded linear functional** $f$ is a bounded linear operator with its domain in a vector space $X$ and its range in a scalar field $F$ defined in $X$. 
 ## Dual space
 > *Definition 3*: the set of linear functionals on a vector space $X$ is defined as the **algebraic dual space** $X^*$ of $X$. 
 From this definition we have the following.
 > *Theorem 1*: the algebraic dual space $X^*$ of a vector space $X$ is a vector space.
 ??? note "*Proof*:"
    Will be added later.
 Furthermore, a secondary type of dual space may be defined as follows.
 > *Definition 4*: the set of bounded linear functionals on a normed space $X$ is defined as **dual space** $X'$.
 In this case, a rather interesting property of a dual space emerges.
 > *Theorem 2*: the dual space $X'$ of a normed space $(X,\|\cdot\|_X)$ is a Banach space with its norm $\|\cdot\|_{X'}$ given by
 >
 > $$
 >   \|f\|_{X'} = \sup_{x \in X\backslash \{0\}} \frac{|f(x)|}{\|x\|_X} = \sup_{\substack{x \in X \\ \|x\|_X = 1}} |f(x)|, 
 > $$
 >
 > for all $f \in X'$. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/normed-spaces/linear-operators.md
+++ b/docs/mathematics/functional-analysis/normed-spaces/linear-operators.md
@ -0,0 +1,215 @@
 # Linear operators
 > *Definition 1*: a **linear operator** $T$ is a linear mapping such that
 >
 > 1. the domain $\mathscr{D}(T)$ of $T$ is a vector space and the range $\mathscr{R}(T)$ of $T$ is contained in a vector space over the same field as $\mathscr{D}(T)$.
 > 2. $\forall x, y \in \mathscr{D}(T): T(x + y) = Tx + Ty$.
 > 3. $\forall x \in \mathscr{D}(T), \alpha \in F: T(\alpha x) = \alpha Tx$.
 Observe the notation; we $Tx$ and $T(x)$ are equivalent, most of the time. 
 > *Definition 2*: let $\mathscr{N}(T)$ be the **null space** of $T$ defined as
 >
 > $$
 >   \mathscr{N}(T) = \{x \in \mathscr{D}(T) \;|\; Tx = 0\}.
 > $$
 We have the following properties.
 > *Proposition 1*: let $T$ be a linear operator, then
 >
 > 1. $\mathscr{R}(T)$ is a vector space,
 > 2. $\mathscr{N}(T)$ is a vector space,
 > 3. if $\dim \mathscr{D}(T) = n \in \mathbb{N}$ then $\dim \mathscr{R}(T) \leq n$.
 ??? note "*Proof*:"
    Will be added later.
 An immediate consequence of statement 3 is that linear operators preserve linear dependence.
 > *Proposition 2*: let $Y$ be a vector space, a linear operator $T: \mathscr{D}(T) \to Y$ is injective if 
 >
 > $$
 >   \forall x_1, x_2 \in \mathscr{D}(T): Tx_1 = Tx_2 \implies x_1 = x_2.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 Injectivity of $T$ is equivalent to $\mathscr{N}(T) = \{0\}$. 
 ??? note "*Proof*:"
    Will be added later.
 > *Theorem 1*: if a linear operator $T: \mathscr{D}(T) \to \mathscr{R}(T)$ is injective there exists a mapping $T^{-1}: \mathscr{R}(T) \to \mathscr{D}(T)$ such that
 >
 > $$
 >   y = Tx \iff T^{-1} y = x,
 > $$
 >
 > for all $x \in \mathscr{D}(T)$, denoted as the **inverse operator**.
 ??? note "*Proof*:"
    Will be added later.
 > *Proposition 3*: let $T: \mathscr{D}(T) \to \mathscr{R}(T)$ be an injective linear operator, if $\mathscr{D}(T)$ is finite-dimensional, then 
 > 
 > $$
 >   \dim \mathscr{D}(T) = \dim \mathscr{R}(T).
 > $$ 
 ??? note "*Proof*:"
    Will be added later.
 > *Lemma 1*: let $X,Y$ and $Z$ be vector spaces and let $T: X \to Y$ and $S: Y \to Z$ be injective linear operators, then $(ST)^{-1}: Z \to X$ exists and 
 >
 > $$
 >   (ST)^{-1} = T^{-1} S^{-1}.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 We finish this subsection with a definition of the space of linear operators.
 > *Definition 3*: let $\mathscr{L}(X,Y)$ denote the set of linear operators mapping from a vector space $X$ to a vector space $Y$. 
 From this definition the following theorem follows.
 > *Theorem 2*: let $X$ and $Y$ be vectors spaces, the set of linear operators $\mathscr{L}(X,Y)$ is a vector space.
 ??? note "*Proof*:"
    Will be added later.
 Therefore, we may also call $\mathscr{L}(X,Y)$ the space of linear operators.
 ## Bounded linear operators
 > *Definition 4*: let $(X, \|\cdot\|_X)$ and $(Y,\|\cdot\|_Y)$ be normed spaces over a field $F$ and let $T: \mathscr{D}(T) \to Y$ be a linear operator with $\mathscr{D}(T) \subset X$. Then $T$ is a **bounded linear operator** if
 >
 > $$
 >   \exists c \in F \forall x \in \mathscr{D}(T): \|Tx\|_Y \leq c \|x\|_X.
 > $$
 In this case we may also define the set of all bounded linear operators.
 > *Definition 5*: let $\mathscr{B}(X,Y)$ denote the set of bounded linear operators mapping from a vector space $X$ to a vector space $Y$.
 We have the following theorem.
 > *Theorem 3*: let $X$ and $Y$ be vectors spaces, the set of bounded linear operators $\mathscr{B}(X,Y)$ is a subspace of $\mathscr{L}(X,Y)$. 
 ??? note "*Proof*:"
    Will be added later.
 Likewise, we may call $\mathscr{B}(X,Y)$ the space of bounded linear operators.
 The smallest possible $c$ such that the statement in definition 4 still holds is denoted as the norm of $T$ in the following definition.
 > *Definition 5*: the norm of a bounded linear operator $T \in \mathscr{B}(X,Y)$ is defined by 
 >
 > $$
 >   \|T\|_{\mathscr{B}} = \sup_{x \in \mathscr{D}(T) \backslash \{0\}} \frac{\|Tx\|_Y}{\|x\|_X},
 > $$
 >
 > with $X$ and $Y$ vector spaces.
 The operator norm makes $\mathscr{B}$ into a normed space.
 > *Lemma 2*: let $X$ and $Y$ be normed spaces, the norm of a bounded linear operator $T \in \mathscr{B}(X,Y)$ may be given by
 >
 > $$
 >   \|T\|_\mathscr{B} = \sup_{\substack{x \in \mathscr{D}(T) \\ \|x\|_X = 1}} \|Tx\|_Y,
 > $$
 >
 > and the norm of a bounded linear operator is a norm. 
 ??? note "*Proof*:"
    Will be added later.
 Note that the second statement in lemma 2 is non trivial, as the norm of a bounded linear operator is only introduced by a definition. 
 > *Proposition 4*: if $(X, \|\cdot\|)$ is a finite-dimensional normed space, then every linear operator on $X$ is bounded.
 ??? note "*Proof*:"
    Will be added later.
 By linearity of the linear operators we have the following.
 > *Theorem 4*: let $X$ and $Y$ be normed spaces and let $T: \mathscr{D}(T) \to Y$ be a linear operator with $\mathscr{D}(T) \subset X$. Then the following statements are equivalent
 >
 > 1. $T$ is bounded,
 > 2. $T$ is continuous in $\mathscr{D}(T)$,
 > 3. $T$ is continuous in a point in $\mathscr{D}(T)$. 
 ??? note "*Proof*:"
    Will be added later.
 > *Corollary 1*: let $T \in \mathscr{B}$ and let $(x_n)_{n \in \mathbb{N}}$ be a sequence in $\mathscr{D}(T)$, then we have that
 >
 > 1. $x_n \to x \in \mathscr{D}(T) \implies Tx_n \to Tx$ as $n \to \infty$,
 > 2. $\mathscr{N}(T)$ is closed. 
 ??? note "*Proof*:"
    Will be added later.
 Furthermore, bounded linear operators have the property that 
 $$
    \|T_1 T_2\| \leq \|T_1\| \|T_2\|,
 $$
 for $T_1, T_2 \in \mathscr{B}$. 
 ??? note "*Proof*:"
    Will be added later.
 > *Theorem 5*: if $X$ is a normed space and $Y$ is a Banach space, then $\mathscr{B}(X,Y)$ is a Banach space.
 ??? note "*Proof*:"
    Will be added later.
 > *Definition 6*: let $T_1, T_2 \in \mathscr{L}$ be linear operators, $T_1$ and $T_2$ are **equal** if and only if
 >
 > 1. $\mathscr{D}(T_1) = \mathscr{D}(T_2)$,
 > 2. $\forall x \in \mathscr{D}(T_1) : T_1x = T_2x$.
 ## Restriction and extension
 > *Definition 7*: the **restriction** of a linear operator $T \in \mathscr{L}$ to a subspace $A \subset \mathscr{D}(T)$, denoted by $T|_A: A \to \mathscr{R}(T)$ is defined by
 >
 > $$
 >   T|_A x = Tx,
 > $$
 >
 > for all $x \in A$.
 Furthermore.
 > *Definition 8*: the **extension** of a linear operator $T \in \mathscr{L}$ to a vector space $M$ is an operator denoted by $\tilde T: M \to \mathscr{R}(T)$ such that
 >
 > $$
 >   \tilde T|_{\mathscr{D}(T)} = T.
 > $$
 Which implies that $\tilde T x = Tx\; \forall x \in \mathscr{D}(T)$. Hence, $T$ is the resriction of $\tilde T$.
 > *Theorem 6*: let $X$ be a normed space and let $Y$ be Banach space. Let $T \in \mathscr{B}(M,Y)$ with $A \subset X$, then there exists an extension $\tilde T: \overline M \to Y$, with $\tilde T$ a bounded linear operator and $\| \tilde T \| = \|T\|$. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/functional-analysis/normed-spaces/normed-spaces.md
+++ b/docs/mathematics/functional-analysis/normed-spaces/normed-spaces.md
@ -0,0 +1,202 @@
 # Normed spaces
 > *Definition 1*: a vector space $X$ is a **normed space** if a norm $\| \cdot \|: X \to F$ is defined on $X$, satisfying
 >
 > 1. $\forall x \in X: \|x\| \geq 0$,
 > 2. $\|x\| = 0 \iff x = 0$,
 > 3. $\forall x \in X, \alpha \in F: \|\alpha x\| = |\alpha| \|x\|$,
 > 4. $\forall x, y \in X: \|x + y\| \leq \|x\| + \|y\|$.
 Also called a *normed vector space* or *normed linear space*. 
 > *Proposition 1*: a norm on a vector space $X$ defines a metric $d$ on $X$ given by
 >
 > $$
 >   d(x,y) = \|x - y\|,
 > $$
 >
 > for all $x, y \in X$ and is called a **metric induced by the norm**.
 ??? note "*Proof*:"
    Will be added later.
 Furthermore, there is a category of normed spaces with interesting properties which is given in the following definition.
 > *Definition 2*: a **Banach space** is a complete normed space with its metric induced by the norm. 
 If we define the norm $\| \cdot \|$ of the Euclidean vector space $\mathbb{R}^n$ by
 $$
    \|x\| = \sqrt{\sum_{j=1}^n |x(j)|^2},
 $$
 for all $x \in \mathbb{R}^n$, then it yields the metric
 $$
    d(x,y) = \|x - y\| = \sqrt{\sum_{j=1}^n |x(j) - y(j)|^2},
 $$
 for all $x, y \in \mathbb{R}^n$ which imposes completeness. Therefore $(\mathbb{R}^n, \|\cdot\|)$ is a Banach space.
 This adaptation also works for $C$, $l^p$ and $l^\infty$, obviously. Obtaining that $\mathbb{R}^n$, $C$, $l^p$ and $l^\infty$ are all Banach spaces.
 > *Lemma 1*: a metric $d$ induced by a norm on a normed space $(X, \|\cdot\|)$ satisfies
 >
 > 1. $\forall x, y \in X, \alpha \in F: d(x + \alpha, y + \alpha) = d(x,y)$,
 > 2. $\forall x, y \in X, \alpha \in F: d(\alpha x, \alpha y) = |\alpha| d(x,y)$.
 ??? note "*Proof*:"
    We have
    $$
        d(x + \alpha, y + \alpha) = \|x + \alpha - (y + \alpha)\| = \|x - y\| = d(x,y),
    $$
    and
    $$
        d(\alpha x, \alpha y) = \|\alpha x - \alpha y\| = |\alpha| \|x - y\| = |\alpha| d(x,y).
    $$
 By definition, a subspace $M$ of a normed space $X$ is a subspace of $X$ with its norm induced by the norm on $X$. 
 > *Definition 3*: let $M$ be a subspace of a normed space $X$, if $M$ is closed then $M$ is a **closed subspace** of $X$. 
 By definition, a subspace $M$ of a Banach space $X$ is a subspace of $X$ as a normed space. Hence, we do not require $M$ to be complete.
 > *Theorem 1*: a subspace $M$ of a Banach space $X$ is complete if and only if $M$ is a closed subspace of $X$.
 ??? note "*Proof*:"
    Will be added later.
 Convergence in normed spaces follows from the definition of convergence in metric spaces and the fact that the metric is induced by the norm. 
 ## Convergent series
 > *Definition 4*: let $(x_k)_{k \in \mathbb{N}}$ be a sequence in a normed space $(X, \|\cdot\|)$. We define the sequence of partial sums $(s_n)_{n \in \mathbb{N}}$ by 
 >
 > $$
 >   s_n = \sum_{k=1}^n x_k,
 > $$
 >
 > if $s_n$ converges to $s \in X$, then
 >
 > $$
 >   \lim_{n \to \infty} \sum_{k=1}^n x_k, 
 > $$
 >
 > is convergent, and $s$ is the sum of the series, writing
 >
 > $$
 >   s = \lim_{n \to \infty} \sum_{k=1}^n x_k = \sum_{k=1}^\infty x_k = \lim_{n \to \infty } s_n.
 > $$
 >
 > If the series 
 >
 > $$
 >   \sum_{k=1}^\infty \|x_k\|,
 > $$
 >
 > is convergent in $F$, then the series is **absolutely convergent**.
 From the notion of absolute convergence the following theorem may be posed.
 > *Theorem 2*: absolute convergence of a series implies convergence if and only if $(X, \|\cdot\|)$ is complete.
 ??? note "*Proof*:"
    Will be added later.
 ## Schauder basis
 > *Definition 5*: let $(X, \|\cdot\|)$ be a normed space and let $(e_k)_{k \in \mathbb{N}}$ be a sequence of vectors in $X$, such that for every $x \in X$ there exists a unique sequence of scalars $(\alpha_k)_{k \in \mathbb{N}}$ such that
 >
 > $$
 >   \lim_{n \to \infty} \|x - \sum_{k=1}^n \alpha_k e_k\| = 0,
 > $$
 >
 > then $(e_k)_{k \in \mathbb{N}}$ is a **Schauder basis* of $(X, \|\cdot\|)$.
 The expansion of a $x \in X$ with respect to a Schauder basis $(e_k)_{k \in \mathbb{N}}$ is given by
 $$
    x = \sum_{k=1}^\infty \alpha_k e_k.
 $$
 > *Lemma 2*: if a normed space has a Schauder basis then it is seperable.
 ??? note "*Proof*:"
    Will be added later.
 ## Completion
 > *Theorem 3*: for every normed space $(X, \|\cdot\|_X)$ there exists a Banach space $(Y, \|\cdot\|_Y)$ that contains a subspace $W$ that satisfies the following conditions
 >
 > 1. $W$ is a normed space isometric with $X$. 
 > 2. $W$ is dense in $Y$.
 ??? note "*Proof*:"
    Will be added later.
 The Banach space $(Y, \|\cdot\|_Y)$ is unique up to isometry.
 ## Finite dimension
 > *Lemma 3*: let $\{x_k\}_{k=1}^n$ with $n \in \mathbb{N}$ be a linearly independent set of vectors in a normed space $(X, \|\cdot\|)$, then there exists a $c > 0$ such that
 >
 > $$
 >   \Big\| \sum_{k=1}^n \alpha_k x_k \Big\| \geq c \sum_{k=1}^n |\alpha_k|,
 > $$
 >
 > for all $\{\alpha_k\}_{k=1}^n \in F$.
 ??? note "*Proof*:"
    Will be added later.
 As a first application of this lemma, let us prove the following.
 > *Theorem 4*: every finite-dimensional subspace $M$ of a normed space $(X, \|\cdot\|)$ is complete. 
 ??? note "*Proof*:"
    Will be added later.
 In particular, every finite dimensional normed space is complete.
 > *Proposition 2*: every finite-dimensional subspace $M$ of a normed space $(X, \|\cdot\|)$ is a closed subspace of $X$. 
 ??? note "*Proof*:"
    Will be added later.
 Another interesting property of finite-dimensional vector space $X$ is that all norms on $X$ lead to the same topology for $X$. That is, the open subsets of $X$ are the same, regardless of the particular choice of a norm on $X$. The details are as follows.
 > *Definition 6*: a norm $\|\cdot\|_1$ on a vector space $X$ is **equivalent** to a norm $\|\cdot\|_2$ on $X$ if there exists $a,b>0$ such that
 >
 > $$
 >   \forall x \in X: a \|x\|_1 \leq \|x\|_2 \leq b \|x\|_1.
 > $$
 This concept is motivated by the following proposition.
 > *Proposition 3*: equivalent norms on $X$ define the same topology for $X$.
 ??? note "*Proof*:"
    Will be added later.
 Using lemma 3 we may now prove the following theorem.
 > *Theorem 5*: on a finite dimensional vector space $X$ any norm $\|\cdot\|_1$ is equivalent to any other norm $\|\cdot\|_2$. 
 ??? note "*Proof*:"
    Will be added later.
 This theorem is of considerable importance. For instance, it implies that convergence or divergence of a sequence in a finite dimensional vector space does not depend on the particular choice of a norm on that space.
--- a/docs/mathematics/functional-analysis/normed-spaces/vector-spaces.md
+++ b/docs/mathematics/functional-analysis/normed-spaces/vector-spaces.md
@ -0,0 +1,100 @@
 # Vector spaces
 > *Definition 1*: a **vector space** $X$ over a **scalar field** $F$ is a non-empty set, on which two algebraic operations are defined; vector addition and scalar multiplication. Such that
 >
 > 1. $(X, +)$ is a commutative group with neutral element 0.
 > 2. the scalar multiplication satisfies $\forall x, y \in X$ and $\lambda, \mu \in F$ 
 >    * $\lambda (x + y) = \lambda x + \lambda y$,
 >    * $(\lambda + \mu) x = \lambda x + \mu x$,
 >    * $\lambda (\mu x) = (\lambda \mu) x$,
 >    * $1 x = x$.
 When $F = \mathbb{R}$ we have a real vector space while when $F = \mathbb{C}$ we have a complex vector space.
 We have that the metric spaces $\mathbb{R}^n$, $C$, $l^p$ and $l^\infty$ are also vector spaces.
 ??? note "*Proof*:"
    I am too lazy to add this trivial proof. Maybe some time in the future, if I do not forget.
 > *Definition 2*: a **subspace** of a vector space $X$ is a non-empty subset $M$ of $X$, such that $\forall x, y \in M$ and $\lambda, \mu \in F$: 
 >
 > $$
 >   \lambda x + \mu y \in M,
 > $$
 >
 > with $M$ itself a vector space.
 A special subspace $M$ of a vector space $X$ is the *improper subspace* $M = X$. Every other subspace of $X$ is a *proper subspace*.
 ## Linear combinations
 > *Definition 3*: a **linear combination** of the vectors $\{x_i\}_{i=1}^n$ with $n \in \mathbb{N}$ is vector of the form
 >
 > $$
 >   \alpha_1 x_1 + \dots + \alpha_n x_n = \sum_{i=1}^n \alpha_i x_i,
 > $$
 >
 > with $\{\alpha_i\}_{i=1}^n \in F$. 
 The set of all linear combinations of a set of vectors is defined as follows.
 > *Definition 4*: the **span** of a subset $M \subset X$ of a vector space $X$, denoted by $\mathrm{span}(M)$, is the set of all linear combinations of vectors from $M$.
 It follows that $\mathrm{span}(M)$ is a subspace of $X$.
 ## Linear independence
 > *Definition 5*: a finite subset of vectors $M = \{x_i\}_{i=1}^n$ is **linearly independent** if 
 >
 > $$
 >   \sum_{i=1}^n \alpha_i x_i = 0 \implies \forall i \in \{1, \dots, n\}: \alpha_i = 0.
 > $$
 The converse may also be defined.
 > *Definition 6*: a finite subset of vectors $M = \{x_i\}_{i=1}^n$ is **linearly dependent** if $\exists \{\alpha_i\}_{i=1}^n \in F$ not all zero such that
 >
 > $$
 >   \sum_{i=1}^n \alpha_i x_i = 0.
 > $$
 The notions of linear dependence and independence may also be extended to infinite subsets.
 > *Definition 7*: a subset $M$ of a vector space $X$ is **linearly independent** if every non-empty finite subset of $M$ is linearly independent.
 While the converse in this case is defined by the contradiction.
 > *Definition 8*: a subset $M$ of a vector space $X$ is **linearly dependent** if $M$ is not linearly independent. 
 ## Dimension and basis
 > *Definition 9*: a vector space $X$ is **finite dimensional** if there exists a $n \in \mathbb{N}$, such that $X$ contains a set of $n$ linearly independent vectors, while every set of $n+1$ vectors in $X$ is linearly dependent. In this case $n$ is the dimension of $X$, denoted by $\dim X = n$.
 By definition $X = \{0\}$ is finite dimensional and $\dim X = 0$. 
 > *Definition 10*: if a vector space $X$ is not finite dimensional then $X$ is **infinite dimensional**.
 The following definition of a basis is both relevant to finite and infinite dimensional vector spaces.
 > *Definition 11*: a **basis** $B$ of a vector space $X$ is a linearly independent subset of $X$, that spans $X$.
 Such a set $B$ is also called a *Hamel basis* of $X$.
 > *Theorem 1*: every vector space $X$ has a Hamel basis.
 ??? note "*Proof*:"
    Read it again, a proof is not necessary.
 > *Theorem 2*: let $X$ be a vector space with $\dim X = n \in \mathbb{N}$. Then any proper subspace $M \subset X$ has dimension less than $n$. 
 ??? note "*Proof*:"
    If $n = 0$, then $X = \{0\}$ and $X$ has no proper subspace.
    If $\dim M = 0$, then $M = \{0\}$ and $X \neq M \implies \dim X \geq 1$. 
    If $\dim M = n$ then $M$ would have a basis of $n$ elements, which would also be a basis for $X$ since $\dim X = n$, so that $X = M$. 
    This shows that any linearly independent set of vectors in $M$ must have fewer than $n$ elements and $\dim M < n$. 
--- a/docs/mathematics/index.md
+++ b/docs/mathematics/index.md
@ -0,0 +1 @@
 # Mathematics
--- a/docs/mathematics/linear-algebra/determinants.md
+++ b/docs/mathematics/linear-algebra/determinants.md
@ -0,0 +1,303 @@
 # Determinants
 ## Definition
 With each $n \times n$ matrix $A$ with $n \in \mathbb{N}$ it is possible to associate a scalar, the determinant of $A$ denoted by $\det (A)$ or $|A|$.
 > *Definition*: let $A = (a_{ij})$ be an $n \times n$ matrix and let $M_{ij}$ denote the $(n-1) \times (n-1)$ matrix obtained from $A$ by deleting the row and column containing $a_{ij}$ with $n \in \mathbb{N}$ and $(i,j) \in \{1, \dots, n\} \times \{1, \dots, n\}$. The determinant of $M_{ij}$ is called the **minor** of $a_{ij}$. We define the **cofactor** of $a_{ij}$ by
 >
 > $$
 >   A_{ij} = (-1)^{i+j} \det(M_{ij}).
 > $$
 This definition is necessary to formulate a definition for the determinant, as may be observed below.
 > *Definition*: the **determinant** of an $n \times n$ matrix $A$ with $n \in \mathbb{N}$, denoted by $\det (A)$ or $|A|$ is a scalar associated with the matrix $A$ that is defined inductively as
 >
 > $$
 >   \det (A) = \begin{cases}a_{11} &\text{ if } n = 1 \\ a_{11} A_{11} + a_{12} A_{12} + \dots + a_{1n} A_{1n} &\text{ if } n > 1\end{cases}
 > $$
 >
 > where 
 >
 > $$
 >   A_{1j} = (-1)^{1+j} \det (M_{1j})
 > $$
 >
 > with $j \in \{1, \dots, n\}$ are the cofactors associated with the entries in the first row of $A$.
 <br>
 > *Theorem*: if $A$ is an $n \times n$ matrix with $n \in \mathbb{N} \backslash \{1\}$ then $\det(A)$ cam be expressed as a cofactor expansion using any row or column of $A$.
 ??? note "*Proof*:"
    Will be added later.
 We then have for a $n \times n$ matrix $A$ with $n \in \mathbb{N} \backslash \{1\}$
 $$
 \begin{align*}
    \det(A) &= a_{i1} A_{i1} + a_{i2} A_{i2} + \dots + a_{in} A_{in}, \\
            &= a_{1j} A_{1j} + a_{2j} A_{2j} + \dots + a_{nj} A_{nj},
 \end{align*}
 $$
 with $i,j \in \mathbb{N}$. 
 For example, the determinant of a $4 \times 4$ matrix $A$ given by 
 $$
    A = \begin{pmatrix} 0 & 2 & 3 & 0\\ 0 & 4 & 5 & 0\\ 0 & 1 & 0 & 3\\ 2 & 0 & 1 & 3\end{pmatrix}
 $$
 may be determined using the definition and the theorem above
 $$
    \det(A) = 2 \cdot (-1)^5 \det\begin{pmatrix} 2 & 3 & 0\\ 4 & 5 & 0\\ 1 & 0 & 3\end{pmatrix} = -2 \cdot 3 \cdot (-1)^6 \det\begin{pmatrix} 2 & 3 \\ 4 & 5\end{pmatrix} = 12.
 $$
 ## Properties of determinants
 > *Theorem*: if $A$ is an $n \times n$ matrix then $\det (A^T) = \det (A)$. 
 ??? note "*Proof*:"
    It may be observed that the result holds for $n=1$. Assume that the results holds for all $k \times k$ matrices and that $A$ is a $(k+1) \times (k+1)$ matrix for some $k \in \mathbb{N}$. Expanding $\det (A)$ along the first row of $A$ obtains
    $$
        \det(A) = a_{11} \det(M_{11}) - a_{12} \det(M_{12}) + \dots + (-1)^{k+2} a_{1(k+1)} \det(M_{1(k+1)}),
    $$
    since the minors are all $k \times k$ matrices it follows from the principle of natural induction that
    $$
        \det(A) = a_{11} \det(M_{11}^T) - a_{12} \det(M_{12}^T) + \dots + (-1)^{k+2} a_{1(k+1)} \det(M_{1(k+1)}^T).
    $$
    The right hand side of the above equation is the expansion by minors of $\det(A^T)$ using the first column of $A^T$, therefore $\det(A^T) = \det(A)$.
 > *Theorem*: if $A$ is an $n \times n$ triangular matrix with $n \in \mathbb{N}$, then the determinant of $A$ equals the product of the diagonal elements of $A$.
 ??? note "*Proof*:"
    Let $A$ be a $n \times n$ triagular matrix with $n \in \mathbb{N}$ given by
    $$
        A = \begin{pmatrix} a_{11} & \cdots &a_{1n}\\ & \ddots & \vdots \\ & & a_{nn} \end{pmatrix}.
    $$
    We claim that $\det(A) = a_{11} \cdot a_{22} \cdots a_{nn}$. We first check the claim for $n=1$ which is given by $\det(A) = a_{11}$. 
    Now suppose for some $k \in \mathbb{N}$, the determinant of a $k \times k$ triangular $A_{k}$ is given by
    $$
        \det(A_k) = a_1{11} \cdot a_{22} \cdots a_{kk}
    $$
    then by assumption 
    $$
        \det(A_{k+1}) = \begin{pmatrix} A_k & a_{(k+1)1}\\& \vdots\\ 0 \cdots 0 & a_{(k+1)(k+1)}\end{pmatrix} = a_{(k+1)(k+1)} \det(A_k) + 0 = a_{11}a_1{11} \cdot a_{22} \cdots a_{kk} \cdot a_{(k+1)(k+1)}. 
    $$
    Hence if the claim holds for some $k \in \mathbb{N}$ then it also holds for $k+1$. The principle of natural induction implies now that for all $n \in \mathbb{N}$ we have
    $$
        \det(A) = a_{11} \cdot a_{22} \cdots a_{nn}.
    $$
 > *Theorem*: let $A$ be an $n \times n$ matrix
 >
 > 1. if $A$ has a row or column consisting entirely of zeros, then $\det(A) = 0$.
 > 2. if $A$ has two identical rows or two identical columns, then $\det(A) = 0$.
 ??? note "*Proof*:"
    Will be added later.
 > *Lemma*: let $A$ be an $n \times n$ matrix with $n \in \mathbb{N}$. If $A_{jk}$ denotes the cofactor of $a_{jk}$ for $k \in \mathbb{N}$ then
 >
 > $$
 >   a_{i1} A_{j1} + a_{i2} A_{j2} + \dots + a_{in} A_{jn} = \begin{cases} \det(A) &\text{ if } i = j,\\ 0 &\text{ if } i \neq j.\end{cases}
 > $$
 ??? note "*Proof*:"
    If $i = j$ then we obtain the cofactor expansion of $\det(A)$ along the $i$th row of $A$. 
    If $i \neq j$, let $A^*$ be the matrix obtained by replacing the $j$th row of $A$ by the $i$th row of $A$
    $$
        A^* = \begin{pmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ \vdots \\ a_{11} & a_{12} & \cdots & a_{1n} \\ \vdots \\ a_{11} & a_{12} & \cdots & a_{1n} \\ \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{pmatrix} \begin{array}{ll} j\text{th row}\\ \\ \\ \\\end{array}
    $$
    since two rows of $A^*$ are the same its determinant must be zero. It follows from the cofactor expansion of $\det(A^*)$ along the $j$th row that
    $$
    \begin{align*}
        0 &= \det(A^*) = a_{i1} A_{j1}^* + a_{i2} A_{j2}^* + \dots + a_{in} A_{jn}^*, \\
        &= a_{i1} A_{j1} + a_{i2} A_{j2} + \dots + a_{in} A_{jn}.
    \end{align*}
    $$
 > *Theorem*: let $E$ be an $n \times n$ elementary matrix and $A$ an $n \times n$ matrix with $n \in \mathbb{N}$ then we have
 >
 > $$
 >   \det(E A) = \det(E) \det(A),
 > $$
 >
 > where
 >
 > $$
 >   \det(E) = \begin{cases} -1 &\text{ if $E$ is of type I},\\ \alpha \in \mathbb{R}\backslash \{0\} &\text{ if $E$ is of type II},\\ 1 &\text{ if $E$ is of type III}. \end{cases}
 > $$
 ??? note "*Proof*:"
    Will be added later.
 Similar results hold for column operations, since for the elementary matrix $E$, $E^T$ is also an elementary matrix and $\det(A E) = \det((AE)^T) = \det(E^T A^T) = \det(E^T) \det(A^T) = \det(E) \det(A)$. 
 > *Theorem*: an $n \times n$ matrix A with $n \in \mathbb{N}$ is singular if and only if
 >
 > $$
 >   \det(A) = 0
 > $$
 ??? note "*Proof*:"
    Let $A$ be an $n \times n$ matrixwith $n \in \mathbb{N}$. Matrix $A$ can be reduced to row echelon form with a finite number of row operations obtaining
    $$
        U = E_k E_{k-1} \cdots E_1 A,
    $$
    where $U$ is in $n \times n$ row echelon form and $E_i$ are $n \times n$ elementary matrices for $i \in \{1, \dots, k\}$. It follows then that 
    $$
    \begin{align*}
        \det(U) &= \det(E_k E_{k-1} \cdots E_1 A), \\
                &= \det(E_k) \det(E_{k-1}) \cdots \det(E_1) \det(A). 
    \end{align*}
    $$
    Since the determinants of the elementary matrices are all nonzero, it follows that $\det(A) = 0$ if and only if $\det(U) = 0$. If $A$ is singular then $U$ has a row consisting entirely of zeros and hence $\det(U) = 0$. If $A$ is nonsingular then $U$ is triangular with 1's along the diagonal and hence $\det(U) = 1$.
 From this theorem we may pose a method for computing $\det(A)$ by taking 
 $$
 \det(A) = \Big(\det(E_k) \det(E_{k-1} \cdots \det(E_1)\Big)^{-1}. 
 $$
 > *Theorem*: let $A$ and $B$ be $n \times n$ matrices with $n \in \mathbb{N}$ then
 >
 > $$
 >   \det(AB) = \det(A) \det(B)
 > $$
 ??? note "*Proof*:"
    If $n \times n$ matrix $B$ is singular with $n \in \mathbb{N}$ then it follows that $AB$ is also singular and therefore
    $$
        \det(AB) = 0 = \det(A) \det(B),
    $$
    If $B$ is nonsingular, $B$ can be written as a product of elementary matrices. Therefore
    $$
    \begin{align*}
        \det(AB) &= \det(A E_k \cdots E_1)
                &= \det(A)\det(E_k)\cdots\det(E_1)
                &- \det(A)\det(E_K \cdots E_1)
                &= \det(A)\det(B).
    \end{align*}
    $$
 > *Theorem*: let $A$ be a nonsingular $n \times n$ matrix with $n \in \mathbb{N}$,then we have
 >
 > $$
 >   \det(A^{-1}) = \frac{1}{\det(A)}.
 > $$
 ??? note "*Proof*:"
    Suppose $A$ is a nonsingular $n \times n$ matrix then
    $$
    A^{-1} A = I,
    $$
    and taking the determinant on both sides
    $$
    \det(A^{-1}A) = \det(A^{-1})\det(A) = \det(I) = 1,
    $$
    therefore
    $$
        \det(A^{-1}) = \frac{1}{\det(A)}.
    $$
 ## The adjoint of a matrix
 > *Definition*: let $A$ be an $n \times n$ matrix with $n \in \mathbb{N}$, the adjoint of $A$ is given by
 >
 > $$
 > \mathrm{adj}(A) = \begin{pmatrix} A_{11} & A_{21} & \dots & A_{n1} \\ A_{12} & A_{22} & \dots & A_{n2} \\ \vdots & \vdots & \ddots & \vdots \\ A_{1n} & A_{2n} & \dots & A_{nn}\end{pmatrix}
 > $$
 >
 > with $A_{ij}$ for $(i,j) \in \{1, \dots, n\} \times \{1, \dots, n\}$ the cofactors of $A$. 
 The use of the adjoint becomes in the following theorem, that generally saves a lot of time and brain capacity.
 > *Theorem*: let $A$ be a nonsingular $n \times n$ matrix with $n \in \mathbb{N}$ then we have
 >
 > $$
 >   A^{-1} = \frac{1}{\det(A)} \text{ adj}(A).
 > $$
 ??? note "*Proof*:"
    Suppose $A$ is a nonsingular $n \times n$ matrix with $n \in \mathbb{N}$, from the definition and the lemma above it follows that
    $$
        \text{adj}(A) A= \det(A) I,
    $$
    this may be rewritten into
    $$
        A^{-1} = \frac{1}{\det(A)} \text{ adj}(A).
    $$
 ## Cramer's rule
 > *Theorem*: let $A$ be an $n \times n$ nonsingular matrix with $n \in \mathbb{N}$ and let $\mathbf{b} \in \mathbb{R}^n$. Let $A_i$ be the matrix obtained by replacing the $i$th column of $A$ by $\mathbf{b}$. If $\mathbf{x}$ is the unique solution of $A\mathbf{x} = \mathbf{b}$ then
 >
 > $$
 >   x_i = \frac{\det(A_i)}{\det(A)}
 > $$
 >
 > for $i \in \{1, \dots, n\}$.
 ??? note "*Proof*:"
    Let $A$ be an $n \times n$ nonsingular matrix with $n \in \mathbb{N}$ and let $\mathbf{b} \in \mathbb{R}^n$. If $\mathbf{x}$ is the unique solution of $A\mathbf{x} = \mathbf{b}$ then we have
    $$
        \mathbf{x} = A^{-1} \mathbf{b} = \frac{1}{\det(A)} \text{ adj}(A) \mathbf{b}
    $$
    it follows that
    $$
    \begin{align*}
        x_i &= \frac{b_1 A_1i + \dots + b_n A_{ni}}{\det(A)} \\
            &= \frac{\det(A_i)}{\det(A)}
    \end{align*}
    $$
    for $i \in \{1, \dots, n\}$.
--- a/docs/mathematics/linear-algebra/dual-vector-spaces.md
+++ b/docs/mathematics/linear-algebra/dual-vector-spaces.md
@ -0,0 +1,57 @@
 # Dual vector spaces
 We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n.$ In the following sections we make use of the Einstein summation convention introduced in [vector analysis](/en/physics/mathematical-physics/vector-analysis/curvilinear-coordinates/) and $\mathbb{K} = \mathbb{R} \lor\mathbb{K} = \mathbb{C}$. 
 > *Definition 1*: let $\mathbf{\hat f}: V \to \mathbb{K}$ be a **covector** or **linear functional** on $V$ if for all $\mathbf{v}_{1,2} \in V$ and $\lambda, \mu \in \mathbb{K}$ we have
 >
 > $$
 >   \mathbf{\hat f}(\lambda \mathbf{v}_1 + \mu \mathbf{v}_2) = \lambda \mathbf{\hat f}(\mathbf{v}_1) + \mu \mathbf{\hat f}(\mathbf{v}_2).
 > $$
 Throughout this section covectors will be denoted by hats to increase clarity. 
 > *Definition 2*: let the the dual space $V^* \overset{\text{def}} = \mathscr{L}(V, \mathbb{K})$ denote the vector space of covectors on the vector space $V$.
 Each basis $\{\mathbf{e}_i\}$ of $V$ therefore induces a basis $\{\mathbf{\hat e}^i\}$ of $V^*$ by 
 $$
    \mathbf{\hat e}^i(\mathbf{v}) = v^i,
 $$
 for all $\mathbf{v} = v^i \mathbf{e}_i \in V$. 
 > *Theorem 1*: the dual basis $\{\mathbf{\hat e}^i\}$ of $V^*$ is uniquely determined by
 >
 > $$
 >   \mathbf{\hat e}^i(\mathbf{e}_j) = \delta_j^i,
 > $$
 >
 > for each basis $\{\mathbf{e}_i\}$ of $V$. 
 ??? note "*Proof*:"
    Let $\mathbf{\hat f} = f_i \mathbf{\hat e}^i \in V^*$ and let $\mathbf{v} = v^i \mathbf{e}_i \in V$, then we have
    $$
        \mathbf{\hat f}(\mathbf{v}) = \mathbf{\hat f}(v^i \mathbf{e}_i) = \mathbf{\hat f}(\mathbf{e}_i) v^i = \mathbf{\hat f}(\mathbf{e}_i) \mathbf{\hat e}^i(\mathbf{v}) = f_i \mathbf{\hat e}^i (\mathbf{v}), 
    $$
    therefore $\{\mathbf{\hat e}^i\}$ spans $V^*$. 
    Suppose $\mathbf{\hat e}^i(\mathbf{e}_j) = \delta_j^i$ and $\lambda_i \mathbf{\hat e}^i = \mathbf{0} \in V^*$, then
    $$
        \lambda_i = \lambda_j \delta_i^j = \lambda_j \mathbf{\hat e}^j(\mathbf{e}_i) = (\lambda_j \mathbf{\hat e}^j)(\mathbf{e}_i) = \mathbf{0},
    $$
    for all $i \in \mathbb{N}[i \leq n]$. Showing that $\{\mathbf{\hat e}^i\}$ is a linearly independent set.
 Obtaining a vector and consequent covector space having the same dimension $n$. 
 From theorem 1 it follows that for each covector basis $\{\mathbf{\hat e}^i\}$ of $V^*$ and each $\mathbf{\hat f} \in V^*$ there exists a unique collection of numbers $\{f_i\}$ such that $\mathbf{\hat f} = f_i \mathbf{\hat e}^i$. 
 > *Theorem 2*: the dual of the covector space $(V^*)^* \overset{\text{def}} = V^{**}$ is isomorphic to $V$. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/linear-algebra/eigenspaces.md
+++ b/docs/mathematics/linear-algebra/eigenspaces.md
@ -0,0 +1,267 @@
 # Eigenspaces
 ## Eigenvalues and eigenvectors
 If a linear transformation is represented by an $n \times n$ matrix $A$ and there exists a nonzero vector $\mathbf{x} \in V$ such that $A \mathbf{x} = \lambda \mathbf{x}$ for some $\lambda \in \mathbb{K}$, then for this transformation $\mathbf{x}$ is a natural choice to use as a basis vector for $V$. 
 > *Definition 1*: let $A$ be a $n \times n$ matrix, a scalar $\lambda \in \mathbb{K}$ is defined as an **eigenvalue** of $A$ if and only if there exists a vector $\mathbf{x} \in V \backslash \{\mathbf{0}\}$ such that 
 >
 > $$
 >   A \mathbf{x} = \lambda \mathbf{x},
 > $$
 >
 > with $\mathbf{x}$ defined as an **eigenvector** belonging to $\lambda$. 
 This notion can be further generalized to a linear operator $L: V \to V$ such that
 $$
    L(\mathbf{x}) = \lambda \mathbf{x},
 $$
 note that $L(\mathbf{x}) = A \mathbf{x}$, which implies the similarity. 
 Furthermore it follows from the definition that any linear combination of eigenvectors is also a eigenvector of $A$. 
 > *Theorem 1*: let $A$ be a $n \times n$ matrix, a scalar $\lambda \in \mathbb{K}$ is an eigenvalue of $A$ if and only if
 >
 > $$
 >   \det (A - \lambda I) = 0.
 > $$
 ??? note "*Proof*:"
    A scalar $\lambda \in \mathbb{K}$ is an eigenvalue of $A$ if and only if there exists a vector $\mathbf{x} \in V \backslash \{\mathbf{0}\}$ such that
    $$
        A \mathbf{x} = \lambda \mathbf{x},
    $$
    obtains
    $$
        A \mathbf{x} - \lambda \mathbf{x} = (A - \lambda I) \mathbf{x} = \mathbf{0},
    $$
    which implies that $(A - \lambda I)$ is singular and $\det(A - \lambda I) = 0$ by [definition](../determinants/#properties-of-determinants). 
 The eigenvalues $\lambda$ may thus be determined from the **characteristic polynomial** of degree $n$ that is obtained from $\det (A - \lambda I) = 0$. In particular, the eigenvalues are the roots of this polynomial.
 > *Theorem 2*: let $A$ be a $n \times n$ matrix and let $\lambda \in \mathbb{K}$ be an eigenvalue of $A$. A vector $\mathbf{x} \in V$ is an eigenvector of $A$ corresponding to $\lambda$ if and only if
 >
 > $$
 >   \mathbf{x} \in N(A - \lambda I) \backslash \{\mathbf{0}\}.
 > $$
 ??? note "*Proof*:"
    Let $A$ be a $n \times n$ matrix, $\mathbf{x} \in V$ is an eigenvector of $A$ if and only if
    $$
        A \mathbf{x} = \lambda \mathbf{x},
    $$
    for an eigenvalue $\lambda \in \mathbb{K}$. Therefore
    $$
        A \mathbf{x} - \lambda \mathbf{x} = (A - \lambda I) \mathbf{x} = \mathbf{0},
    $$
    which implies that $\mathbf{x} \in N(A - \lambda I)$. 
 Which implies that the eigenvectors can be obtained by determining the corresponding null space of $A - \lambda I$. 
 > *Definition 2*: let $L: V \to V$ be a linear operator and let $\lambda \in \mathbb{K}$ be an eigenvalue of $L$. Let the **eigenspace** $E_\lambda$ of the corresponding eigenvalue $\lambda$ be defined as 
 >
 > $$
 >   E_\lambda = \{\mathbf{x} \in V \;|\; L(\mathbf{x}) = \lambda \mathbf{x}\} = N(A - \lambda I),
 > $$
 >
 > with $L(\mathbf{x}) = A \mathbf{x}$. 
 It may be observed that $E_\lambda$ is a subspace of $V$ consisting of the zero vector and the eigenvectors of $L$ or $A.$
 ### Properties
 > *Theorem 3*: if $\lambda_1, \dots, \lambda_n \in \mathbb{K}$ are distinct eigenvalues of an $n \times n$ matrix $A$ with corresponding eigenvectors $\mathbf{x}_1, \dots \mathbf{x}_k \in V\backslash \{\mathbf{0}\}$, then $\mathbf{x}_1, \dots \mathbf{x}_k$ are linearly independent.
 ??? note "*Proof*:"
    Will be added later.
 If $A \in \mathbb{R}^{n \times n}$ and $A \mathbf{x} = \lambda \mathbf{x}$ for some $\mathbf{x} \in V$ and $\lambda \in \mathbb{K}$. Then 
 $$
    A \mathbf{\bar x} = \overline{A \mathbf{x}} = \overline{\lambda \mathbf{x}} = \bar \lambda \mathbf{\bar x}. 
 $$
 The complex conjugate of an eigenvector of $A$ is also an eigenvector of $A$ with an eigenvalue $\bar \lambda$. 
 > *Theorem 4*: let $A$ be a $n \times n$ matrix and let $\lambda_1, \dots, \lambda_n \in \mathbb{K}$ be the eigenvalues of $A$. It follows that
 >
 > $$
 >   \det (A - \lambda I) = (\lambda_1 - \lambda)(\lambda_2 - \lambda) \cdots (\lambda_n - \lambda),
 > $$
 >
 > and
 >
 > $$
 >   \det (A) = \lambda_1 \lambda_2 \cdots \lambda_n.
 > $$
 ??? note "*Proof*:"
    Let $A$ be a $n \times n$ matrix and let $\lambda_1, \dots, \lambda_n \in \mathbb{K}$ be the eigenvalues of $A$. It follows from the [fundamental theorem of algebra](../../number-theory/complex-numbers/#roots-of-polynomials) that 
    $$
        \det (A - \lambda I) = (\lambda_1 - \lambda)(\lambda_2 - \lambda) \cdots (\lambda_n - \lambda),
    $$
    by taking $\lambda = 0$ it follows that
    $$
        \det (A) = \lambda_1 \lambda_2 \cdots \lambda_n.
    $$
 From $\det (A) = \lambda_1 \lambda_2 \cdots \lambda_n$ it must follow that
 $$
    \mathrm{trace}(A) = \sum_{i=1}^n \lambda_i. 
 $$
 > *Theorem 5*: let $A$ and $B$ be $n \times n$ matrices. If $B$ is similar to $A$, then $A$ and $B$ have the same eigenvalues. 
 ??? note "*Proof*:"
    Let $A$ and $B$ be similar $n \times n$ matrices, then there exists a nonsingular matrix $S$ such that
    $$
        B = S^{-1} A S.
    $$
    Let $\lambda \in \mathbb{K}$ be an eigenvalue of $B$ then
    $$
    \begin{align*}
        0 &= \det(B - \lambda I), \\
        &= \det(S^{-1} A S - \lambda I), \\
        &= \det(S^{-1}(A - \lambda I) S), \\
        &= \det(S^{-1}) \det(A - \lambda I) \det(S), \\
        &= \det(A - \lambda I).
    \end{align*}
    $$
 ## Diagonalization
 > *Definition 3*: an $n \times n$ matrix $A$ is **diagonalizable** if there exists a nonsingular diagonalizing matrix $X$ and a diagonal matrix $D$ such that
 >
 > $$
 >   A X = X D.
 > $$
 We may now pose the following theorem.
 > *Theorem 6*: an $n \times n$ matrix $A$ is diagonalizable if and only if $A$ has $n \in \mathbb{N}$ linearly independent eigenvectors. 
 ??? note "*Proof*:"
    Will be added later.
 It follows from the proof that the column vectors of the diagonalizing matrix $X$ are eigenvectors of $A$ and the diagonal elements of $D$ are the corresponding eigenvalues of $A$. If $A$ is diagonalizable, then
 $$
    A = X D X^{-1},
 $$
 it follows then that
 $$
    A^k = X D^k X^{-1},
 $$
 for $k \in \mathbb{K}$. 
 ### Hermitian case
 The following section is for the special case that a matrix is [Hermitian](../matrices/matrix-arithmatic/#hermitian-matrix). 
 > *Theorem 7*: the eigenvalues of a Hermitian matrix are real.
 ??? note "*Proof*:"
    Let $A$ be a Hermitian matrix and let $\mathbf{x} \in V \backslash \{\mathbf{0}\}$ be an eigenvector of $A$ with corresponding eigenvalue $\lambda \in \mathbb{C}$. We have
    $$
    \begin{align*}
        \lambda \mathbf{x}^H \mathbf{x} &= \mathbf{x}^H (\lambda \mathbf{x}), \\
                                        &= \mathbf{x}^H (A \mathbf{x}), \\
                                        &= (\mathbf{x}^H A) \mathbf{x}, \\
                                        &= (A^H \mathbf{x})^H \mathbf{x} , \\
                                        &= (A \mathbf{x})^H \mathbf{x}, \\
                                        &= (\lambda \mathbf{x})^H \mathbf{x}, \\
                                        &= \bar \lambda \mathbf{x}^H \mathbf{x},
    \end{align*}
    $$
    since $\bar \lambda = \lambda$ we must have that $\lambda \in \mathbb{R}$. 
 > *Theorem 8*: the eigenvectors of a Hermitian matrix corresponding to distinct eigenvalues are orthogonal.
 ??? note "*Proof*:"
    Let $A$ be a Hermitian matrix and let $\mathbf{x}_1, \mathbf{x}_2 \in V \backslash \{\mathbf{0}\}$ be two eigenvectors of $A$ with corresponding eigenvalues $\lambda_1, \lambda_2 \in \mathbb{C}[\lambda_1 \neq \lambda_2]$. We have
    $$
    \begin{align*}
        \lambda_1 \mathbf{x}_1^H \mathbf{x}_2 &= (\lambda_1 \mathbf{x}_1)^H \mathbf{x}_2, \\
                                            &= (A \mathbf{x}_1)^H \mathbf{x}_2, \\
                                            &= \mathbf{x}_1^H A^H \mathbf{x}_2, \\
                                            &= \mathbf{x}_1^H A \mathbf{x}_2, \\
                                            &= \mathbf{x}_1^H (\lambda_2 \mathbf{x}_2), \\
                                            &= \lambda_2 \mathbf{x}_1^H \mathbf{x}_2,
    \end{align*}
    $$
    since $\lambda_1 \neq \lambda_2$ this must imply that $\mathbf{x}_1^H \mathbf{x}_2 = 0$, implying orthogonality in terms of the Hermite scalar product. 
 Theorem 7 and 8 impose that the following definition can be used.
 > *Definition 4*: an $n \times n$ matrix $U$ is **unitary** if the column vectors of $U$ form an orthonormal set in $V$. 
 Thus, $U$ is unitary if and only if $U^H U = I$. Then it also follows that $U^{-1} = U^H$. A real unitary matrix is an orthogonal matrix.
 One may observe that theorem 8 implies that the diagonalizing matrix of a Hermitian matrix $A$ is unitary when $A$ has distinct eigenvalues.
 > *Lemma 1*: if the eigenvalues of a Hermitian matrix $A$ are distinct, then there exists a unitary matrix $U$ and a diagonal matrix $D$ such that
 >
 > $$
 >   A U = U D.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 With the column vectors of $U$ the eigenvectors of $A$ and the diagonal elements of $D$ the corresponding eigenvalues of $A$. 
 > *Theorem 9*: let $A$ be an $n \times n$ matrix, there exists a unitary matrix $U$ and a upper triangular matrix $T$ such that 
 >
 > $$
 > A U = U T.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 The factorization $A = U T U^H$ is often referred to as the *Schur decomposition* of $A$. 
 > *Theorem 10*: if $A$ is Hermitian, then there exists a unitary matrix $U$ and a diagonal matrix $D$ such that
 >
 > $$
 >   A U = U D.
 > $$
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/linear-algebra/inner-product-spaces.md
+++ b/docs/mathematics/linear-algebra/inner-product-spaces.md
@ -0,0 +1,219 @@
 # Inner product spaces 
 ## Definition
 An introduction of length in a vector space may be formulated in terms of an inner product space. 
 > *Definition 1*: an **inner product** on $V$ is an operation on $V$ that assigns, to each pair of vectors $\mathbf{x},\mathbf{y} \in V$, a real number $\langle \mathbf{x},\mathbf{y}\rangle$ satisfying the following conditions
 >
 > 1. $\langle \mathbf{x},\mathbf{x}\rangle > 0, \text{ for } \mathbf{x} \in V\backslash\{\mathbf{0}\} \text{ and }  \langle \mathbf{x},\mathbf{x}\rangle = 0, \; \text{for } \mathbf{x} = \mathbf{0}$,
 > 2. $\langle \mathbf{x},\mathbf{y}\rangle = \overline{\langle \mathbf{y},\mathbf{x}\rangle}, \; \forall \mathbf{x}, \mathbf{y} \in V$, 
 > 3. $\langle a \mathbf{x} + b \mathbf{y}, \mathbf{z}\rangle = a \langle \mathbf{x},\mathbf{z}\rangle + b \langle \mathbf{y},\mathbf{z}\rangle, \; \forall \mathbf{x}, \mathbf{y}, \mathbf{z} \in V \text{ and } a,b \in \mathbb{K}$.
 A vector space $V$ with an inner product is called an **inner product space**. 
 ### Euclidean inner product spaces
 The standard inner product on the Euclidean vector spaces $V = \mathbb{R}^n$ with $n \in \mathbb{N}$ is given by the scalar product defined by
 $$
    \langle \mathbf{x},\mathbf{y}\rangle = \mathbf{x}^T \mathbf{y}, 
 $$
 for all $\mathbf{x},\mathbf{y} \in V$. 
 ??? note "*Proof*:"
    Will be added later.
 This can be extended to matrices $V = \mathbb{R}^{m \times n}$ with $m,n \in \mathbb{N}$ for which an inner product may be given by
 $$
    \langle A, B\rangle = \sum_{i=1}^m \sum_{j=1}^n a_{ij} b_{ij}, 
 $$
 for all $A, B \in V$. 
 ??? note "*Proof*:"
    Will be added later.
 ### Function inner product spaces
 Let $V$ be a function space with a domain $X$. An inner product on $V$ may be defined by
 $$
    \langle f, g\rangle = \int_X \bar f(x) g(x) dx 
 $$
 for all $f,g \in V$. 
 ??? note "*Proof*:"
    Will be added later.
 ### Polynomial inner product spaces
 Let $V$ be a polynomial space of degree $n \in \mathbb{N}$ with the set of numbers $\{x_i\}_{i=1}^n \subset \mathbb{K}^n$. An inner product on $V$ may be defined by
 $$
    \langle p, q \rangle = \sum_{i=1}^n \bar p(x_i) q(x_i),
 $$
 for all $p,q \in V$. 
 ??? note "*Proof*:"
    Will be added later.
 ## Properties of inner product spaces
 > *Definition 2*: let $V$ be an inner product space, the Euclidean length $\|\mathbf{v}\|$ of a vector $\mathbf{v}$ is defined as
 >
 > $$
 >   \|\mathbf{v}\| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle},
 > $$
 >
 > for all $\mathbf{v} \in V$. 
 Which is consistent with Euclidean geometry. According to definition 1 the distance between two vectors $\mathbf{v}, \mathbf{w} \in V$ is $\|\mathbf{v} - \mathbf{w}\|$.
 > *Definition 3*: let $V$ be an inner product space, the vectors $\mathbf{u}$ and $\mathbf{v}$ are orthogonal if
 >
 > $$
 >   \langle \mathbf{u}, \mathbf{v} \rangle = 0,
 > $$
 >
 > for all $\mathbf{u}, \mathbf{v} \in V$. 
 A pair of orthogonal vectors will satisfy the theorem of Pythagoras.
 > *Theorem 1*: let $V$ be an inner product space and $\mathbf{u}$ and $\mathbf{v}$ are orthogonal then
 >
 > $$
 >   \|\mathbf{u} + \mathbf{v}\|^2 = \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2,
 > $$
 >
 > for all $\mathbf{u}, \mathbf{v} \in V$. 
 ??? note "*Proof*:"
    let $V$ be an inner product space and let $\mathbf{u}, \mathbf{v} \in V$ be orthogonal, then
    $$
    \begin{align*}
        \|\mathbf{u} + \mathbf{v}\|^2 &= \langle \mathbf{u} + \mathbf{v}, \mathbf{u} + \mathbf{v}\rangle, \\
                                    &= \langle \mathbf{u}, \mathbf{u} \rangle + 2 \langle \mathbf{u}, \mathbf{v} \rangle + \langle \mathbf{v}, \mathbf{v} \rangle, \\
                                    &= \|\mathbf{u}\|^2 + \|\mathbf{v}\|^2. 
    \end{align*}
    $$
 Interpreted in $\mathbb{R}^2$ this is just the familiar Pythagorean theorem.
 > *Definition 4*: let $V$ be an inner product space then the **scalar projection** $a$ of $\mathbf{u}$ onto $\mathbf{v}$ is defined as
 >
 > $$
 >   a = \frac{1}{\|\mathbf{v}\|} \langle \mathbf{u}, \mathbf{v} \rangle,
 > $$
 >
 > for all $\mathbf{u} \in V$ and $\mathbf{v} \in V \backslash \{\mathbf{0}\}$. 
 >
 > The **vector projection** $p$ of $\mathbf{u}$ onto $\mathbf{v}$ is defined as
 >
 > $$
 > \mathbf{p} = a \bigg(\frac{1}{\|\mathbf{v}\|} \mathbf{v}\bigg) = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\langle \mathbf{v}, \mathbf{v} \rangle} \mathbf{v},
 > $$
 >
 > for all $\mathbf{u} \in V$ and $\mathbf{v} \in V \backslash \{\mathbf{0}\}$. 
 It may be observed that $\mathbf{u} - \mathbf{p}$ and $\mathbf{p}$ are orthogonal since $\langle \mathbf{p}, \mathbf{p} \rangle = a^2$ and $\langle \mathbf{u}, \mathbf{p} \rangle = a^2$ which implies 
 $$
    \langle \mathbf{u} - \mathbf{p}, \mathbf{p} \rangle = \langle \mathbf{u}, \mathbf{p} \rangle - \langle \mathbf{p}, \mathbf{p} \rangle = a^2 - a^2 = 0.
 $$
 Additionally, it may be observed that $\mathbf{u} = \mathbf{p}$ if and only if $\mathbf{u}$ is a scalar multiple of $\mathbf{v}$; $\mathbf{u} = b \mathbf{v}$ for some $b \in \mathbb{K}$. Since 
 $$
    \mathbf{p} = \frac{\langle b \mathbf{v}, \mathbf{v} \rangle}{\langle \mathbf{v}, \mathbf{v} \rangle} \mathbf{v} = b \mathbf{v} = \mathbf{u}.
 $$
 > *Theorem 2*: let $V$ be an inner product space then
 >
 > $$
 >   | \langle \mathbf{u}, \mathbf{v} \rangle | \leq \| \mathbf{u} \| \| \mathbf{v} \|,
 > $$
 >
 > is true for all $\mathbf{u}, \mathbf{v} \in V$. With equality only holding if and only if $\mathbf{u}$ and $\mathbf{v}$ are linearly dependent.
 ??? note "*Proof*:"
    let $V$ be an inner product space and let $\mathbf{u}, \mathbf{v} \in V$. If $\mathbf{v} = \mathbf{0}$, then
    $$
        | \langle \mathbf{u}, \mathbf{v} \rangle | = 0 = \| \mathbf{u} \| \| \mathbf{v} \|,
    $$
    If $\mathbf{v} \neq \mathbf{0}$, then let $\mathbf{p}$ be the vector projection of $\mathbf{u}$ onto $\mathbf{v}$. Since $\mathbf{p}$ is orthogonal to $\mathbf{u} - \mathbf{p}$ it follows that
    $$
        \| \mathbf{p} \|^2 + \| \mathbf{u} - \mathbf{p} \|^2 = \| \mathbf{u} \|^2,
    $$
    thus 
    $$
        \frac{1}{\|\mathbf{v}\|^2} \langle \mathbf{u}, \mathbf{v} \rangle^2 = \| \mathbf{p}\|^2 = \| \mathbf{u} \|^2 - \| \mathbf{u} - \mathbf{p} \|^2,
    $$
    and hence
    $$
        \langle \mathbf{u}, \mathbf{v} \rangle^2 = \|\mathbf{u}\|^2 \|\mathbf{v}\|^2 - \|\mathbf{u} - \mathbf{p}\|^2 \|\mathbf{v}\|^2 \leq \|\mathbf{u}\|^2 \|\mathbf{v}\|^2,
    $$
    therefore
    $$
        | \langle \mathbf{u}, \mathbf{v} \rangle | \leq \| \mathbf{u} \| \| \mathbf{v} \|.
    $$
    Equality holds if and only if $\mathbf{u} = \mathbf{p}$. From the above observations, this condition may be restated to linear dependence of $\mathbf{u}$ and $\mathbf{v}$. 
 A consequence of the Cauchy-Schwarz inequality is that if $\mathbf{u}$ and $\mathbf{v}$ aer nonzero vectors in an inner product space then
 $$
    -1 \leq \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\|\mathbf{u}\| \|\mathbf{v}\|} \leq 1,
 $$
 and hence there is a unique angle $\theta \in [0, \pi]$ such that 
 $$
    \cos \theta = \frac{\langle \mathbf{u}, \mathbf{v} \rangle}{\|\mathbf{u}\| \|\mathbf{v}\|}.
 $$
 ## Normed spaces
 > *Definition 5*: a vector space $V$ is said to be a **normed linear space** if to each vector $\mathbf{v} \in V$ there is associated a real number $\| \mathbf{v} \|$ satisfying the following conditions
 >
 > 1. $\|\mathbf{v}\| > 0, \text{ for } \mathbf{v} \in V\backslash\{\mathbf{0}\} \text{ and } \| \mathbf{v} \| = 0, \text{ for } \mathbf{v} = \mathbf{0}$,
 > 2. $\|a \mathbf{v}\| = |a| \|\mathbf{v}\|, \; \forall \mathbf{v} \in V \text{ and } a \in \mathbb{K}$,
 > 3. $\| \mathbf{v} + \mathbf{w}\| \geq \|\mathbf{v}\| + \| \mathbf{w}\|, \; \forall \mathbf{v}, \mathbf{w} \in V$,
 >
 > is called the **norm** of $\mathbf{v}$. 
 With the third condition, the *triangle inequality*. 
 > *Theorem 3*: let $V$ be an inner product space then
 >
 > $$
 >   \| \mathbf{v} \| = \sqrt{\langle \mathbf{v}, \mathbf{v} \rangle},
 > $$
 >
 > for all $\mathbf{v} \in V$ defines a norm on $V$. 
 ??? note "*Proof*:"
    Will be added later.
 We therefore have that the Euclidean length (definition 2) is a norm, justifying the notation.
--- a/docs/mathematics/linear-algebra/linear-transformations.md
+++ b/docs/mathematics/linear-algebra/linear-transformations.md
@ -0,0 +1,126 @@
 # Linear transformations
 ## Definition
 > *Definition*: let $V$ and $W$ be vector spaces, a mapping $L: V \to W$ is a **linear transformation** or **linear map** if 
 >
 > $$
 >   L(\lambda \mathbf{v}_1 + \mu \mathbf{v}_2) = \lambda L(\mathbf{v}_1) + \mu L(\mathbf{v}_2),
 > $$
 >
 > for all $\mathbf{v}_{1,2} \in V$ and $\lambda, \mu \in \mathbb{K}$. 
 A linear transformation may also be called a **vector space homomorphism**. If the linear transformation is a bijection then it may be called a **linear isomorphism**.
 In the case that the vector spaces $V$ and $W$ are the same; $V=W$, a linear transformation $L: V \to V$ will be referred to as a **linear operator** on $V$ or **linear endomorphism** . 
 ## The image and kernel
 Let $L: V \to W$ be a linear transformation from a vector space $V$ to a vector space $W$. In this section the effect is considered that $L$ has on subspaces of $V$. Of particular importance is the set of vectors in $V$ that get mapped into the zero vector of $W$. 
 > *Definition*: let $L: V \to W$ be a linear transformation. The **kernel** of $L$, denoted by $\ker(L)$, is defined by
 >
 > $$
 >   \ker(L) = \{\mathbf{v} \in V \;|\; L(\mathbf{v}) = \mathbf{0}\}.
 > $$
 The kernel is therefore a set consisting of vectors in $V$ that get mapped into the zero vector of $W$. 
 > *Definition*: let $L: V \to W$ be a linear transformation and let $S$ be a subspace of $V$. The **image** of $S$, denoted by $L(S)$, is defined by
 >
 > $$
 >   L(S) = \{\mathbf{w} \in W \;|\; \mathbf{w} = L(\mathbf{v}) \text{ for } \mathbf{v} \in S \}.
 > $$
 >
 > The image of the entire vector space $L(V)$, is called the **range** of $L$. 
 With these definitions the following theorem may be posed.
 > *Theorem*: if $L: V \to W$ is a linear transformation and $S$ is a subspace of $V$, then
 >
 > 1. $\ker(L)$ is a subspace of $V$.
 > 2. $L(S)$ is a subspace of $W$. 
 ??? note "*Proof*:"
    Let $L: V \to W$ be a linear transformation and $S$ is a subspace of $V$. 
    To prove 1, let $\mathbf{v}_{1,2} \in \ker(L)$ and let $\lambda, \mu \in \mathbb{K}$. Then
    $$
        L(\lambda \mathbf{v}_1 + \mu \mathbf{v}_2) = \lambda L(\mathbf{v}_1) + \mu L(\mathbf{v}_2) = \lambda \mathbf{0} + \mu \mathbf{0} = \mathbf{0},
    $$
    therefore $\lambda \mathbf{v}_1 + \mu \mathbf{v}_2 \in \ker(L)$ and hence $\ker(L)$ is a subspace of $V$. 
    To prove 2, let $\mathbf{w}_{1,2} \in L(S)$ then there exist $\mathbf{v}_{1,2} \in S$ such that $\mathbf{w}_{1,2} = L(\mathbf{v}_{1,2})$ For any $\lambda, \mu \in \mathbb{K}$ we have
    $$
        \lambda \mathbf{w}_1 + \mu \mathbf{w}_2 = \lambda L(\mathbf{v}_1) + \mu L(\mathbf{v}_2) = L(\lambda \mathbf{v}_1 + \mu \mathbf{v}_2),
    $$
    since $\lambda \mathbf{v}_1 + \mu \mathbf{v}_2 \in S$ it follows that $\lambda \mathbf{w}_1 + \mu \mathbf{w}_2 \in L(S)$ and hence $L(S)$ is a subspace of $W$. 
 ## Matrix representations 
 > *Theorem*: let $L: \mathbb{R}^n \to \mathbb{R}^m$ be a linear transformation, then there is an $m \times n$ matrix $A$ such that 
 >
 > $$
 >   L(\mathbf{x}) = A \mathbf{x},
 > $$
 >
 > for all $x \in \mathbb{R}^n$. With the $i$th column vector of $A$ given by
 >
 > $$
 >   \mathbf{a}_i = L(\mathbf{e}_i),
 > $$
 >
 > for a basis $\{\mathbf{e}_1, \dots, \mathbf{e}_n\} \subset \mathbb{R}^n$ and $i \in \{1, \dots, n\}$. 
 ??? note "*Proof*:"
    For $i \in \{1, \dots, n\}$, define
    $$
        \mathbf{a}_i = L(\mathbf{e}_i),
    $$
    and let
    $$
        A = (\mathbf{a}_1, \dots, \mathbf{a}_n).
    $$
    If $\mathbf{x} = x_1 \mathbf{e}_1 + \dots + x_n \mathbf{e}_n$ is an arbitrary element of $\mathbb{R}^n$, then
    $$
    \begin{align*}
        L(\mathbf{x}) &= x_1 L(\mathbf{e}_1) + \dots + x_n L(\mathbf{e}_n), \\
                    &= x_1 \mathbf{a}_1 + \dots + x_n \mathbf{a}_n, \\
                    &= A \mathbf{x}.
    \end{align*}
    $$
 It has therefore been established that each linear transformation from $\mathbb{R}^n$ to $\mathbb{R}^m$ can be represented in terms of an $m \times n$ matrix.
 > *Theorem*: let $E = \{\mathbf{e}_1, \dots, \mathbf{e}_n\}$ and $F = \{\mathbf{f}_1, \dots, \mathbf{f}_n\}$ be two ordered bases for a vector space $V$, and let $L: V \to V$ be a linear operator on $V$, $\dim V = n \in \mathbb{N}$. Let $S$ be the $n \times n$ transition matrix representing the change from $F$ to $E$, 
 > 
 > $$
 > \mathbf{e}_i = S \mathbf{f}_i,
 > $$ 
 > 
 > for $i \in \mathbb{N}; i\leq n$. 
 > 
 > If $A$ is the matrix representing $L$ with respect to $E$, and $B$ is the matrix representing $L$ with respect to $F$, then
 >
 > $$
 >   B = S^{-1} A S.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 > *Definition*: let $A$ and $B$ be $n \times n$ matrices. $B$ is said to be **similar** to $A$ if there exists a nonsingular matrix $S$ such that $B = S^{-1} A S$. 
 It follows from the above theorem that if $A$ and $B$ are $n \times n$ matrices representing the same operator $L$, then $A$ and $B$ are similar.
--- a/docs/mathematics/linear-algebra/matrices/elementary-matrices.md
+++ b/docs/mathematics/linear-algebra/matrices/elementary-matrices.md
@ -0,0 +1,69 @@
 # Elementary matrices
 > *Definition*: an *elementary* matrix is defined as an identity matrix with exactly one elementary row operation undergone. 
 >
 > 1. An elementary matrix of type 1 $E_1$ is obtained by changing two rows $I$.
 > 2. An elementary matrix of type 2 $E_2$ is obtained by multiplying a row of $I$ by a nonzero constant. 
 > 3. An elementary matrix of type 3 $E_3$ is obtained from $I$ by adding a multiple of one row to another row.
 For example the elementary matrices could be given by
 $$
    E_1 = \begin{pmatrix} 0 & 1 & 0 \\ 1 & 0 & 0 \\ 0 & 0 & 1\end{pmatrix}, \qquad E_2 = \begin{pmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 3\end{pmatrix}, \qquad E_3 = \begin{pmatrix}1 & 0 & 3\\ 0 & 1 & 0\\ 0 & 0 & 1\end{pmatrix}.
 $$
 > *Theorem*: if $E$ is an elementary matrix, then $E$ is nonsingular and $E^{-1}$ is an elementary matrix of the same type.
 ??? note "*Proof*:"
    If $E$ is the elementary matrix of type 1 formed from $I$ by interchanging the $i$th and $j$th rows, then $E$ can be transfomred back into $I$ by interchanging these same rows again. Therefore, $EE = I$ and hence $E$ is its own inverse.
    IF $E$ is the elementray matrix of type 2 formed by multiplying the $i$th row of $I$ by a nonzero scalar $\alpha$ then $E$ can be transformed into the identity matrix by multiplying either its $i$th row or its $i$th column by $1/\alpha$. 
    If $E$ is the elemtary matrix of type 3 formed from $I$ by adding $m$ times the $i$th row to the $j$th row then $E$ can be transformed back into $I$ either by subtracting $m$ times the $i$th row from the $j$th row or by subtracting $m$ times the $j$th column from the $i$th column.
 > *Definition*: a matrix $B$ is **row equivalent** to a matrix $A$ if there exists a finite sequence $E_1, E_2, \dots, E_K$ of elementary matrices with $k \in \mathbb{N}$ such that
 >
 > $$
 >   B = E_k E_{k-1} \cdots E_1 A.
 > $$
 It may be observed that row equivalence is a reflexive, symmetric and transitive relation.
 > *Theorem*: let $A$ be an $n \times n$ matrix, the following are equivalent
 >
 > 1. $A$ is nonsingular,
 > 2. $A\mathbf{x} = \mathbf{0}$ has only the trivial solution $\mathbf{0}$,
 > 3. $A$ is row equivalent to $I$.
 ??? note "*Proof*:"
    Let $A$ be a nonsingular $n \times n$ matrix and $\mathbf{\hat x}$ is a solution of $A \mathbf{x} = \mathbf{0}$ then
    $$
        \mathbf{\hat x} = I \mathbf{\hat x} = (A^{-1} A)\mathbf{\hat x} = A^{-1} (A \mathbf{\hat x}) = A^{-1} \mathbf{0} = \mathbf{0}.
    $$
    Let $U$ be the row echelon form of $A$. If one of the diagonal elements of $U$ were 0, the last row of $U$ would consist entirely of zeros. But then $A \mathbf{x} = \mathbf{0}$ would have a nontrivial solution. Thus $U$ must be a strictly triangular matrix with diagonal elements all equal to 1. It then follows that $I$ is the reduced row echelon form of $A$ and hence $A$ is row equivalent to $I$.
    If $A$ is row equivalent to $I$ there exists elementary matrices $E_1, E_2, \dots, E_k$ with $k \in \mathbb{N}$ such that 
    $$
        A = E_k E_{k-1} \cdots E_1 I = E_k E_{k-1} \cdots E_1.
    $$
    Since $E_i$ is invertible for $i \in \{1, \dots, k\}$ the product $E_k E_{k-1} \cdots E_1$ is also invertible, hence $A$ is nonsingular.
 If $A$ is nonsingular then $A$ is row equivalent to $I$ and hence there exists elemtary matrices $E_1, \dots, E_k$ such that 
 $$
    E_k E_{k-1} \cdots E_1 A = I,
 $$
 multiplyting both sides on the right by $A^{-1}$ obtains
 $$
    E_k E_{k-1} \cdots E_1 = A^{-1}
 $$
 a method for computing $A^{-1}$.
--- a/docs/mathematics/linear-algebra/matrices/matrix-algebra.md
+++ b/docs/mathematics/linear-algebra/matrices/matrix-algebra.md
@ -0,0 +1,94 @@
 # Matrix algebra
 > *Theorem*: let $A, B$ and $C$ be matrices and $\alpha$ and $\beta$ be scalars. Each of the following statements is valid 
 >
 > 1. $A + B = B + A$,
 > 2. $(A + B) + C = A + (B + C)$,
 > 3. $(AB)C = A(BC)$,
 > 4. $A(B + C) = AB + AC$,
 > 5. $(A + B)C = AC + BC$,
 > 6. $(\alpha \beta) A = \alpha(\beta A)$,
 > 7. $\alpha (AB) = (\alpha A)B = A (\alpha) B$,
 > 8. $(\alpha + \beta)A = \alpha A + \beta A$,
 > 9. $\alpha (A + B) = \alpha A + \alpha B$.
 ??? note "*Proof*:"
    Will be added later.
 In the case where an $n \times n$ matrix $A$ is multiplied by itself $k$ times it is convenient to use exponential notation: $AA \cdots A = A^k$. 
 > *Definition*: the $n \times n$ **identity matrix** is the matrix $I = (\delta_{ij})$, where
 >
 > $$
 > \delta_{ij} = \begin{cases} 1 &\text{ if } i = j, \\ 0 &\text{ if } i \neq j.\end{cases}
 > $$
 Obtaining for the multiplication of a $n \times n$ matrix $A$ with the identitiy matrix; $A I = A$.
 > *Definition*: an $n \times n$ matrix $A$ is said to be **nonsingular** or **invertible** if there exists a matrix $A^{-1}$ such that $AA^{-1} = A^{-1}A = I$. The matrix $A^{-1}$ is said to be a **multiplicative inverse** of $A$.
 If $B$ and $C$ are both multiplicative inverses of $A$ then 
 $$
    B = BI = B(AC) = (BA)C = IC = C,
 $$
 thus a matrix can have at most one multiplicative inverse.
 > *Definition*: an $n \times n$ matrix is said to be **singular** if it does not have a multiplicative inverse.
 Or similarly, an $n \times n$ matrix $A$ is singular if $A \mathbf{x} = \mathbf{0}$ for some non trivial $\mathbf{x} \in \mathbb{R}^n \backslash \{\mathbf{0}\}$. For a nonsingular matrix $A$, $\mathbf{x} = \mathbf{0}$ is the only solution to $A \mathbf{x} = \mathbf{0}$. 
 > *Theorem*: if $A$ and $B$ are nonsingular $n \times n$ matrices, then $AB$ is also nonsingular and 
 >
 > $$
 >   (AB)^{-1} = B^{-1} A^{-1}.
 > $$
 ??? note "*Proof*:"
    Let $A$ and $B$ be nonsingular $n \times n$ matrices. If we suppose $AB$ is nonsingular and $(AB)^{-1} = B^{-1} A^{-1}$ we have
    $$
        (AB)^{-1}AB = (B^{-1} A^{-1})AB = B^{-1} (A^{-1} A) B = B^{-1} B = I, \\
        AB(AB)^{-1} = AB(B^{-1} A^{-1}) = A (B B^{-1}) A^{-1} = A A^{-1} = I.
    $$
 > *Theorem*: let $A$ be a nonsingular $n \times n$ matrix, the inverse of $A$ given by $A^{-1}$ is nonsingular.
 ??? note "*Proof*:"
    Let $A$ be a nonsingular $n \times n$ matrix, $A^{-1}$ its inverse and $\mathbf{x} \in \mathbb{R}^n$ a vector. Suppose $A^{-1} \mathbf{x} = \mathbf{0}$ then
    $$
        \mathbf{x} = I \mathbf{x} = (A A^{-1}) \mathbf{x} = A(A^{-1} \mathbf{x}) = \mathbf{0}.
    $$
 > *Theorem*: let $A$ be a nonsingular $n \times n$ matrix then the solution of the system $A\mathbf{x} = \mathbf{b}$ is $\mathbf{x} = A^{-1} \mathbf{b}$ with $\mathbf{x}, \mathbf{b} \in \mathbb{R}^n$.
 ??? note "*Proof*:"
    Let $A$ be a nonsingular $n \times n$ matrix, $A^{-1}$ its inverse and $\mathbf{x}, \mathbf{b} \in \mathbb{R}^n$ vectors. Suppose $\mathbf{x} = A^{-1} \mathbf{b}$ then we have
    $$
        A \mathbf{x} = A (A^{-1} \mathbf{b}) = (A A^{-1}) \mathbf{b} = \mathbf{b}.
    $$
 > *Corollary*: the system $A \mathbf{x} = \mathbf{b}$ of $n$ linear equations in $n$ unknowns has a unique solution if and only if $A$ is nonsingular.
 ??? note "*Proof*:"
    The proof follows from the above theorem.
 > *Theorem*: let $A$ and $B$ be matrices and $\alpha$ and $\beta$ be scalars. Each of the following statements valid
 >
 > 1. $(A^T)^T = A$,
 > 2. $(\alpha A)^T = \alpha A^T$,
 > 3. $(A + B)^T = A^T + B^T$,
 > 4. $(AB)^T = B^T A^T$.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/linear-algebra/matrices/matrix-arithmetic.md
+++ b/docs/mathematics/linear-algebra/matrices/matrix-arithmetic.md
@ -0,0 +1,105 @@
 # Matrix arithmetic
 ## Definitions
 > *Definition*: let $A$ be a $m \times n$ *matrix* given by
 >
 > $$
 > A = \begin{pmatrix} a_{11} & a_{12}& \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{pmatrix}
 > $$
 >
 > with $a_{ij}$ referred to as the entries of $A$ or scalars in general, with $(i,j) \in \{1, \dots, m\} \times \{1, \dots, n\}$. For real entries in $A$ we may denote $A \in \mathbb{R}^{m \times n}$.
 This matrix may be denoted in a shorter way by $A = (a_{ij})$. 
 > *Definition*: let $\mathbf{x}$ be a $1 \times n$ matrix, referred to as *row vector* given by
 >
 > $$
 >   \mathbf{x} = \begin{pmatrix}x_1 \\ x_2 \\ \vdots \\ x_n\end{pmatrix}
 > $$
 >
 > with $x_i$ referred to as the entries of $\mathbf{x}$, with $i \in \{1, \dots, n\}$. For real entries we may denote $\mathbf{x} \in \mathbb{R}^n$.
 <br>
 > *Definition*: let $\mathbf{x}$ be a $n \times 1$ matrix, referred to as *column vector* given by
 >
 > $$
 >   \mathbf{x} = (x_1, x_2, \dots, x_n)
 > $$
 >
 > with $x_i$ referred to as the entries of $\mathbf{x}$, with $i \in \{1, \dots, n\}$. Also for the column vector we have for real entries $\mathbf{x} \in \mathbb{R}^n$. 
 From these two definitions it may be observed that row and column vectors may be used interchangebly, however using both it is important to state the difference. Best practice is to always work with row vectors and take the transpose if necessary.
 ## Matrix operations
 > *Definition*: two $m \times n$ matrices $A$ and $B$ are said to be **equal** if $a_{ij} = b_{ij}$ for each $i(i,j) \in \{1, \dots, m\} \times \{1, \dots, n\}$.
 <br>
 > *Definition*: if $A$ is an $m \times n$ matrix and $\alpha$ is a scalar, then $\alpha A$ is the $m \times n$ matrix whose $(i,j) \in \{1, \dots, m\} \times \{1, \dots, n\}$ entry is $\alpha a_{ij}$. 
 <br>
 > *Definition*: if $A = (a_{ij})$ and $B = (b_{ij})$ are both $m \times n$ matrices, then the sum $A + B$ is the $m \times n$ matrix whose $(i,j) \in \{1, \dots, m\} \times \{1, \dots, n\}$ entry is $a_{ij} + b_{ij}$ for each ordered pair $(i,j)$.
 If $A$ is an $m \times n$ matrix and $\mathbf{x}$ is a vector in $\mathbb{R}^n$, then
 $$
    A \mathbf{x} = x_1 \mathbf{a}_1 + x_2 \mathbf{a}_2 + \dots + x_n \mathbf{a}_n
 $$
 with $A = (\mathbf{a_1}, \mathbf{a_2}, \dots, \mathbf{a_n})$. 
 > *Definition*: if $\mathbf{a_1}, \mathbf{a_2}, \dots, \mathbf{a_n}$ are vectors in $\mathbb{R}^m$ and $x_1, x_2 \dots, x_n$ are scalars, then a sum of the form
 >
 > $$
 >   x_1 \mathbf{a}_1 + x_2 \mathbf{a}_2 + \dots + x_n \mathbf{a}_n
 > $$
 >
 > is said to be a **linear combination** of the vectors $\mathbf{a_1}, \mathbf{a_2}, \dots, \mathbf{a_n}$. 
 <br>
 > *Theorem*: a linear system $A \mathbf{x} = \mathbf{b}$ is consistent if and only if $\mathbf{b}$ can be written as a linear combination of the column vectors $A$. 
 ??? note "*Proof*:"
    Will be added later.
 ## Transpose matrix
 > *Definition*: the **transpose** of an $m \times n$ matrix A is the $n \times m$ matrix $B$ defined by
 >
 > $$
 >   b_{ji} = a_{ij},
 > $$
 >
 > for $j \in \{1, \dots, n\}$ and $i \in \{1, \dots m\}$. The transpose of $A$ is denoted by $A^T$.
 <br>
 > *Definition*: an $n \times n$ matrix $A$ is said to be **symmetric** if $A^T = A$.
 ## Hermitian matrix
 > *Definition*: the **conjugate transpose** of an $m \times n$ matrix A is the $n \times m$ matrix $B$ defined by
 >
 > $$
 >   b_{ji} = \bar a_{ij},
 > $$
 >
 > for $j \in \{1, \dots, n\}$ and $i \in \{1, \dots m\}$. The **conjugate transpose** of $A$ is denoted by $A^H$.
 <br>
 > *Definition*: an $n \times n$ matrix $A$ is said to be **Hermitian** if $A^H = A$.
 ## Matrix multiplication
 > *Definition*: if $A = (a_{ij})$ is an $m \times n$ matrix and $B = (b_{ij})$ is an $n \times r$ matrix, then the product $A B = C = (c_{ij})$ is the $m \times r$ matrix whose entries are defined by
 >
 > $$
 >   c_{ij} = \mathbf{a}_i \mathbf{b}_j = \sum_{k=1}^n a_{ik} b_{kj}
 > $$
--- a/docs/mathematics/linear-algebra/orthogonality.md
+++ b/docs/mathematics/linear-algebra/orthogonality.md
@ -0,0 +1,467 @@
 # Orthogonality
 ## Orthogonal subspaces
 > *Definition 1*: two subspaces $S$ and $T$ of an inner product space $V$ are **orthogonal** if 
 >
 > $$
 >   \langle \mathbf{u}, \mathbf{v} \rangle = 0,
 > $$
 >
 > for all $\mathbf{u} \in S$ and $\mathbf{v} \in T$. Orthogonality of $S$ and $T$ may be denoted by $S \perp T$. 
 The notion of orthogonality is only valid in vector spaces with a defined inner product.
 > *Definition 2*: let $S$ be a subspace of an inner product space $V$. The set of all vectors in $V$ that are orthogonal to every vector in $S$ will be denoted by $S^\perp$. Which implies
 >
 > $$
 >   S^\perp = \{\mathbf{v} \in V \;|\; \langle \mathbf{v}, \mathbf{u} \rangle = 0 \; \forall \mathbf{u} \in S \}.
 > $$
 >
 > The set $S^\perp$ is called the **orthogonal complement** of $S$. 
 For example the subspaces $X = \mathrm{span}(\mathbf{e}_1)$  and $Y = \mathrm{span}(\mathbf{e}_2)$ of $\mathbb{R}^3$ are orthogonal, but they are not orthogonal complements. Indeed,
 $$
    X^\perp = \mathrm{span}(\mathbf{e}_2, \mathbf{e}_3) \quad \text{and} \quad Y^\perp = \mathrm{span}(\mathbf{e}_1, \mathbf{e}_3). 
 $$
 We may observe that if $S$ and $T$ are orthogonal subspaces of an inner product space $V$, then $S \cap T = \{\mathbf{0}\}$. Since for $\mathbf{v} \in S \cap T$ and $S \perp T$ then $\langle \mathbf{v}, \mathbf{v} \rangle = 0$ and hence $\mathbf{v} = \mathbf{0}$. 
 Additionally, we may also observe that if $S$ is a subspace of an inner product space $V$, then $S^\perp$ is also a subspace of $V$. Since for $\mathbf{u} \in S^\perp$ and $a \in \mathbb{K}$ then 
 $$
    \langle a \mathbf{u}, \mathbf{v} \rangle = a \cdot 0 = 0
 $$
 for all $\mathbf{v} \in S$, therefore $a \mathbf{u} \in S^\perp$. 
 If $\mathbf{u}_1, \mathbf{u}_2 \in S^\perp$ then 
 $$
    \langle \mathbf{u}_1 + \mathbf{u}_2, \mathbf{v} \rangle = \langle \mathbf{u}_1, \mathbf{v} \rangle + \langle \mathbf{u}_2, \mathbf{v} \rangle = 0 + 0 = 0,
 $$
 for all $\mathbf{v} \in S$, and hence $\mathbf{u}_1 + \mathbf{u}_2 \in S^\perp$. Therefore $S^\perp$ is a subspace of $V$.
 ### Fundamental subspaces
 Let $V$ be an Euclidean inner product space $V = \mathbb{R}^n$ with its inner product defined by the [scalar product](../inner-product-spaces/#euclidean-inner-product-spaces). With this definition of the inner product on $V$ the following theorem may be posed.
 > *Theorem 1*: let $A$ be an $m \times n$ matrix, then
 >
 > $$
 >   N(A) = R(A^T)^\perp,
 > $$
 >
 > and 
 >
 > $$
 >   N(A^T) = R(A)^\perp,
 > $$
 >
 > for all $A \in \mathbb{R}^{m \times n}$ with $R(A)$ denoting the column space of $A$ and $R(A^T)$ denoting the row space of $A$. 
 ??? note "*Proof*:"
    Let $A \in \mathbb{R}^{m \times n}$ with $R(A) = \mathrm{span}(\mathbf{\vec{a}}_i^T)$ for $i \in \mathbb{N}[i \leq n]$ denoting the column space of $A$ and $R(A^T) = \mathrm{span}(\mathbf{a}_i)$ for $i \in \mathbb{N}[i \leq m]$ denoting the row space of $A$. 
    For the first equation, let $\mathbf{v} \in R(A^T)^\perp$ then $\mathbf{v}^T \mathbf{\vec{a}}_i^T = \mathbf{0}$ which obtains
    $$
        \mathbf{0} = \mathbf{v}^T \mathbf{\vec{a}}_i^T = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{\vec{a}}_i \mathbf{v},
    $$
    so $A \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A)$. Which implies that $R(A^T)^\perp \subseteq N(A)$. Similarly, let $\mathbf{w} \in N(A)$ then $A \mathbf{w} = \mathbf{0}$ which obtains
    $$
        \mathbf{0} = \mathbf{\vec{a}}_i \mathbf{v} = \big(\mathbf{v}^T \mathbf{\vec{a}}_i^T \big)^T = \mathbf{v}^T \mathbf{\vec{a}}_i^T,
    $$
    and hence $\mathbf{w} \in R(A^T)^\perp$ which implies that $N(A) \subseteq R(A^T)^\perp$. Therefore $N(A) = R(A^T)^\perp$.
    For the second equation, let $\mathbf{v} \in R(A)^\perp$ then $\mathbf{v}^T \mathbf{a}_i = \mathbf{0}$ which obtains
    $$
        \mathbf{0} = \mathbf{v}^T \mathbf{a}_i = \big(\mathbf{v}^T \mathbf{a}_i \big)^T = \mathbf{a}_i^T \mathbf{v},
    $$
    so $A^T \mathbf{v} = \mathbf{0}$ and hence $\mathbf{v} \in N(A^T)$. Which implies that $R(A)^\perp \subseteq N(A^T)$. Similarly, let $\mathbf{w} \in N(A^T)$ then $A^T \mathbf{w} = \mathbf{0}$ which obtains
    $$
        \mathbf{0} = \mathbf{a}_i^T \mathbf{w} = \big(\mathbf{a}_i^T \mathbf{w} \big)^T = \mathbf{w}^T \mathbf{a}_i, 
    $$
    and hence $\mathbf{w} \in R(A)^\perp$ which implies that $N(A^T) \subseteq R(A)^\perp$. Therefore $N(A^T) = R(A)^\perp$. 
 Known as the fundamental theorem of linear algebra. Which can be used to prove the following theorem.
 > *Theorem 2*: if $S$ is a subspace of the inner product space $V = \mathbb{R}^n$, then 
 >
 > $$
 >   \dim S + \dim S^\perp = n.
 > $$
 >
 > Furthermore, if $\{\mathbf{v}_i\}_{i=1}^r$ is a basis of $S$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ is a basis of $S^\perp$ then $\{\mathbf{v}_i\}_{i=1}^n$ is a basis of $V$. 
 ??? note "*Proof*:"
    If $S = \{\mathbf{0}\}$, then $S^\perp = V$ and 
    $$
        \dim S + \dim S^\perp = 0 + n = n.
    $$
    If $S \neq \{\mathbf{0}\}$, then let $\{\mathbf{x}_i\}_{i=1}^r$ be a basis of $S$ and define $X \in \mathbb{R}^{r \times m}$ whose $i$th row is $\mathbf{x}_i^T$ for each $i$. Matrix $X$ has rank $r$ and $R(X^T) = S$. Then by theorem 2
    $$
        S^\perp = R(X^T)^\perp = N(X),
    $$
    from the [rank nullity theorem](../vector-spaces/#rank-and-nullity) it follows that
    $$
        \dim S^\perp = \dim N(X) = n - r.
    $$
    and therefore 
    $$
        \dim S + \dim S^\perp = r + n - r = n.
    $$
    Let $\{\mathbf{v}_i\}_{i=1}^r$ be a basis of $S$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ be a basis of $S^\perp$. Suppose that 
    $$
        c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r + c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n = \mathbf{0}.
    $$
    Let $\mathbf{u} = c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r$ and let $\mathbf{w} = c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n$. Then we have
    $$
        \mathbf{u} + \mathbf{w} = \mathbf{0},
    $$
    implies $\mathbf{u} = - \mathbf{w}$ and thus both elements must be in $S \cap S^\perp$. However, $S \cap S^\perp = \{\mathbf{0}\}$, therefore
    $$
    \begin{align*}
        c_1 \mathbf{v}_1 + \dots + c_r \mathbf{v}_r &= \mathbf{0}, \\
        c_{r+1} \mathbf{v}_{r+1} + \dots + c_n \mathbf{v}_n &= \mathbf{0},
    \end{align*}
    $$
    since $\{\mathbf{v}_i\}_{i=1}^r$ and $\{\mathbf{v}_i\}_{i=r+1}^n$ are linearly independent, we must also have that $\{\mathbf{v}_i\}_{i=1}^n$ are linearly independent and therefore form a basis of $V$.
 We may further extend this with the notion of a direct sum.
 > *Definition 3*: if $U$ and $V$ are subspaces of a vector space $W$ and each $\mathbf{w} \in W$ can be written uniquely as
 >
 > $$
 >   \mathbf{w} = \mathbf{u} + \mathbf{v},
 > $$
 >
 > with $\mathbf{u} \in U$ and $\mathbf{v} \in V$ then $W$ is a **direct sum** of U and $V$ denoted by $W = U \oplus V$. 
 In the following theorem it will be posed that the direct sum of a subspace and its orthogonal complement make up the whole vector space, which extends the notion of theorem 2. 
 > *Theorem 3*: if $S$ is a subspace of the inner product space $V = \mathbb{R}^n$, then 
 >
 > $$
 >   V = S \oplus S^\perp. 
 > $$
 ??? note "*Proof*:"
    Will be added later.
 The following results emerge from these posed theorems.
 > *Proposition 1*: let $S$ be a subspace of $V$, then $(S^\perp)^\perp = S$. 
 ??? note "*Proof*:"
    Will be added later.
 Recall that the system $A \mathbf{x} = \mathbf{b}$ is consistent if and only if $\mathbf{b} \in R(A)$ since $R(A) = N(A^T)^\perp$ we have the following result.
 > *Proposition 2*: let $A \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$, then either there is a vector $\mathbf{x} \in \mathbb{R}^n$ such that 
 >
 > $$
 >   A \mathbf{x} = \mathbf{b}, 
 > $$
 >
 > or there is a vector $\mathbf{y} \in \mathbb{R}^m$ such that
 >
 > $$
 >   A^T \mathbf{y} = \mathbf{0} \;\land\; \mathbf{y}^T \mathbf{b} \neq 0 .
 > $$
 ??? note "*Proof*:"
    Will be added later.
 ## Orthonormal sets
 In working with an inner product space $V$, it is generally desirable to have a basis of mutually orthogonal unit vectors.
 > *Definition 4*: the set of vectors $\{\mathbf{v}_i\}_{i=1}^n$ in an inner product space $V$ is **orthogonal** if 
 >
 > $$
 >   \langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0,
 > $$
 >
 > whenever $i \neq j$. Then $\{\mathbf{v}_i\}_{i=1}^n$ is said to be an **orthogonal set** of vectors.
 For example the trivial set $\mathbf{e}_1, \mathbf{e}_2, \mathbf{e}_3$ is an orthogonal set in $\mathbb{R}^3$. 
 > *Theorem 4*: if $\{\mathbf{v}_i\}_{i=1}^n$ is an orthogonal set of nonzero vectors in an inner product space $V$, then $\{\mathbf{v}_i\}_{i=1}^n$ are linearly independent.
 ??? note "*Proof*:"
    Suppose that $\{\mathbf{v}_i\}_{i=1}^n$ is an orthogonal set of nonzero vectors in an inner product space $V$ and 
    $$
        c_1 \mathbf{v}_1 + \dots + c_n \mathbf{v}_n = \mathbf{0},
    $$
    then
    $$
        c_1 \langle \mathbf{v}_j, \mathbf{v}_1 \rangle + \dots + c_n \langle \mathbf{v}_j, \mathbf{v}_n \rangle = 0,
    $$
    for $j \in \mathbb{N}[j \leq n]$ obtains $c_j \|\mathbf{v}_j\| = 0$ and hence $c_j = 0$ for all $j \in \mathbb{N}[j \leq n]$. 
 We may even go further and define a set of vectors that are orthogonal and have a length of $1$, a unit vector by definition. 
 > *Definition 5*: an **orthonormal** set of vectors is an orthogonal set of unit vectors.
 For example the set $\{\mathbf{u}_i\}_{i=1}^n$ will be orthonormal if and only if
 $$
    \langle \mathbf{u}_i, \mathbf{u}_j \rangle = \delta_{ij},
 $$
 where 
 $$
    \delta_{ij} = \begin{cases} 1 &\text{ for } i = j, \\ 0 &\text{ for } i \neq j.\end{cases}
 $$
 > *Theorem 5*: let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$. If
 >
 > $$
 >   \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
 > $$
 >
 > then $c_i = \langle \mathbf{v}, \mathbf{u}_i \rangle$ for all $i \in \mathbb{N}[i \leq n]$. 
 ??? note "*Proof*:"
    Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let 
    $$
        \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
    $$
    we have
    $$
        \langle \mathbf{v}, \mathbf{u}_i \rangle = \Big\langle \sum_{j=1}^n c_j \mathbf{u}_j, \mathbf{u}_i \Big\rangle = \sum_{j=1}^n c_j \langle \mathbf{u}_j, \mathbf{u}_i \rangle = \sum_{j=1}^n c_j \delta_{ij} = c_i.
    $$
 Implying that it is much easier to calculate the coordinates of a given vector with respect to an orthonormal basis.
 > *Corollary 1*: let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$. If 
 >
 > $$
 >   \mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i,
 > $$ 
 >
 > and 
 >
 > $$
 >   \mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i,
 > $$
 >
 > then $\langle \mathbf{v}, \mathbf{w} \rangle = \sum_{i=1}^n a_i b_i$. 
 ??? note "*Proof*:"
    Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let 
    $$
        \mathbf{v} = \sum_{i=1}^n a_i \mathbf{u}_i,
    $$
    and 
    $$
        \mathbf{w} = \sum_{i=1}^n b_i \mathbf{u}_i,
    $$
    by theorem 5 we have 
    $$
        \langle \mathbf{v}, \mathbf{w} \rangle = \Big\langle \sum_{i=1}^n a_i \mathbf{u}_i, \mathbf{w} \Big\rangle = \sum_{i=1}^n a_i \langle \mathbf{w}, \mathbf{u}_i \rangle = \sum_{i=1}^n a_i b_i. 
    $$
 > *Corollary 2*: let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and
 >
 > $$
 >   \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
 > $$
 >
 > then
 >
 > $$
 >   \|\mathbf{v}\|^2 = \sum_{i=1}^n c_i^2.
 > $$
 ??? note "*Proof*:"
    Let $\{\mathbf{u}_i\}_{i=1}^n$ be an orthonormal basis of an inner product space $V$ and let 
    $$
        \mathbf{v} = \sum_{i=1}^n c_i \mathbf{u}_i,
    $$
    then by corollary 1 we have
    $$
        \|\mathbf{v}\|^2 = \langle \mathbf{v}, \mathbf{v} \rangle = \sum_{i=1}^n c_i \mathbf{u}_i.
    $$
 ### Orthogonal matrices
 > *Definition 6*: an $n \times n$ matrix $Q$ is an **orthogonal matrix** if 
 >
 > $$
 >   Q^T Q = I.
 > $$
 Orthogonal matrices have column vectors that form an orthonormal set in $V$, as may be posed in the following theorem.
 > *Theorem 6*: let $Q = (\mathbf{q}_1, \dots, \mathbf{q}_n)$ be an orthogonal matrix, then $\{\mathbf{q}_i\}_{i=1}^n$ is an orthonormal set.
 ??? note "*Proof*:"
    Let $Q = (\mathbf{q}_1, \dots, \mathbf{q}_n)$ be an orthogonal matrix. Then
    $$
        Q^T Q = I,
    $$
    and hence $\mathbf{q}_i^T \mathbf{q}_j = \delta_{ij}$ such that for an inner product space with a scalar product we have
    $$
        \langle \mathbf{q}_i, \mathbf{q}_j \rangle = 0,
    $$
    for $i \neq j$. 
 It follows then that if $Q$ is an orthogonal matrix, then $Q$ is nonsingular and $Q^{-1} = Q^T$. 
 In general scalar products are preserved under multiplication by an orthogonal matrix since 
 $$
    \langle Q \mathbf{u}, Q \mathbf{v} \rangle = (Q \mathbf{v})^T Q \mathbf{u} = \mathbf{v}^T Q^T Q \mathbf{u} = \langle \mathbf{u}, \mathbf{v} \rangle. 
 $$
 In particular, if $\mathbf{u} = \mathbf{v}$ then $\|Q \mathbf{u}\|^2 = \|\mathbf{u}\|^2$ and hence $\|Q \mathbf{u}\| = \|\mathbf{u}\|$. Multiplication by an orthogonal matrix preserves the lengths of vectors.
 ## Orthogonalization process
 Let $\{\mathbf{a}_i\}_{i=1}^n$ be a basis of an inner product space $V$. We may use the modified method of Gram-Schmidt to determine the orthonormal basis $\{\mathbf{q}_i\}_{i=1}^n$ of $V$.
 Let $\mathbf{q}_1 = \frac{1}{\|\mathbf{a}_1\|} \mathbf{a}_1$ be the first step. 
 Then we may induce the following step for $i \in \mathrm{range}(2,n)$:
 $$
 \begin{align*}
    \mathbf{w} &= \mathbf{a}_i - \langle \mathbf{a}_i, \mathbf{q}_1 \rangle \mathbf{q}_1 - \dots - \langle \mathbf{a}_i, \mathbf{q}_{i-1} \rangle \mathbf{q}_{i-1}, \\
    \mathbf{q}_i &= \frac{1}{\|\mathbf{w}\|} \mathbf{w}.
 \end{align*}
 $$
 ??? note "*Proof*:"
    Will be added later.
 ## Least squares solutions of overdetermined systems
 A standard technique in mathematical and statistical modeling is to find a least squares fit to a set of data points. This implies that the sum of squares fo errors between the model and the data points are minimized. A least squares problem can generally be formulated as an overdetermined linear system of equations. 
 For a system of equations $A \mathbf{x} = \mathbf{b}$ with $A \in \mathbb{R}^{m \times n}$ with $m, n \in \mathbb{N}[m>n]$ and $\mathbf{b} \in \mathbb{R}^m$ then for each $\mathbf{x} \in \mathbb{R}^n$ a *residual* $\mathbf{r}: \mathbb{R}^n \to \mathbb{R}^m$ can be formed
 $$
    \mathbf{r}(\mathbf{x}) = \mathbf{b} - A \mathbf{x}.
 $$
 The distance between $\mathbf{b}$ and $A \mathbf{x}$ is given by
 $$
    \| \mathbf{b} - A \mathbf{x} \| = \|\mathbf{r}(\mathbf{x})\|,
 $$
 We wish to find a vector $\mathbf{x} \in \mathbb{R}^n$ for which $\|\mathbf{r}(\mathbf{x})\|$ will be a minimum. A solution $\mathbf{\hat x}$ that minimizes $\|\mathbf{r}(\mathbf{x})\|$ is a *least squares solution* of the system $A \mathbf{x} = \mathbf{b}$. Do note that minimizing $\|\mathbf{r}(\mathbf{x})\|$ is equivalent to minimizing $\|\mathbf{r}(\mathbf{x})\|^2$.
 > *Theorem 7*: let $S$ be a subspace of $\mathbb{R}^m$. For each $b \in \mathbb{R}^m$, there exists a unique $\mathbf{p} \in S$ that suffices
 >
 > $$
 >   \|\mathbf{b} - \mathbf{s}\| > \|\mathbf{b} - \mathbf{p}\|,
 > $$
 >
 > for all $\mathbf{s} \in S\backslash\{\mathbf{p}\}$ and $\mathbf{b} - \mathbf{p} \in S^\perp$. 
 ??? note "*Proof*:"
    Will be added later.
 If $\mathbf{p} = A \mathbf{\hat x}$ in $R(A)$ that is closest to $\mathbf{b}$ then it follows that
 $$
    \mathbf{b} - \mathbf{p} = \mathbf{b} - A \mathbf{x} = \mathbf{r}(\mathbf{\hat x}),
 $$
 must be an element of $R(A)^\perp$. Thus, $\mathbf{\hat x}$ is a solution to the least squares problem if and only if 
 $$
    \mathbf{r}(\mathbf{\hat x}) \in R(A)^\perp = N(A^T).
 $$
 Thus to solve for $\mathbf{\hat x}$ we have the *normal equations* given by
 $$
    A^T A \mathbf{x} = A^T \mathbf{b}. 
 $$
 Uniqueness of $\mathbf{\hat x}$ can be obtained if $A^T A$ is nonsingular which will be posed in the following theorem.
 > *Theorem 8*: let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix with rank $n$, then $A^T A$ is nonsingular. 
 ??? note "*Proof*:"
    Let $A \in \mathbb{R}^{m \times n}$ be an $m \times n$ matrix with rank $n$. Let $\mathbf{v}$ be a solution of 
    $$
        A^T A \mathbf{x} = \mathbf{0},
    $$
    then $A \mathbf{v} \in N(A^T)$, but we also have that $A \mathbf{v} \in R(A) = N(A^T)^\perp$. Since $N(A^T) \cap N(A^T)^\perp = \{\mathbf{0}\}$ it follows that 
    $$
        A\mathbf{v} = \mathbf{0},
    $$
    so $\mathbf{v} = \mathbf{0}$ by the nonsingularity of $A$. 
 It follows that 
 $$
    \mathbf{\hat x} = (A^T A)^{-1} A^T \mathbf{b},
 $$
 is the unique solution of the normal equations for $A$ nonsingular and consequently, the unique least squares solution of the system $A \mathbf{x} = \mathbf{b}$. 
--- a/docs/mathematics/linear-algebra/systems-of-linear-equations.md
+++ b/docs/mathematics/linear-algebra/systems-of-linear-equations.md
@ -0,0 +1,115 @@
 # Systems of linear equations
 > *Definition*: a *linear equation* in $n$ unknowns is an equation of the form
 >
 > $$
 >   a_1 x_1 + a_2 x_2 + \dots + a_n x_n = b,
 > $$
 >
 > with $a_i, b \in \mathbb{C}$ the constants and $x_i \in \mathbb{C}$ the variables for $i \in \{1, \dots, n\}$. 
 > 
 > A *linear system* of $m$ equations in $n$ unknowns is then a $m \times n$ system of the form
 >
 > $$
 > \begin{align*}
 >   &a_{11} x_1 + a_{12} x_2 + \dots + a_{1n} x_n = b_1, \\
 >   &a_{21} x_1 + a_{22} x_2 + \dots + a_{2n} x_n = b_2, \\
 >   &\vdots \\
 >   &a_{m1} x_1 + a_{m2} x_2 + \dots + a_{mn} x_n = b_m,
 > \end{align*}
 > $$
 >
 > with $a_{ij}, b_i \in \mathbb{C}$ for $i \in \{1, \dots, n\}$ and $j \in \{1, \dots, m\}$. 
 A system of linear equations may have one solution, no solution or infinitely many solutions. Think of two lines in euclidean space that may intersect at one point (one solution), are parellel (no solution) or are the same line (infinitely many solutions). If the system has at least one solution that it may be referred to as consistent if it has not than it may be referred to as inconsistent.
 > *Definition*: two systems of equations involving the same variables are to be **equivalent** if they have the same solution set.
 A system may be transformed into an equivalent system by
 1. changing the order of the equations,
 2. multiplying an equation by a non-zero number,
 3. and adding a multiple of an equation to another equation.
 > *Definition*: a linear system is said to be *overdetermined* if there are more equations than unknows. A linear system is said to be *underdetermined* if the opposite is true, there are fewer equations than unknowns.
 Overdetermined systems are usually inconsistent and a consistent underdetermined system has always infinitely many solutions.
 > *Definition*: a $n \times n$ system is said to be in **strict triangular form** if in the $k$th equation the coefficients of the first $k-1$ variables are all zero and the coefficient of $x_k$ is nonzero for $k \in \{1, \dots, n\}$ with $n \in \mathbb{N}$. 
 For example the system given by
 $$
 \begin{align*}
    3x_1 + 2x_2 + x_3 &= 1, \\
    x_2 - x_3 &= 2, \\
    2x_3 &= 4,
 \end{align*}
 $$
 with $x_i \in \mathbb{C}$ for $i \in \{1,2,3\}$ is in strict triangular form. This system can be solved with *back substitution* by finding $x_3 = 2$, then $x_2 = 4$ and $x_1 = -3$. 
 A $m \times n$ system of equations may be represented by a augmented matrix of the form
 $$
 \left( \begin{array}{cccc|c} a_{11} & a_{12} & \cdots & a_{1n} & b_1 \\ a_{21} & a_{22} & \cdots & a_{2n} & b_2 \\ \vdots & \vdots & & \vdots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} & b_m\end{array} \right)
 $$
 with $a_{ij}, b_i \in \mathbb{C}$ for $i \in \{1, \dots, n\}$ and $j \in \{1, \dots, m\}$. 
 It may be solved using the following elementary row operations
 1. interchange two rows,
 2. multiply a row by a nonzero real number,
 3. and replace a row by its sum with a multiple of another row.
 based of the equivalence transformations. 
 ## Row echelon form
 > *Definition*: a matrix is said to be in **row echelon form**
 >
 > * if the first nonzero entry in each nonzero row is 1, the pivots.
 > * if row $k$ does not consist entirely of zeros, the number of leading zero entries in row $k+1$ is greater than the number of leading zero entries in row $k$.
 > * if there are rows whose entries are all zero, they are below the rows having nonzero entries.
 For example the following matrices are in row echelon form:
 $$
    \begin{pmatrix} 1 & 4 & 2 \\ 0 & 1 & 3 \\ 0 & 0 & 1\end{pmatrix}, \qquad \begin{pmatrix} 1 & 2 & 3 \\ 0 & 0 & 1 \\ 0 & 0 & 0\end{pmatrix}, \qquad \begin{pmatrix} 1 & 3 & 1 & 0 \\ 0 & 0 & 1 & 3 \\ 0 & 0 & 0 & 0\end{pmatrix}.
 $$
 > *Definition*: the process of using row operations 1, 2 and 3 to transform a linear system into one whose augmented matrix is in row echelon form is called **Gaußian elimination**. Obtaining a reduced matrix. Where the variables corresponding to the pivots of reduced matrix will be referred to as *lead variables* and the variables corresponding to the columns skipped in the process will be referred to as *free variables*.
 ## Reduced row echelon form
 > *Definition*: a matrix is said to be in **reduced row echelon form**
 >
 > * if the matrix is in row echelon form.
 > * if the first nonzero entry in each row is the only nonzero entry in its column.
 For example the following matrices are in reduced row echelon form:
 $$
    \begin{pmatrix}
        1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1
    \end{pmatrix}, \qquad \begin{pmatrix}
        1 & 0 & 0 & 3 \\ 0 & 1 & 0 & 2 \\ 0 & 0 & 1 & 1
    \end{pmatrix}, \qquad \begin{pmatrix}
        0 & 1 & 2 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0
    \end{pmatrix}.
 $$
 The process of using elementary row operations to transform a matrix into reduced row echelon form is called *Gauß-Jordan reduction*.
 ## Homogeneous systems
 > *Definition*: a system of linear equations is said to be *homogeneous* if the constants on the righthand side are all zero.
 Homogeneous are always consistent. Due to their trivial solution; setting all the variables equal to zero.
 > *Theorem*: an $m \times n$ homogeneous system of linear equations has a nontrivial solution if $n > m$. 
 ??? note "*Proof*:"
    Since a homogeneous system is always consistent, the row echelon form of the matrix can have at most $m$ nonzero rows. Thus there are at most $m$ lead variables. Since there are $n$ variables altogether $n > m$ there must be some free variables. The free variables can be assigned arbitrary values. For each assignment of values to the free variables, there is a solution of the system.
--- a/docs/mathematics/linear-algebra/tensors/tensor-formalism.md
+++ b/docs/mathematics/linear-algebra/tensors/tensor-formalism.md
@ -0,0 +1,242 @@
 # Tensor formalism
 We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n$ and a corresponding dual space $V^*$ with a basis $\{\mathbf{\hat e}^i\}.$ In the following sections we make use of the Einstein summation convention introduced in [vector analysis](/en/physics/mathematical-physics/vector-analysis/curvilinear-coordinates/) and $\mathbb{K} = \mathbb{R} \lor\mathbb{K} = \mathbb{C}.$
 ## Definition
 > *Definition 1*: a **tensor** is a multilinear mapping of the type 
 >
 > $$
 >   \mathbf{T}: \underbrace{V^* \times \dots \times V^*}_p \times \underbrace{V \times \dots \times V}_q \to \mathbb{K},
 > $$
 >
 > with $p, q \in \mathbb{N}$. Tensors are collectively denoted as 
 >
 > $$
 >   \mathbf{T} \in \underbrace{V \otimes \dots \otimes V}_p \otimes \underbrace{V^* \otimes \dots \otimes V^*}_q = \mathscr{T}_q^p(V),
 > $$
 >
 > with $\mathscr{T}_0^0(V) = \mathbb{K}$. 
 We refer to $\mathbf{T} \in \mathscr{T}_q^p(V)$ as a $(p, q)$-tensor; a mixed tensor of **contravariant rank** $p$ and **covariant rank** $q.$ It may be observed that we have $\dim \mathscr{T}_q^p (V) = n^{p+q}$ with $\dim V = n \in \mathbb{N}$.  
 It follows from definition 1 and by virtue of the isomorphism between $V^{**}$ and $V$ that $\mathbf{T} \in \mathscr{T}_1^0(V) = V^*$ is a covector and $\mathbf{T} \in \mathscr{T}_0^1(V) = V$ is a vector.
 ## Kronecker tensor
 > *Definition 2*: let $\mathbf{k} \in \mathscr{T}_1^1(V)$ be the **Kronecker tensor** be defined such that
 >
 > $$
 >   \mathbf{k}(\mathbf{\hat e}^i, \mathbf{e}_j) = \delta^i_j,
 > $$
 >
 > with $\delta_j^i$ the Kronecker symbol.
 Let $\mathbf{\hat u} = u_i \mathbf{\hat e}^i \in V^*$ and $\mathbf{v} = v^j \mathbf{e}_j \in V$ then the tensor properties and the definition of the Kronecker tensor imply that 
 $$
 \begin{align*}
    \mathbf{k}(\mathbf{\hat u}, \mathbf{v}) &= \mathbf{k}(u_i \mathbf{\hat e}^i, v^j \mathbf{e}_j), \\
                                            &= u_i v^j \mathbf{k}(\mathbf{\hat e}^i, \mathbf{e}_j), \\
                                            &= u_i v^j \delta^i_j, \\
                                            &= u_i v^i. 
 \end{align*}
 $$
 ## Outer product
 > *Definition 3*: the outer product $f \otimes g: X \times Y \to \mathbb{K}$ of two scalar functions $f: X \to \mathbb{K}$ and $g: Y \to \mathbb{K}$ is defined as
 >
 > $$
 >   (f \otimes g)(x,y) = f(x) g(y),
 > $$
 >
 > for all $(x,y) \in X \times Y$. 
 The outer product is associative and distributive with respect to addition and scalar multiplication, but not commutative. 
 Note that although the same symbol is used for the outer product and the denotation of a tensor space, these are not equivalent.
 The following statements are given with $p=q=r=s=1$ without loss of generality.
 > *Definition 4*: the mixed $(p, q)$-tensor $\mathbf{e}_i \otimes \mathbf{\hat e}^j \in \mathscr{T}_q^p(V)$ is defined as
 >
 > $$
 >   (\mathbf{e}_i \otimes \mathbf{\hat e}^j)(\mathbf{\hat u}, \mathbf{v}) = \mathbf{k}(\mathbf{\hat u}, \mathbf{e}_i) \mathbf{k}(\mathbf{\hat e}^j, \mathbf{v}),
 > $$
 >
 > for all $(\mathbf{\hat u}, \mathbf{v}) \in V^* \times V$. 
 From this definition the subsequent theorem follows naturally. 
 > *Theorem 1*: let $\mathbf{T} \in \mathscr{T}_q^p(V)$ be a tensor, then there exists **holors** $T_j^i \in \mathbb{K}$ such that
 >
 > $$
 >   \mathbf{T} = T^i_j \mathbf{e}_i \otimes \mathbf{\hat e}^j,
 > $$
 >
 > with $T^i_j = \mathbf{T}(\mathbf{\hat e}^i, \mathbf{e}_j)$. 
 ??? note "*Proof*:"
    Let $\mathbf{T} \in \mathscr{T}_q^p(V)$ such that
    $$
    \begin{align*}
        \mathbf{T}(\mathbf{\hat e}^i, \mathbf{e}_j) &= T^k_l (\mathbf{e}_k \otimes \mathbf{\hat e}^l)(\mathbf{\hat e}^i, \mathbf{e}_j), \\
        &= T^k_l \mathbf{k}(\mathbf{\hat e}^i, \mathbf{e}_k) \mathbf{k}(\mathbf{\hat e}^l,\mathbf{e}_j), \\
        &= T^k_l \delta^i_k \delta^l_j, \\
        &= T^i_j.
    \end{align*}
    $$
 For $\mathbf{T} \in \mathscr{T}^0_q(V)$ it follows that there exists holors $T_i \in \mathbb{K}$ such that $\mathbf{T} = T_i \mathbf{\hat e}^i$ with $T_i = \mathbf{T}(\mathbf{e}_i)$, are referred to as the **covariant components** of $\mathbf{T}$ relative to a basis $\{\mathbf{e}_i\}$. 
 For $\mathbf{T} \in \mathscr{T}^p_0(V)$ it follows that there exists holors $T^i \in \mathbb{K}$ such that $\mathbf{T} = T^i \mathbf{e}_i$ with $T^i = \mathbf{T}(\mathbf{\hat e}^i)$, are referred to as the **contravariant components** of $\mathbf{T}$ relative to a basis $\{\mathbf{e}_i\}$. 
 If $\mathbf{T} \in \mathscr{T}^p_q(V)$, it follows that there exists holors $T^i_j \in \mathbb{K}$ are coined the **mixed components** of $\mathbf{T}$ relative to a basis $\{\mathbf{e}_i\}$. 
 By definition tensors are basis independent. Holors are basis dependent.
 > *Theorem 2*: let $\mathbf{S} \in \mathscr{T}^p_q(V)$ and $\mathbf{T} \in \mathscr{T}^r_s(V)$ be tensors with
 >
 > $$
 >   \mathbf{S} = S^i_j \mathbf{e}_i \otimes \mathbf{\hat e}^j \quad \land \quad \mathbf{T} = T^r_s \mathbf{e}_r \otimes \mathbf{\hat e}^s, 
 > $$
 >
 > then the outer product of $\mathbf{S}$ and $\mathbf{T}$ is given by
 >
 > $$
 >   \mathbf{S} \otimes \mathbf{T} = S^i_j T^k_l \mathbf{e}_i \otimes \mathbf{e}_k \otimes \mathbf{\hat e}^j \otimes \mathbf{\hat e}^l,
 > $$
 >
 > with $\mathbf{S} \otimes \mathbf{T} \in \mathscr{T}^{p+r}_{q+s}(V)$. 
 ??? note "*Proof*:"
    Let $\mathbf{S} \in \mathscr{T}^p_q(V)$ and $\mathbf{T} \in \mathscr{T}^r_s(V)$ with 
    $$
        \mathbf{S} = S^i_j \mathbf{e}_i \otimes \mathbf{\hat e}^j \quad \land \quad \mathbf{T} = T^r_s \mathbf{e}_r \otimes \mathbf{\hat e}^s, 
    $$
    then
    $$
    \begin{align*}
        \mathbf{S} \otimes \mathbf{T} &= S^i_j (\mathbf{e}_i \otimes \mathbf{\hat e}^j) \otimes T^r_s (\mathbf{e}_r \otimes \mathbf{\hat e}^s), \\
        &= S^i_j T^r_s \mathbf{e}_i \otimes \mathbf{e}_r \otimes \mathbf{\hat e}^j \otimes \mathbf{\hat e}^s.
    \end{align*}
    $$
    Which maps two vectors and two covectors, therefore $\mathbf{S} \otimes \mathbf{T} \in \mathscr{T}^{p+r}_{q+s}(V)$. 
 We have from theorem 2 that the outer product of two tensors yields another tensor, with ranks adding up.
 ## Inner product
 > *Definition 5*: an **inner product** on $V$ is a bilinear mapping $\bm{g}: V \times V \to \mathbb{K}$ which satisfies
 >
 > 1. for all $\mathbf{u}, \mathbf{v} \in V: \; \bm{g}(\mathbf{u}, \mathbf{v}) = \overline{\bm{g}}(\mathbf{v}, \mathbf{u}),$ 
 > 2. for all $\mathbf{u}, \mathbf{v}, \mathbf{w} \in V$ and $\lambda, \mu \in \mathbb{K}: \;\bm{g}(\mathbf{u}, \lambda \mathbf{v} + \mu \mathbf{w}) = \lambda \bm{g}(\mathbf{u}, \mathbf{v}) + \mu \bm{g}(\mathbf{u}, \mathbf{w}),$
 > 3. for all $\mathbf{u} \in V\backslash \{\mathbf{0}\}: \bm{g}(\mathbf{u},\mathbf{u}) > 0,$
 > 4. for $\mathbf{u} = \mathbf{0} \iff \bm{g}(\mathbf{u},\mathbf{u}) = 0.$
 It may be observed that $\bm{g} \in \mathscr{T}_2^0$. Unlike the Kronecker tensor, the existence of an inner product is never implied. 
 > *Definition 6*: let $G$ be the Gram matrix with its components $G \overset{\text{def}}= (g_{ij})$ defined as 
 > 
 > $$
 >   g_{ij} = \bm{g}(\mathbf{e}_i, \mathbf{e}_j).
 > $$
 For $\mathbf{u} = u^i \mathbf{e}_i, \mathbf{v} = v^j \mathbf{e}_j \in V$ we then have
 $$
 \begin{align*}
    \bm{g}(\mathbf{u}, \mathbf{v}) &= \bm{g}(u^i \mathbf{e}_i, v^j \mathbf{e}_j), \\
                                   &= u^i v^j \bm{g}(\mathbf{e}_i, \mathbf{e}_j), \\
                                   &\overset{\text{def}}= u^i v^j g_{ij}.
 \end{align*}
 $$
 > *Proposition 1*: the Gram matrix $G$ is symmetric and nonsingular such that
 >
 > $$
 >   g^{ik} g_{kj} = \delta^i_j,
 > $$
 >
 > with $G^{-1} \overset{\text{def}}= (g^{ij})$. 
 ??? note "*Proof*:"
    Let $G$ be the Gram matrix, symmetry of $G$ follows from defintion 5. Suppose that $G$ is singular, then there exists $\mathbf{u} = u^i \mathbf{e}_i \in V \backslash \{\mathbf{0}\}$ such that $G \mathbf{u} = \mathbf{0} \implies u^i g_{ij} = 0$, as a result we find that 
    $$
        \forall \mathbf{v} = v^j \mathbf{e}_j \in V: 0 = u^i g_{ij} v^j = u^i \bm{g}(\mathbf{e}_i, \mathbf{e}_j) v^j = \bm{g}(u^i \mathbf{e}_i, v^j \mathbf{e}_j) = \bm{g}(\mathbf{u}, \mathbf{v}),
    $$
    which contradicts the non-degeneracy of the pseudo inner product in definition 5.  
 > *Theorem 3*: there exists a bijective linear map $\mathbf{g}: V \to V^*$ with inverse $\mathbf{g}^{-1}$ such that
 >
 > 1. $\forall \mathbf{u}, \mathbf{v} \in V: \; \bm{g}(\mathbf{u}, \mathbf{v}) = \mathbf{k}(\mathbf{g}(\mathbf{u}), \mathbf{v})$, 
 > 2. $\forall \mathbf{\hat u} \in V^*, \mathbf{v} \in V: \; \bm{g}(\mathbf{g}^{-1}(\mathbf{\hat u}), \mathbf{v}) = \mathbf{k}(\mathbf{\hat u}, \mathbf{v})$,
 >
 > with $\mathbf{g}(\mathbf{v}) = G \mathbf{v}$ for all $\mathbf{v} \in V$. 
 ??? note "*Proof*:"
    Let $\mathbf{u} \in V$ and let $\mathbf{\hat u} \in V^*$, suppose $\mathbf{\hat u}: \mathbf{v} \mapsto \bm{g}(\mathbf{u}, \mathbf{v})$ then we may define $\mathbf{g}: V \to V^*: \mathbf{u} \mapsto \mathbf{g}(\mathbf{u}) \overset{\text{def}} = \mathbf{\hat u}$. 
    Let $\mathbf{v} \in V \backslash \{\mathbf{0}\}: \mathbf{g}(\mathbf{v}) = \mathbf{0}$, then
    $$
        0 = \mathbf{k}(\mathbf{g}(\mathbf{v}), \mathbf{w}) \overset{\text{def}} = \bm{g}(\mathbf{v}, \mathbf{w}),
    $$
    for all $\mathbf{w} \in V$, which contradicts the non-degeneracy of the pseude inner product in definition 5. Hence $\mathbf{g}$ is injective, since $\dim V$ is finite $\mathbf{g}$ is also bijective.
    Let $\mathbf{u} = u^i \mathbf{e}_i, \mathbf{v} = v^j \mathbf{e}_j \in V$ and define $\mathbf{g}(\mathbf{e}_i) = \text{g}_{ij} \mathbf{\hat e}^j$ such that
    $$
        \mathbf{k}(\mathbf{g}(\mathbf{u}), \mathbf{v}) \overset{\text{def}} = \bm{g}(\mathbf{u}, \mathbf{v}) = g_{ij} u^i v^j,
    $$
    but also
    $$
        \mathbf{k}(\mathbf{g}(\mathbf{u}), \mathbf{v}) = \text{g}_{ij} u^i v^k\mathbf{k}(\mathbf{\hat e}^j, \mathbf{e}_k) = \text{g}_{ij} u^i v^k \delta^j_k = \text{g}_{ij} u^i v^j. 
    $$
    Since $u^i, v^j \in \mathbb{K}$ are arbitrary it follows that $\text{g}_{ij} = g_{ij}$. 
 Consequently, the inverse $\mathbf{g}^{-1}: V^* \to V$ has the property $\mathbf{g}^{-1}(\mathbf{\hat u}) = G^{-1} \mathbf{\hat u}$ for all $\mathbf{\hat u} \in V^*$. The bijective linear map $\mathbf{g}$ is commonly known as the **metric** and $\mathbf{g}^{-1}$ as the **dual metric**. 
 It follows from theorem 3 that for $\mathbf{u} = u^i \mathbf{e}_i \in V$ and $\mathbf{\hat u} = u_i \mathbf{\hat e}^i \in V^*$ we have
 $$
    \mathbf{g}(\mathbf{u}) = g_{ij} u^i \mathbf{\hat e}^j = u_j \mathbf{\hat e}^j = \mathbf{\hat u},
 $$
 with $u_j = g_{ij} u^i$ and
 $$
    \mathbf{g}^{-1}(\mathbf{\hat u}) = g^{ij} u_i \mathbf{e}_j = u^j \mathbf{e}_j = \mathbf{u},
 $$
 with $u^j = g^{ij} u_i$. 
 > *Definition 7*: the basis $\{\mathbf{e}_i\}$ of $V$ induces a **reciprocal basis** $\{\mathbf{g}^{-1}(\mathbf{\hat e}^i)\}$ of $V$ given by 
 > 
 > $$
 > \mathbf{g}^{-1}(\mathbf{\hat e}^i) = g^{ij} \mathbf{e}_j.
 > $$
 >
 > Likewise, the basis $\{\mathbf{\hat e}^i\}$ of $V^*$ induces a **reciprocal dual basis** $\{\mathbf{g}(\mathbf{e}_i)\}$ of $V^*$ given by
 >
 > $$
 >   \mathbf{g}(\mathbf{e}^i) = g_{ij} \mathbf{\hat e}^j.
 > $$
 So far, a vector space $V$ and its associated dual space $V^*$ have been introduced as a priori independent entities. An inner product provides us with an explicit mechanism to construct a bijective linear mapping associated with each vector by virtue of the metric. 
--- a/docs/mathematics/linear-algebra/tensors/tensor-symmetries.md
+++ b/docs/mathematics/linear-algebra/tensors/tensor-symmetries.md
@ -0,0 +1,193 @@
 # Tensor symmetries
 We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n,$ a corresponding dual space $V^*$ with a basis $\{\mathbf{\hat e}^i\}$ and a pseudo inner product $\bm{g}$ on $V.$
 ## Symmetric tensors
 > *Definition 1*: let $\pi = [\pi(1), \dots, \pi(k)]$ be any permutation of the set $\{1, \dots, k\}$, then $\mathbf{T} \in \mathscr{T}^0_q(V)$ is a **symmetric covariant** $q$-tensor if for all $\mathbf{v}_1, \dots, \mathbf{v}_q \in V$ we have
 >
 > $$
 >   \mathbf{T}(\mathbf{v}_{\pi(1)}, \dots, \mathbf{v}_{\pi(q)}) = \mathbf{T}(\mathbf{v}_1, \dots, \mathbf{v}_q),
 > $$
 >
 > with $k = q \in \mathbb{N}$.
 > 
 > Likewise, $\mathbf{T} \in \mathscr{T}^p_0(V)$ is a **symmetric contravariant** $p$-tensor if for all $\mathbf{\hat u}_1, \dots, \mathbf{\hat u}_p \in V^*$ we have
 >
 > $$
 >   \mathbf{T}(\mathbf{\hat u}_{\pi(1)}, \dots, \mathbf{\hat u}_{\pi(p)}) = \mathbf{T}(\mathbf{\hat u}_1, \dots, \mathbf{\hat u}_p),
 > $$
 >
 > with $k = p \in \mathbb{N}$. 
 This symmetry implies that the ordering of the (co)vector arguments in a tensor evaluation do not affect the outcome. 
 > *Definition 2*: the vector space of symmetric covariant $q$-tensors is denoted by $\bigvee_q(V) \subset \mathscr{T}^0_q(V)$ and the vector space of symmetric contravariant $p$-tensors is denoted by $\bigvee^p(V) \subset \mathscr{T}^p_0(V).$
 Alternatively one may write $\bigvee_q(V) = V^* \otimes_s \cdots \otimes_s V^*$ and $\bigvee^p(V) = V \otimes_s \cdots \otimes_s V.$
 ## Antisymmetric tensors
 > *Definition 3*: let $\pi = [\pi(1), \dots, \pi(k)]$ be any permutation of the set $\{1, \dots, k\}$, then $\mathbf{T} \in \mathscr{T}^0_q(V)$ is an **antisymmetric covariant** $q$-tensor if for all $\mathbf{v}_1, \dots, \mathbf{v}_q \in V$ we have
 >
 > $$
 >   \mathbf{T}(\mathbf{v}_{\pi(1)}, \dots, \mathbf{v}_{\pi(q)}) = \mathrm{sign}(\pi) \mathbf{T}(\mathbf{v}_1, \dots, \mathbf{v}_q),
 > $$
 >
 > with $k = q \in \mathbb{N}$.
 >
 > Likewise, $\mathbf{T} \in \mathscr{T}^p_0(V)$ is an **antisymmetric contravariant** $p$-tensor if for all $\mathbf{\hat u}_1, \dots, \mathbf{\hat u}_p \in V^*$ we have
 >
 > $$
 >   \mathbf{T}(\mathbf{\hat u}_{\pi(1)}, \dots, \mathbf{\hat u}_{\pi(p)}) = \mathrm{sign}(\pi)\mathbf{T}(\mathbf{\hat u}_1, \dots, \mathbf{\hat u}_p),
 > $$
 >
 > with $k = p \in \mathbb{N}$. 
 This antisymmetry implies that the ordering of the (co)vector arguments in a tensor evaluation only change the sign of the outcome. 
 > *Definition 4*: the vector space of antisymmetric covariant $q$-tensors is denoted by $\bigwedge_q(V) \subset \mathscr{T}^0_q(V)$ and the vector space of antisymmetric contravariant $p$-tensors is denoted by $\bigwedge^p(V) \subset \mathscr{T}^p_0(V).$
 Alternatively one may write $\bigwedge_q(V) = V^* \otimes_a \cdots \otimes_a V^*$ and $\bigwedge^p(V) = V \otimes_a \cdots \otimes_a V.$
 It follows from the definitions of symmetric and antisymmetric tensors that for $0$-tensors we have
 $$
    {\bigvee}_0(V) = {\bigvee}^0(V) = {\bigwedge}_0(V) = {\bigwedge}^0(V) = \mathbb{K}.
 $$
 Furthermore, for $1$-tensors we have
 $$
    {\bigvee}_1(V) = {\bigwedge}_1(V) = V^*,
 $$
 and
 $$
    {\bigvee}^1(V) = {\bigwedge}^1(V) = V.
 $$
 ## Symmetrisation maps
 The following statements are given with the covariant $q$-tensor without loss of generality.
 > *Definition 5*: the linear **symmetrisation map** $\mathscr{S}: \mathscr{T}^0_q \to \bigvee_q(V)$ is given by
 >
 > $$
 >   \mathscr{S}(\mathbf{T})(\mathbf{v}_1, \dots, \mathbf{v}_q) = \frac{1}{q!} \sum_\pi \mathbf{T}(\mathbf{v}_{\pi(1)}, \dots, \mathbf{v}_{\pi(q)}), 
 > $$
 >
 > for all $\mathbf{T} \in \mathscr{T}^0_q(V)$ in which summation runs over all permutations $\pi$ of the set $\{1, \dots, q\}$.
 Let $\mathbf{T} = T_{i_1 \cdots i_q} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q} \in \mathscr{T}^0_q(V)$, then we have $\mathscr{S}(\mathbf{T}) = T_{(i_1 \cdots i_q)} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q} \in \bigvee_q(V)$ with
 $$
    T_{(i_1 \cdots i_q)} = \frac{1}{q!} \sum_\pi T_{i_{\pi(1)} \cdots i_{\pi(q)}}. 
 $$
 If $\mathbf{T} \in \bigvee_q(V)$ then $\mathbf{T} = \mathscr{S}(\mathbf{T})$. The symmetrisation map is idempotent such that $\mathscr{S} \circ \mathscr{S} = \mathscr{S}.$
 > *Definition 6*: the linear **antisymmetrisation map** $\mathscr{A}: \mathscr{T}^0_q(V) \to \bigwedge_q(V)$ is given by
 >
 > $$
 >   \mathscr{A}(\mathbf{T})(\mathbf{v}_1, \dots, \mathbf{v}_q) = \frac{1}{q!} \sum_\pi \mathrm{sign}(\pi) \mathbf{T}(\mathbf{v}_{\pi(1)}, \dots, \mathbf{v}_{\pi(q)}), 
 > $$
 >
 > for all $\mathbf{T} \in \mathscr{T}^0_q(V)$ in which summation runs over all permutations $\pi$ of the set $\{1, \dots, q\}$.
 Let $\mathbf{T} = T_{i_1 \cdots i_q} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q} \in \mathscr{T}^0_q(V)$, then we have $\mathscr{A}(\mathbf{T}) = T_{[i_1 \cdots i_q]} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q} \in \bigwedge_q(V)$ with
 $$
    T_{[i_1 \cdots i_q]} = \frac{1}{q!} \sum_\pi \mathrm{sign}(\pi) T_{i_{\pi(1)} \cdots i_{\pi(q)}}.
 $$
 If $\mathbf{T} \in \bigwedge_q(V)$ then $\mathbf{T} = \mathscr{A}(\mathbf{T})$. The antisymmetrisation map is idempotent such that $\mathscr{A} \circ \mathscr{A} = \mathscr{A}.$
 ## Symmetric product
 The outer product does not preserve (anti)symmetry. For this reason alternative product operators are introduced which preserve (anti)symmetry. The following statements are given with covariant tensors without loss of generality.
 > *Definition 7*: the **symmetric product** between two tensors is defined as
 >
 > $$
 >   \mathbf{T} \vee \mathbf{S} = (q+s)! \cdot \mathscr{S}(\mathbf{T} \otimes \mathbf{S}),
 > $$
 >
 > for all $\mathbf{T} \in \mathscr{T}^0_q(V)$ and $\mathbf{S} \in \mathscr{T}^0_s(V)$ with $q,s \in \mathbb{N}$.
 It follows from definition 7 that the symmetric product is associative, bilinear and symmetric. Subsequently, we may write a basis of $\bigvee_q(V)$ as
 $$
    \mathscr{S}(\mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q}) = \frac{1}{q!} \mathbf{\hat e}^{i_1} \vee \cdots \vee \mathbf{\hat e}^{i_q},
 $$
 with $\{1 \leq i_1 \leq \dots \leq i_q \leq n\}$. 
 Let $\mathbf{T} \in \bigvee_q(V)$ and $\mathbf{S} \in \bigvee_s(V)$ then it follows that
 $$
    \mathbf{T} \vee \mathbf{S} = \mathbf{S} \vee \mathbf{T}. 
 $$
 > *Definition 8*: the **antisymmetric product** between two tensors is defined as
 >
 > $$
 >   \mathbf{T} \wedge \mathbf{S} = (q+s)! \cdot \mathscr{A}(\mathbf{T} \otimes \mathbf{S}),
 > $$
 >
 > for all $\mathbf{T} \in \mathscr{T}^0_q(V)$ and $\mathbf{S} \in \mathscr{T}^0_s(V)$ with $q,s \in \mathbb{N}$.
 It follows from definition 8 that the antisymmetric product is associative, bilinear and antisymmetric. Subsequently, we may write a basis of $\bigwedge_q(V)$ as
 $$
    \mathscr{A}(\mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_q}) = \frac{1}{q!} \mathbf{\hat e}^{i_1} \wedge \cdots \wedge \mathbf{\hat e}^{i_q},
 $$
 with $\{1 \leq i_1 < \dots < i_q \leq n\}$. 
 Let $\mathbf{T} \in \bigwedge_q(V)$ and $\mathbf{S} \in \bigwedge_s(V)$ then it follows that
 $$
    \mathbf{T} \wedge \mathbf{S} = (-1)^{qs} \mathbf{S} \wedge \mathbf{T}. 
 $$
 > *Theorem 1*: the dimension of the vector space of symmetric covariant $q$-tensors is given by 
 > 
 > $$
 >   \dim \Big({\bigvee}_q(V) \Big) = \binom{n+q-1}{q},
 > $$
 >
 > and for antisymmetric covariant $q$-tensors the dimension is given by
 >
 > $$
 >   \dim \Big({\bigwedge}_q(V) \Big) = \binom{n}{q}.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 An interesting result of the definition of the symmetric and antisymmetric product is given in the theorem below.
 > *Theorem 2*: let $\mathbf{\hat u}_{1,2} \in V^*$ be covectors, the symmetric product of $\mathbf{\hat u}_1$ and $\mathbf{\hat u}_2$ may be given by
 >
 > $$
 >   (\mathbf{\hat u}_1 \vee \mathbf{\hat u}_2)(\mathbf{v}_1, \mathbf{v}_2) = \mathrm{perm}\big(\mathbf{k}(\mathbf{\hat u}_i, \mathbf{v}_j)\big),
 > $$
 >
 > for all $(\mathbf{v}_1, \mathbf{v}_2) \in V \times V$ with $(i,j)$ denoting the entry of the matrix over which the permanent is taken. 
 >
 > The antisymmetric product of $\mathbf{\hat u}_1$ and $\mathbf{\hat u}_2$ may be given by
 >
 > $$
 >   (\mathbf{\hat u}_1 \wedge \mathbf{\hat u}_2)(\mathbf{v}_1, \mathbf{v}_2) = \det \big(\mathbf{k}(\mathbf{\hat u}_i, \mathbf{v}_j) \big),
 > $$
 >
 > for all $(\mathbf{v}_1, \mathbf{v}_2) \in V \times V$ with $(i,j)$ denoting the entry of the matrix over which the determinant is taken. 
 ??? note "*Proof*:"
    Will be added later.
 In some literature theorem 2 is used as definition for the symmetric and antisymmetric product from which the relation with the symmetrisation maps can be proven. Either method is valid, however it has been chosen that defining the products in terms of the symmetrisation maps is more general.
--- a/docs/mathematics/linear-algebra/tensors/tensor-transformations.md
+++ b/docs/mathematics/linear-algebra/tensors/tensor-transformations.md
@ -0,0 +1,97 @@
 # Tensor transformations
 We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n,$ a corresponding dual space $V^*$ with a basis $\{\mathbf{\hat e}^i\}_{i=1}^n$ and a pseudo inner product $\bm{g}$ on $V.$ 
 Let us introduce a different basis $\{\mathbf{f}_i\}_{i=1}^n$ of $V$ with a corresponding dual basis $\{\mathbf{\hat f}^i\}_{i=1}^n$ of $V^*$ which are related to the former basis $\{\mathbf{e}_i\}_{i=1}^n$ by
 $$
    \mathbf{f}_j = A^i_j \mathbf{e}_i, 
 $$
 so that $\mathbf{\hat e}^i = A^i_j \mathbf{\hat f}^j$. 
 ## Transformation of tensors
 Recall from the section of [tensor-formalism]() that a holor depends on the chosen basis, but the corresponding tensor itself does not. This implies that holors transform in a particular way under a change of basis, which is characteristic for tensors. 
 > *Theorem 1*: let $\mathbf{T} \in \mathscr{T}^p_q(V)$ be a tensor with $p=q=1$ without loss of generality and $B = A^{-1}$. Then $\mathbf{T}$ may be decomposed into
 >
 > $$
 > \begin{align*}
 >   \mathbf{T} &= T^i_j \mathbf{e}_i \otimes \mathbf{\hat e}^j, \\
 >              &= \overline T^i_j \mathbf{f}_i \otimes \mathbf{\hat f}^j,
 > \end{align*}
 > $$
 >
 > with the holors related by
 >
 > $$
 >   \overline T^i_j = B^i_k A^j_l T^k_l.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 The homogeneous nature of the tensor transformation implies that a holor equation of the form $T^i_j = 0$ holds relative to any basis if it holds relative to a particular one.
 ## Transformation of volume forms
 > *Lemma 1*: let $(V, \bm{\mu})$ be a vector space with an oriented volume form with
 >
 > $$
 > \begin{align*}
 >   \bm{\mu} &= \mu_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_n}, \\
 >   &= \overline \mu_{i_1 \dots i_n} \mathbf{\hat f}^{i_1} \otimes \cdots \otimes \mathbf{\hat f}^{i_n},
 > \end{align*}
 > $$
 >
 > then we have
 >
 > $$
 >   \overline \mu_{j_1 \dots j_n} = A^{i_1}_{j_1} \cdots A^{i_n}_{j_n} \mu_{i_1 \dots i_n} = \mu_{j_1 \dots j_n} \det (A).
 > $$
 ??? note "*Proof*:"
    Will be added later.
 Then $\det(A)$ is the volume scaling factor of the transformation with $A$. So that if $\bm{\mu}(\mathbf{e}_1, \dots, \mathbf{e}_n) = 1$, then $\bm{\mu}(\mathbf{f}_1, \dots, \mathbf{f}_n) = \det(A).$
 > *Theorem 2*: let $(V, \bm{\mu})$ be a vector space with an oriented volume form with
 >
 > $$
 > \begin{align*}
 >   \bm{\mu} &= \mu_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_n}, \\
 >   &= \overline \mu_{i_1 \dots i_n} \mathbf{\hat f}^{i_1} \otimes \cdots \otimes \mathbf{\hat f}^{i_n},
 > \end{align*}
 > $$
 >
 > and if we define 
 >
 > $$
 >   \overline \mu_{i_1 \dots i_n} \overset{\text{def}}{=} \frac{1}{\det (A)} A^{j_1}_{i_1} \cdots A^{j_n}_{i_n} \mu_{j_1 \dots j_n}, 
 > $$
 >
 > then $\mu_{i_1 \dots i_n} = \overline \mu_{i_1 \dots i_n} = [i_1, \dots, i_n]$ is an invariant holor. 
 ??? note "*Proof*:"
    Will be added later.
 ## Transformation of Levi-Civita form
 > *Theorem 3*: let $\bm{\epsilon} \in \bigwedge_n(V)$ be the Levi-Civita tensor with
 >
 > $$
 > \begin{align*}
 >   \bm{\epsilon} &= \epsilon_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_n}, \\
 > &= \overline \epsilon_{i_1 \dots i_n} \mathbf{\hat f}^{i_1} \otimes \cdots \otimes \mathbf{\hat f}^{i_n},
 > \end{align*}
 > $$
 >
 > then $\epsilon_{i_1 \dots i_n} = \overline \epsilon_{i_1 \dots i_n}$ is an invariant holor. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/linear-algebra/tensors/volume-forms.md
+++ b/docs/mathematics/linear-algebra/tensors/volume-forms.md
@ -0,0 +1,131 @@
 # Volume forms
 We have a $n \in \mathbb{N}$ finite dimensional vector space $V$ such that $\dim V = n$, with a basis $\{\mathbf{e}_i\}_{i=1}^n,$ a corresponding dual space $V^*$ with a basis $\{\mathbf{\hat e}^i\}_{i=1}^n$ and a pseudo inner product $\bm{g}$ on $V.$ 
 ## n-forms
 > *Definition 1*: let $\bm{\mu} \in \bigwedge_n(V) \backslash \{\mathbf{0}\}$, if 
 >
 > $$
 >   \bm{\mu}(\mathbf{e}_1, \dots, \mathbf{e}_n) = 1,
 > $$
 >
 > then $\bm{\mu}$ is the **unit volume form** with respect to the basis $\{\mathbf{e}_i\}$. 
 Note that $\dim \bigwedge_n(V) = 1$ and consequently if $\bm{\mu}_1, \bm{\mu}_2 \in \bigwedge_n(V) \backslash \{\mathbf{0}\}$, then $\bm{\mu}_1 = \lambda \bm{\mu}_2$ with $\lambda \in \mathbb{K}$. 
 > *Proposition 1*: the unit volume form $\bm{\mu} \in \bigwedge_n(V) \backslash \{\mathbf{0}\}$ may be given by
 >
 > $$
 > \begin{align*}
 >   \bm{\mu} &= \mathbf{\hat e}^1 \wedge \dots \wedge \mathbf{\hat e}^n, \\
 >            &= \mu_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \dots \otimes \mathbf{\hat e}^{i_n}, 
 > \end{align*}
 > $$
 >
 > with $\mu_{i_1 \dots i_n} = [i_1, \dots, i_n]$. 
 ??? note "*Proof*:"
    Will be added later.
 The normalisation of the unit volume form $\bm{\mu}$ requires a basis. Consequently, the identification $\mu_{i_1 \dots i_n} = [i_1, \dots, i_n]$ holds only relative to the basis.
 > *Definition 2*: let $(V, \bm{\mu})$ denote the vector space $V$ endowed with an **oriented volume form** $\bm{\mu}$. For $\bm{\mu} > 0$ we have a positive orientation of $(V, \bm{\mu})$ and for $\bm{\mu} < 0$ we have a negative orientation of $(V, \bm{\mu})$. 
 For a vector space with an oriented volume $(V, \bm{\mu})$ we may write
 $$
    \bm{\mu} = \mu_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \cdots \otimes \mathbf{\hat e}^{i_n},
 $$
 or, equivalently
 $$
    \bm{\mu} = \mu_{|i_1 \dots i_n|} \mathbf{\hat e}^{i_1} \wedge \cdots \wedge \mathbf{\hat e}^{i_n},
 $$
 by convention, to resolve ambiguity with respect to the meaning of $\mu_{i_1 \dots i_n}$ without using another symbol or extra accents.
 Using theorem 2 in the section of [tensor symmetries]() we may state the following.
 > *Proposition 2*: let $(V, \bm{\mu})$ be a vector space with an oriented volume form, then we have
 >
 > $$
 >   \bm{\mu}(\mathbf{v}_1, \dots, \mathbf{v}_n) = \det \big(\mathbf{k}(\mathbf{\hat e}^i, \mathbf{v}_j) \big), 
 > $$
 >
 > for all $\mathbf{v}_1, \dots, \mathbf{v}_n \in V$ with $(i,j)$ denoting the entry of the matrix over which the determinant is taken. 
 ??? note "*Proof*:"
    Will be added later.
 Which reveals the role of the Kronecker tensor and thus the role of the dual space in the definition of $\bm{\mu}$. We may also conclude that an oriented volume $\bm{\mu} \in \bigwedge_n(V)$ on a vector space $V$ does not require an inner product. 
 From proposition 2 it may also be observed that within a geometrical context the oriented volume form may represent the area of a parallelogram in $n=2$ or the volume of a parallelepiped in $n=3$, span by its basis. 
 ## (n - k)-forms
 > *Definition 3*: let $(V, \bm{\mu})$ be a vector space with an oriented volume form and let $\mathbf{u}_1, \dots, \mathbf{u}_k \in V$ with $k \in \mathbb{N}[k < n]$. Let the $(n-k)$-form $\bm{\mu} \lrcorner \mathbf{u}_1 \lrcorner \dots \lrcorner \mathbf{u}_k \in \bigwedge_{n-k}(V)$ be defined as
 >
 > $$
 >   \bm{\mu} \lrcorner \mathbf{u}_1 \lrcorner \dots \lrcorner \mathbf{u}_k(\mathbf{v}_{k+1}, \dots, \mathbf{v}_n) = \bm{\mu}(\mathbf{u}_1, \dots, \mathbf{u}_k, \mathbf{v}_{k+1}, \dots, \mathbf{v}_n),
 > $$
 >
 > for all $\mathbf{v}_{k+1}, \dots, \mathbf{v}_n \in V$ with $\lrcorner$ the insert operator. 
 It follows that $(n-k)$-form $\bm{\mu} \lrcorner \mathbf{u}_1 \lrcorner \dots \lrcorner \mathbf{u}_k \in \bigwedge_{n-k}(V)$ can be written as
 $$
 \begin{align*}
    \bm{\mu} \lrcorner \mathbf{u}_1 \lrcorner \dots \lrcorner \mathbf{u}_k &= u_1^{i_1} \cdots u_k^{i_k} (\bm{\mu} \lrcorner \mathbf{e}_{i_1} \lrcorner \dots \lrcorner \mathbf{e}_{i_k}), \\
    &= u_1^{i_1} \cdots u_k^{i_k} \mu_{i_1 \dots i_n} (\mathbf{\hat e}^{i_{k+1}} \wedge \cdots \wedge \mathbf{\hat e}^{i_{n}}),
 \end{align*}
 $$
 for $\mathbf{u}_1, \dots, \mathbf{u}_k \in V$ with $k \in \mathbb{N}[k < n]$ and decomposition by $\mathbf{u}_q = u_q^{i_q} \mathbf{e}_{i_q}$ for $q \in \mathbb{N}[q \leq k]$. 
 If we have a unit volume form $\bm{\mu}$ with respect to $\{\mathbf{e}_i\}$ then
 $$
    \bm{\mu}\lrcorner\mathbf{e}_1 \lrcorner \dots \lrcorner \mathbf{e}_k = \mathbf{\hat e}^{i_{k+1}} \wedge \cdots \wedge \mathbf{e}^{i_n},
 $$
 for $k \in \mathbb{N}[k < n]$. 
 ## Levi-Civita form
 > *Definition 4*: let $(V, \bm{\mu})$ be a vector space with a unit volume form with invariant holor. Let $\bm{\epsilon} \in \bigwedge_n(V)$ be the **Levi-Civita tensor** which is the unique unit volume form of positive orientation defined as
 >
 > $$
 >   \bm{\epsilon} = \sqrt{g} \bm{\mu},
 > $$
 >
 > with $g \overset{\text{def}}{=} \det (G)$, the determinant of the [Gram matrix]().
 Therefore, if we decompose the Levi-Civita tensor by
 $$
    \bm{\epsilon} = \epsilon_{i_1 \dots i_n} \mathbf{\hat e}^{i_1} \otimes \dots \otimes \mathbf{\hat e}^{i_n} = \epsilon_{|i_1 \dots i_n|} \mathbf{\hat e}^{i_1} \wedge \dots \wedge \mathbf{\hat e}^{i_n}, 
 $$
 then we have $\epsilon_{i_1 \dots i_n} = \sqrt{g} \mu_{i_1 \dots i_n}$ and $\epsilon_{|i_1 \dots i_n|} = \sqrt{g}$. 
 > *Theorem 2*: let $(V, \bm{\mu})$ be a vector space with a unit volume form with invariant holor. Let $\mathbf{g}(\bm{\epsilon}) \in \bigwedge^n(V)$ be the **reciprocal Levi-Civita tensor** which is given by
 >
 > $$
 >   \mathbf{g}(\bm{\epsilon}) = \frac{1}{\sqrt{g}} \bm{\mu}.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 We may decompose the reciprocal Levi-Civita tensor by
 $$
    \mathbf{g}(\bm{\epsilon}) = \epsilon^{i_1 \dots i_n} \mathbf{e}_{i_1} \otimes \cdots \otimes \mathbf{e}_{i_n} = \epsilon^{|i_1 \dots i_n|} \mathbf{e}_{i_1} \wedge \cdots \wedge \mathbf{e}_{i_n},
 $$
 then we have $\epsilon^{i_1 \dots i_n} = \frac{1}{\sqrt{g}} \mu^{i_1 \dots i_n}$ and $\epsilon^{|i_1 \dots i_n|} = \frac{1}{\sqrt{g}}$. 
--- a/docs/mathematics/linear-algebra/vector-spaces.md
+++ b/docs/mathematics/linear-algebra/vector-spaces.md
@ -0,0 +1,504 @@
 # Vector spaces
 ## Definition
 > *Definition*: a **vector space** $V$ is a set on which the operations of addition and scalar multiplication are defined. Such that for all vectors $\mathbf{u}$ and $\mathbf{v}$ in $V$ the vectors $\mathbf{u} + \mathbf{v}$ are in $V$ and for each scalar $a$ the vector $a\mathbf{v}$ is in $V$. With the following axioms satisfied.
 > 
 > 1. Associativity of vector addition: $\mathbf{u} + (\mathbf{v} + \mathbf{w}) = (\mathbf{u} + \mathbf{v}) + \mathbf{w}$ for any $\mathbf{u},\mathbf{v}, \mathbf{w} \in V$.
 > 2. Commutativity of vector addition: $\mathbf{u} + \mathbf{v} = \mathbf{v} + \mathbf{u}$ for any $\mathbf{u},\mathbf{v} \in V$.
 > 3. Identity element of vector addition: $\exists \mathbf{0} \in V$ such that $\mathbf{v} + \mathbf{0}$ for all $\mathbf{v} \in V$. 
 > 4. Inverse element of vector addition: $\forall \mathbf{v} \in V \exists (-\mathbf{v}) \in V$ such that $\mathbf{v} + (-\mathbf{v}) = \mathbf{0}$. 
 > 5. Distributivity of scalar multiplication with respect to vector addition: $a(\mathbf{u} + \mathbf{v}) = a\mathbf{u} + a\mathbf{v}$ for any scalar $a$ and any $\mathbf{u}, \mathbf{v} \in V$.
 > 6. Distributivity of scalar multiplication with respect to field addition: $(a + b) \mathbf{v} = a \mathbf{v} + b \mathbf{v}$ for any scalars $a$ and $b$ and any $\mathbf{v} \in V$.
 > 7. Compatibility of scalar multiplication with field multiplication: $a(b\mathbf{v}) = (ab) \mathbf{v}$ for any scalars $a$ and $b$ and any $\mathbf{v} \in V$. 
 > 8. Identity element of scalar multiplication: $1 \mathbf{v} = \mathbf{v}$ for all $\mathbf{v} \in V$. 
 Some important properties of a vector space can be derived from this definition in the following proposition a few have been listed. 
 > *Proposition*: if $V$ is a vector space and $\mathbf{u}$, $\mathbf{v}$ is in $V$, then
 >
 > 1. $0 \mathbf{v} = \mathbf{0}$.
 > 2. $\mathbf{u} + \mathbf{v} = \mathbf{0} \implies \mathbf{u} = - \mathbf{v}$.
 > 3. $(-1)\mathbf{v} = - \mathbf{v}$. 
 ??? note "*Proof*:"
    For 1, suppose $\mathbf{v} \in V$ then it follows from axioms 3, 6 and 8
    $$
        \mathbf{v} = 1 \mathbf{v} = (1 + 0)\mathbf{v} = 1 \mathbf{v} + 0 \mathbf{v} = \mathbf{v} + 0\mathbf{v},
    $$
    therefore
    $$
    \begin{align*}
        -\mathbf{v} + \mathbf{v} &= - \mathbf{v} + (\mathbf{v} + 0\mathbf{v}) = (-\mathbf{v} + \mathbf{v}) + 0\mathbf{v}, \\
        \mathbf{0} &= \mathbf{0} + 0\mathbf{v} = 0\mathbf{v}.
    \end{align*}
    $$
    For 2, suppose for $\mathbf{u}, \mathbf{v} \in V$ that $\mathbf{u} + \mathbf{v} = \mathbf{0}$ then it follows from axioms 1, 3 and 4
    $$
        - \mathbf{v} = - \mathbf{v} + \mathbf{0} = - \mathbf{v} + (\mathbf{v} + \mathbf{u}),
    $$
    therefore
    $$
        -\mathbf{v} = (-\mathbf{v} + \mathbf{v}) + \mathbf{u} = \mathbf{0} + \mathbf{u} = \mathbf{u}.
    $$
    For 3, suppose $\mathbf{v} \in V$ then it follows from 1 and axioms 4 and 6
    $$
        \mathbf{0} = 0 \mathbf{v} = (1 + (-1))\mathbf{v} = 1\mathbf{v} + (-1)\mathbf{v},
    $$
    therefore
    $$
        \mathbf{v} + (-1)\mathbf{v} = \mathbf{0},
    $$
    from 2 it follows then that
    $$
        (-1)\mathbf{v} = -\mathbf{v}.
    $$
 ### Euclidean spaces
 Perhaps the most elementary vector spaces are the Euclidean vector spaces $V = \mathbb{R}^n$ with $n \in \mathbb{N}$. Given a nonzero vector $\mathbf{u}  \in \mathbb{R}^n$ defined by 
 $$
    \mathbf{u} = \begin{pmatrix}u_1 \\ \vdots \\ u_n\end{pmatrix},
 $$ 
 it may be associated with the directed line segment from $(0, \dots, 0)$ to $(u_1, \dots, u_n)$. Or more generally line segments that have the same length and direction can be represented by any line segment from $(a_1, \dots, a_n)$ to $(a_1 + u_1, \dots, a_n + u_n)$. Vector addition and scalar multiplication in $\mathbb{R}^n$ are respectively defined by
 $$
    \mathbf{u} + \mathbf{v} = \begin{pmatrix} u_1 + v_1 \\ \vdots \\ u_n + v_n \end{pmatrix} \quad \text{ and } \quad a \mathbf{u} = \begin{pmatrix} a u_1 \\ \vdots \\ a u_n \end{pmatrix},
 $$
 for any $\mathbf{u}, \mathbf{v} \in \mathbb{R}^n$ and any scalar $a$. 
 This can be extended to matrices with $V = \mathbb{R}^{m \times n}$ with $m,n \in \mathbb{N}$, the set of all matrices. Given a nonzero matrix $A \in \mathbb{R}^{m \times n}$ defined by $A = (a_{ij})$. Matrix addition and scalar multiplication in $\mathbb{R}^{m \times n}$ are respectively defined by
 $$
    A + B = C \iff a_{ij} + b_{ij} = c_{ij} \quad \text{ and } \quad \alpha A = C \iff \alpha a_{ij} = c_{ij},
 $$
 for any $A, B, C \in \mathbb{R}^{m \times n}$ and any scalar $\alpha$. 
 ### Function spaces
 Let $V$ be a vector space over a field $F$ and let $X$ be any set. The functions $X \to F$ can be given the structure of a vector space over $F$ where the operations are defined by
 $$
 \begin{align*}
    (f + g)(x) = f(x) + g(x), \\
    (af)(x) = af(x),
 \end{align*}
 $$
 for any $f,g: X \to F$, any $x \in X$ and any $a \in F$. 
 ### Polynomial spaces
 Let $P_n$ denote the set of all polynomials of degree less than $n \in \mathbb{N}$ where the operations are defined by
 $$
 \begin{align*}
    (p+q)(x) = p(x) + q(x), \\
    (ap)(x) = ap(x),
 \end{align*}
 $$
 for any $p,q: X \to P_n$, any $x \in X$ and any $a \in P_n$. 
 ## Vector subspaces
 > *Definition*: if $S$ is a nonempty subset of a vector space $V$ and $S$ satisfies the conditions
 >
 > 1. $a \mathbf{u} \in S$ whenever $\mathbf{u} \in S$ for any scalar $a$.
 > 2. $\mathbf{u} + \mathbf{v} \in S$ whenever $\mathbf{u}, \mathbf{v} \in S$. 
 >
 > then $S$ is said to be a **subspace** of $V$.
 In a vector space $V$ it can be readily verified that $\{\mathbf{0}\}$ and $V$ are subspaces of $V$. All other subspaces are referred to as *proper subspaces* and $\{\mathbf{0}\}$ is referred to as the *zero subspace*.
 > *Theorem*: Every subspace of a vector space is a vector space. 
 ??? note "*Proof*:"
    May be proved by testing if all axioms remain valid for the definition of a subspace. 
 ### The null space of a matrix
 > *Definition*: let $A \in \mathbb{R}^{m \times n}$, $\mathbf{x} \in \mathbb{R}^n$ and let $N(A)$ denote the set of all solutions of the homogeneous system $A\mathbf{x} = \mathbf{0}$. Therefore
 >
 > $$
 >   N(A) = \{\mathbf{x} \in \mathbb{R}^n \;|\; A \mathbf{x} = \mathbf{0}\},
 > $$
 >
 > referred to as the null space of $A$. 
 Claiming that $N(A)$ is a subspace of $\mathbb{R}^n$. Clearly $\mathbf{0} \in N(A)$ so $N(A)$ is nonempty. If $\mathbf{x} \in N(A)$ and $\alpha$ is a scalar then
 $$
    A(\alpha \mathbf{x}) = \alpha A\mathbf{x} = \alpha \mathbf{0} = \mathbf{0}
 $$
 and hence $\alpha \mathbf{x} \in N(A)$. If $\mathbf{x}, \mathbf{y} \in N(A)$ then 
 $$
    A(\mathbf{x} + \mathbf{y}) = A\mathbf{x} + A\mathbf{y} = \mathbf{0} + \mathbf{0} = \mathbf{0}
 $$
 therefore $\mathbf{x} + \mathbf{y} \in N(A)$ and it follows that $N(A)$ is a subspace of $\mathbb{R}^n$. 
 ### The span of a set of vectors
 > *Definition*: let $\mathbf{v}_1, \dots, \mathbf{v}_n$ be vectors in a vector space $V$ with $n \in \mathbb{N}$. A sum of the form 
 >
 > $$
 >   a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n,
 > $$
 >
 > with scalars $a_1, \dots, a_n$ is called a **linear combination** of $\mathbf{v}_1, \dots, \mathbf{v}_n$.
 >
 > The set of all linear combinations of $\mathbf{v}_1, \dots, \mathbf{v}_n$ is called the **span** of $\mathbf{v}_1, \dots, \mathbf{v}_n$ which is denoted by $\text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$. 
 The nullspace can be for example defined by a span of vectors.
 > *Theorem*: if $\mathbf{v}_1, \dots, \mathbf{v}_n$ are vectors in a vector space $V$ with $n \in \mathbb{N}$ then $\text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ is a subspace of $V$. 
 ??? note "*Proof*:"
    Let $b$ be a scalar and $\mathbf{u} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ given by
    $$
        a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n,
    $$
    with scalars $a_1, \dots, a_n$. Since
    $$
        b \mathbf{u} = (b a_1)\mathbf{v}_1 + \dots + (b a_n)\mathbf{v}_n,
    $$
    it follows that $b \mathbf{u} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$. 
    If we also have $\mathbf{w} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ given by
    $$
        b_1 \mathbf{v}_1 + \dots + b_n \mathbf{v}_n,
    $$
    with scalars $b_1, \dots, b_n$. Then
    $$
        \mathbf{u} + \mathbf{w} = (a_1 + b_1) \mathbf{v}_1 + \dots + (a_n + b_n)\mathbf{v}_n, 
    $$
    it follows that $\mathbf{u} + \mathbf{w} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ is a subspace of $V$.
 For example, a vector $\mathbf{x} \in \mathbb{R}^3$ is in $\text{span}(\mathbf{e}_1, \mathbf{e}_2)$ if and only if it lies in the $x_1 x_2$-plane in 3-space. Thus we can think of the $x_1 x_2$-plane as the geometrical representation of the subspace $\text{span}(\mathbf{e}_1, \mathbf{e}_2)$. 
 > *Definition*: the set $\{\mathbf{v}_1, \dots, \mathbf{v}_n\}$ with $n \in \mathbb{N}$ is a spanning set for $V$ if and only if every vector $V$ can be written as a linear combination of $\mathbf{v}_1, \dots, \mathbf{v}_n$. 
 ## Linear independence
 We have the following observations.
 > *Proposition*: if $\mathbf{v}_1, \dots, \mathbf{v}_n$ with $n \in \mathbb{N}$ span a vector space $V$ and one of these vectors can be written as a linear combination of the other $n-1$ vectors then those $n-1$ vectors span $V$. 
 ??? note "*Proof*:"
    suppose $\mathbf{v}_n$ with $n \in \mathbb{N}$ can be written as a linear combination of the vectors $\mathbf{v}_1, \dots, \mathbf{v}_{n-1}$ given by
    $$
        \mathbf{v}_n = a_1 \mathbf{v}_1 + \dots + a_{n-1} \mathbf{v}_{n-1}.
    $$
    Let $\mathbf{v}$ be any element of $V$. Since we have
    $$
    \begin{align*}
        \mathbf{v} &= b_1 \mathbf{v}_1 + \dots + b_{n-1} \mathbf{v}_{n-1} + b_n \mathbf{v}_n, \\
                &= b_1 \mathbf{v}_1 + \dots + b_{n-1} \mathbf{v}_{n-1} + b_n (a_1 \mathbf{v}_1 + \dots + a_{n-1} \mathbf{v}_{n-1}), \\
                &= (b_1 + b_n a_1)\mathbf{v}_1 + \dots + (b_{n-1} + b_n a_{n-1}) \mathbf{v}_{n-1},
    \end{align*}
    $$
    we can write any vector $\mathbf{v} \in V$ as a linear combination of $\mathbf{v}_1, \dots, \mathbf{v}_{n-1}$ and hence these vectors span $V$. 
 > *Proposition*: given $n$ vectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ with $n \in \mathbb{N}$, it is possible to write one of the vectors as a linear combination of the other $n-1$ vectors if and only if there exist scalars $a_1, \dots, a_n$ not all zero such that
 >
 > $$
 >   a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n = \mathbf{0}.
 > $$
 ??? note "*Proof*:"
    Suppose that one of the vectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ with $n \in \mathbb{N}$ can be written as a linear combination of the others
    $$
        \mathbf{v}_n = a_1 \mathbf{v}_1 + \dots + a_{n-1} \mathbf{v}_{n-1}.
    $$
    Subtracting $\mathbf{v}_n$ from both sides obtains
    $$
        a_1 \mathbf{v}_1 + \dots + a_{n-1} \mathbf{v}_{n-1} - \mathbf{v}_n = \mathbf{0},
    $$
    we have $a_n = -1$ and 
    $$
        a_1 \mathbf{v}_1 + \dots + a_n\mathbf{v}_n = \mathbf{0}.
    $$
 We may use these oberservations to state the following definitions.
 > *Definition*: the vectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ in a vector space $V$ with $n \in \mathbb{N}$ are said to be **linearly independent** if 
 >
 > $$
 >   a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n = \mathbf{0} \implies \forall i \in \{1, \dots, n\} [c_i = 0].
 > $$
 It follows from the above propositions that if $\{\mathbf{v}_1, \dots, \mathbf{v}_n\}$ is a minimal spanning set of a vector space $V$ then $\mathbf{v}_1, \dots, \mathbf{v}_n$ are linearly independent. A minimal spanning set is called a basis of the vector space. 
 > *Definition*: the vectors $\mathbf{v}_1, \dots, \mathbf{v}_n$ in a vector space $V$ with $n \in \mathbb{N}$ are said to be **linearly dependent** if there exists scalars $a_1, \dots, a_n$ not all zero such that 
 >
 > $$
 >   a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n = \mathbf{0}.
 > $$
 It follows from the above propositions that if a set of vectors is linearly dependent then at least one vector is a linear combination of the other vectors. 
 > *Theorem*: let $\mathbf{x}_1, \dots, \mathbf{x}_n$ be vectors in $\mathbb{R}^n$ with $n \in \mathbb{N}$ and let $X = (\mathbf{x}_1, \dots, \mathbf{x}_n)$. The vectors $\mathbf{x}_1, \dots, \mathbf{x}_n$ will be linearly dependent if and only if $X$ is singular.
 ??? note "*Proof*:"
    Let $\mathbf{x}_1, \dots, \mathbf{x}_n$ be vectors in $\mathbb{R}^n$ with $n \in \mathbb{N}$ and let $X = (\mathbf{x}_1, \dots, \mathbf{x}_n)$. Suppose we have the linear combination given by
    $$
        a_1 \mathbf{x}_1 + \dots + a_n \mathbf{x}_n = \mathbf{0},
    $$
    can be rewritten as a matrix equation by
    $$
        X\mathbf{a} = \mathbf{0},
    $$
    with $\mathbf{a} = (a_1, \dots, a_n)^T$. This equation will have a nontrivial solution if and only if $X$ is singular. Therefore $\mathbf{x}_1, \dots, \mathbf{x}_n$ will be linearly dependent if and only if $X$ is singular.
 This result can be used to test whether $n$ vectors are linearly independent in $\mathbb{R}^n$ for $n \in \mathbb{N}$.
 > *Theorem*: let $\mathbf{v}_1, \dots, \mathbf{v}_n$ be vectors in a vector space $V$ with $n \in \mathbb{N}$. A vector $\mathbf{v} \in \text{span}(\mathbf{v}_1, \dots, \mathbf{v}_n)$ can be written uniquely as a linear combination of $\mathbf{v}_1, \dots, \mathbf{v}_n$ if and only if $\mathbf{v}_1, \dots, \mathbf{v}_n$ are linearly independent.
 ??? note "*Proof*:"
    If $\mathbf{v} \in \text{span}(\mathbf{v}_1, \dots \mathbf{v}_n)$ with $n \in \mathbb{N}$ then $\mathbf{v}$ can be written as a linear combination 
    $$
        \mathbf{v} = a_1 \mathbf{v}_1 + \dots + a_n \mathbf{v}_n.
    $$
    Suppose that $\mathbf{v}$ can also be expressed as a linear combination 
    $$
        \mathbf{v} = b_1 \mathbf{v}_1 + \dots + b_n \mathbf{v}_n.
    $$
    If $\mathbf{v}_1, \dots \mathbf{v}_n$ are linearly independent then subtracting both expressions yields
    $$
        (a_1 - b_1)\mathbf{v}_1 + \dots + (a_n - b_n)\mathbf{v}_n = \mathbf{0}.
    $$
    By the linear independence of $\mathbf{v}_1, \dots \mathbf{v}_n$, the coefficients must all be 0, hence
    $$
        a_1 = b_1,\; \dots \;, a_n = b_n 
    $$
    therefore the representation of $\mathbf{v}$ is unique when $\mathbf{v}_1, \dots \mathbf{v}_n$ are linearly independent. 
    On the other hand if $\mathbf{v}_1, \dots \mathbf{v}_n$ are linearly dependent then the coefficients must not all be 0 and $a_i \neq b_i$ for some $i \in \{1, \dots, n\}$. Therefore the representation of $\mathbf{v}$ is not unique when $\mathbf{v}_1, \dots \mathbf{v}_n$ are linearly dependent.
 ## Basis and dimension
 > *Definition*: the vectors $\mathbf{v}_1,\dots,\mathbf{v}_n \in V$ form a basis if and only if
 >
 > 1. $\mathbf{v}_1,\dots,\mathbf{v}_n$ are linearly independent,
 > 2. $\mathbf{v}_1,\dots,\mathbf{v}_n$ span $V$. 
 Therefore, a basis may define a vector space, but it is not necessarily unique. 
 > *Theorem*: if $\{\mathbf{v}_1,\dots,\mathbf{v}_n\}$ is a spanning set for a vector space $V$, then any collection of $m$ vectors in $V$ where $m>n$, is linearly dependent. 
 ??? note "*Proof*:"
    Let $\mathbf{u}_1, \dots, \mathbf{u}_m \in V$, where $m > n$. Then since $\{\mathbf{v}_1,\dots,\mathbf{v}_n\}$ span $V$ we have
    $$
        \mathbf{u}_i = a_{i1} \mathbf{v}_1 + \dots + a_{in} \mathbf{v}_n,
    $$
    for $i,j \in \{1, \dots, n\}$ with $a_{ij} \in \mathbb{R}$. 
    A linear combination $c_1 \mathbf{u}_1 + \dots + c_m \mathbf{u}_m$ can be written in the form
    $$
        c_1 \sum_{j=1}^n a_{1j} \mathbf{v}_j + \dots + c_m \sum_{j=1}^n a_{1j} a_{mj} \mathbf{v}_j,
    $$
    obtaining
    $$
        c_1 \mathbf{u}_1 + \dots + c_m \mathbf{u}_m = \sum_{i=1}^m \bigg( c_i \sum_{j=1}^n a_{ij} \mathbf{v}_j \bigg) = \sum_{j=1}^n \bigg(\sum_{i=1}^m a_{ij} c_i \bigg) \mathbf{v}_j.
    $$
    Considering the system of equations
    $$
        \sum_{i=1}^m a_{ij} c_i = 0
    $$
    for $j \in \{1, \dots, n\}$, a homogeneous system with more unknowns than equations. Therefore the system must have a nontrivial solution $(\hat c_1, \dots, \hat c_m)^T$, but then
    $$
        \hat c_1 \mathbf{u}_1 + \dots + \hat c_m \mathbf{u}_m = \sum_{j=1}^n 0 \mathbf{v}_j = \mathbf{0},
    $$
    hence $\mathbf{u}_1, \dots, \mathbf{u}_m$ are linearly dependent.
 > *Corollary*: if both $\{\mathbf{v}_1,\dots,\mathbf{v}_n\}$ and $\{\mathbf{u}_1,\dots,\mathbf{u}_m\}$ are bases for a vector space $V$, then $n = m$. 
 ??? note "*Proof*:"
    Let both $\{\mathbf{v}_1,\dots,\mathbf{v}_n\}$ and $\{\mathbf{u}_1,\dots,\mathbf{u}_m\}$ be bases for $V$. Since $\mathbf{v}_1,\dots,\mathbf{v}_n$ span $V$ and $\mathbf{u}_1,\dots,\mathbf{u}_m$ are linearly independent then it follows that $m \leq n$, similarly $\mathbf{u}_1,\dots,\mathbf{u}_m$ span $V$ and $\mathbf{v}_1,\dots,\mathbf{v}_n$ are linearly independent so $n \leq m$. Which must imply $n=m$. 
 With this result we may now refer to the number of elements in any basis for a given vector space. Which leads to the following definition.
 > *Definition*: let $V$ be a vector space. If $V$ has a basis consisting of $n \in \mathbb{N}$ vectors, then $V$ has **dimension** $n$. The subspace $\{\mathbf{0}\}$ of $V$ is said to have dimension $0$. $V$ is said to be **finite dimensional** if there is a finite set of vectors that spans $V$, otherwise $V$ is **infinite dimensional**. 
 So a single nonzero vector must span one-dimension exactly. For multiple vectors we have the following theorem.
 > *Theorem*: if $V$ is a vector space of dimension $n \in \mathbb{N}\ \backslash \{\mathbf{0}\}$, then
 >
 > 1. any set of $n$ linearly independent vectors spans $V$,
 > 2. any $n$ vectors that span $V$ are linearly independent,
 ??? note "*Proof*:"
    To prove 1, suppose that $\mathbf{v}_1,\dots,\mathbf{v}_n \in V$ are linearly independent and $\mathbf{v} \in V$. Since $V$ has dimension $n$, it has a basis consisting of $n$ vectors and these vectors span $V$. It follows that $\mathbf{v}_1,\dots,\mathbf{v}_n, \mathbf{v}$ must be linearly dependent. Thus there exist scalars $c_1, \dots, c_n, c_{n+1}$ not all zero, such that
    $$
        c_1 \mathbf{v}_1 + \dots + c_n \mathbf{v}_n + c_{n+1} \mathbf{v} = \mathbf{0}.
    $$
    The scalar $c_{n+1}$ cannot be zero, since that would imply that $\mathbf{v}_1,\dots,\mathbf{v}_n$ are linearly dependent, hence 
    $$
        \mathbf{v} = a_1 \mathbf{v}_1 + \dots a_n \mathbf{v}_n,
    $$
    with
    $$  
        a_i = - \frac{c_i}{c_{n+1}}
    $$
    for $i \in \{1, \dots, n\}$. Since $\mathbf{v}$ was an arbitrary vector in $V$ it follows that $\mathbf{v}_1, \dots, \mathbf{v}_n$ span $V$. 
    To prove 2, suppose that $\mathbf{v}_1,\dots,\mathbf{v}_n$ span $V$. If $\mathbf{v}_1,\dots,\mathbf{v}_n$ are linearly dependent, then one vector $\mathbf{v}_i$ can be written as a linear combination of the others, take $i=n$ without loss of generality. It follows that $\mathbf{v}_1,\dots,\mathbf{v}_{n-1}$ will still span $V$, which contradicts with $\dim V = n$, therefore $\mathbf{v}_1, \dots, \mathbf{v}_n$ must be linearly independent.
 Therefore no set fewer than $n$ vectors can span $V$, if $\dim V = n$.  
 ### Change of basis
 > *Definition*: let $V$ be a vector space and let $E = \{\mathbf{e}_1, \dots \mathbf{e}_n\}$ be an ordered basis for $V$. If $\mathbf{v}$ is any element of $V$, then $\mathbf{v}$ can be written in the form 
 >
 > $$
 >   \mathbf{v} = v_1 \mathbf{e}_1 + \dots + v_n \mathbf{e}_n,
 > $$
 >
 > where $v_1, \dots, v_n \in \mathbb{R}$ are the **coordinates** of $\mathbf{v}$ relative to $E$. 
 ## Row space and column space
 > *Definition*: if $A$ is an $m \times n$ matrix, the subspace of $\mathbb{R}^{n}$ spanned by the row vectors of $A$ is called the **row space** of $A$. The subspace of $\mathbb{R}^m$ spanned by the column vectors of $A$ is called the **column space** of $A$.
 With the definition of a row space the following theorem may be posed.
 > *Theorem*: two row equivalent matrices have the same row space.
 ??? note "*Proof*:"
    Let $A$ and $B$ be two matrices, if $B$ is row equivalent to $A$ then $B$ can be formed from $A$ by a finite sequence of row operations. Thus the row vectors of $B$ must be linear combinations of the row vectors of $A$. Consequently, the row space of $B$ must be a subspace of the row space of $A$. Since $A$ is row equivalent to $B$, by the same reasoning, the row space of $A$ is a subspace of the row space of $B$.
 With the definition of a column space a theorem posed in [systems of linear equations](systems-of-linear-equations.md) may be restated as.
 > *Theorem*: a linear system $A \mathbf{x} = \mathbf{b}$ is consistent if and only if $\mathbf{b}$ is in the column space of $A$. 
 ??? note "*Proof*:"
    For the proof, see the initial proof in [systems of linear equations](systems-of-linear-equations.md). 
 With this restatement the following statements may be proposed.
 > *Proposition*: let $A$ be an $m \times n$ matrix. The linear system $A \mathbf{x} = \mathbf{b}$ is consistent for every $\mathbf{b} \in \mathbb{R}^m$ if and only if the column vectors of $A$ span $\mathbb{R}^m$. 
 > 
 > The system $A \mathbf{x} = \mathbf{b}$ has at most one solution for every $\mathbf{b}$ if and only if the column vectors of $A$ are linearly independent.
 ??? note "*Proof*:"
    Let $A$ be an $m \times n$ matrix. It follows that $A \mathbf{x} = \mathbf{b}$ will be consistent for every $\mathbf{b} \in \mathbb{R}^m$ if and only if the column vectors of $A$ span $\mathbb{R}^m$. To prove the second statement, the system $A \mathbf{x} = \mathbf{0}$ can have only the trivial solution and hence the column vectors of $A$ must be linearly independent. Conversely, if the column vectors of $A$ are linearly independent, $A \mathbf{x} = \mathbf{0}$ has only the trivial solution. If $\mathbf{x}_1, \mathbf{x}_2$ were both solutions of $A \mathbf{x} = \mathbf{b}$ then $\mathbf{x}_1 - \mathbf{x}_2$ would be a solution of $A \mathbf{x} = \mathbf{0}$
    $$
        A(\mathbf{x}_1 - \mathbf{x}_2) = A\mathbf{x}_1 - A\mathbf{x}_2 = \mathbf{b} - \mathbf{b} = \mathbf{0}.
    $$
    It follows that $\mathbf{x}_1 - \mathbf{x}_2 = \mathbf{0}$ and hence $\mathbf{x}_1 = \mathbf{x}_2$. 
 From these propositions the following corollary emerges.
 > *Corollary*: an $n \times n$ matrix $A$ is nonsingular if and only if the column vectors of $A$ form a basis for $\mathbb{R}^n$. 
 ??? note "*Proof*:"
    Let $A$ be an $m \times n$ matrix. If the column vectors of $A$ span $\mathbb{R}^m$, then $n$ must be greater or equal to $m$, since no set of fewer than $m$ vectors could span $\mathbb{R}^m$. If the columns of $A$ are linearly independent, then $n$ must be less than or equal to $m$, since every set of more than $m$ vectors in $\mathbb{R}^m$ is linearly dependent. Thus, if the column vectors of $A$ form a basis for $\mathbb{R}^m$, then $n = m$.
 > *Theorem*: if $A$ is an $m \times n$ matrix, the dimension of the row space of $A$ equals the dimension of the column space of $A$. 
 ??? note "*Proof*:"
    Will be added later.
 ## Rank and nullity
 > *Definition*: the **rank** of a matrix $A$, denoted as $\text{rank}(A)$, is the dimension of the row space of $A$.
 The rank of a matrix may be determined by reducing the matrix to row echelon form. The nonzero rows of the row echelon matrix will form a basis for the row space. The rank may be interpreted as a measure for singularity of the matrix.
 > *Definition*: the **nullity** of a matrix $A$, denoted as $\text{nullity}(A)$, is the dimension of the null space of $A$. 
 The nullity of $A$ is the number of columns without a pivot in the reduced echelon form.
 > *Theorem*: if $A$ is an $m \times n$ matrix, then 
 >
 > $$
 >   \text{rank}(A) + \text{nullity}(A) = n.
 > $$
 ??? note "*Proof*:"
    Let $U$ be the reduced echelon form of $A$. The system $A \mathbf{x} = \mathbf{0}$ is equivalent to the system $U \mathbf{x} = \mathbf{0}$. If $A$ has rank $r$, then $U$ will have $r$ nonzero rows and consequently the system $U \mathbf{x} = \mathbf{0}$ will involve $r$ pivots and $n - r$ free variables. The dimension of the null space will equal the number of free variables.
--- a/docs/mathematics/logic.md
+++ b/docs/mathematics/logic.md
@ -0,0 +1,56 @@
 # Logic
 > *Definition*: a statement is a sentence that is either true or false, never both.
 <br>
 > *Definition* **- Logical operators**: let $A$ and $B$ be assertions. 
 > 
 > * The assertion $A$ and $B$ ($A \land B$) is true, iff both $A$ and $B$ are true.
 > * The assertion $A$ or $B$ ($A \lor B$) is true, iff at least one of $A$ and $B$ is true.
 > * The negation of $A$ ($\neg A$) is true iff $A$ is false.
 <br>
 > *Definition* **- Implies**: if $A$ and $B$ are assertions then the assertion if $A$ then $B$ ($A \implies B$) is true iff
 >
 > * $A$ is true and $B$ is true,
 > * $A$ is false and $B$ is true,
 > * $A$ is false and $B$ is false.
 >  
 > This also works the opposite way, if $B$ then $A$ ($A \Longleftarrow B$)
 <br>
 > *Definition* **- If and only if**: if $A$ and $B$ are assertions then the assertion $A$ if and only if $B$ (A \iff B) is true iff
 > 
 > * $(A \Longleftarrow B) \land (a \implies B)$.
 > 
 > This leads to the following table.
 | $A$   | $B$   | $A \implies B$ | $A \Longleftarrow B$ | $A \iff B$|
 | :---: | :---: | :------------: | :------------------: | :-------: |
 | true  | true  | true           | true                 | true      | 
 | true  | false | false          | true                 | false     |
 | false | true  | true           | false                | false     |
 | false | false | true           | true                 | true      |
 <br>
 > *Definition*: suppose $P$ and $Q$ are assertions. $P$ implies $Q$ if $P \implies Q$ is true. $P$ and $Q$ are equivalent if $P$ implies $Q$ and $Q$ implies $P$. 
 ## Methods of proof
 > *Direct proof*: for proving $P \implies Q$ only consider the case where $P$ is true.
 <br>
 > *Proof by contraposition*: proving $P \implies Q$ to be true by showing that $\neg Q \implies \neg P$ is true.
 <br>
 > *Proof by contradiction*: using the equivalence of $P \implies Q$ and $\neg Q \implies \neg P$ by assuming $P$ is not true and deducing a contradiction with some obviously true statement $Q$.
 <br>
 > *Proof by cases*: dividing a proof into cases which makes use of the equivalence of $(P \lor Q) \implies R$ and $(P \implies R) \land (Q \implies R)$. Which together cover all situations under consideration.
--- a/docs/mathematics/multivariable-calculus/differentation.md
+++ b/docs/mathematics/multivariable-calculus/differentation.md
@ -0,0 +1,151 @@
 # Differentation
 Generalization of derivatives to higher dimensions:
 * limit of difference quotient: partial derivatives,
 * linearization: total derivative.
 ## Partial derivatives
 *Definition*: let $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) and let $f: D \to \mathbb{R}$ and $\mathbf{a} \in D$, if the limit exists the partial derivates of $f$ are
 $$
 \begin{align*}
    &\partial_1 f(\mathbf{a}) := \lim_{h \to 0} \frac{f(a_1 + h, a_2) - f(\mathbf{a})}{h}, \\
    &\partial_2 f(\mathbf{a}) := \lim_{h \to 0} \frac{f(a_1, a_2 + h) - f(\mathbf{a})}{h}.
 \end{align*}
 $$
 *Theorem*: suppose that two mixed $n$th order partial derivatives of a function $f$ involve the same differentations but in different orders. If those partials are continuous at a point $\mathbf{a}$ and if $f$ and all partials of $f$ of order less than $n$ are continuous in a neighbourhood of $\mathbf{a}$, then the two mixed partials are equal at the point $\mathbf{a}$. We have for $n=2$ 
 $$
    \partial_{12} f(P) = \partial_{21} f(P),
 $$
 ??? note "*Proof*:"
    Will be added later.
 ## Total derivatives
 *Definition*: let $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) and let $f: D \to \mathbb{R}$, determining an affine linear approximation of $f$ around $\mathbf{a} \in D$
 $$
    p(\mathbf{x}) = f(\mathbf{a}) + \big\langle L,\; \mathbf{x} - \mathbf{a} \big\rangle,
 $$
 with $f(\mathbf{x}) = p(\mathbf{x}) + r(\mathbf{x})$ demand $\frac{r(\mathbf{x})}{\|\mathbf{x} - \mathbf{a}\|} \to 0$ when $\mathbf{x} \to \mathbf{a}$. 
 if $L \in \mathbb{R}^2$ exists to satisfy this, then $f$ is called totally differentiable in $\mathbf{a}$. 
 *Theorem*: if $f$ is totally differentiable in $\mathbf{a}$, then $f$ is partially differentiable in $\mathbf{a}$ and the partial derivatives are
 $$
    \partial_1 f(\mathbf{x}) = L_1, \qquad \partial_2 f(\mathbf{x}) = L_2,
 $$
 obtaining 
 $$
    p(\mathbf{x}) = f(\mathbf{a}) + \big\langle \nabla f(\mathbf{a}),\; \mathbf{x} - \mathbf{a} \big\rangle.
 $$
 with $\nabla f(\mathbf{a})$ the gradient of $f$.
 ??? note "*Proof*:"
    Will be added later.
 ## Chain rule
 *Definition*: let $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) and let $f: D \to \mathbb{R}$, also let $g: \mathbb{R} \to \mathbb{R}$ given by
 $$
    g(t) = f\big(\mathbf{x}(t)\big),
 $$
 if $f$ is continuously differentiable, then $g$ is differentiable with
 $$
    g'(t) = \big\langle \nabla f\big(\mathbf{x}(t)\big),\; \mathbf{\dot x}(t) \big\rangle.
 $$
 ## Gradients
 *Definition*: at any point $\mathbf{x} \in D$ where the first partial derivatives of $f$ exist, we define the gradient vector $\nabla$ by
 $$
    \nabla f(\mathbf{x}) = \begin{pmatrix} \partial_1 f(\mathbf{x}) \\ \partial_2 f(\mathbf{x}) \end{pmatrix}.
 $$
 The direction of the gradient is the direction of steepest increase of $f$ at $\mathbf{x}$.
 <br>
 *Theorem*: gradients are orthogonal to level lines and level surfaces.
 ??? note "*Proof*:"
    let $\mathbf{r}(t) = \big(x(t),\; y(t) \big)^T$ be a parameterization of the level curve of $f$ such that $\mathbf{r}(0) = \mathbf{a}$. Then for all $t$ near $0$, $f(\mathbf{r}(t)) = f(\mathbf{a})$. Differentiating this equation with respect to $t$ using the chain rule, we obtain
    $$
        \partial_1 f(\mathbf{x}) \dot x(t) + \partial_2 f(\mathbf{x}) \dot y(t) = 0,
    $$
    at $t=0$, we can rewrite this to
    $$
        \big\langle \nabla f(\mathbf{a}),\; \mathbf{\dot r}(0) \big\rangle = 0,
    $$
    obtaining that $\nabla f$ is orthogonal to $\mathbf{\dot r}$. 
 ## Directional derivatives
 *Definition*: let $D \subseteq \mathbb{R}^n$ and let $f: D \to \mathbb{R}$ with $\mathbf{v} \in D$ and $\|\mathbf{v}\| = 1$ a unit vector. The directional derivative is then the change of $f$ near a point $\mathbf{a} \in D$ in the direction of $\mathbf{v}$ 
 $$  
    D_\mathbf{v} f(\mathbf{a}) = \big\langle \mathbf{v},\; \nabla f(\mathbf{a}) \big\rangle.
 $$
 ## The general case
 *Definition*: let $D \subseteq \mathbb{R}^n$ and let $\mathbf{f}: D \to \mathbb{R}^m$, with $f_i: D \to \mathbb{R}$, with $i = 1, \dotsc, m$ being the components of $\mathbf{f}$. 
 * $\mathbf{f}$ is continuous at $\mathbf{a} \in D$ $\iff$ all $f_i$ continuous at $\mathbf{a}$,
 * $\mathbf{f}$ is partially/totally differentiable at $\mathbf{a}$ $\iff$ all $f_i$ are partially/totally differentiable at $\mathbf{a}$. 
 The linearization of every component $f_i$ we have
 $$
    f_i(\mathbf{x}) = f_i(\mathbf{a}) + \big\langle \nabla f_i(\mathbf{a}),\; \mathbf{x} - \mathbf{a} \big\rangle + r_i(\mathbf{x}),
 $$
 so in total we have
 $$
    \mathbf{f}(\mathbf{x}) = \mathbf{f}(\mathbf{a}) + D\mathbf{f}(\mathbf{a}) \big(\mathbf{x} - \mathbf{a}\big) + \mathbf{r}(\mathbf{x}),
 $$
 with $D\mathbf{f}(\mathbf{a})$ the Jacobian of $\mathbf{f}$.
 *Definition*: the Jacobian is given by $\big[D\mathbf{f}(\mathbf{a}) \big]_{i,\;j} = \partial_j f_i(\mathbf{a}).$
 ### Chain rule
 Let $D \subseteq \mathbb{R}^n$ and let $E \subseteq \mathbb{R}^m$ be sets and let $\mathbf{f}: D \to \mathbb{R}^m$ and let $\mathbf{g}: E \to \mathbb{R}^k$ with $\mathbf{f}$ differentiable at $\mathbf{x}$ and $\mathbf{g}$ differentiable at $\mathbf{f}(\mathbf{x})$. Then $D\mathbf{f}(\mathbf{x}) \in \mathbb{R}^{m \times n}$ and $D\mathbf{g}\big(\mathbf{f}(\mathbf{x})\big) \in \mathbb{R}^{k \times m}$. 
 Then if we differentiate $\mathbf{g} \circ \mathbf{f}$ we obtain
 $$
    D(\mathbf{g} \circ \mathbf{f})(\mathbf{x}) = D\mathbf{g}\big(\mathbf{f}(\mathbf{x})\big) D\mathbf{f}(\mathbf{x}).
 $$
 We have two interpretations:
 * the composition of linear maps,
 * the matrix multiplication of the Jacobian.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/multivariable-calculus/extrema.md
+++ b/docs/mathematics/multivariable-calculus/extrema.md
@ -0,0 +1,96 @@
 # Extrema
 *Definition*: for $D \subseteq \mathbb{R}^n$ let $f: D \to \mathbb{R}$ be differentiable and $D$ contains no boundary points (open). A point $\mathbf{x^*} \in D$ is called a critical point for $f$ $\iff \nabla f(\mathbf{x^*}) = \mathbf{0}$.
 *Definition*: $f$ has (strict) global $\begin{matrix} \text{ maximum } \\ \text{ minimum } \end{matrix}$ in $\mathbf{x^*} \in D$ $\iff \forall \mathbf{x} \in D \backslash \{\mathbf{x^*}\} \Big[f(\mathbf{x^*}) \begin{matrix} (>) \\ \geq \\ \leq \\ (<) \end{matrix} f(\mathbf{x}) \Big]$.
 *Definition*: $f$ has (strict) local $\begin{matrix} \text{ maximum } \\ \text{ minimum } \end{matrix}$ in $\mathbf{x^*} \in D$ $\iff \exists r_{>0} \forall \mathbf{x} \in D \backslash \{\mathbf{x^*}\} \Big[f(\mathbf{x^*}) \begin{matrix} (>) \\ \geq \\ \leq \\ (<) \end{matrix} f(\mathbf{x}) \;\land\;  (0) < \|\mathbf{x} - \mathbf{x^*}\| < r \Big]$
 *Theorem*: if $f$ has local $\begin{matrix} \text{ maximum } \\ \text{ minimum } \end{matrix}$ at $\mathbf{x^*} \in D$ then $\mathbf{x^*}$ is a critical point of for $f$.
 ??? note "*Proof*:"
    Will be added later.
 ## A second derivative test
 *Definition*: suppose $f: \mathbb{R}^n \to \mathbb{R}$ is differentiable with $\mathbf{x} \in \mathbb{R}^n$. The Hessian matrix of $f$ is defined as
 $$
    H_f(\mathbf{x}) := \begin{pmatrix} \partial_{11} f(\mathbf{x}) & \dots & \partial_{1n} f(\mathbf{x}) \\ \vdots & \ddots & \vdots \\ \partial_{n1} f(\mathbf{x}) & \dots & \partial_{nn} f(\mathbf{x}) \end{pmatrix}.
 $$
 *Theorem*:
 * If $H_f(\mathbf{x^*})$ is positive definite (all eigenvalues are positive), then $f$ has a local minimum at $\mathbf{x^*}$.
 * If $H_f(\mathbf{x^*})$ is negative definite (all eigenvalues are negative), then $f$ has a local maximum at $\mathbf{x^*}$.
 * If $H_f(\mathbf{x^*})$ is indefinite (both positive and negative eigenvalues), then $f$ has a saddle point at $\mathbf{x^*}$.
 * If $H_f(\mathbf{x^*})$ is neither positive nor negative definite, nor indefinite, (eigenvalues equal to zero) this test gives no information.
 ??? note "*Proof*:"
    Will be added later.
 ## Extrema on restricted domains
 *Theorem*: let $D \subseteq \mathbb{R}^n$ be bounded and closed ($D$ contains all boundary points). Let $f: D \to \mathbb{R}$ be continuous, then $f$ has a global maximum and minimum.
 ??? note "*Proof*:"
    Will be added later.
 **Procedure to find the global maximum and minimum**:
 * Find critical points in the interior.
 * Find global extrema on the boundary.
 * Find the largest/smallest among them.
 ### Lagrange multipliers
 *Theorem*: let $f: M \to \mathbb{R}$ and $g: \mathbb{R}^n \to \mathbb{R}$ with $M$ the boundary of $D$ given by
 $$
    M := \big\{\mathbf{x} \in \mathbb{R}^n \;\big|\; g(\mathbf{x}) = 0  \big\} \subseteq D,
 $$
 suppose that there is global maximum or minimum $\mathbf{x^*} \in M$ of $f$ that is not an endpoint of $M$ and $\nabla g(\mathbf{x^*}) \neq \mathbf{0}$. Then there exists a $\lambda^* \in \mathbb{R}$ such that $(\mathbf{x^*}, \lambda^*)$ is a critical point of the Lagrange function
 $$
    L(\mathbf{x}, \lambda) := f(\mathbf{x}) - \lambda g(\mathbf{x}).
 $$
 ??? note "*Proof*:"
    Will be added later.
 ### The general case
 *Theorem*: Let $f: S \to \mathbb{R}$ and $\mathbf{g}: \mathbb{R}^m \to \mathbb{R}^n$ with $m \leq n -1$ restrictions given by
 $$
    S := \big\{\mathbf{x} \in \mathbb{R}^n \;\big|\; \mathbf{g}(\mathbf{x}) = 0  \big\} \subseteq D,
 $$
 suppose that there is global maximum or minimum $\mathbf{x^*} \in S$ of $f$ that is not an endpoint of $S$ and $D \mathbf{g}(\mathbf{x^*}) \neq \mathbf{0}$. Then there exists a $\mathbf{\lambda^*} \in \mathbb{R^m}$ such that $(\mathbf{x^*}, \mathbf{\lambda^*})$ is a critical point of the Lagrange function
 $$
    L(\mathbf{x}, \mathbf{\lambda}) := f(\mathbf{x}) - \big\langle \mathbf{\lambda},\; \mathbf{g}(\mathbf{x})  \big\rangle.
 $$
 ??? note "*Proof*:"
    Will be added later.
 #### Example
 Let $f: M_1 \cap M_2 \to \mathbb{R}$ and $g_{1,2}: \mathbb{R}^n \to \mathbb{R}$ with the restrictions given by
 $$
    M_{1,2} := \big\{\mathbf{x} \in \mathbb{R}^n \;\big|\; g_{1,2}(\mathbf{x}) = 0  \big\} \subseteq D,
 $$
 suppose that there is global maximum or minimum $\mathbf{x^*} \in M_1 \cap M_2$ of $f$ that is not an endpoint of $M_1 \cap M_2$ and $\nabla g_{1,2}(\mathbf{x^*}) \neq \mathbf{0}$. Then there exists a $\lambda_{1,2}^* \in \mathbb{R}$ such that $(\mathbf{x^*}, \lambda_{1,2}^*)$ is a critical point of the Lagrange function
 $$
    L(\mathbf{x}, \lambda_1, \lambda_2) := f(\mathbf{x}) - \lambda_1 g_1(\mathbf{x}) - \lambda_2 g_2(\mathbf{x}).
 $$
--- a/docs/mathematics/multivariable-calculus/functions-of-several-variables.md
+++ b/docs/mathematics/multivariable-calculus/functions-of-several-variables.md
@ -0,0 +1,74 @@
 # Functions of several variables
 *Definition*: let $D \subseteq \mathbb{R}^m$ with $m>1$, and $f: D \to \mathbb{R}^n$ then $f$ is a function of several variables where:
 * for $n=1$, $f$ is a scalar function,
 * for $n>1$, $f$ is a vector valued function.
 <br>
 *Definition*: the domain convention specifies that the domain of a function of $m$ variables is the largest set of points for which the function makes sense as a real number, unless that domain is explicitly stated to be a smaller set. 
 ## Graphical representations of scalar valued functions
 ### Graphs
 *Definition*: let $D \subseteq \mathbb{R}^2$ and let $f: D \to \mathbb{R}$ then $G_f := \big\{\big(x, y, f(x,y)\big) \;\big|\; (x, y) \in D\big\}$ is the graph of $f$. Observe that $G_f \subseteq \mathbb{R}^3$. 
 ### Level sets
 *Definition*: let $D \subseteq \mathbb{R}^2$ and let $f: D \to \mathbb{R}$ then for $c \in \mathbb{R}$ we have $S_c := \big\{(x, y) \in D \;\big|\; f(x,y) = c \big\}$ is the level set of $f$. Observe that $S_c \subseteq \mathbb{R}^2$. 
 ## Multi-index notation
 *Definition*: an $n$-dimensional multi-index is an $n$-tuple of non-negative integers
 $$
    \alpha = (\alpha_1, \alpha_2, \dotsc, \alpha_n), \qquad \text{with } \alpha_i \in \mathbb{N}.
 $$
 ### Properties
 For the sum of components we have: $|\alpha| := \alpha_1 + \dotsc + \alpha_n$. 
 For $n$-dimensional multi-indeces $\alpha, \beta$ we have componentwise sum and difference
 $$
    \alpha \pm \beta := (\alpha_1 \pm \beta_1, \dotsc, \alpha_n \pm \beta_n).
 $$
 For the products of powers with $\mathbf{x} \in \mathbb{R}^n$ we have
 $$
    \mathbf{x}^\alpha := x_1^{\alpha_1} x_2^{\alpha_2} \dotsc x_n^{\alpha_n}.
 $$
 For factorials we have
 $$
    \alpha ! = \alpha_1 ! \cdot \alpha_2 ! \cdots \alpha_n !
 $$
 For the binomial coefficient we have
 $$
    \begin{pmatrix} \alpha \\ \beta \end{pmatrix} = \begin{pmatrix} \alpha_1 \\ \beta_1 \end{pmatrix} \begin{pmatrix} \alpha_2 \\ \beta_2 \end{pmatrix} \cdots \begin{pmatrix} \alpha_n \\ \beta_n \end{pmatrix} = \frac{\alpha !}{\beta ! (\alpha - \beta)!}
 $$
 For polynomials of degree less or equal to $m$ we have
 $$
    p(\mathbf{x}) = \sum_{|\alpha| \leq m} c_\alpha \mathbf{x}^\alpha,
 $$
 as an example for $m=2$ and $n=2$ we have
 $$
    p(\mathbf{x}) = c_1 + c_2 x_1 + c_3 x_2 + c_4 x_1 x_2 + c_5 x_1 ^2 + c_6 x_2^2 \qquad c_{1,2,3,4,5,6} \in \mathbb{R}
 $$
 For partial derivatives of $f: \mathbb{R}^n \to \mathbb{R}$ we have
 $$
    \partial^\alpha f(\mathbf{x}) = \partial^{\alpha_1}_{x_1} \dotsc \partial^{\alpha_n}_{x_n} f(\mathbf{x}).
 $$
--- a/docs/mathematics/multivariable-calculus/implicit-equations.md
+++ b/docs/mathematics/multivariable-calculus/implicit-equations.md
@ -0,0 +1,58 @@
 # Implicit equations
 *Theorem*: for $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicty), let $f: D \to \mathbb{R}$ be continuously differentiable and $\mathbf{a} \in D$.  Assume
 * $f(\mathbf{a}) = 0$, 
 * $\partial_2 f(\mathbf{a}) \neq 0$, nondegeneracy. 
 then there exists an $I$ around $a_1$ and an $J$ around $a_2$ such that $\phi: I \to J$ is differentiable and 
 $$
    \forall x \in I, y \in J: f(x,y) = 0 \iff y = \phi(x).
 $$
 Now calculating $\phi' (x)$ with the chain rule
 $$
 \begin{align*}
    f\big(x,\phi(x)\big) &= 0, \\
    \partial_1 f\big(x,\phi(x)\big) + \partial_2 f\big(x,\phi(x)\big) \phi' (x) &= 0,
 \end{align*}
 $$
 and we obtain
 $$
    \phi' (x) = - \frac{\partial_1 f\big(x,\phi(x)\big)}{\partial_2 f\big(x,\phi(x)\big)}.
 $$
 ??? note "*Proof*:"
    Will be added later.
 ## General case
 *Theorem*: Let $\mathbf{F}: \mathbb{R}^{n+m} \to \mathbb{R}^m$ given by $F(\mathbf{x},\mathbf{y}) = \mathbf{0}$ with $\mathbf{x} \in \mathbb{R}^n$ and $\mathbf{y} \in \mathbb{R}^m$. Suppose $\mathbf{F}$ is continuously differentiable and assume $D_2 \mathbf{F}(\mathbf{x},\mathbf{y}) \in \mathbb{R}^{m \times m}$ is nonsingular. Then there exists in neighbourhoods $I$ of $\mathbf{x}$ and $J$ of $\mathbf{y}$ with $I \subseteq \mathbb{R}^n,\; J \subseteq \mathbb{R}^m$, such that $\mathbf{\phi}: I \to J$ is differentiable and 
 $$
    \forall (\mathbf{x},\mathbf{y}) \in I \times J: \mathbf{F}(\mathbf{x},\mathbf{y}) = \mathbf{0} \iff \mathbf{y} = \mathbf{\phi}(\mathbf{x}).
 $$
 Now calculating $D \mathbf{\phi}(\mathbf{x})$ with the generalized chain rule
 $$
 \begin{align*}
    \mathbf{F}\big(\mathbf{x},\mathbf{\phi}(\mathbf{x})\big) &= \mathbf{0}, \\
    D_1 \mathbf{F}\big(\mathbf{x},\mathbf{\phi}(\mathbf{x})\big) + D_2 \mathbf{F}\big(\mathbf{x},\mathbf{\phi}(\mathbf{x})\big) D \mathbf{\phi}(\mathbf{x}) &= \mathbf{0}, \\
 \end{align*}
 $$
 and we obtain
 $$
    D \mathbf{\phi}(\mathbf{x}) = - \Big(D_2 \mathbf{F}\big(\mathbf{x},\mathbf{\phi}(\mathbf{x})\big) \Big)^{-1} D_1 \mathbf{F}\big(\mathbf{x},\mathbf{\phi}(\mathbf{x})\big).
 $$
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/mathematics/multivariable-calculus/integration.md
+++ b/docs/mathematics/multivariable-calculus/integration.md
@ -0,0 +1,55 @@
 # Integration
 *Theorem*: for $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) with $D = X \times Y$, let $f: D \to \mathbb{R}$ then we have
 $$
    \iint_D f = \int_X \Big(\int_Y f(x,y)dy \Big)dx = \int_Y \Big(\int_X f(x,y)dx \Big)dy
 $$
 implying that order can be interchanged, this is true for $n \in \mathbb{N}$. 
 ??? note "*Proof*:"
    Will be added later.
 ## Iteration of integrals
 *Theorem*: for $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) bounded and piecewise smooth boundary, let $f: D \to \mathbb{R}$ be bounded and continuous. Let $R$ be a rectangle with $D \subseteq R$ then
 $$
    \iint_D f dA = \iint_R F dA, \qquad \text{where } F(\mathbf{x}) = \begin{cases} F(\mathbf{x}) \quad &\mathbf{x} \in D, \\ 0 \quad &\mathbf{x} \notin D. \end{cases}
 $$
 ??? note "*Proof*:"
    Will be added later.
 ## Coordinate transformation for integrals
 *Theorem*: for $D \subseteq \mathbb{R}^n$ ($n=2$ for simplicity) bounded and piecewise smooth boundary, let $f: D \to \mathbb{R}$ be bounded and continuous and let $\phi: D \to \mathbb{R}^n$ be continuously differentiable and injective, define
 $$
    E := \phi(D),
 $$
 then we have
 $$
    \iint_D f = \iint_E f \circ \phi \;\Big|\mathrm{det} \big(D_\phi \big) \Big|,
 $$
 with $D_\phi$ the Jacobian of $\phi$.
 ??? note "*Proof*:"
    Will be added later.
 ### Example
 Let $D = \big\{(x,y) \in \mathbb{R}^2 \;\big|\; x^2 + y^2 \leq 4 \land 0 \leq y \leq x \big\}$ and let $\phi: D \to \mathbb{R}^2$ be given by
 $$
    \phi(r,\theta) = \begin{pmatrix} r\cos \theta \\ r\sin \theta \end{pmatrix},
 $$
 define $E := \phi(D) = [0,2] \times [0, \frac{\pi}{4}]$. Then $E$ is a rectangle which can be more easily integrated.
--- a/docs/mathematics/multivariable-calculus/limits-and-continuity.md
+++ b/docs/mathematics/multivariable-calculus/limits-and-continuity.md
@ -0,0 +1,21 @@
 # Limits and continuity
 ## Limit
 *Definition*: let $D \subseteq \mathbb{R}^m$ and let $f: D \to \mathbb{R}^n$, with $m,n \in \mathbb{N}$. Let $\mathbf{a}$ be the point $\mathbf{x}$ approaches, then $f$ approaches the limit $L \in \mathbb{R}^n$
 $$
 \lim_{\mathbf{x} \to \mathbf{a}} f(\mathbf{x}) = L \iff \forall \varepsilon_{>0} \exists \delta_{>0} \Big[0 < \|\mathbf{x} - \mathbf{a}\|< \delta \implies \|f(\mathbf{x}) - L\| < \varepsilon \Big],
 $$
 with $\mathbf{a}, \mathbf{x} \in \mathbb{R}^m$.
 ## Continuity
 *Definition*: let $D \subseteq \mathbb{R}^m$ and let $f: D \to \mathbb{R}^n$, with $m,n \in \mathbb{N}$. Then $f$ is called continuous at $\mathbf{a}$ if
 $$
 \lim_{\mathbf{x} \to \mathbf{a}} f(\mathbf{x}) = f(\mathbf{a}),
 $$
 with $\mathbf{a}, \mathbf{x} \in \mathbb{R}^m$.
--- a/docs/mathematics/multivariable-calculus/taylor-polynomials.md
+++ b/docs/mathematics/multivariable-calculus/taylor-polynomials.md
@ -0,0 +1,39 @@
 # Taylor polynomials
 For $D \subseteq \mathbb{R}^n$ let $f: D \to \mathbb{R}$ sufficiently often differentiable, we have $\mathbf{a} \in D$. Find a polynomial $T: \mathbb{R}^n \to \mathbb{R}$ such that
 $$
    \partial^\beta T(\mathbf{a}) = \partial^\beta f(\mathbf{a}).
 $$
 Ansatz: let $T(\mathbf{x}) = \sum_{|\alpha| \leq n} c_\alpha (\mathbf{x} - \mathbf{a})^\alpha$. Then
 $$
    \partial^\beta T(\mathbf{x}) = \sum_{|\alpha| \leq n,\; \alpha \geq \beta} c_\alpha \frac{\alpha!}{(\alpha - \beta)!} (\mathbf{x} - \mathbf{a})^{\alpha - \beta}.
 $$
 Choose $\mathbf{x} = \mathbf{a}$: $\partial^\beta T(\mathbf{a}) = c_\beta \beta! = \partial^\beta f(\mathbf{a}) \implies c_\beta = \frac{\partial^\beta f(\mathbf{a})}{\beta!}$. Therefore we obtain
 $$
    T(\mathbf{x}) = \sum_{|\alpha| \leq n} \frac{\partial^\alpha f(\mathbf{a})}{\alpha!} (\mathbf{x} - \mathbf{a})^\alpha. 
 $$
 *Theorem*: suppose $x \in D$ and the line segment $[\mathbf{a},\mathbf{x}]$ lies completely in $D$. Set $\mathbf{h} = \mathbf{x} - \mathbf{a}$. Then there is a $\theta \in (0,1)$ such that
 $$
    f(\mathbf{x}) = T(\mathbf{x}) + \frac{1}{(n+1)!} \partial_\mathbf{h}^{n+1} f(\mathbf{a} + \theta \mathbf{h}).
 $$
 ??? note "*Proof*:"
    Apply Taylor’s theorem in 1D and the chain rule to the function $\phi : [0, 1] \to \mathbb{R}$ given by
    $$
        \phi(\theta) := f(\mathbf{a} + \theta \mathbf{h}).
    $$
 ## Other methods
 Creating multivariable Taylor polynomials by using 1D Taylor polynomials of the different variables and composing them. 
 ### Example
--- a/docs/mathematics/number-theory/complex-numbers.md
+++ b/docs/mathematics/number-theory/complex-numbers.md
@ -0,0 +1,230 @@
 # Complex numbers
 ## Definition
 Let $p: A \to B$ be a quadratic equation given by
 $$
    p(x) = ax^2 + bx + c, \qquad x \in A.
 $$
 If we have $A,B \subseteq \mathbb{R}$ we can describe the descriminant $D$ of $p$ as
 $$
    D = b^2 - 4ac \begin{cases}>0: \text{two real solutions},\\ =0: \text{one real solution},\\ <0: \text{no real solution}.\end{cases}
 $$
 We may now define a set of numbers such that the discriminant $D$ of $p$ can be expressed as
 $$
    D = b^2 - 4ac \begin{cases}>0: \text{two solutions},\\ =0: \text{one solution},\\ <0: \text{two  solutions},\end{cases}
 $$
 with $A,B \subseteq \mathbb{C}$. We call these the complex numbers.
 > *Definition*: let $z=a+bi$ with $a,b \in \mathbb{R}$ and $i^2 = -1$ is the definition of a complex number. The set of complex numbers is denoted by $\mathbb{C}$. Such that we have $z \in \mathbb{C}$.
 >
 > * The real part of the complex number $z$ is given by $\mathrm{Re}(z) = a$.
 > * The imaginary part of the complex number $z$ is given by $\mathrm{Im}(z) = b$.
 > * The modulus of the complex number $z$ is given by $|z| = \sqrt{a^2+b^2}$.
 > * The conjugate of the complex number $z$ is given by $\overline z = a - bi$.
 ## Properties of the complex numbers
 > *Proposition*: let $z = a + bi$ be a complex number. The product of $z$ with its conjugate $\overline z$ is given by $z \overline z = |z|^2$. 
 ??? note "*Proof*:"
    Suppose $z = a + bi$ is complex number then $\overline z = a - bi$ is its conjugate and we have
    $$
        z \overline z = (a+bi)(a-bi) = a^2 - b^2 i^2 = a^2 + b^2 = |z|^2.
    $$
 > *Proposition*: let $z_\alpha = a_\alpha + b_\alpha i$ with $\alpha \in \{1,2\}$ be two complex numbers.
 >
 > * Addition of two complex numbers is given by $z_1 + z_2 = (a_1 + a_2) + (b_1 + b_2)i$.
 > * Multiplication of two complex numbers is given by $z_1 z_2 = (a_1 a_2 - b_1 b_2) + (a_1 b_2 + a_2 b_1)i$. 
 > * Division of two complex numbers is given by $\frac{z_1}{z_2} = \frac{a_1 a_2 + b_1 b_2}{a_2^2 + b_2^2} + \frac{-a_1 b_2 + a_2 b_1}{a_2^2 + b_2^2}i = \frac{z_1 \overline z_2}{|z_2|^2}$.
 ??? note "*Proof*:"
    Suppose $z_\alpha = a_\alpha + b_\alpha i$ with $\alpha \in \{1,2\}$ are two complex numbers.
    For addition we have
    $$
        z_1 + z_2 = (a_1 + b_1 i) + (a_2 + b_2 i) = (a_1 + a_2) + (b_1 + b_2)i.
    $$
    For multiplication we have
    $$
        z_1 z_2 = (a_1 + b_1 i)(a_2 + b_2 i) = (a_1 a_2 + b_1 b_2 i^2) + (b_1 a_2 + a_1 b_2) i = (a_1 a_2 - b_1 b_2) + (a_1 b_2 + a_2 b_1)i.
    $$
    For division we have
    $$
        \frac{z_1}{z_2} = \frac{z_1 \overline z_2}{z_2 \overline z_2} = \frac{z_1 \overline z_2}{|z_2|^2} = \frac{(a_1 + b_1 i)(a_2 - b_2 i)}{|z_2|^2} = \frac{a_1 a-2 - b_1 b_2 i^2 + (b_1 a_2 - b_2 a_1)i}{|z_2|^2} = \frac{a_1 a_2 + b_1 b_2}{a_2^2 + b_2^2} + \frac{-a_1 b_2 + a_2 b_1}{a_2^2 + b_2^2}i,
    $$
    is thus also a complex number.
 For real numbers the calculation rules are in agreement with the ordinary calculation rules for addition, multiplication and division of real numbers. Consequenty, the complex number system is an extension of the real number system. 
 ## Geometry of complex numbers
 Complex numbers may be represented as vectors in the complex plane, spanned by the real and imaginary axes. The addition of the complex numbers may then be observed as vector addition. 
 > *Definition*: let $z \in \mathbb{C}$ be a complex number and $C$ a circle with radius $r \in \mathbb{R}^+$ and center $c \in \mathbb{C}$. Then the circle $C$ is given by
 >
 > $$
 >   |z - c| = r.
 > $$
 >
 > The unit circle is given by $|z| = 1 = z \overline z$.
 Each point on the unit circle has rectangular coordinates of the form $(\cos \varphi, \sin \varphi)$, with $\varphi$ the angle with respect to the postitive real number axis.
 > *Proposition*: let $z \in \mathbb{C}$ be a complex number given by
 >
 > $$
 >   z = \cos \varphi + i \sin \varphi,
 > $$
 >
 > with $\varphi \in [0, 2\pi)$ describes a point on the unit circle.
 ??? note "*Proof*:"
    let $z \in \mathbb{C}$ be a complex number given by $z = \cos \varphi + i \sin \varphi$ with $\varphi \in [0, 2\pi)$. We have
    $$
        |z|= |\cos \varphi + i \sin \varphi| = \sqrt{\cos^2 \varphi + \sin^2 \varphi} = \sqrt{1} = 1.
    $$
 The angle $\varphi$ is called the argument of its corresponding complex number $z$, denoted by $\varphi = \mathrm{arg}(z)$. 
 > *Proposition*: let $z_\alpha = \cos \varphi_\alpha + i \sin \varphi_\alpha$ with $\varphi_\alpha \in \mathbb{R}$ and $\alpha \in \{1,2\}$ be two complex numbers on the unit circle. 
 >
 > * Multiplication of two complex numbers on the unity circle gives $z_1 z_2 = \cos (\varphi_1 + \varphi_2) + i \sin (\varphi_1 + \varphi_2)$. 
 > * Division of two complex numbers on the unity circle gives $\frac{z_1}{z_2} = \cos (\varphi_1 - \varphi_2) + i \sin (\varphi_1 - \varphi_2).$
 ??? note "*Proof*:"
    let $z_\alpha = \cos \varphi_\alpha + i \sin \varphi_\alpha$ with $\varphi_\alpha \in \mathbb{R}$ and $\alpha \in \{1,2\}$ be two complex numbers on the unit circle. 
    We have for multiplication
    $$
    \begin{align*}
        z_1 z_2 &= (\cos \varphi_1 + i \sin \varphi_1)(\cos \varphi_2 + i \sin \varphi_2), \\
                &= (\cos \varphi_1 \cos \varphi_2 - \sin \varphi_1 \sin \varphi_2) + i (\cos \varphi_1 \sin \varphi_2 + \sin \varphi_1 \cos \varphi_2), \\
                &= \cos (\varphi_1 + \varphi_2) + i \sin (\varphi_1 + \varphi_2).
    \end{align*}
    $$
    We have then for division
    $$
    \begin{align*}
        \frac{z_1}{z_2} &= \frac{z_1 \overline z_2}{|z_2|^2}, \\ 
                        &= \frac{(\cos \varphi_1 + i \sin \varphi_1)(\cos \varphi_2 - i \sin \varphi_2)}{1}, \\
                        &= (\cos \varphi_1 \cos \varphi_2 + \sin \varphi_1 \sin \varphi_2) + i (\sin \varphi_1 \cos \varphi_2 - \cos \varphi_1 \sin \varphi_2), \\
                        &= \cos (\varphi_1 - \varphi_2) + i \sin (\varphi_1 - \varphi_2).
    \end{align*}
    $$
 In argument notation we then have
 * For multiplication: $\mathrm{arg}(z_1 z_2) = \mathrm{arg}(z_1) + \mathrm{arg}(z_2)$.
 * For division: $\mathrm{arg}(\frac{z_1}{z_2}) = \mathrm{arg}(z_1) - \mathrm{arg}(z_2)$.
 ## Euler's formula
 > *Theorem*: a point on the unit circle can also be described by
 >
 > $$
 >   e^{i \varphi} = \cos \varphi + i \sin \varphi,
 > $$
 >
 > with $\varphi \in \mathbb{R}$.
 ??? note "*Proof*:"
    Using the power-series of $e^x$ given by
    $$
        e^x = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \frac{x^4}{4!} + \frac{x^5}{5!} + \frac{x^6}{6!} + \frac{x^7}{7!} + \dots.
    $$
    Let $x = i \varphi$ obtains
    $$
    \begin{align*}
        e^{i \varphi} &= 1 + i \varphi + \frac{(i \varphi)^2}{2!} + \frac{(i \varphi)^3}{3!} + \frac{(i \varphi)^4}{4!} + \frac{(i \varphi)^5}{5!} + \frac{(i \varphi)^6}{6!} + \frac{(i \varphi)^7}{7!} + \dots, \\
        &= 1 + i \varphi - \frac{\varphi^2}{2!} - \frac{i \varphi^3}{3!} + \frac{\varphi^4}{4!} + \frac{i \varphi^5}{5!} - \frac{\varphi^6}{6!} - \frac{i \varphi^7}{7!} + \dots, \\
        &= \Big(1 - \frac{\varphi^2}{2!} + \frac{\varphi^4}{4!} - \frac{\varphi^6}{6!} + \dots \Big) + i \Big(\varphi - \frac{\varphi^3}{3!} + \frac{\varphi^5}{5!} - \frac{\varphi^7}{7!} + \dots\Big), \\
        &= \cos \varphi + i \sin \varphi,
    \end{align*}
    $$
    where in the last step the two terms are recognized as the Maclaurin series for $\cos \varphi$ and $\sin \varphi$. The rearrangement of terms is justified beceause each series is absolutely convergent.
 We may obtain Euler's identity given by $e^{i \pi} + 1 = 0$ from this theorem, by taking $\varphi = \pi$. 
 > *Theorem*: for any $\varphi \in \mathbb{R}$ and $n \in \mathbb{N}$ it holds that
 >
 > $$
 >   (\cos \varphi + i \sin \varphi)^n = \cos n \varphi + i \sin n \varphi,
 > $$
 >
 > known as de Moivre's theorem. 
 ??? note "*Proof*:"
    Let $\varphi \in \mathbb{R}$ and $n \in \mathbb{N}$, the proof follows from Euler's formula by taking
    $$
        (\cos \varphi + i \sin \varphi)^n = (e^{i \varphi})^n = e^{i \varphi n} = \cos n \varphi + i \sin n \varphi.
    $$
 With de Moivre's theorem the trigonometric identies can be derived. For example by taking $n=2$ let $z$ be a complex number given by
 $$
    z = (\cos \varphi + i \sin \varphi)^2 = \cos^2 \varphi - \sin^2 \varphi + 2i \cos \varphi \sin \varphi = \cos 2 \varphi + i \sin 2 \varphi,
 $$
 then $\mathrm{Re}(z) = \cos^2 \varphi - \sin^2 \varphi = \cos 2 \varphi$ and $\mathrm{Im}(z) = 2 \cos \varphi \sin \varphi = \sin 2 \varphi$. 
 ## Roots of polynomials
 > *Definition*: let $p$ be a complex polynomial of degree $n$ given by 
 >
 > $$
 >   p(z) = \alpha_0 + \alpha_1 z + \alpha_2 z^2 + \dots + \alpha_n z^n,
 > $$
 >
 > with $\alpha_i, z \in \mathbb{C}$ for $i \in \mathbb{N}$.
 If for a certain $z_0 \in \mathbb{C}$ we have $p(z_0) = 0$ then $z_0$ is called a *zero* of the polynomial.
 > *Lemma*: if $z_0 \in \mathbb{C}$ is a zero of $p$ then there exists a complex polynomial $q$ such that $p(z) = (z-z_0)q(z)$ for all $z \in \mathbb{C}$. 
 ??? note "*Proof*:"
    Will be added later.
 > *Theorem* **- Fundamental theorem of algebra**: for each $n^\text{th}$-degree complex polynomial $p$ with $n \in \mathbb{N}$ there are $n$ complex numbers $z_1, \dots, z_n$ such that $p(z) = \gamma (z - z_0)(z - z_1) \cdots (z - z_n)$ for all $z \in \mathbb{C}$. 
 ??? note "*Proof*:"
    Will be added later.
 > *Theorem*: each real polynomial can be written as a product of real linear factors and real quadratic factors with a negative discriminent.
 ??? note "*Proof*:"
    Will be added later.
 From this theorem it follows that for a certain zero $z \in \mathbb{C}$ of a real polynomial $p$ its conjugate $\overline z$ is also a zero for the real polynomial $p$. Since $a = \overline a$ for $a \in \mathbb{R}$. 
--- a/docs/mathematics/number-theory/integer-arithmetic.md
+++ b/docs/mathematics/number-theory/integer-arithmetic.md
@ -0,0 +1 @@
 # Integer arithmetic
--- a/docs/mathematics/number-theory/modular-arithmetic.md
+++ b/docs/mathematics/number-theory/modular-arithmetic.md
@ -0,0 +1 @@
 # Modular arithmetic
--- a/docs/mathematics/ordinary-differential-equations/first-order-ode.md
+++ b/docs/mathematics/ordinary-differential-equations/first-order-ode.md
@ -0,0 +1,66 @@
 # First-order differential equations
 ## First-order linear differential equations
 A first-order **linear** differential equation is one of the type
 $$
 \frac{dy}{dx} + p(x) y = q(x).
 $$
 where $p(x)$ and $q(x)$ are given functions, which may be assumed to be continuous. The equation is called **nonhomogeneous** unless $q(x)$ is dentically zero. The corresponding **homogeneous** equation
 $$
 \frac{dy}{dx} + p(x)y = 0,
 $$
 is separable and so is easily solved. By seperation of variables
 $$
 \frac{dy}{dx} = -p(x)y \implies \int \frac{1}{y}dy = \int p(x)dx.
 $$
 Though, pay attention to absolute values.
 There are two methods for solving nonhomogeneous equations.
 ### Integration factor
 The first method is by using an integrating factor. Let $\mu(x)$ be an antiderivative of $p(x)$ and multiply the equation by $e^{\mu(x)}$.
 $$
 \begin{array}{ll}
 e^{\mu(x)} \frac{dy}{dx} + e^{\mu(x)} p(x) y = e^{\mu(x)} q(x) &\implies \frac{d}{dx}(e^{\mu(x)} y) = q(x) e^{\mu(x)}, \\
 &\implies e^{\mu(x)} y = \int q(x) e^{\mu(x)}dx, \\
 &\implies y(x) = e^{-\mu(x)} \int q(x) e^{\mu(x)}dx.
 \end{array}
 $$
 ### Variation of the constant
 The second method is by using a variation of a constant. Let $\mu(x)$ be an antiderivative of $p(x)$ and solve the homegeneous equation
 $$
 \frac{dy}{dx} + p(x)y = 0 \implies y(x) = k e^{-\mu(x)}.
 $$
 Try:
 $$
 y'(x) + p(x) y(x) = q(x) = k'(x) e^{\mu(x)},
 $$
 thus $k'(x) = q(x) e^{\mu(x)}$.
 #### Example
 Solve $\frac{dy}{dx} + 2xy = x$ with $y(0) = 3$.
 First solving the homogeneous equation
 $$
 \begin{array}{ll}
 \frac{dy}{dx} + 2xy = 0 &\implies \int \frac{1}{y} dy = -2\int xdx
 \end{array}
 $$
--- a/docs/mathematics/ordinary-differential-equations/laplace-transform.md
+++ b/docs/mathematics/ordinary-differential-equations/laplace-transform.md
@ -0,0 +1,167 @@
 # The Laplace transform
 *Definition*: let $f: (0,\infty) \to \mathbb{R}$ be a piecewise continuous function that complies to the demand: $\exists s_0 \geq 0, \mu > 0: |f(t)| \leq \mu e^{s_0 t}$, then the **Laplace transform** $\mathcal{L}[f]$ is defined by
 $$
 \mathcal{L}[f](s) := \int_0^\infty e^{-st} f(t)dt = F(s),
 $$
 where $F(s)$ exists for all $s > s_0$.
 ## Basic properties
 **Linearity**: if $f,g: (0,\infty) \to \mathbb{R}$ both have Laplace transforms, then $f + g$ also has a Laplace transform, and
 $$
 \mathcal{L}[f + g] = \mathcal{f} + \mathcal{g},
 $$
 on the interval where both are defined. 
 ??? note "*Proof*:"
    Will be added later.
 If $c \in \mathbb{R}$ then $cf$ also has a Laplace transform and, 
 $$
 \mathcal{L}[cf] = c \mathcal{L}[f].
 $$
 **Shifting**: if $f$ has a Laplace transform $F$ on $(s_0,\infty)$ and $a \in \mathbb{R}$ then the function $g$ given by
 $$
 g(t) = e^{at} f(t)
 $$
 has a Laplace transform $G$ on $(\mathrm{max}(s_0 + a),0),\infty$, and
 $$
 G(s) = F(s-a)
 $$
 on this interval
 ??? note "*Proof*:"
    Will be added later.
 **More shifting**: let $a>0$, if $f$ has a Laplace transform $F$ on $s_0, \infty$ then the function $g$ given by
 $$
 g(t) = \begin{cases} f(t-a) \qquad &\text{if } t \geq a, \\ 0 \qquad &\text{if } t < a \end{cases}
 $$
 has a Laplace transform G on $(s_0,\infty)$, and
 $$
 G(s) = e^{-as}F(s)
 $$
 on this interval.
 ??? note "*Proof*:"
    Will be added later.
 **Scaling**: let $a > 0$. If $f$ has a Laplace transform $F$ on $(s_0, \infty)$ then the function $g$ given by 
 $$
 g(t) = f(at)
 $$
 has a Laplace transform G on $(as_0, \infty)$, and 
 $$
 G(s) = \frac{1}{a} F\Big(\frac{s}{a}\Big)
 $$
 on this interval.
 ??? note "*Proof*:"
    Will be added later.
 **Derivatives**: if $f$ has a derivative $g$ having a Laplace transform $G$ on the interval $(s_0,\infty)$ then $f$ has a Laplace transform on the same interval, and
 $$
 G(s) = sF(s) - f(0).
 $$
 More generally, for higher derivatives we have (under analogous assumptions)
 $$
 \mathcal{L}[f^{(n)}](s) = s^n F(s) - \sum_{k=0}^{n-1} s^k f^{(n-1-k)}(0)
 $$
 ??? note "*Proof*:"
    For large enough $s$, the case $n=1$ follows by integration by parts
    $$
    \begin{align*} 
        \mathcal{L}[f'](s) &= \int_0^\infty e^{-st} f'(t)dt, \\ 
        &= \Big[e^{-st} f(t) \Big]_0^\infty + s\int_0^\infty e^{-st}f(t), \\ 
        &= sF(s) - f(0),
    \end{align*}
    $$
    suppose $\mathcal{L}[f^{k}](s) = s^k F(s) - \sum_{r=0}^{k-1} s^r f^{(k-1-r)}(0)$ is true for $k \in \mathbb{N}$. Then by assumption
    $$
    \begin{align*} 
        \mathcal{L}[f^{k+1}](s) &= \int_0^\infty e^{-st} f^{(k+1)}(t)dt, \\ 
        &= \Big[e^{-st} f^{(k+1)}(t) \Big]_0^\infty + s\int_0^\infty e^{-st}f^{(k)}(t), \\ 
        &= s \mathcal{L}[f^{(k)}] - f^{(k)}(0), \\ &= s \Big(s^k F(s) - \sum_{r=0}^{k-1} s^r f^{(k-1-r)}(0)\Big) - f^{(k)}(0), \\ 
        &= s^{k+1} F(s) - \sum_{r=0}^{k} s^r f^{(k-r)}(0).
    \end{align*}
    $$
 ## Examples
 **Solving a second order linear ODE**: with $y: \mathbb{K} \to \mathbb{R}$ given by 
 $$
 \ddot y + 4 \dot y + 4y = t \qquad \text{with } y(0) = 1 \text{ and } \dot y(0) = 0
 $$
 using the Laplace transform
 $$
 \begin{align*}
 \mathcal{L}[\ddot y + 4 \dot y + 4y](s) &= \frac{1}{s^2} \qquad \text{let } \mathcal{L}[y](s) = Y(s), \\ 
 s^2 Y(s) -s + 4(sY(s) - 1) + 4 Y(s) &= \frac{1}{s^2}, \\ 
 (s^2 + 4s + 4)Y(s) &= \frac{1}{s^2} + s + 4, \\ Y(s) &= \frac{s^3 + 4s^2 +1}{s^2(s+2)^2}, 
 \end{align*}
 $$
 then it may be solved with partial fraction decomposition and the inverse transform.
 **Solving a linear system of ODEs**: with $\mathbf{y}: \mathbb{K} \to \mathbb{R}^2$ given by
 $$
 \mathbf{\dot y}(t) = \begin{pmatrix} 5 & 1 \\ 1 & 5 \end{pmatrix} \mathbf{y}(t) \qquad \text{with } \mathbf{y}(0) = \begin{pmatrix} -3 \\ 7 \end{pmatrix} 
 $$
 using the Laplace transform
 $$
 \begin{align*} 
    \mathcal{L}[\mathbf{\dot y}](s) &= \begin{pmatrix} 5 & 1 \\ 1 & 5 \end{pmatrix} \mathcal{L}[\mathbf{y}](s) \qquad \text{let } \mathcal{L}[\mathbf{y}](s) = \mathbf{Y}(s), \\ 
    s \mathbf{Y}(s) - \mathbf{y}(0) &= \begin{pmatrix} 5 & 1 \\ 1 & 5 \end{pmatrix} \mathbf{Y}(s), \\ 
    s \mathbf{Y}(s) + \begin{pmatrix} 3 \\ -7 \end{pmatrix} &= \begin{pmatrix} 5 & 1 \\ 1 & 5 \end{pmatrix} \mathbf{Y}(s), \\
    \begin{pmatrix} 5 - s & 1 \\ 1 & 5 - s \end{pmatrix} \mathbf{Y}(s) &= \begin{pmatrix} 3 \\ -7 \end{pmatrix}, 
 \end{align*}
 $$
 using Cramer's rule
 $$
 \begin{align*}
    &Y_1(s) = \frac{\mathrm{det}\begin{pmatrix} 3 & 1 \\ -7 & 5 - s \end{pmatrix}}{(5-s^2)-1}, \\
    \\
    &Y_2(s) = \frac{\mathrm{det}\begin{pmatrix} 5 - s & 3 \\ 1 & -7\end{pmatrix}}{(5-s^2)-1},
 \end{align*}
 $$
 both can be solved with partial fraction decomposition and the inverse transform.
--- a/docs/mathematics/ordinary-differential-equations/second-order-ode.md
+++ b/docs/mathematics/ordinary-differential-equations/second-order-ode.md
@ -0,0 +1,133 @@
 # Second-order ordinary differential equations
 For simplicity, all definitions and statements are for complex values functions and vector spaces over $\mathbb{C}$.
 ## Linear second-order ODEs with constant coefficients
 Let $L[y] = f$ be given by
 $$
 L[y] = \ddot y + p \dot y + qy = f \qquad (*),
 $$
 with $f,p,q \in \mathbb{R}$. 
 *Definition*: the set of all solutions to $(*)$ is called the general solution.
 *Property*: if $y_1,y_2$ are both solutions to the homogeneous case $L[y]=0$ then $\forall c_1,c_2 \in \mathbb{R}$, $y=c_1y_1 + c_2y_2$ is a solution. 
 $$
 L[y] = L[c_1y_1 + c_2y_2] = c_1L[y_1] + c_2L[y_2],
 $$
 Then the consequence is that the general solution is a linear space.
 $(*)$ is said to have **resonance** if $f$ can be split into linearly independent terms of which at least one lies in the solution space of $(*)$.
 ### Solving homogeneous linear second-order ODEs with constant coefficients
 Therefore solving
 $$
 L[y] = \ddot y + p \dot y + qy = 0.
 $$
 Ansatz: let $y(t) = e^{\lambda t}$ with $\lambda \in \mathbb{C}$. Then
 $$
 L[y(t)] = \lambda^2 e^{\lambda t} + p \lambda e^{\lambda t} + q e^{\lambda t} = e^{\lambda t} (\lambda^2 + p \lambda + q) = 0,
 $$
 obtaining the characteristic equation $\chi(\lambda) = \lambda^2 + p \lambda + q = 0$. If two roots $\lambda_1,\lambda_2 \in \mathbb{C}$ are found the solution space is 
 $$
 y(t) = c_1 e^{\lambda_1 t} + c_2 e^{\lambda_2 t}, \quad c_1,c_2 \in \mathbb{C},
 $$
 if instead one root $\lambda_1 \in \mathbb{C}$ is foundt the solution space is
 $$
 y(t) = (c_1 + c_2t) e^{\lambda_1 t}.
 $$
 ??? note "*Proof*:"
    Will be added later.
 #### Example
 Let the homogeneous linear second-order ode be given by $\ddot y + 4 \dot y + 8y = 0$. Then the characteristic equation is given by $\chi(\lambda) = \lambda^2 + 4\lambda + 8 = 0$ with solutions $\lambda_1 = -2 + 2i$ and $\lambda_2 = -2 - 2i$. Then the general solution is given by
 $$
 y(t) = c_1 e^{(-2 + 2i)1 t} + c_2 e^{(-2 - 2i) t}, \quad c_1,c_2 \in \mathbb{C},
 $$
 and we can write the real solution as
 $$
 y(t) = e^{-2t}\big(d_1\cos 2t + d_2 \sin 2t \big), \quad d_1,d_2 \in \mathbb{R}.
 $$
 ### Solving inhomogeneous linear second-order ODEs with constant coefficients
 *Theorem*: let $y_p$ be a particular solution to $(*)$. Then the general solution to $(*)$ is given by
 $$
 y = y_h + y_p,
 $$
 with $y_h$ the solution to the homegeneous case.
 ??? note "*Proof*:"
    Let $y$ be a solution to $(*)$, then $L[y - y_p] = L[y] - L[y_p] = f - f = 0$. Therefore $y = (y - y_p) + y_p = y_h + y_p$.
 #### Method of variation of parameters
 We need the general solution to the homogeneous case 
 $$
 y_h(t) = c_1 y_1(t) + c_2 y_2(t), \qquad c_1,c_2 \in \mathbb{C}. 
 $$
 Ansatz: let $y_p(t) = c_1(t) y_2(t) + c_2(t) y_2(t)$, then taking the derivative of $y_p(t)$
 $$
 \dot y_p(t) = \dot c_1(t) y_2(t) + \dot c_2(t) y_2(t) + c_1(t) \dot y_2(t) + c_2(t) \dot y_2(t),
 $$
 we demand that $\dot c_1(t) y_2(t) + \dot c_2(t) y_2(t) = 0$. Then taking the second derivative of $y_p(t)$
 $$
 \ddot y_p(t) = \dot c_1(t) \dot y_2(t) + \dot c_2(t) \dot y_2(t) + c_1(t) \ddot y_2(t) + c_2(t) \ddot y_2(t),
 $$
 then we have for $(*)$
 $$
 \ddot y_p(t) + p \dot y_p(t) + q = c_1\big(\ddot y_1 + p \dot y_1 + q y_1\big) + c_2\big(\ddot y_2 + p \dot y_2 + q y_2\big) + \dot c_1 \dot y_1 + \dot c_2 \dot y_2 = f
 $$
 we demand that $\dot c_1 \dot y_1 + \dot c_2 \dot y_2 = f$. Then we can create a linear system of demands
 $$
 \begin{pmatrix} y_1 && y_2 \\ \dot y_1 && \dot y_2\end{pmatrix} \begin{pmatrix} \dot c_1 \\ \dot c_2 \end{pmatrix} = \begin{pmatrix} 0 \\ f \end{pmatrix},
 $$
 named the Wronskian and we can solve for $c_1(t)$ and $c_2(t)$ by integration.
 #### Ansatz method
 Let $f(t) = p(t)e^{\lambda t}$, rule of thumb: $y_p$ is of related type to inhomogeneity $f$. Then for $A_n, B_n$ and $P_n$ polynomials of degree $\leq n$ and $\alpha \in \mathbb{R}$
 | Inhomogeneity | Particular solution |
 | ------ | --------------- |
 | $L[y] = P_n$ | $t^m A_n$ |
 | $L[y] = P_n e^{\alpha t}$ | $t^m A_n e^{\alpha t}$ |
 | $L[y] = P_n \cos \omega t$ | $t^m \big(A_n \cos \omega t + B_n \sin \omega t \big)$ |
 | $L[y] = P_n \sin \omega t$ | $t^m \big(A_n \cos \omega t + B_n \sin \omega t \big)$ |
 | $L[y] = P_n e^{\alpha t} \cos \omega t$ | $t^m e^{\alpha t} \big(A_n \cos \omega t + B_n \sin \omega t \big)$ |
 | $L[y] = P_n e^{\alpha t} \sin \omega t$ | $t^m e^{\alpha t} \big(A_n \cos \omega t + B_n \sin \omega t \big)$ |
 Choose $m \in \mathbb{N} \cup \{0\}$ as small as possible such that no term in the ansatz solves the homogeneous equation $L[y] = 0$.
--- a/docs/mathematics/ordinary-differential-equations/systems-of-linear-ode.md
+++ b/docs/mathematics/ordinary-differential-equations/systems-of-linear-ode.md
@ -0,0 +1,77 @@
 # Systems of linear ordinary differential equations
 ## Homogeneous systems of linear ODEs with constant coefficients
 Let $\mathbb{K} = \mathbb{R} \lor \mathbb{C}$, $n \in \mathbb{N}$ and $A \in \mathbb{R}^{n \times n}$. Seek differentiable functions $y:\mathbb{R} \to \mathbb{K}^n$ such that 
 $$
 \mathbf{\dot y}(t) = A \mathbf{y}(t), \qquad t \in \mathbb{R}
 $$
 The solutions from a linear space, therefore the general solutions can be written as,
 $$
 \mathbf{y}(t) = \sum_{k=1}^n c_k \mathbf{y}_k(t), \qquad c_k \in \mathbb{K}
 $$
 where $\{\mathbf{y_1}, \dots, \mathbf{y_n}\}$ is a linear independent set of solutions, i.e. the basis of the solutions space.
 Assume now that $A$ is diagonalizable, and let $\{\mathbf{v_1}, \dots, \mathbf{v_n}\}$ be a basis of $\mathbb{K}^n$ consisting of eigenvectors of A.
 $$
 AV = VD, \qquad \text{with } D = \begin{pmatrix} \lambda_1 & & \\ & \ddots & \\ & & \lambda_n \end{pmatrix}
 $$
 then $A = VDV^{-1}$, let $\mathbf{z}(t) = V^{-1} \mathbf{y}(t)$
 $$
 \begin{array}{ll}
 &\mathbf{\dot z} = V^{-1} \mathbf{\dot y} = V^{-1} A \mathbf{y} = V^{-1} V D V^{-1} = D \mathbf{z}, \\
 & \mathbf{\dot z} = D \mathbf{z} \implies \mathbf{z}(t) = \mathbf{c} e^{\lambda t}.
 \end{array}
 $$
 Obtaining the general solution
 $$\mathbf{y}(t) = V \mathbf{z}(t) = \sum_{k=1}^n c_k \mathbf{v_k} e^{\lambda_k t}.
 $$
 ## Inhomogeneous systems of linear ODEs with constant coefficients
 Let $I \subseteq \mathbb{R}$ be an interval, $\mathbf{f}: I \to \mathbb{R}$ continuous. Find functions $\mathbf{y}: I \to \mathbb{R}^n$ such that 
 $$
 \mathbf{\dot y}(t) = A \mathbf{y}(t) + \mathbf{f}(t), \qquad t \in I. \qquad (*)
 $$
 *Theorem*: let $\mathbf{y}_p: I \to \mathbb{R}^n$ a particular solution for $(*)$ and $\mathbf{y}_h$ the general solution to the homegeneous system. Then the general solutions of the inhomogeneous system $(*)$ is given by
 $$
 \mathbf{y}(t) = \mathbf{y}_p(t) + \mathbf{y}_h(t), \qquad t \in I
 $$
 ??? note "*Proof*:"
    Similar to 1d case, will be added later.
 ### Method of variation of parameters
 Let $\{\mathbf{y_1}, \dotsc, \mathbf{y_n}\}$ be a basis for the solution space of the homogeneous system. Ansatz:
 $$
 \mathbf{y}_p(t) = \sum_{k=1}^n c_k(t) \mathbf{y}_k(t) = (\mathbf{y}_1, \dots, \mathbf{y}_n) \begin{pmatrix} c_1(t) \\ \vdots \\ c_n(t) \end{pmatrix} = Y(t) \mathbf{c}(t),
 $$
 where $c_1(t), \dots, c_n(t): I \to \mathbb{R}$ are to be determined. 
 Then: 
 $$
 \begin{align*}
 \mathbf{\dot y}_p &= \sum_{k=1}^n \dot c_k(t) \mathbf{y}_k(t) + \sum_{k=1}^n c_k(t) \mathbf{\dot y}_k(t), \\
                  &= \sum_{k=1}^n \dot c_k(t) \mathbf{y}_k(t) + A \sum_{k=1}^n c_k(t) \mathbf{y}_k(t), \\
                  &= Y(t) \mathbf{\dot c}(t) + A \mathbf{y}_p(t).
 \end{align*}
 $$
 Demanding that: $Y(t) \mathbf{\dot c}(t) = \mathbf{f}(t)$ is the Wronskian. Then $\mathbf{\dot c}(t) = Y^{-1}(t) \mathbf{f}(t) \iff Y(t)$ is nonsingular. Then solve for $\mathbf{c}(t)$. 
--- a/docs/mathematics/set-theory/additional-axioms.md
+++ b/docs/mathematics/set-theory/additional-axioms.md
@ -0,0 +1,27 @@
 # Additional axioms
 ## Axiom of choice
 > *Axiom*: let $C$ be a collection of nonempty sets. Then there exists a map 
 >
 >$$
 >    f: C \to \bigcap_{A \in C} A
 >$$
 >
 > with $f(A) \in A$. 
 >
 > * The image of $f$ is a subset of $\bigcap_{A \in C} A$. 
 > * The function $f$ is called a **choice function**.
 The following statements are equivalent to the axiom of choice.
 * For any two sets $A$ and $B$ there does exist a surjective map from $A$ to $B$ or from $B$ to $A$.
 * The cardinality of an infinite set $A$ is equal to the cardinality of $A \times A$.
 * Every vector space has a basis.
 * For every surjective map $f: A \to B$ there is a map $g: B \to A$ with $f(g(b)) = b$ for all $b \in B$.
 ## Axiom of regularity
 > *Axiom*: let $X$ be a nonempty set of sets. Then $X$ contains an element $Y$ with $X \cap Y = \varnothing$. 
 As a result of this axiom any set $S$ cannot contain itself.
--- a/docs/mathematics/set-theory/cardinalities.md
+++ b/docs/mathematics/set-theory/cardinalities.md
@ -0,0 +1,67 @@
 # Cardinalities
 ## Cardinality
 > *Definition*: two sets $A$ and $B$ have the same **cardinality** if there exists a bijection from $A$ to $B$. 
 For example, two finite sets have the same cardinality if and only if they have the same number of elements. The sets $\mathbb{N}$ and $\mathbb{Z}$ have the same cardinality, consider the map $f: \mathbb{N} \to \mathbb{Z}$ defined by $f(2n) = n$ and $f(2n+1) = -n$ with $n \in \mathbb{N}$, which may be observed to be a bijection.
 > *Theorem*: having the same cardinality is an equivalence relation.
 ??? note "*Proof*:"
    Let $A$ be a set. Then the identity map is a bijection from $A$ to itself, so $A$ has the same cardinality as $A$. Therefore we obtain reflexivity.
    Suppose $A$ has the same cardinality as $B$. Then there is a bijection $f: A \to B$. Now $f$ has an inverse $f^{-1}$, which is a bijection from $B$ to $A$. So $B$ has the same cardinality as $A$, obtaining symmetry.
    Suppose $A$ has the same cardinality as $B$ and $B$ the same cardinality as $C$. So, there exist bijections $f: A \to B$ and $g: B \to C$. Then $g \circ f: A \to C$ is a bijection from $A$ to $C$. So $A$ has the same cardinality as $C$, obtaining transitivity. 
 ## Countable sets
 > *Definition*: a set is called **finite** if it is empty or has the same cardinality as the set $\mathbb{N}_n := \{1, 2, \dots, n\}$ and **infinite** otherwise. 
 <br>
 > *Definition*: a set is called **countable** if it is finite or has the same cardinality as the set $\mathbb{N}$. An infinite set that is not countable is called **uncountable**.
 <br>
 > *Theorem*: every infinite set contains an infinite countable subset.
 ??? note "*Proof*:"
    Suppose $A$ is an infinite set. Since $A$ is infinite, we can start enumerating the elements $a_1, a_2, \dots$ such that all the elements are distinct. This yields a sequence of elements in $A$. The set of all elements in this sequence form a countable subset of $A$.
 > *Theorem*: let $A$ be a set. If there is a surjective map from $\mathbb{N}$ to $A$ then $A$ is countable.
 ??? note "*Proof*:"
    Will be added later.
 ## Uncountable sets
 > *Lemma*: the set $\{0,1\}^\mathbb{N}$ is uncountable.
 ??? note "*Proof*:"
    let $F: \mathbb{N} \to \{0,1\}^\mathbb{N}$. By $f_i$ we denote the function $F(i)$ from $\mathbb{N}$ to $\{0,1\}$. ...
 The power set of $\mathbb{N}$ has the same cardinality as $\{0,1\}^\mathbb{N}$ therefore it also uncountable.
 > *Lemma*: the interval $[0,1)$ is uncountable.
 ??? note "*Proof*:"
    Will be added later.
 > *Theorem*: $\mathbb{R}$ is uncountable.
 ??? note "*Proof*:"
    as $\mathbb{R}$ contains the uncountable subset $[0,1)$, it is uncountable.
 ## Cantor-Schröder-Bernstein theorem
 > *Theorem*: let $A$ and $B$ be sets and assume that there are two maps $f: A \to B$ and $g: B \to A$ which are injective. Then there exists a bijection $h: A \to B$. 
 > 
 > Therefore $A$ and $B$ have the same cardinality.
--- a/docs/mathematics/set-theory/maps.md
+++ b/docs/mathematics/set-theory/maps.md
@ -0,0 +1,109 @@
 # Maps
 ## Definition
 > *Definition*: a relation $f$ from a set $A$ to a set $B$ is called a map or function from $A$ to $B$ if for each $a \in A$ there is one and only one $b \in B$ with $afb$. 
 >
 > * To indicate that $f$ is a map from $A$ to $B$ we may write $f:A \to B$.
 > * If $a \in A$ and $b \in B$ is the unique element with $afb$ then we may write $b=f(a)$. 
 > * The set of all maps from $A$ to $B$ is denoted by $B^A$.
 > * A partial map $f$ from a $A$ to $B$ with the property that for each $a \in A$ there is at most one $b \in B$ with $afb$. 
 For example, let $f: \mathbb{R} \to \mathbb{R}$ with $f(x) = \sqrt{x}$ for all $x \in \mathbb{R}$ is a partial map, since not all of $\mathbb{R}$ is mapped.
 <br>
 > *Proposition*: let $f: A \to B$ and $g: B \to C$ be maps, then the composition $g$ after $f$: $g \circ f = f;g$ is a map from $A$ to $C$. 
 ??? note "*Proof*:"
    Let $a \in A$ then $g(f(a))$ is an element in $C$ in relation $f;g$ with $a$. If $c \in C$ is an element in $C$ that is in relation $f;g$ with $a$, then there is a $b \in B$ with $afb$ and $bgc$. But then, as $f$ is a map, $b=f(a)$ and as $g$ is a map $c=g(b)$. Hence $c=g(b)=g(f(a))$ is the unique element in $C$ which is in relation $g \circ f$ with $a$. 
 <br>
 > *Definition*: Let $f: A \to B$ be a map.
 > 
 > * The set $A$ is called the *domain* of $f$ and the set $B$ the *codomain*. 
 > * If $a \in A$ then the element $b=f(a)$ is called the image of $a$ under $f$. 
 > * The subset of $B$ consisting of the images of the elements of $A$ under $f$ is called the image or range of $f$ and is denoted by $\text{Im}(f)$. 
 > * If $a \in A$ amd $b=f(a)$ then the element $a$ is called a pre-image of $b$. The set of all pre-images of $b$ is denoted by $f^{-1}(b)$. 
 Notice that $b$ can have more than one pre-image. Indeed if $f: \mathbb{R} \to \mathbb{R}$ is given by $f(x) = x^2$ for all $x \in \mathbb{R}$, then both $-2$ and $2$ are pre-images of $4$.
 If $A'$ is a subset of $A$ then the image of $A'$ under $f$ is the set $f(A') = \{f(a) \;|\; a \in A'\}$, so $\text{Im}(f) = f(A)$. 
 If $B'$ is a subet of $B$ then the pre-image of $B'$, denoted by $f^{-1}(B')$ is the set of elements $a$ from $A$ that are mapped to an element $b$ of $B'$.
 <br>
 > *Theorem*: let $f: A \to B$ be a map.
 >
 > * If $A' \subseteq A$, then $f^{-1}(f(A')) \supseteq A'$.
 > * If $B' \subseteq B$, then $f(f^{-1}(B')) \subseteq B'$.
 ??? note "*Proof*:"
    Let $a' \in A'$, then $f(a') \in f(A')$ and hence $a' \in f^{-1}(f(A'))$. Thus $A' \subseteq f^{-1}(f(A'))$.
    Let $a \in f^{-1}(B')$, then $f(a) \in B'$. Thus $f(f^{-1}(B')) \subseteq B'$.
 ## Special maps
 > *Definition*: let $f: A \to B$ be a map.
 >
 > * $f$ is called **surjective**, if for each $b \in B$ there is at least one $a \in A$ with $b = f(a)$. Thus $\text{Im}(f) = B$.
 > * $f$ is called **injective** if for each $b \in B$, there is at most one $a$ with $f(a) = b$.
 > * $f$ is called **bijective** if it is both surjective and injective. So, if for each $b \in B$ there is a unique $a \in A$ with $f(a) = b$.
 For example the map $\sin: \mathbb{R} \to \mathbb{R}$ is not surjective nor injective. The map $\sin: [-\frac{\pi}{2},\frac{\pi}{2}] \to \mathbb{R}$ is injective but not surjective and the map $\sin: \mathbb{R} \to [-1,1]$ is surjective but not injective. To conclude the map $\sin: [-\frac{\pi}{2},\frac{\pi}{2}] \to [-1,1]$ is a bijective map.
 <br>
 > *Theorem*: let $A$ be a set of size $n$ and $B$ a set of size $m$. Let $f: A \to B$ be a map between the sets $A$ and $B$.
 >
 > * If $n < m$ then $f$ can not be surjective.
 > * If $n > m$ then $f$ can not be injective.
 > * If $n = m$ then $f$ is injective if and only if it is surjective.
 ??? note "*Proof*:"
    Think of pigeonholes. (Not really a proof).
 <br>
 > *Proposition*: let $f: A \to B$ be a bijection. Then for all $a \in A$ and $b \in B$ we have $f^{-1}(f(a)) = a$ and $f(f^{-1}(b)) = b$. In particular, $f^{-1}$ is the inverse of $f$.
 ??? note "*Proof*:"
    Let $a \in A$. Then $f^{-1}(f(a)) = a$ by definition of $f^{-1}$. If $b \in B$ then by surjectivity of $f$ there is an $a \in A$ with $b = f(a)$. So, by the above $f(f^{-1}(b)) = f(f^{-1}(f(a))) = f(a) = b$. 
 <br>
 > *Theorem*: let $f: A \to B$ and $g: B \to C$ be two maps.
 >
 > 1. If $f$ and $g$ are surjective then so is $g \circ f$.
 > 2. If $f$ and $g$ are injective then so is $g \circ f$.
 > 3. If $f$ and $g$ are bijective then so is $g \circ f$.
 ??? note "*Proof*:"
    1. Suppose $f$ and $g$ are surjective, let $c \in C$. By surjectivity of $g$ there is a $b \in B$ with $g(b) = c$. Since $f$ is surjective there is also an $a \in A$ with $f(a) = b$. Therefore $g \circ f(a) = g(f(a)) = g(b) = c$.
    2. Suppose $f$ and $g$ are injective, let $a,a' \in A$ with $g \circ f(a) = g \circ f(a')$. Then $g(f(a)) = g(f(a'))$ and by injectivity of $g$ we find $f(a) = f(a')$. Injectivity of $f$ implies $a = a'$.
    3. Proofs 1. and 2. imply 3. by definition of bijectivity.
 <br>
 > *Proposition*: if $f: A \to B$ and $g: B \to A$ are maps with $f \circ g = I_B$ and $g \circ f = I_A$, where $I_A$ and $I_B$ denote the identity maps on $A$ and $B$, respectively. Then $f$ and $g$ are bijections and $f^{-1} = g$ and $g^{-1} = f$.
 ??? note "*Proof*:"
    Suppose $f A \to B$ and $g: B \to A$ are maps with $f \circ g = I_B$ and $g \circ f = I_A$. Let $b \in B$ then $f(g(b)) = b$, thus $f$ is surjective. If $a,a' \in A4 with $f(a) = f(a')$, then $a = g(f(a)) = g(f(a')) = a' and hence $f$ is injective. Therefore $f$ is bijective and by symmetry $g$ is also bijective. 
 <br>
 > *Proposition*: suppose $f: A \to B$ and $g: B \to C$ are bijective maps. Then the inverse of the map $g \circ f$ equals $f^{-1} \circ g^{-1}$.
 ??? note "*Proof*:"
    Suppose $f: A \to B$ and $g: B \to C$ are bijective maps. Then for all $a \in A$ we have $(f^{-1} \circ g^{-1}) (g \circ f)(a) = f^{-1}(g^{-1}(g(f(a)))) = f^{-1}(f(a)) = a$.
--- a/docs/mathematics/set-theory/orders.md
+++ b/docs/mathematics/set-theory/orders.md
@ -0,0 +1,55 @@
 # Orders
 ## Orders and posets
 > *Definition*: a relation $\sqsubseteq$ on a set $P$ is called an **order** if it is reflexive, antisymmetric and transitive. 
 >
 >* The pair $(P, \sqsubseteq)$ is called a **partially ordered set** or for short **poset**.
 >* Two elements $x$ and $y$ in a poset $(P, \sqsubseteq)$ are called comparable if $x \sqsubseteq y$ or $y \sqsubseteq x$. The elements are incomparable if $x \not\sqsubseteq y$ and $y \not\sqsubseteq x$.
 >* If any two elements are comparable then the relation is called a linear order.
 For example on the set of real numbers $\mathbb{R}$ the relation $\leq$ is an order relation. For any two numbers $x,y \in \mathbb{R}$ we have $x \leq y$ or $y \leq x$. This makes $\leq$ into a linear order.
 > *Definition* **- Hasse diagram**: let $(P, \sqsubseteq)$ be a poset. The graph with vertex set $P$ and two vertices $x,y \in P$ adjacent if and only if $x \sqsubseteq y$ and there is no $z \in P$ different from $x$ and $y$ with $x \sqsubseteq z$ and $z \sqsubseteq y$.
 ## Maximal and minimal elements
 > *Definition*: let $(P, \sqsubseteq)$ be a partially ordered set and $A \subseteq P$. An element $a \in A$ is called the **maximum** ($\top$) of $A$, if for all $a' \in A$ we have $a' \sqsubseteq a$. An element $a \in A$ is called **maximal** if for all $a' \in A$ we have that either $a' \sqsubseteq a$ or $a$ and $a'$ are incomparable. 
 >
 > Similarly we can define the notion of **minimum** ($\bot$) and **minimal** element.
 If we consider the poset of all subsets of a set $S$ then the empty set $\varnothing$ is the minimum of the poset, whereas the whole set $S$ is the maximum. The atoms are the subsets of $S$ containing just a single element.
 > *Definition*: if a poset $(P, \sqsubseteq)$ has a minimum $\bot$, then the minimal elements of $P\backslash \{\bot\}$ are called the atoms of $P$. 
 <br>
 > *Lemma*: let $(P, \sqsubseteq)$ be a partially ordered set. Then $P$ contains at most one maximum and one minimum.
 ??? note "*Proof*:"
    Suppose $p,q \in P$ are maxima. Then $p \sqsubseteq q$ as $q$ is a maximum. Similarly $q \sqsubseteq p$ as $p$ is a maximum. By antisymmetry of $\sqsubseteq$ we have $p = q$.
 > *Lemma*: let $(P, \sqsubseteq)$ be a finite poset then $P$ contains a minimal and a maximal element.
 ??? note "*Proof*:"
    Consider the directed graph associated to $(P, \sqsubseteq)$ and pick a vertex in this graph. If the vertex is not maximal, then there is an edge leaving it. Move along this edge to the neighbour. Repeat this as long as no maximal element is found. Since the graph contains no cycles, a vertex will never be met twice. Hence, as $P$ is finite, the procedure has to stop. Implying a maximal element has been found. A minimal element of $(P, \sqsubseteq)$ is a maximal element of $(P, \sqsupseteq)$ and thus also exists.
 > *Definition*: if $(P, \sqsubseteq)$ is a poset and $A \subseteq P$ then an **upperbound** for $A$ is an element $u$ with $a \sqsubseteq u$ for all $a \in A$. A **lowerbound** for $A$ is an element $u$ with $u \sqsubseteq a$ for all $a \in A$.
 >
 > If the set of all upperbounds of $A$ has a minimal element then this element is called the **least upperbound** or **supremum** of $A$. Such an element is denoted by $\mathrm{sup} A$. 
 > 
 > If the set of all lowerbounds of $A$ has a maximal element then this element is called the **largest lowerbound** or **infimum** of $A$. Such an element is denoted by $\mathrm{inf} A$.
 For example let $S$ be a set. In $(\wp(S), \subseteq)$ any set $A$ of subsets of $S$ has a least upperbound and an largest lowerbound. Indeed
 $$
    \mathrm{sup} A = \bigcup_{X \in A} X \;\text{ and }\; \mathrm{inf} A = \bigcap_{X \in A} X.
 $$
 If $(P, \sqsubseteq)$ is a finite poset then the elements from $P$ can be ordered as $p_1, p_2, \dots, p_n$ such that $p_i \sqsubseteq p_j$ implies $i < j$. This implies that the adjacency matrix of $\sqsubseteq$ is uppertriangular, which means that it has only nonzero entries on or above the main diagonal.
 > *Definition*: an **ascending chain** in a poset $(P, \sqsubseteq)$ is a sequence $p_1 \sqsubseteq p_2 \sqsubseteq \dots$ of elements $p_i \in P,i \in \mathbb{N}$. A **descending chain** in $(P, \sqsubseteq)$ is a sequence $p_1 \sqsupseteq p_2 \sqsupseteq \dots$ of elements $p_i \in P, i \in \mathbb{N}$. 
 >
 > The poset $(P, \sqsubseteq)$ is called **well founded** if any descending chain is finite.
--- a/docs/mathematics/set-theory/permutations.md
+++ b/docs/mathematics/set-theory/permutations.md
@ -0,0 +1,197 @@
 # Permutations
 ## Definition
 > *Definition*: let $X$ be a set.
 >
 > * A bijection of $X$ to itself is called a permutation of $X$. The set of all permutations of $X$ is denoted by $\text{Sym}(X)$ and is called the symmetric group on $X$.
 > * The product $g \cdot h$ of two permutations $g,h$ in $\text{Sym}(X)$ is defined as the composition $g \circ h$ of $g$ and $h$.
 > * If $X = \{1, \dots, n\}$ we write $\mathrm{Sym}_n(X)$ instead of $\mathrm{Sym}(X)$.
 <br>
 > *Definition*: the identity map is defined as $\mathrm{id}: X \to X$ with $g = g \cdot \mathrm{id} = \mathrm{id} \cdot g$ for all $g$ in $\mathrm{Sym}(X)$. The inverse of $g$ denoted by $g^{-1}$ satisfies $g^{-1} \cdot g = g \cdot g^{-1} = \mathrm{id}$. 
 In matrix notation: let $g = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 3 & 1\end{pmatrix}$ and $h = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 1 & 3\end{pmatrix}$ with $g,h \in \mathrm{Sym}_3(X)$, then we can take
 $$
    g \cdot h = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 3 & 1 \\ \hline 2 & 1 & 3 \\ 3 & 2 & 1\end{pmatrix} = \begin{pmatrix} 1 & 2 & 3 \\ 3 & 2 & 1\end{pmatrix},
 $$
 and we have $g^{-1} = \begin{pmatrix} 2 & 3 & 1 \\1 & 2 & 3 \end{pmatrix}$. 
 <br>
 > *Theorem*: $\mathrm{Sym}_n$ has exactly $n!$ elements.
 ??? note "*Proof*:"
    A permutation can be described in a matrix notation by a $2$ by $n$ matrix with the numbers $1,\dots,n$ in the first row and the images in the second row. There are $n!$ possibilities to fill the second row.
 We can also omit the matrix notation and use the list notation for permutations then we have for $g = \begin{pmatrix} 1 & 2 & 3 \\ 2 & 3 & 1\end{pmatrix} = [2,3,1]$, as the first row speaks for itself.
 <br>
 > *Definition*: the order of a permutation $g$ is the smallest positive integer $m$ such that $g^m = \mathrm{id}$. 
 For example the order of the permutation $[2,1,3]$ in $\mathrm{Sym}_3$ is 2.
 If $g$ is a permutation in $\mathrm{Sym}_n$ then the permutations $g, g^2, g^3, \dots$ can not all be distinct, since there are only $n!$ distinct permutations in $\mathrm{Sym}_n$. So there must exists a $r < s$ such that $g^r = g^s$. Since $g$ is a bijection there must be $g^{s-r} = e$. So there exist positive numbers $m$ with $g^m = e$ and in particular a smallest such number. Therefore each permutation $g$ has a well-defined order.
 ## Cycles
 > *Definition*: the **fixed** points of a permutation $g$ of $\mathrm{Sym}(X)$ are the elements of $x \in X$ for which $g(x) = x$ holds. The set of all fixed points is $\mathrm{fix}(g) = \{x \in X \;|\; g(x) = x\}$. 
 > 
 > The **support** of $g$ is the complement in $\mathrm{Sym}(X)$ of $\mathrm{fix}(g)$, denoted by $\mathrm{support}(g)$. 
 For example consider the permutation $g = [1,3,2,5,4,6] \in \mathrm{Sym}_6$. The fixed points of $g$ are 1 and 6. So $\mathrm{fix}(g) = \{1,6\}$. Thus the points moved by $g$ form the set $\mathrm{support}(g) = \{2,3,4,5\}$. 
 <br>
 > *Definition*: let $g \in \mathrm{Sym}_n$ be a permutation with $\mathrm{support}(g) = \{a_1, \dots, a_m\}$ with $a_i$ pairwise distinct. 
 >
 > We say $g$ is an $m$-cycle if $g(a_i) = g(a_{i+1})$ for all $i \in \{1, \dots, m-1\}$ and $g(a_m) = a_1$. For such a cycle $g$ we also use the cycle notation $(a_1, \dots, a_m)$.
 >
 > 2-cycles are called transpositions.
 The composition of permutation in $\mathrm{Sym}_n$ is not commutative. This implies that for $g, h \in \mathrm{Sym}_n(X)$ the products $g \cdot h$ and $h \cdot g$ are not the same. 
 Two cycles are called disjoint if the intersection of their supports is empty. Two disjoint cycles always commute.
 For example in $\mathrm{Sym}_4$ the permutation $[2,1,4,3]$ is not a cycle, but it is the product of two disjoint cycles $(1,2)$ and $(3,4)$.
 <br>
 > *Theorem*: every permutation in $\mathrm{Sym}_n$ is a product of disjoint cycles. This product is unique up to rearrangement of the factors.
 ??? note "*Proof*:"
    Will be added later.
 For example consider the permutation $g = [8,4,1,6,7,2,5,3]$ in $\mathrm{Sym}_8$. The following steps lead to the disjoint cycles decomposition.
 :   Choose an element in the support of $g$, for example 1. Now construct the cycle
    $$
        (1,g(1),g^2(1),\dots),
    $$
    obtaining the cycle $(1,8,3)$. 
    Next choose an element in the support of $g$, but outside $\{1,3,8\}$, for example 2. Construct the cycle
    $$
        (2,g(2),g^2(2),\dots),
    $$
    obtaining the cycle $(2,4,6)$.
    Choose an element in the support of $g$ but outside $\{1,2,3,4,6,8\}, for example 5. Construct the cycle
    $$
        (5,g(5),g^2(5),\dots),
    $$
    obtaining the cycle $(5,7)$. Then $g$ and $(1,8,3) \cdot (2,4,6) \cdot (5,7)$ coincide on $\{1,\dots,8\}$ and the decomposition is finished. As these cycles are disjoint they may commute, implying that $g$ can also be written as $(5,7) \cdot (1,8,3) \cdot (2,4,6)$ and $(2,4,6) \cdot (5,7) \cdot (1,8,3)$. 
 <br>
 > *Definition*: the cycle structure of a permutation $g$ is the sequence of the cycle lengths in an expression of $g$ as a product of disjoint cycles.
 This means that every permutation has a unique cycle structure.
 ## Conjugation
 The choice $X = \{1, \dots, n\}$ fixed the set $X$ under consideration. Suppose a different numbering of the elements in $X$ is chosen. How may a permutation of $X$ be compared with respect to two different numberings?
 > *Lemma*: let $h$ be a permutation in $\mathrm{Sym}_n$. 
 >
 > * For every cycle $(a_1, \dots, a_m)$ in $\mathrm{Sym}_n$ we have
 > $$
 >   h \cdot (a_1, \dots, a_m) \cdot h^{-1} = (h(a_1), \dots, h(a_m)).
 > $$
 >
 > * If $(g_1, \dots, g_k)$ are in $\mathrm{Sym}_n$, then $h \cdot g_1 \cdots g_k \cdot h^{-1} = h g_1 h^{-1} \cdots h g_k h^{-1}$. In particular, if $g_1, \dots, g_k$ are disjoint cycles, then $h \cdot g_1 \cdots g_k \cdot h^{-1}$ is the product of the disjoint cycles $h g_1 h^{-1}, \dots, h g_k h^{-1}$. 
 ??? note "*Proof*:"
    Will be added later.
 Conjugation is similar to basis transformation in linear algebra.
 <br>
 > *Theorem*: two permutations $g$ and $h$ in $\mathrm{Sym}_n$ have the same cycle structure if and only if there exists a permutation $k$ in $\mathrm{Sym}_n$ with $g = k \cdot h \cdot k^{-1}$. 
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Corollary*: being conjugate is an equivalence relation on $\mathrm{Sym}_n$.
 ??? note "*Proof*:"
    Two elements in $\mathrm{Sym}_n$ are conjugate if and only if they have the same cycle structure. But having the same cycle structure is reflexive, symmetric and transitive. 
 For example in $\mathrm{Sym}_4$ the permutations $g = [2,1,4,3]$ and $h=[3,4,1,2] are conjugate, since both have the cycle structure $2,2$: $g = (1,2) \cdot (3,4)$ and $h = (1,3) \cdot (2,4)$. A permutation $k$ such that $k \cdot g \cdot k^{-1} = h$ is $k = [1,3,2,4] = (2,3)$. 
 <br>
 > *Theorem*: let $n \geq 2$. Every permutation of $\mathrm{Sym}_n$ is the product of transpositions.
 ??? note "*Proof*:"
    Since every permutation in $\mathrm{Sym}_n$ can be written as a product of disjoint cycles, it suffices to show that every cycle is a product of 2-cycles. Now every $m$-cycle $(a_1, \dots, a_m)$ is equal to the product
    $$
        (a_1, a_2) \cdot (a_2, a_3) \cdots (a_{m-1}, a_m).
    $$
 ## Alternating groups
 To be able to distinguish between permutations defined by an even or odd number of products (length of products), the following result is needed.
 > *Theorem*: if a permutation can be written in two way as a product of 2-cycles, then both products have even length or both products have odd length.
 ??? note "*Proof*:"
    Will be added later.
 From this theorem the following definition follows.
 > *Definition*: let $g$ be a permutation of $\mathrm{Sym}_n$. The sign of $g$, denoted by $\mathrm{sign}(g)$, is defined as
 >
 > * 1 if $g$ can be written as a product of an even number of 2-cycles, and
 > * -1 if $g$ can be written as a product of an odd number of 2-cycles.
 >
 > We say that $g$ is even if $\mathrm{sign}(g)=1$ and odd if $\mathrm{sign}(g)=-1$.
 <br>
 > *Theorem*: for all permutations $g,h$ in $\mathrm{Sym}_n$, we have
 >
 > $$
 >   \mathrm{sign}(g \cdot h) = \mathrm{sign}(g) \cdot \mathrm{sign}(h).
 > $$
 ??? note "*Proof*:"
    Let $g$ and $h$ be elements of $\mathrm{Sym}_n$, if one of the permutations is even and the other is odd, then $g \cdot h$ can be written as the product of an odd number of 2-cycles and is therefore odd. If $g$ and $h$ are both even or both odd, then the product $g \cdot h$ can be written as the product of an even number of 2-cycles so that $g \cdot h$ is even.
 The fact that sign is multiplicative implies that products and inverses of even permutations are event, this given rise to the following definition.
 > *Definition*: by $\mathrm{Alt}_n$ we denote the set of even permutations in $\mathrm{Sym}_n$, called the alternating group on $n$ letters. 
 > 
 > The alternating group is closed with respect to taking products and inverse elements.
 For example for $n=3$ the even permutations are given by ($\mathrm{id}$ or $(1,2,3)$), $(3,1,2)$ and $(2,3,1)$. 
 <br>
 > *Theorem*: for $n > 1$ the alternating group $\mathrm{Alt}_n$ contains precisely $\frac{n!}{2}$ permutations.
 ??? note "*Proof*:"
    A permutation $g$ of $\mathrm{Sym}_n$ is even if and only if the product $g \cdot (1,2)$ is odd. Hence the map $g \mapsto g \cdot (1,2)$ defines a bijection between the even and the odd permutations of $\mathrm{Sym}_n$. Then half of the $n!$ permutations of $\mathrm{Sym}_n$ are even.
--- a/docs/mathematics/set-theory/recursion-induction.md
+++ b/docs/mathematics/set-theory/recursion-induction.md
@ -0,0 +1,99 @@
 # Recursion and induction
 ## Recursion
 A recursively defined function $f$ needs two ingredients:
 * a *base*, where the function value $f(n)$ is defined, for some value of $n$.
 * a *recursion*, in which the computation of the function in $n$ is explained with the help of the previous values smaller than $n$.
 For example, the sum
 $$
    \begin{align*}&\sum_{i=1}^1 i = 1,\\ &\sum_{i=1}^{n+1} i = (n + 1) + \sum_{i=1}^{n} i.\end{align*}
 $$
 Or the product
 $$
    \begin{align*}&\prod_{i=0}^0 i = 1,\\ &\prod_{i=0}^{n+1} i = (n+1) \cdot \prod_{i=0}^{n} i.\end{align*}
 $$
 ## Induction
 > *Principle* **- Natural induction**: suppose $P(n)$ is a predicate for $n \in \mathbb{Z}$, let $b \in \mathbb{Z}$. If the following holds
 >
 > * $P(b)$ is true,
 > * for all $k \in \mathbb{Z}$, $k \geq b$ we have that $P(k)$ implies $P(k+1)$. 
 >
 > Then $P(n)$ is true for all $n \geq b$.
 For example, we claim that $\forall n \in \mathbb{N}$ we have
 $$
    \sum_{i=1}^n i = \frac{n}{2} (n+1).
 $$
 We first check the claim for $n=1$:
 $$
    \sum_{i=1}^1 i = \frac{1}{2} (1+1) = 1.
 $$
 Now suppose that for some $k \in \mathbb{N}$ 
 $$
    \sum_{i=1}^k i = \frac{k}{2} (k+1).
 $$
 Then by assumption
 $$
 \begin{align*}
    \sum_{i=1}^{k+1} i &= \sum_{i=1}^k i + (k+1), \\
                       &= \frac{k}{2}(k+1) + (k+1), \\
                       &= \frac{k+1}{2}(k+2).
 \end{align*}
 $$
 Hence if the claim holds for some $k \in \mathbb{N}$ then it also holds for $k+1$. The principle of natural induction implies now that $\forall n \in \mathbb{N}$ we have
 $$
    \sum_{i=1}^n i = \frac{n}{2}(n+1).
 $$
 > *Principle* **- Strong induction**: suppose $P(n)$ is a predicate for $n \in \mathbb{Z}$, let $b \in \mathbb{Z}$. If the following holds
 >
 > * $P(b)$ is true,
 > * for all $k \in \mathbb{Z}$ we have that $P(b), P(b+1), \dots, P(k-1)$ and $P(k)$ together imply $P(k+1)$. 
 > 
 > Then $P(n)$ is true for all $n \geq b$. 
 For example, we claim for the recursion
 $$
 \begin{align*}
    &a_1 = 1, \\
    &a_2 = 3, \\
    &a_n = a_{n-2} + 2 a_{n-1}
 \end{align*}
 $$
 that $a_n$ is odd $\forall n \in \mathbb{N}$. 
 We first check the claim for for $n=1$ and $n=2$, from the definition of the recursion it may be observed that the it is true.
 Now suppose that for some $i \in \{1, \dots, k\}$ $a_i$ is odd.
 Then by assumption
 $$
 \begin{align*}
    a_{k+1} &= a_{k-1} + 2 a_k, \\
            &= a_{k-1} + 2 a_{k} + 2(a_{k-2} + 2a_{k-1}), \\
            &= 2 (a_k + a_{k-2} + 2 a_{k-1}) + a_{k-1},
 \end{align*}
 $$
 so $a_{k+1}$ is odd.
--- a/docs/mathematics/set-theory/relations.md
+++ b/docs/mathematics/set-theory/relations.md
@ -0,0 +1,183 @@
 # Relations
 ## Binary relations
 > *Definition*: a binary relation $R$ between the sets $S$ and $T$ is a subset of the Cartesian product $S \times T$. 
 > 
 > * If $(a,b) \in R$ then $a$ is in relation $R$ to $b$, denoted by $aRb$.
 > * The set $S$ is called the domain of the relation $R$ and the set $T$ the codomain.
 > * If $S=T$ then $R$ is a relation on $S$.
 > * This definition can be expanded to n-ary relations.
 <br>
 > *Definition*: let $R$ be a relation from a set $S$ to a set $T$. Then for each element $a \in S$ we define $[a]_R$ to be the set
 >
 > $$
 >   [a]_R := \{b \in T \;|\; aRb\}.
 > $$
 >
 > This set is called the ($R$-) image of $a$. 
 >
 > For $b \in T$ the set
 >
 > $$
 >   _R[b] := \{a \in S \;|\; aRb\}
 > $$
 >
 > is called the ($R$-) pre-image of $B$ or $R$-fiber of $b$.
 <br>
 Relations between finite sets can be described using matrices. 
 > *Definition*: if $S = \{s_1, \dots, s_n\}$ and $T = \{t_1, \dots, t_m\}$ are finite sets and $R \subseteq S \times T$ is a binary relation, then the adjacency matrix $A_R$ of the relation $R$ is the $n \times n$ matrix whose rows are indexed by $S$ and columns by $T$ defined by
 >
 > $$
 >   A_{s,t} = \begin{cases} 1 &\text{ if } (s,t) \in R, \\ 0 &\text{ otherwise}. \end{cases}
 > $$
 For example, the adjacency matrix of relation $\leq$ on the set $\{1,2,3,4,5\}$ is the upper triangular matrix
 $$
 \begin{pmatrix} 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 0 & 1 & 1 \\ 0 & 0 & 0 & 0 & 1\end{pmatrix}
 $$
 <br>
 Some relations have special properties
 > *Definitions*: let $R$ be a relation on a set $S$. Then $R$ is called
 >
 > * *Reflexive* if $\forall x \in S$ there is $(x,x) \in R$.
 > * *Irreflexive* if $\forall x \in S$ there is $(x,x) \notin R$.
 > * *Symmetric* if $\forall x,y \in S$ there is that $xRy \implies yRx$.
 > * *Antisymmetric* if $\forall x,y \in S$ there is that $xRy \land yRx \implies x = y$.
 > * *Transitive* if $\forall x,y,z \in S$ there is that $xRy \land yRz \implies xRz$.
 ## Equivalence relations
 > *Definition*: a relation $R$ on a set $S$ is called an equivalence relation on $S$ if and only if it is reflexive, symmetric and transitive.
 <br>
 > *Lemma*: let $R$ be an equivalence relation on a set $S$. If $b \in [a]_R$, then $[b]_R = [a]_R$.
 ??? note "*Proof*:"
    Suppose $b \in [a]_R$, therefore $aRb$. If $c \in [b]_R$, then $bRc$ and as $aRb$ there is transitivity $aRc$. In particular $[b]_R \subseteq [a]_R$. By symmetry of $R$, $aRb \implies bRa$ and hence $a \in [b]_R$, obtaining $[a]_R \subseteq [b]_R$. 
 <br>
 > *Definition*: let $R$ be an equivalence relation on a set $S$. Then the sets $[s]_R$ where $s \in S$ are called the $R$-equivalence classes on $S$. The set of $R$-equivalence classes is denoted by $S/R$.
 <br>
 > *Theorem*: let $R$ be an equivalence relation on a set $S$. Then the set $S/R$ of $R$-equivalence classes partitions the set $S$. 
 ??? note "*Proof*:"
    Let $\Pi_R$ be the set of $R$-equivalence classes. Then by reflexivity of $R$ we find that each element $a \in S$ is inside the class $[a]_R$ of $\Pi_R$. If an element $a \in S$ is in the classes $[b]_R$ and $[c]_R$ of $\Pi_R$, then by the previous lemma we find $[b]_R = [a]_R$ and $[c]_R = [a]_R$. Then $[b]_R = [c]_R$, therefore each element $a \in S$ is inside a unique member of $\Pi_R$, which therefore is a partition of $S$.
 ## Composition of relations
 If $R_1$ and $R_2$ are two relations between a set $S$ and $T$, new relations can be formed between $S$ and $T$ by taking the intersection $R_1 \cap R_2$, the union $R_1 \cup R_2$ or the complement $R_1 \backslash R_2$. Furthermore a relation $R^\top$ from $T$ to $S$ can be considered as the relation $\{(t,s) \in T \times S \;|\; (s,t) \in R\}$ and the identity relation from $T$ to $S$ is given by $I = \{(s, t) \in S \times T \;|\; s = t\}$
 Another way of making new relations out of existing ones is by taking the composition.
 > *Definition*: if $R_1$ is a relation between $S$ and $T$ and $R_2$ is a relation between $T$ and $U$ then the composition $R = R_1;R_2$ is the relation between $S$ and $U$ defined by $sRu$ for $s \in S$ and $u \in U$, if and only if there is a $t \in T$ with $sR_1t$ and $tR_2u$.
 <br>
 > *Proposition*: suppose $R_1$ is relation from $S$ to $T$, $R_2$ a relation from $T$ to $U$ and $R_3$ a relation from $U$ to $V$. Then $R_1;(R_2;R_3) = (R_1;R_2);R_3$. Composing relations is associative.
 ??? note "*Proof*:"
    Suppose $s \in S$ and $v \in V$ with $sR_1;(R_2;R_3)v$. Then a $t \in T$ with $sR_1t$ and $t(R_2;R_3)v$ can be found. Then there is also a $u \in U$ with $tR_2u$ and $uR_3v$. For this $u$ there is $sR_1;R_2u$ and $uR_3v$ and hence $s(R_1;R_2);R_3v$.
    Similarly, if $s \in S$ and $v \in V$ with $s(R_1;R_2);R_3v$. Then a $u \in U$ with $s(R_1;R_2)u$ and $uR_3v$ can be found. Then there is also a $t \in T$ with $sR_1t$ and $tR_2u$. For this $t$ there is $tR_2;R_3u$ and $sR_1t$ and hence $sR_1;(R_2;R_3)v$.
 ## Transitive closure
 > *Lemma*: let $\ell$ be a collection of relations $R$ on a set $S$. If all relations $R$ in $\ell$ are transitive, reflexive or symmetric then the relation $\bigcap_{R \in \ell} R$ is also transitive, reflexive or symmetric respectively.
 ??? note "*Proof*:"
    Let $\bar R = \bigcap_{R \in \ell} R$. Suppose all members of $\ell$ are transitive. Then for all $a,b,c \in S$ with $a \bar R b$ and $b \bar R c$ there is $aRb$ and $bRc$ for all $R \in \ell$. Thus by transitivity of each $R \in \ell$ there is also $aRc$ for each $R \in \ell$. Thus there is $a \bar R c$. Hence $\bar R$ is also transitive.
    Proof for symmetric relation will follow.
    Proof for reflexive relation will follow.
 The above lemma makes it possible to define the reflexive, symmetric or transitive closure of a relation $R$ on a set $S$. It is the smallest reflexive, symmetric or transitive relation containing $R$. 
 For example suppose $R = \{(1,2), (2,2), (2,3), (5,4)\}$ is a relation on $S = \{1, 2, 3, 4, 5\}$. 
 :   The reflexive closure of $R$ is then the relation
    $$
    \big\{(1,1), (1,2), (2,2), (2,3), (3,3), (4,4), (5,5), (5,4) \big\},
    $$
    the symmetric closure of $R$ is then the relation
    $$
    \big\{ (1,2), (2,1), (2,2), (2,3), (3,2), (4,5), (5,4) \big\},
    $$
    and the transitive clusure of $R$ is then the relation
    $$
    \{(1,2), (1,3), (2,2), (2,3), (5,4)\}.
    $$
 It may be observed that the reflexive closure of $R$ equals the relation $I \cup R$ and the symmetric closure equals $R \cup R^\top$. For the transitive closure there is:
 > *Proposition*: $\bigcup_{n > 0} R^n$ is the transitive closure of the relation $R$ on a set $S$. 
 ??? note "*Proof*:"
    Define $\bar R = \bigcup_{n>0} R^n$, to show that $\bar R$ is the least transitive relation containing $R$, $\bar R$ must contain $R$, must be transitive and must be the smallest set with both of those properties. 
    Since $R \subseteq \bar R$, $\bar R$ contains all of the $R^i, i \in \mathbb{N}$, so in particular $\bar R$ contains $R$.
    If $(s_1, s_2), (s_2, s_3) \in \bar R$, then $(s_1, s_2) \in R^j$ and $(s_2, s_3) \in R^k$ for some $j,k$. Since composition is [associative](#composition-of-relations), $R^{j+k} = R^j ; R^k$ and hence $(s_1, s_3) \in R^{j+k} \subseteq \bar R$. 
    We claim that if $T$ is any transitive relation containing $R$, then $\bar R \subseteq T$. By taking $R^n \subseteq \bar R \subseteq T \; \forall n \in \mathbb{N}$ . 
    :   We first check for $n=1$
        $$
            R^1 = R \subseteq T.
        $$
    :   Now suppose that for some $k \in \mathbb{N}$ we have $R^k \subseteq T$. Then by assumption $R^{k+1} \subseteq T$. Let $(s_1, s_3) \in R^{k+1} = R^k ; R$, then $(s_1, s_2) \in R$ and $(s_2, s_3) \in R^k$ for some $s_2$. Hence $(s_1, s_2), (s_2, s_3) \in T$ and by transitivity of $T$, $(s_1, s_3) \in T$. 
    Hence if the claim holds for some $k \in \mathbb{N}$ then it also holds for $k+1$. The principle of natural induction implies now that $\forall n \in \mathbb{N}$ we have $R^n \subseteq \bar R \subseteq T$. 
 Suppose a relation $R$ on a finite set $S$ of size $n$ is given by its adjacency matrix $A_R$. Then Warshall's algorithm is an method for finding the adjacency matrix of the transitive closure of the relation $R$. 
 > *Algorithm* **- Warshall's algoritm**: for an adjacency matrix $A_R = M_0$ of relation $R$ on $n$ elements there will be $n$ steps taken to obtain the adjacency matrix of the transitive closure of the relation $R$. Let $R_i$ and $C_i$ be the $i$th row and column of $A_R$. In each step a new matrix $M_i$ is obtained with $C_i \times R_i$ added to $M_{i-1}$. After $n$ steps $A_{\bar R}$ is obtained.
 For example let $R$ be an relation on $S = \{1,2,3,4\}$ with $R = \{(2,1), (2,3), (3,1), (3,4), (4,1), (4,3)\}$, determining the transitive closure $\bar R$ of $R$ with Warshall's algorithm.
 :   The adjacency matrix of the relation $R$ is given by
    $$
        A_R = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 0\end{pmatrix}.
    $$
    We have $C_1 = \{2,3,4\}$ and $R_1 = \varnothing$, therefore $C_1 \times R_1 = \varnothing$ and no additions will be made, $M_1 = A_R$.
    We have $C_2 = \varnothing$ and $R_2 = \{1,3\}, therefore $C_2 \times R_2 = \varnothing$ and no additions will be made, $M_2 = M_1$.
    We have $C_3 = \{2,4\}$ and $R_3 = \{1,4\}$, therefore $C_3 \times R_3 = \{(2,1), (2,4), (4,1), (4,4)\}$ obtaining the matrix
    $$
        M_3 = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 1 & 1\end{pmatrix}.
    $$
    We have $C_4 = \{2,3,4\} and $R_4 = \{1,3,4\}, therefore $C_4 \times R_4 = \{(2,1), (2,3), (2,4), (3,1), (3,3), (3,4), (4,1), (4,3), (4,4)\}$ obtaining the final matrix
    $$
    M_4 = \begin{pmatrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1 \\ 1 & 0 & 1 & 1\end{pmatrix} = A_{\bar R}.
    $$
--- a/docs/mathematics/set-theory/sets.md
+++ b/docs/mathematics/set-theory/sets.md
@ -0,0 +1,165 @@
 # Sets
 ## Sets and subsets
 > *Definition*: a set is a collection of elements uniquely defined by these elements. 
 Examples are $\mathbb{N}$, the set of natural numbers. $\mathbb{Z}$, the set of integers. $\mathbb{Q}$, the set of rational numbers. $\mathbb{R}$, the set of real numbers and $\mathbb{C}$ the set of complex numbers.
 <br>
 > *Definition*: suppose $A$ and $B$ are sets. Then $A$ is called a subset of $B$, if for every element $a \in A$ there also is $a \in B$. Then $B$ contains $A$ and can be denoted by $A \subseteq B$. 
 The extra line under the symbol implies properness. A subset $A$ of a set $B$ which is not the empty set $\varnothing$ nor the full set $B$ is called a proper subset of $B$, denoted by $A \subsetneq B$. For example $\mathbb{N} \subsetneq \mathbb{Z}$.
 <br>
 > *Definition*: if $B$ is a set, then $\wp(B)$ denotes the set of all subsets $A$ of $B$. The set $\wp(B)$ is called the power set of $B$. 
 Suppose for example that $B = {x,y,z}$, then $\wp(B) = \{\varnothing,\{x\},\{y\},\{z\},\{x,y\},\{x,z\},\{y,z\},\{x,y,z\}\}$.
 <br>
 > *Proposition*: let $B$ be a set with $n$ elements. Then its power set $\wp(B)$ contains $w^n$ elements.
 ??? note "*Proof*:"
    Let $B$ be set with $n$ elements. A subset $A$ of $B$ is completely determined by its elements. For each element $b \in B$ there are two options, it is in $A$ or it is not. So, there are $2^n$ options and thus $2^n$ different subsets $A$ of $B$.
 <br>
 > *Proposition*: suppose $A$, $B$ and $C$ are sets. Then the following hold:
 > 
 > 1. if $A \subseteq B$ and $B \subseteq C$ then $A \subseteq C$,
 > 2. if $A \subseteq B$ and $B \subseteq A$ then $A = B$.
 ??? note "*Proof*:"
    To prove 1, suppose that $A \subseteq B$. Let $a \in A$, then $a \in B$ therefore $a \in C$.
    To prove 2, every element of $A$ is in $B$ and every element of $B$ is in $A$. As the set is uniquely determined by its elements $A = B$.
 <br>
 > *Definition*: let $P$ be a predicate with reference set $X$, then
 > 
 >$$
 >   \big\{x \in X \;\big|\; P(x) \big\}
 >$$
 >
 > denotes the subset of $X$ consisting of all elements $x \in X$ for which statement $P(x)$ is true.
 ## Operations on sets
 > *Definition*: let $A$ and $B$ be sets.
 >
 > * The intersection of $A$ and $B$ $(A \cap B)$ is the set of all elements contained in both $A$ and $B$.
 > * The union of $A$ and $B$ $(A \cup B)$ is the set of elements that are in at least on of $A$ or $B$.
 > * $A$ and $B$ are disjoint if the intersection $(A \cap B)$ is the empty set $\varnothing$.
 <br>
 > *Definition*: suppose $I$ is a set (an index set) and for each element $i$ there exists a set $A_i$, then
 > 
 > $$
 > \bigcup_{i \in I} A_i := \big\{x \;\big|\; \text{there is an } i \in I \text{ with } x \in A_i \big\},
 > $$
 >
 > and
 >
 > $$
 > \bigcap_{i \in I} A_i := \big\{x \;\big|\; \text{for all } i \in I \text{ there is } x \in A_i \big\}.
 > $$
 >
 Implying unions and intersections taken over an index set. For example suppose for each $i \in \mathbb{N}$ the set $A_i$ is defined as $\{x \in \mathbb{R} \;|\; 0 \leq x \leq i \}$, then
 $$
 \bigcap_{i \in \mathbb{N}} A_i = \{0\},
 $$
 and
 $$
 \bigcup_{i \in \mathbb{N}} A_i = \mathbb{R}_{\geq 0}.
 $$
 <br>
 > *Definition*: if $C$ is a collection of sets, then
 >
 > $$
 > \bigcup_{A \in C} A := \big\{x \;\big|\; \text{there is an } A \in C \text{ with } x \in A \big\},
 > $$
 >
 > and
 >
 > $$
 > \bigcap_{A \in C} A := \big\{x \;\big|\; \text{for all } A \in C \text{ there is } x \in A \big\}.
 > $$
 <br>
 > *Definition*: let $A$ and $B$ be sets. The difference of $A$ and $B$ $(A \backslash B)$ is the set of all elements from $A$ that are not in $B$. 
 > 
 >:  The symmetric difference of $A$ and $B$ $(A \triangle B)$ is the set consisting of all elements that are in exactly one of $A$ or $B$.
 >
 >:  If one is working inside a fixed set $U$ and only considering subsets of $U$, then the difference $U \backslash A$ is also called the complement of $A$ in $U$, denoted by $A^*$. In this case the set $U$ is called the universe.
 ## Cartesian products
 Suppose $a_1, a_2, \dots, a_k$ are elements from some set, then the ordered k-tuple of $a_1, a_2, \dots, a_k$ is denoted by $(a_1, a_2, \dots, a_k)$
 > *Definition*: the cartesian product $A_1 \times \dots \times A_k$ of sets $A_1, \dots , A_k$ is the set of all ordered k-tuples $(a_1, a_2, \dots, a_k)$ where $a_i \in A_i$ for $1 \leq i \leq k$.
 > 
 >:  If $A$ and $B$ are sets then
 >
 > $$
 > A \times B = \big\{ (a,b) \;\big|\; a \in A,\; b \in B \big\}
 > $$
 Notice that for all $1 \leq i \leq k$ and $A_i = A$ then $A_1 \times \dots \times A_k$ is also denoted by $A^k$. 
 ## Partitions
 > *Definition*: let $S$ be a nonempty set. A collection $\Pi$ of subsets is called a partition if and only if 
 > 
 > * $\varnothing \notin \Pi$,
 > * $\bigcup_{X \in \Pi} X = S$,
 > * for all $X \neq Y \in \Pi$ there is $X \cap Y = \varnothing$
 For example the set $\{1,2, \dots , 10\}$ can be partitioned into the sets $\{1,2,3\}$, $\{4,5\}$ and $\{6,7,8,9,10\}$.
 ## Quantifiers
 > *Definitions*: the universal quantifier "for all" is denoted by $\forall$ and the existential quantifier "there exists" is denoted by $\exists$.
 <br>
 > *Proposition* **- DeMorgan's rule**: the statement
 >
 > $$
 >   \neg (\forall x \in X \;[P(x)])
 > $$
 >
 > is equivalent with the statement
 >
 > $$
 >   \exists x \in X \;[\neg (P(x))].
 > $$
 >
 > The statement 
 >
 > $$
 >   \neg (\exists x \in X \;[P(x)])
 > $$
 >
 > is equivalent with the statement
 >
 > $$
 > \forall x \in X \; [\neg (P(x))].
 > $$
 ??? note "*Proof*:"
    will be added later.
--- a/docs/mathematics/topology/fiber-bundles.md
+++ b/docs/mathematics/topology/fiber-bundles.md
@ -0,0 +1,57 @@
 # Fiber bundles
 Let $X$ be a manifold over a field $F$. 
 > *Definition 1*: a **fiber** $V_x$ at a point $x \in X$ on a manifold is a finite dimensional vector space. With the collection of fibers $V_x$ for all $x \in X$ define the **fiber bundle** as
 >
 > $$
 >   V = \bigcup_{x \in X} V_x.
 > $$
 Then by definition we have the projection map $\pi$ given by
 $$
    \pi: V \to X: (x,\mathbf{v}) \mapsto \pi(x, \mathbf{v}) \overset{\text{def}}{=} x,
 $$
 and its inverse
 $$
    \pi^{-1}: X \to V: x \mapsto \pi(x) \overset{\text{def}}{=} V_x.
 $$
 Similarly, a dual fiber $V_x^*$ may be defined for $x \in X$, with its fiber bundle defined by
 $$
    V^* = \bigcup_{x \in X} V_x^*.
 $$
 > *Definition 2*: a **tensor fiber** $\mathscr{B}_x$ at a point $x \in X$ on a manifold is defined as
 >
 > $$
 >   \mathscr{B}_x = \bigcup_{p,q \in \mathbb{N}} \mathscr{T}^p_q(V_x).
 > $$
 >
 > With the collection of tensor fibers $\mathscr{B}_x$ for all $x \in X$ define the **tensor fiber bundle** as
 >
 > $$
 >   \mathscr{B} = \bigcup_{x \in X} \mathscr{B}_x.
 > $$
 Then for a point $x \in X$ we have a tensor $\mathbf{T} \in \mathscr{B}_x$ such that
 $$
    \mathbf{T} = T^{ij}_k \mathbf{e}_i \otimes \mathbf{e}_j \otimes \mathbf{\hat e}^k, 
 $$
 with $T^{ij}_k \in \mathbb{K}$ holors of $\mathbf{T}$. Furthermore, we have a basis $\{\mathbf{e}_i\}_{i=1}^n$ of $V_x$ and a basis $\{\mathbf{\hat e}^i\}_{i=1}^n$ of $V_x^*$. 
 > *Definition 3*: a tensor field $\mathbf{T}$ on a manifold $X$ is a [section]()
 >
 > $$
 >   \mathbf{T} \in \Gamma(X, \mathscr{B}),
 > $$
 >
 > of the tensor fiber bundle $\mathscr{B}$. 
 Therefore, a tensor field assigns a tensor fiber (or tensor) to each point on a section of the  manifold. These tensors may vary smoothly along the section of the manifold.
--- a/docs/physics/classical-mechanics/hamiltonian-mechanics/equations-of-hamilton.md
+++ b/docs/physics/classical-mechanics/hamiltonian-mechanics/equations-of-hamilton.md
@ -0,0 +1,136 @@
 # Equations of Hamilton
 ## The Hamiltonian
 > *Definition 1*: let $\mathcal{L}: (\mathbf{q},\mathbf{q}',t) \mapsto \mathcal{L}(\mathbf{q},\mathbf{q}',t)$ be the Lagrangian of the system, suppose that the generalized momenta $\mathbf{p}$ are defined in terms of the active variables $\mathbf{q}'$ and the passive variables $(\mathbf{q},t)$ such that
 >
 > $$
 >   \mathbf{p} = \nabla_{\mathbf{q}'}\mathcal{L}(\mathbf{q},\mathbf{q}',t),
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 We may now pose that there exists a function that meets the inverse, which can be obtained with Legendre transforms.
 > *Theorem 1*: there exists a function $\mathcal{H}: (\mathbf{q},\mathbf{p},t) \mapsto \mathcal{H}(\mathbf{q},\mathbf{p},t)$ such that
 >
 > $$
 >   \mathbf{q}' = \nabla_{\mathbf{p}} \mathcal{H}(\mathbf{q},\mathbf{p},t),
 > $$
 >
 > for all $t \in \mathbb{R}$. Where $\mathcal{H}$ is the Hamiltonian of the system and is related to the Lagrangian $\mathcal{L}$ by
 >
 > $$
 >   \mathcal{H}(\mathbf{q},\mathbf{p},t) = \langle \mathbf{q'}, \mathbf{p} \rangle - \mathcal{L}(\mathbf{q},\mathbf{q}',t),
 > $$
 >
 > for all $t \in \mathbb{R}$ with $\mathcal{L}$ and $\mathcal{H}$ the Legendre transforms of each other.
 ??? note "*Proof*:"
    Will be added later.
 ## The equations of Hamilton
 > *Corollary 1*: the partial derivatives of $\mathcal{L}$ and $\mathcal{H}$ with respect to the passive variables are related by
 >
 > $$
 > \begin{align*}
 >   \nabla_{\mathbf{q}} \mathcal{H}(\mathbf{q},\mathbf{p},t) &= - \nabla_{\mathbf{q}} \mathcal{L}(\mathbf{q},\mathbf{q}',t), \\
 >   \partial_t \mathcal{H}(\mathbf{q},\mathbf{p},t) &= - \partial_t \mathcal{L}(\mathbf{q},\mathbf{q}',t),
 > \end{align*}
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 ??? note "*Proof*:"
    Will be added later.
 Obtaining the equations of Hamilton 
 $$
 \begin{align*}
    \mathbf{p}' &= -\nabla_{\mathbf{q}} \mathcal{H}(\mathbf{q},\mathbf{p},t), \\
    \mathbf{q}' &= \nabla_{\mathbf{p}} \mathcal{H}(\mathbf{q},\mathbf{p},t),
 \end{align*}
 $$
 for all $t \in \mathbb{R}$. 
 > *Proposition 1*: when the Hamiltonian $\mathcal{H}$ has no explicit time dependence it is a constant of motion.
 ??? note "*Proof*:"
    Will be added later.
 To put it differently; a Hamiltonian of a conservative autonomous system is conserved.
 > *Theorem 2*: for conservative autonomous systems, the Hamiltonian $\mathcal{H}$ may be expressed as
 >
 > $$
 >   \mathcal{H}(\mathbf{q},\mathbf{p}) = T(\mathbf{q},\mathbf{p}) + V(\mathbf{q}),
 > $$
 >
 > for all $t \in \mathbb{R}$ with $T: (\mathbf{q},\mathbf{p}) \mapsto T(\mathbf{q},\mathbf{p})$ and $V: \mathbf{q} \mapsto V(\mathbf{q})$ the kinetic and potential energy of the system.
 ??? note "*Proof*:"
    Will be added later.
 It may be observed that the Hamiltonian $\mathcal{H}$ and [generalised energy](/en/physics/mechanics/lagrangian-mechanics/lagrange-generalizations/#the-generalized-energy) $h$ are identical. Note however that $\mathcal{H}$ must be expressed in $(\mathbf{q},\mathbf{p},t)$ which is not the case for $h$. 
 > *Proposition 2*: a coordinate $q_j$ is cyclic if
 >
 > $$
 >   \partial_{q_j} \mathcal{H}(\mathbf{q},\mathbf{p},t) = 0,
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 ??? note "*Proof*:"
    Will be added later.
 > *Proposition 3*: the Hamiltonian is seperable if there exists two mutually independent subsystems.
 ??? note "*Proof*:"
    Will be added later.
 ## Poisson brackets
 > *Definition 2*: let $G: (\mathbf{q},\mathbf{p},t) \mapsto G(\mathbf{q},\mathbf{p},t)$ be an arbitrary observable, its time derivative may be given by
 >
 > $$
 > \begin{align*}
 >   d_t G(\mathbf{q},\mathbf{p},t) &= \sum_{j=1}^f \Big(\partial_{q_j} G q_j' + \partial_{p_j} G p_j' \Big) + \partial_t G, \\
 >   &= \sum_{j=1}^f \Big(\partial_{q_j} G \partial_{p_j} \mathcal{H} - \partial_{p_j} G \partial_{q_j} \mathcal{H} \Big) + \partial_t G, \\
 >   &\overset{\mathrm{def}}= \{G, \mathcal{H}\} + \partial_t G. 
 > \end{align*}
 > $$
 >
 > for all $t \in \mathbb{R}$ with $\mathcal{H}$ the Hamiltonian and $\{G, \mathcal{H}\}$ the Poisson bracket of $G$ and $\mathcal{H}$. 
 The Poisson bracket may simplify expressions; it has distinct properties that are true for any observables. The following theorem demonstrates the usefulness even more. 
 > *Theorem 3*: let $f: (\mathbf{q}, \mathbf{p}, t) \mapsto f(\mathbf{q}, \mathbf{p}, t)$ and $g: (\mathbf{q}, \mathbf{p}, t) \mapsto f(\mathbf{q}, \mathbf{p}, t)$ be two integrals of Hamilton's equations given by
 >
 > $$
 > \begin{align*}
 >   f(\mathbf{q}, \mathbf{p}, t) = c_1, \\
 >   g(\mathbf{q}, \mathbf{p}, t) = c_2,
 > \end{align*}
 > $$
 >
 > for all $t \in \mathbb{R}$ with $c_{1,2} \in \mathbb{R}$. Then
 >
 > $$
 >   \{f,g\} = c_3
 > $$
 >
 > with $c_3 \in \mathbb{R}$ for all $t \in \mathbb{R}$.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/physics/classical-mechanics/hamiltonian-mechanics/hamiltonian-formalism.md
+++ b/docs/physics/classical-mechanics/hamiltonian-mechanics/hamiltonian-formalism.md
@ -0,0 +1,96 @@
 # Hamiltonian formalism of mechanics
 The Hamiltonian formalism of mechanics is based on the definitions posed by [Lagrangian mechanics](/en/physics/mechanics/lagrangian-mechanics/lagrangian-formalism) and the axioms, postulates and principles posed in the [Newtonian formalism](/en/physics/mechanics/newtonian-mechanics/newtonian-formalism/). 
 Where the Lagrangian formalism used the [principle of virtual work](/en/physics/mechanics/lagrangian-mechanics/lagrange-equations/#principle-of-virtual-work) to derive the Lagrangian equations of motion, the Hamiltonian formalism will derive the Lagrangian equations with the stationary action principle. A derivative of Fermat's principle of least time. 
 In Hamilton's formulation the stationary action principle is referred to as Hamilton's principle.
 ## Hamilton's principle
 > *Principle 1*: of all the kinematically possible motions that take a mechanical system from one given configuration to another within a time interval $T \subset \mathbb{R}$, the actual motion is the stationary point of the time integral of the Lagrangian $\mathcal{L}$ of the system. Let $S$ be the functional of the trajectories of the system, then
 >
 > $$
 >   S = \int_T \mathcal{L} dt,
 > $$ 
 >
 > has stationary points.
 The functional $S$ is often referred to as the action of the system. With this principle the equations of Lagrange can be derived. 
 > *Theorem 1*: let $\mathcal{L}: (\mathbf{q}, \mathbf{q'}) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'})$ be the Lagrangian, the equations of Lagrange are given by
 >
 > $$
 >   \partial_{q_j} \mathcal{L}(\mathbf{q}, \mathbf{q'}) - d_t \Big(\partial_{q_j'} \mathcal{L}(\mathbf{q}, \mathbf{q'}) \Big) = 0,
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 ??? note "*Proof*:"
    Let the redefined generalized coordinates $\mathbf{q}: (t,a) \mapsto \mathbf{q}(t,a)$ be given by 
    $$
        \mathbf{q}(t,a) = \mathbf{\hat q}(t) + a \varepsilon(t), 
    $$
    with $\mathbf{\hat q}: t \mapsto \mathbf{\hat q}(t)$ the generalized coordinates of the system and $\varepsilon: t \mapsto \varepsilon(t)$ a smooth differentiable function. 
    Let $S: a \mapsto S(a)$ be the action of the system and let $\mathcal{L}: (\mathbf{q}, \mathbf{q'}) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'})$ be the Lagrangian of the system, according to Hamilton's principle 
    $$
        S(a) = \int_T \mathcal{L}(\mathbf{q}, \mathbf{q'})dt,
    $$
    for all $a \in \mathbb{R}$. To determine the stationary points we must have that $S'(0) = 0$. We have that $S'$ is given by
    $$
    \begin{align*}
        S'(a) &= \int_T \partial_a \mathcal{L}(\mathbf{q}, \mathbf{q'})dt, \\
            &= \int_T \sum_{j=1}^f \bigg(\partial_{q_j} \mathcal{L} \partial_a q_j + \partial_{q_j'} \mathcal{L} \partial_a q_j'\bigg)dt, \\
            &= \int_T \sum_{j=1}^f \bigg(\partial_{q_j} \mathcal{L} \varepsilon_j(t) + \partial_{q_j'} \mathcal{L} \partial_a \partial_t q_j\bigg)dt. \\
    \end{align*}
    $$
    Partial integration may be used for the second part:
    $$
    \begin{align*}
        \int_T \partial_{q_j'} \mathcal{L} \partial_a \partial_t q_j dt &= \Big[\partial_{q_j'} \mathcal{L} \partial_a q_j \Big]_T - \int_T \partial_a q_j d_t (\partial_{q_j'} \mathcal{L})dt, \\
        &= \Big[\partial_{q_j'} \mathcal{L} \varepsilon_j(t) \Big]_T - \int_T \partial_a q_j d_t (\partial_{q_j'} \mathcal{L})dt.
    \end{align*}
    $$
    Choose $\varepsilon_j$ such that
    $$
        \Big[\partial_{q_j'} \mathcal{L} \varepsilon_j(t) \Big]_T = 0.
    $$
    Obtains
    $$
        \int_T \partial_{q_j'} \mathcal{L} \partial_a \partial_t q_j dt = - \int_T \partial_a q_j d_t (\partial_{q_j'} \mathcal{L})dt.
    $$
    The general expression of $S'$ may now be given by
    $$
    \begin{align*}
        S'(a) &= \int_T \sum_{j=1}^f \bigg(\partial_{q_j} \mathcal{L} \varepsilon_j(t) - \partial_a q_j d_t (\partial_{q_j'} \mathcal{L})\bigg)dt, \\
            &= \int_T \sum_{j=1}^f \bigg(\partial_{q_j} \mathcal{L} \varepsilon_j(t) - \varepsilon_j(t) d_t (\partial_{q_j'} \mathcal{L})\bigg)dt, \\
            &= \sum_{j=1}^f \int_T  \varepsilon_j(t) \Big(\partial_{q_j} \mathcal{L} - d_t (\partial_{q_j'} \mathcal{L})\Big)dt.
    \end{align*}
    $$
    Then
    $$
        S'(0) = \sum_{j=1}^f \int_T  \varepsilon_j(t) \Big(\partial_{q_j} \mathcal{L} - d_t (\partial_{q_j'} \mathcal{L})\Big)dt = 0,
    $$
    since $\varepsilon_j$ can be chosen arbitrary this implies that 
    $$
        \partial_{q_j} \mathcal{L} - d_t (\partial_{q_j'} \mathcal{L}) = 0.
    $$
--- a/docs/physics/classical-mechanics/lagrangian-mechanics/lagrange-equations.md
+++ b/docs/physics/classical-mechanics/lagrangian-mechanics/lagrange-equations.md
@ -0,0 +1,97 @@
 # The equations of Lagrange
 ## Principle of virtual work
 > *Definition 1*: a virtual displacement is a displacement at a fixed moment in time that is consistent with the constraints at that moment.
 The following principle addresses the problem that the constraint forces are generally unknown.
 > *Principle 1*: let $\mathbf{\delta x}_i \in \mathbb{R}^m$ be a virtual displacement and let $\mathbf{F}_i: \mathbf{q} \mapsto \mathbf{F}_i(\mathbf{q})$ be the total force excluding the constraint forces. Then 
 >
 > $$
 >   \sum_{i=1}^n \Big\langle \mathbf{F}_i(\mathbf{q}) - m_i \mathbf{x}_i''(\mathbf{q}), \mathbf{\delta x}_i \Big\rangle = 0,
 > $$
 >
 > is true for sklerenomic constraints and all $t \in \mathbb{R}$.
 Which implies that the constraint forces do not do any (net) virtual work.
 ## The equations of Lagrange
 > *Theorem 1*: let $T: (\mathbf{q}, \mathbf{q}') \mapsto T(\mathbf{q}, \mathbf{q'})$ be the kinetic energy of the system. For holonomic constraints we have that
 >
 > $$
 >   d_t \Big(\partial_{q_j'} T(\mathbf{q},\mathbf{q}') \Big) - \partial_{q_j} T(\mathbf{q},\mathbf{q}') = Q_j(\mathbf{q}), 
 > $$
 >
 > for all $t \in \mathbb{R}$. With $Q_j: \mathbf{q} \mapsto Q_j(\mathbf{q})$ the generalized forces of type I given by
 >
 > $$
 >   Q_j(\mathbf{q}) = \sum_{i=1}^n \Big\langle \mathbf{F}_i(\mathbf{q}), \partial_j \mathbf{x}_i(\mathbf{q}) \Big\rangle,
 > $$
 >
 > for all $t \in \mathbb{R}$ with $\mathbf{F}_i: \mathbf{q} \mapsto \mathbf{F}_i(\mathbf{q})$ the total force excluding the constraint forces. 
 ??? note "*Proof*:"
    Will be added later.
 Obtaining the equations of Lagrange. Note that the position of each point mass $\mathbf{x}_i$ is defined in the [Lagrangian formalism](lagrangian-formalism.md#generalizations). 
 ### Conservative systems
 For conservative systems we may express the force $\mathbf{F}_i: \mathbf{q} \mapsto \mathbf{F}_i(\mathbf{q})$ in terms of a potential energy $V: X \mapsto V(X)$ by
 $$
    \mathbf{F}_i(\mathbf{q}) = -\nabla_i V(X),
 $$
 for $X: \mathbf{q} \mapsto X(\mathbf{q}) \overset{\mathrm{def}}= \{\mathbf{x}_i(\mathbf{q})\}_{i=1}^n$. 
 > *Lemma 1*: for a conservative holonomic system the generalized forces of type I $Q_j: \mathbf{q} \mapsto Q_j(\mathbf{q})$ may be expressed in terms of the potential energy $V: \mathbf{q} \mapsto V(\mathbf{q})$ by
 >
 > $$
 >   Q_j(\mathbf{q}) = -\partial_{q_j} V(\mathbf{q}),
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 ??? note "*Proof*:"
    Will be added later.
 The equation of Lagrange may now be rewritten, which obtains the following lemma.
 > *Lemma 2*: let $T: (\mathbf{q}, \mathbf{q}') \mapsto T(\mathbf{q}, \mathbf{q'})$ and $V: \mathbf{q} \mapsto V(\mathbf{q})$ be the kinetic and potential energy of the system. The Lagrange equations for conservative systems are given by
 >
 > $$
 >   d_t \Big(\partial_{q_j'} T(\mathbf{q},\mathbf{q}')\Big) - \partial_{q_j}T(\mathbf{q},\mathbf{q}') = - \partial_{q_j} V(\mathbf{q}),
 > $$
 >
 > for all $t \in \mathbb{R}$
 ??? note "*Proof*:"
    Will be added later.
 > *Definition 2*: let $T: (\mathbf{q}, \mathbf{q}') \mapsto T(\mathbf{q}, \mathbf{q'})$ and $V: \mathbf{q} \mapsto V(\mathbf{q})$ be the kinetic and potential energy of the system. The Lagrangian $\mathcal{L}: (\mathbf{q}, \mathbf{q'}) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'})$ is defined as
 >
 > $$
 >   \mathcal{L}(\mathbf{q}, \mathbf{q'}) = T(\mathbf{q},\mathbf{q}') - V(\mathbf{q}),
 > $$
 >
 > for all $t \in \mathbb{R}$.
 With this definition we may write the Lagrange equations in a more formal way.
 > *Theorem 2*: let $\mathcal{L}: (\mathbf{q}, \mathbf{q'}) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'})$ be the Lagrangian, the equations of Lagrange for conservative holonomic systems are given by
 >
 > $$
 >   d_t \Big(\partial_{q_j'} \mathcal{L}(\mathbf{q}, \mathbf{q'}) \Big) - \partial_{q_j} \mathcal{L}(\mathbf{q}, \mathbf{q'}) = 0,
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/physics/classical-mechanics/lagrangian-mechanics/lagrange-generalizations.md
+++ b/docs/physics/classical-mechanics/lagrangian-mechanics/lagrange-generalizations.md
@ -0,0 +1,96 @@
 # Lagrange generalizations
 ## The generalized momentum and force
 > *Definition 1*: let $\mathcal{L}: (\mathbf{q}, \mathbf{q'}) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'})$ be the Lagrangian, the **generalized momentum** $p_j: (\mathbf{q}, \mathbf{q}') \mapsto p_j(\mathbf{q},\mathbf{q}')$ is defined as
 >
 > $$
 >   p_j(\mathbf{q},\mathbf{q}') = \partial_{q_j'} \mathcal{L}(\mathbf{q}, \mathbf{q'}), 
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 The generalized momentum may also be referred to as the canonical or conjugated momentum. Recall that $j \in \mathbb{N}[j\leq f]$. 
 > *Definition 2*: let $\mathcal{L}: (\mathbf{q}, \mathbf{q'}) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'})$ be the Lagrangian, the **generalized force of type II** $F_j: (\mathbf{q}, \mathbf{q}') \mapsto F_j(\mathbf{q},\mathbf{q}')$ is defined as
 >
 > $$
 >   F_j(\mathbf{q},\mathbf{q}') = \partial_{q_j} \mathcal{L}(\mathbf{q}, \mathbf{q'})
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 We may also write $\mathbf{p} = \{p_j\}_{j=1}^f$ and $\mathbf{F} = \{F_j\}_{j=1}^f$. 
 ## The generalized energy
 > *Theorem 1*: let $\mathcal{L}: (\mathbf{q}, \mathbf{q'}) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'})$ be the Lagrangian, the generalized energy $h: (\mathbf{q}, \mathbf{q'},\mathbf{p}) \mapsto h(\mathbf{q}, \mathbf{q'},\mathbf{p})$ is given by
 >
 > $$
 >   h(\mathbf{q}, \mathbf{q'}, \mathbf{p}) = \sum_{j=1}^f \big(p_j q_j' \big) - \mathcal{L}(\mathbf{q}, \mathbf{q'}), 
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 ??? note "*Proof*:"
    Will be added later.
 A generalization of the concept of energy.
 * If the Lagrangian $\mathcal{L}: (\mathbf{q}, \mathbf{q'},t) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'},t)$ is explicitly time-dependent $\partial_t \mathcal{L}(\mathbf{q}, \mathbf{q'},t) \neq 0$ and the generalized energy $h$ is not conserved.
 * If the Lagrangian $\mathcal{L}: (\mathbf{q}, \mathbf{q'}) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'})$ is not explicitly time-dependent $\partial_t \mathcal{L}(\mathbf{q}, \mathbf{q'}) = 0$ and the generalized energy $h$ is conserved. 
 > *Theorem 2*: for autonomous systems with only conservative forces the generalized energy $h: (\mathbf{q}, \mathbf{q'}) \mapsto h(\mathbf{q}, \mathbf{q'})$ is conserved and is given by
 >
 > $$
 >   h(\mathbf{q}, \mathbf{q'}) = T(\mathbf{q},\mathbf{q}') + V(\mathbf{q}) \overset{\mathrm{def}}= E,
 > $$
 >
 > for all $t \in \mathbb{R}$ with $T: (\mathbf{q}, \mathbf{q}') \mapsto T(\mathbf{q}, \mathbf{q'})$ and $V: \mathbf{q} \mapsto V(\mathbf{q})$ the kinetic and potential energy of the system and $E \in \mathbb{R}$ the total energy of the system.
 ??? note "*Proof*:"
    Will be added later.
 In this case the generalized energy $h$ is conserved and is equal to the total energy $E$ of the system.
 ## Conservation of generalized momentum
 > *Definition 3*: let $\mathcal{L}: (\mathbf{q}, \mathbf{q'}) \mapsto \mathcal{L}(\mathbf{q}, \mathbf{q'})$ be the Lagrangian, a coordinate $q_j$ is **cyclic** if
 >
 > $$
 >   \partial_{q_j} \mathcal{L}(\mathbf{q}, \mathbf{q'}) = 0,
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 Therefore the Lagrangian is independent of a cyclic coordinate. 
 > *Proposition 1*: the generalized momentum $p_j$ corresponding to a cyclic coordinate $q_j$ is conserved.
 ??? note "*Proof*:"
    Will be added later.
 ## Seperable systems
 > *Proposition 2*: the Lagrangian is seperable if there exists two mutually independent subsystems. 
 ??? note "*Proof*:"
    Will be added later.
 Obtaining a decoupled set of partial differential equations.
 ## Invariances
 > *Proposition 3*: the Lagrangian is invariant for Gauge transformations and therefore **not unique**. 
 ??? note "*Proof*:"
    Will be added later.
 There can exist multiple Lagrangians that may lead to the same equation of motion.
 According to the theorem of Noether, the invariance of a closed system with respect to continuous transformations implies that corresponding conservation laws exist.
--- a/docs/physics/classical-mechanics/lagrangian-mechanics/lagrangian-formalism.md
+++ b/docs/physics/classical-mechanics/lagrangian-mechanics/lagrangian-formalism.md
@ -0,0 +1,75 @@
 # Lagrangian formalism of mechanics
 The Lagrangian formalism of mechanics is based on the axioms, postulates and principles posed in the [Newtonian formalism](/en/physics/mechanics/newtonian-mechanics/newtonian-formalism/).
 ## Configuration of a system
 Considering a system of $n \in \mathbb{R}$ point masses $m_i \in \mathbb{R}$ with positions $\mathbf{x}_i \in \mathbb{R}^m$ in dimension $m \in \mathbb{N}$, for $i \in \mathbb{N}[i \leq n]$. 
 > *Definition 1*: the set of positions $\{\mathbf{x}_i\}_{i=1}^n$ is defined as the configuration of the system.
 Obtaining a $n m$ dimensional configuration space of the system.
 > *Definition 2*: let $N = nm$, the set of time dependent coordinates $\{q_i: t \mapsto q_i(t)\}_{i=1}^N$ at a time $t \in \mathbb{R}$ is a point in the $N$ dimensional configuration space of the system.
 <br>
 > *Definition 3*: let the generalized coordinates be a minimal set of coordinates which are sufficient to specify the configuration of a system completely and uniquely. 
 The minimum required number of generalized coordinates is called the number of degrees of freedom of the system. 
 ## Classification of constraints
 > *Definition 4*: geometric constraints define the range of the positions $\{\mathbf{x}_i\}_{i=1}^n$. 
 <br>
 > *Definition 5*: holonomic constraints are defined as constraints that can be formulated as an equation of generalized coordinates and time.
 Let $g: (q_1, \dots, q_N, t) \mapsto g(q_1, \dots, q_N, t) = 0$ is an example of a holonomic constraint.
 > *Definition 6*: a constraint that depends on velocities is defined as a kinematic constraint.
 If the kinematic constrain is integrable and can be formulated as a holonomic constraint it is referred to as a integrable kinematic constraint.
 > *Definition 7*: a constraint that explicitly depends on time is defined as a rheonomic constraint. Otherwise the constraint is defined as a sklerenomic constraint.
 If a system of $n$ point masses is subject to $k$ indepent holonomic constraints, then these $k$ equations can be used to eliminate $k$ of the $N$ coordinates. Therefore there remain $f \overset{\mathrm{def}}= N - k$ "independent" generalized coordinates.
 ## Generalizations
 > *Definition 8*: the set of generalized velocities $\{q_i'\}_{i=1}^N$ at a time $t \in \mathbb{R}$ is the velocity at a point along its trajectory through configuration space.
 The position of each point mass may be given by
 $$
    \mathbf{x}_i: \mathbf{q} \mapsto \mathbf{x}_i(\mathbf{q}),
 $$
 with $\mathbf{q} = \{q_i\}_{i=1}^f$ generalized coordinates.
 Therefore the velocity of each point mass is given by
 $$
    \mathbf{x}_i'(\mathbf{q}) = \sum_{r=1}^f \partial_r \mathbf{x}_i(\mathbf{q}) q_r',
 $$
 for all $t \in \mathbb{R}$ (inexplicitly). 
 > *Theorem 1*: the total kinetic energy $T: (\mathbf{q}, \mathbf{q}') \mapsto T(\mathbf{q}, \mathbf{q}')$ of the system is given by
 >
 > $$
 >   T(\mathbf{q}, \mathbf{q}') = \sum_{r,s=1}^f a_{rs}(\mathbf{q}) q_r' q_s',
 > $$
 >
 > with
 >
 > $$
 >   a_{rs}(\mathbf{q}) = \sum_{i=1}^n \frac{1}{2} m_i \Big\langle \partial_r \mathbf{x}_i(\mathbf{q}), \partial_s \mathbf{x}_i(\mathbf{q}) \Big\rangle,
 > $$
 >
 > for all $t \in \mathbb{R}$.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/physics/classical-mechanics/newtonian-mechanics/energy.md
+++ b/docs/physics/classical-mechanics/newtonian-mechanics/energy.md
@ -0,0 +1,58 @@
 # Energy
 ## Potential energy
 > *Definition 1*: a force field $\mathbf{F}$ is conservative if it is [irrotational](../../mathematical-physics/vector-analysis/vector-operators/#potentials)
 >
 > $$
 >   \nabla \times \mathbf{F} = 0,
 > $$
 >
 > obtaining a scalar potential $V$ such that
 >
 > $$
 >   \mathbf{F} = - \nabla V,
 > $$
 >
 > referred to as the potential energy.
 ## Kinetic energy
 > *Definition 2*: the kinetic energy $T: t \mapsto T(t)$ of a pointmass $m \in \mathbb{R}$ with position $x: t \mapsto x(t)$ subject to a force $\mathbf{F}: x \mapsto \mathbf{F}(x)$ is defined as
 >
 > $$
 >   T(t) - T(0) = \int_0^t \langle \mathbf{F}(x), dx \rangle,
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 <br>
 > *Proposition 1*: the kinetic energy $T: t \mapsto T(t)$ of a pointmass $m \in \mathbb{R}$ with position $x: t \mapsto x(t)$ subject to a force $\mathbf{F}: x \mapsto \mathbf{F}(x)$ is given by
 >
 > $$
 >   T(t) - T(0) = \frac{1}{2} m \|x'(t)\|^2 - \frac{1}{2} m \|x'(0)\|^2,
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 ??? note "*Proof*:"
    Will be added later.
 ## Energy conservation
 > *Theorem 1*: for a pointmass $m \in \mathbb{R}$ with position $x: t \mapsto x(t)$ subject to a force $\mathbf{F}: x \mapsto \mathbf{F}(x)$ we have that
 >
 > $$
 >   T(x) + V(x) = T(0) + V(0) \overset{\mathrm{def}} = E,
 > $$
 >
 > for all x, with $T: x \mapsto T(x)$ and $V: x \mapsto V(x)$ the kinetic and potential energy of the point mass. 
 ??? note "*Proof*:"
    Will be added later.
 Obtaining conservation of energy with $E \in \mathbb{R}$ the total (constant) energy of the system.
--- a/docs/physics/classical-mechanics/newtonian-mechanics/momentum.md
+++ b/docs/physics/classical-mechanics/newtonian-mechanics/momentum.md
@ -0,0 +1,30 @@
 # Momentum
 > *Definition 1*: the **momentum** $\mathbf{p}$ of a particle is defined as the product of the mass and velocity of the particle
 >
 > $$
 >   \mathbf{p} = m \mathbf{v},
 > $$
 >
 > with $m$ the mass of the particle and $\mathbf{v}$ the velocity of the particle.
 For the case that $\mathbf{v}: t \to \mathbf{v}(t) \implies \mathbf{v}'(t) = \mathbf{a}(t)$ we have the following theorem.
 > *Theorem 1*: let $\mathbf{v}$, $\mathbf{a}$ be the velocity and acceleration of a particle respectively, if we have 
 > 
 > $$
 >   \mathbf{v}: t \to \mathbf{v}(t) \implies \forall t \in \mathbb{R}: \mathbf{v}'(t) = \mathbf{a}(t),
 > $$ 
 >
 > then
 >
 > $$
 >   \mathbf{p}'(t) = \mathbf{F}(t),
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/physics/classical-mechanics/newtonian-mechanics/newtonian-formalism.md
+++ b/docs/physics/classical-mechanics/newtonian-mechanics/newtonian-formalism.md
@ -0,0 +1,128 @@
 # Newtonian formalism of mechanics
 ## Fundamental assumptions
 > *Postulate 1*: there exists an absolute space in which the axioms of Euclidean geometry hold.
 The properties of space are constant, immutable and entirely independent of the presence of objects and of all dynamical processes that occur within it.
 > *Postulate 2*: there exists an absolute time, entirely independent.
 From postulate 1 and 2 we obtain the notion that simultaneity is absolute. In the sense that incidents that occur simultaneously in one reference system, occur simultaneously in all reference systems, independent of their mutual dynamic states or relations.
 The definition of a reference system will follow in the next section.
 > *Principle of relativity*: all physical axioms are of identical form in all **inertial** reference systems.
 It follows from the principle of relativity that the notion of absolute velocity does not exist.
 > *Postulate 3*: space and time are continuous, homogeneous and isotropic.
 Implying that there is no fundamental limit to the precision of measurements of spatial positions, velocities and time intervals. There are no special locations or instances in time all positions and times are equivalent. The properties of space and time are invariant under translations. And there are no special directions, all directions are equivalent. The properties of space and time are invariant under rotations and reflections.
 ## Galilean transformations
 > *Definition 1*: a **reference system** is an abstract coordinate system whose origin, orientation, and scale are specified by a set of geometric points whose position is identified both mathematically and physically.
 From the definition of a reference system and postulates 1, 2 and 3 the Galilean transformations may be posed, which may be used to transform between the coordinates of two reference systems. 
 > *Principle 1*: let $(\mathbf{x},t) \in \mathbb{R}^4$ be a general point in spacetime. 
 > 
 > A uniform motion with velocity $\mathbf{v}$ is given by
 >
 > $$
 >   (\mathbf{x},t) \mapsto (\mathbf{x} + \mathbf{v}t,t),
 > $$
 >
 > for all $\mathbf{v}\in \mathbb{R}^3$.
 >
 > A translation by $(\mathbf{a},t)$ is given by
 >
 > $$
 >   (\mathbf{x},t) \mapsto (\mathbf{x} + \mathbf{a},t + s),
 > $$
 >
 > for all $(\mathbf{a},t) \in \mathbb{R}^4$. 
 >
 > A rotation by $R$ is given by
 >
 > $$
 >   (\mathbf{x},t) \mapsto (R \mathbf{x},t),
 > $$
 >
 > for all orthogonal transformations $R: \mathbb{R}^3 \to \mathbb{R}^3$. 
 The Galilean transformations may form a Lie group. 
 ## Axioms of Newton
 > *Axiom 1*: in the absence of external forces, a particle moves with a constant speed along a straight line. 
 >
 > *Axiom 2:* the net force on a particle is equal to the rate at which the particle's momentum changes with time.
 >
 > *Axiom 3:* if two particles exert forces onto each other, then the mutual forces have equal magnitudes but opposite directions.
 From axiom 1 and the principle of relativity the definition of a inertial reference system may be posed. 
 > *Definition 2*: an **inertial reference system** is a reference system in which the first axiom of Newton holds.
 This implies that a inertial reference system is reference system not undergoing any acceleration. Therefore we may postulate the following. 
 > *Postulate 4*: inertial reference systems exist.
 <br>
 > *Definition 3*: considering two particles $i \in \{1,2\}$ which exert forces onto each other having accelerations $\mathbf{a}_i$. Since by the 2nd and 3rd axiom we have that $\mathbf{a}_1 = - \mathbf{a}_2$ and that the ratio of their magnitudes is a constant we define the ratio of the inertial masses by
 >
 > $$
 >   \frac{m_1}{m_2} = \frac{\|\mathbf{a}_2\|}{\|\mathbf{a}_1\|}. 
 > $$
 A particle with a mass can be considered as a point mass, which is defined below.
 > *Definition 4*: a point mass is defined as a point in space and time appointed with a mass.
 ## Forces
 > *Definition 5*: a force $\mathbf{F}$ is defined as
 >
 > $$
 >   \mathbf{F} = m \mathbf{a},
 > $$
 >
 > with $m \in \mathbb{R}$ the inertial mass and $\mathbf{a}$ the acceleration of the particle.
 Definition 5 also implies the equation of motion, for a constant force a second order ordinary differential equation of the position.
 > *Proposition 1*: in the case that a force only depends on position, the equation of motion is invariant to time inversion and time translation.
 ??? note "*Proof*:"
    Will be added later.
 This implies that for a moving a particle in a force field it can not be deduced at what point in time it occured and whether it is moving forward or backward in time.
 > *Definition 6*: a central force $\mathbf{F}$ representing the interaction between two point masses at positions $\mathbf{x}_1$ and $\mathbf{x}_2$ is defined as 
 >
 > $$
 >   \mathbf{F} = F(\mathbf{x}_1,\mathbf{x}_2) \frac{\mathbf{x}_2 - \mathbf{x}_1}{\|\mathbf{x}_2 - \mathbf{x}_1\|} \overset{\mathrm{def}} = F(\mathbf{x}_1,\mathbf{x}_2) \mathbf{e}_r,
 > $$
 >
 > with $F: (\mathbf{x}_1,\mathbf{x}_2) \mapsto F(\mathbf{x}_1,\mathbf{x}_2)$ the magnitude.
 Which for a isotropic central force depends only on the distance between the pointmasses $\|\mathbf{x}_2 - \mathbf{x}_1\|$. 
 ### Gravitational force of Newton
 > *Postulate 5*: the force $\mathbf{F}$ between two particles described by their positions $\mathbf{x}_{1,2}: t \mapsto \mathbf{x}_{1,2}(t)$ is given by
 >
 > $$
 >   \mathbf{F} = G \frac{m_1 m_2}{\|\mathbf{x}_2 - \mathbf{x}_1\|^2} \mathbf{e}_r, 
 > $$
 >
 > with $m_{1,2} \in \mathbb{R}$ the gravitational mass of both particles and $G \in \mathbb{R}$ the gravitational constant. 
 According to the observation of Galilei; all object fall with equal speed (in the absence of air friction), which implies that the ratio of inertial and gravitational mass is a constant for any kind of matter.
 > *Principle 2*: the inertial and gravitational mass of a particle are equal.
--- a/docs/physics/classical-mechanics/newtonian-mechanics/particle-systems.md
+++ b/docs/physics/classical-mechanics/newtonian-mechanics/particle-systems.md
@ -0,0 +1,216 @@
 # Particle systems
 For a system of particles we have the mutual forces among the selected particles referred to as internal forces, otherwise external forces. If there are no external forces, the system is called closed, otherwise open.
 > *Definition 1*: the internal interaction forces $\mathbf{F}_i$ in a system of $n \in \mathbb{N}$ particles with position $\mathbf{x}_i$ may be approximated by pairwise interaction forces given by
 >
 > $$
 >   \mathbf{F}_i (\mathbf{x}_i) = \sum_{j=1}^n \mathbf{F}_{ij}(\mathbf{x}_i, \mathbf{x}_j) \epsilon_{ij}, 
 > $$
 >
 > for all $\mathbf{x}_i$ with $\mathbf{F}_{ij}$ the pairwise interaction force between particle $i$ and $j$. 
 For high density systems this approximation diverges.
 ## Systems with conservative internal forces
 Considering a system of $n \in \mathbb{N}$ particles with position $\mathbf{x}_i$ and mass $m_i \in \mathbb{R}$ with conservative external forces $\mathbf{F}_i$. For each particle an equation of motion can be formulated using the pairwise interaction approximation (definition 1), obtaining
 $$
    m_i \mathbf{x}_i''(t) = \mathbf{F}_i(\mathbf{x}_i(t)) + \sum_{j=1}^n \mathbf{F}_{ij}(\mathbf{x}_i, \mathbf{x}_j) \epsilon_{ij},
 $$
 for all $t \in \mathbb{R}$ with $\mathbf{F}_{ij}$ the pairwise interaction force. 
 > *Definition 2*: the total mass $M$ of the system is defined as
 >
 > $$
 >   M = \sum_{i=1}^n m_i.
 > $$
 <br>
 > *Definition 3*: the center of mass $\mathbf{R}: t \mapsto \mathbf{R}(t)$ of the system is defined as 
 >
 > $$
 >   \mathbf{R}(t) = \frac{1}{M} \sum_{i=1}^n m_i \mathbf{x}_i(t),
 > $$
 >
 > for all $t \in \mathbb{R}$. 
 <br>
 > *Definition 4*: the total momentum $\mathbf{P}$ and angular momentum $\mathbf{J}$ of the system are defined as
 >
 > $$
 > \begin{align*}
 >   \mathbf{P} &= \sum_{i=1}^n \mathbf{p}_i, \\
 >   \mathbf{J} &= \sum_{i=1}^n \mathbf{x}_i \times \mathbf{p}_i,
 > \end{align*}
 > $$
 >
 > with $\mathbf{p}_i$ the momentum of each particle.
 We have for $\mathbf{P}: t \mapsto \mathbf{P}(t)$ the total momentum equivalently given by
 $$
    \mathbf{P}(t) = M \mathbf{R}'(t),
 $$
 for all $t \in \mathbb{R}$ with $\mathbf{R}: t \mapsto \mathbf{R}(t)$ the center of mass.
 > *Definition 5*: the total external force $\mathbf{F}$ and torque $\mathbf{\Gamma}$ of the system are defined as
 >
 > $$
 > \begin{align*}
 >   \mathbf{F} &= \sum_{i=1}^n \mathbf{F}_i, \\
 >   \mathbf{\Gamma} &= \sum_{i=1}^n \mathbf{x}_i \times \mathbf{F}_i,
 > \end{align*}
 > $$
 >
 > with $\mathbf{F}_i$ the conservative external force.
 <br>
 > *Proposition 1*: the total momentum $\mathbf{P}: t \mapsto \mathbf{P}(t)$ is related to the total external force $\mathbf{F}: t \mapsto \mathbf{F}(t)$ by
 >
 > $$
 >   \mathbf{P}'(t) = \mathbf{F}(t),
 > $$
 >
 > for all $t \in \mathbb{R}$.
 ??? note "*Proof*:"
    Will be adder later.
 > *Proposition 2*: the total angular momentum $\mathbf{J}: t \mapsto \mathbf{J}(t)$ is related to the total external torque $\mathbf{\Gamma}: t \mapsto \mathbf{\Gamma}(t)$ by
 >
 > $$
 >   \mathbf{J}'(t) = \mathbf{\Gamma}(t),
 > $$
 >
 > for all $t \in \mathbb{R}$ if the internal forces are central forces.
 ??? note "*Proof*:"
    Will be adder later.
 ### Orbital and spin angular momentum
 Considering internal position vectors $\mathbf{r}_i$ relative to the center of mass $\mathbf{r}_i = \mathbf{x}_i - \mathbf{R}$. I propose that the total angular momentum $\mathbf{J}$ can be expressed as a superposition of the orbital $\mathbf{L}$ and spin $\mathbf{S}$ angular momentum components given by
 $$
    \mathbf{J} = \mathbf{L} + \mathbf{S}.
 $$
 ??? note "*Proof*:"
    Will be added later.
 > *Definition 6*: the orbital angular momentum $\mathbf{L}$ of the system is defined as
 >
 > $$
 >   \mathbf{L} = \mathbf{R} \times \mathbf{P},
 > $$
 >
 > with $\mathbf{R}$ the center of mass and $\mathbf{P}$ the total momentum of the system.
 <br>
 > *Definition 7*: the spin angular momentum $\mathbf{S}: t \mapsto \mathbf{S}(t)$ of the system is defined as
 >
 > $$
 >   \mathbf{S}(t) = \sum_{i=1}^n \mathbf{r}_i(t) \times m_i \mathbf{r}'_i(t)
 > $$
 >
 > for all $t \in \mathbb{R}$ with $\mathbb{r}_i$ the internal position. 
 Analoguosly the orbital and spin torque may be defined. 
 > *Definition 8*: the orbital and spin torque $\mathbf{\Gamma}_{o,s}$ of the system are defined as 
 >
 > $$
 > \begin{align*}
 >   \mathbf{\Gamma}_o &= \mathbf{R} \times \mathbf{F}, \\
 >   \mathbf{\Gamma}_s &= \sum_{i=1}^n \mathbf{r}_i \times \mathbf{F}_i,
 > \end{align*}
 > $$
 >
 > with $\mathbf{R}$ the center of mass, $\mathbf{r}_i$ the internal position and $\mathbf{F}_i$ the conservative external force.
 Similarly, the total torque $\mathbf{\Gamma}$ of the system is the superposition of the orbital and spin torque $\mathbf{\Gamma}_{o,s}$ given by
 $$
    \mathbf{\Gamma} = \mathbf{\Gamma}_o + \mathbf{\Gamma}_s.
 $$
 ??? note "*Proof*:"
    Will be added later.
 > *Proposition 3*: let $\mathbf{L}: t \mapsto \mathbf{L}(t)$ be the orbital angular momentum and let $\mathbf{S}: t \mapsto \mathbf{S}(t)$ be the spin angular momentum. Then we have
 >
 > $$
 > \begin{align*}
 >   \mathbf{L}'(t) &= \mathbf{\Gamma}_o(t), \\
 >   \mathbf{S}'(t) &= \mathbf{\Gamma}_s(t),
 > \end{align*}
 > $$
 >
 > for all $t \in \mathbb{R}$ with $\mathbf{\Gamma}_o: t \mapsto \mathbf{\Gamma}_o(t)$ and $\mathbf{\Gamma}_s: t \mapsto \mathbf{\Gamma}_s(t)$ the orbital and spin torque.
 ### Energy
 > *Definition 9*: the total kinetic energy $T$ of the system is defined as 
 >
 > $$
 >   T = \sum_{i=1}^n \frac{1}{2} m_i \|\mathbf{x}_i'\|^2,
 > $$
 >
 > with $\mathbf{x}_i$ the position of each particle.
 <br>
 > *Definition 10*: the orbital and internal kinetic energy $T_{o,r}$ of the system are defined as 
 >
 > $$
 > \begin{align*}
 >   T_o = \frac{1}{2} M \|\mathbf{R}\|^2, \\
 >   T_r = \sum_{i=1}^n \frac{1}{2} m_i \|\mathbf{r}_i'\|^2,
 > \end{align*}
 > $$
 >
 > with $M$ the total mass, $\mathbf{R}$ the center of mass and $\mathbf{r}$ the internal position of each particle.
 <br>
 > *Proposition 4*: the total kinetic energy $T$ of the system is a superposition of the orbital and internal kinetic energy given by
 >
 > $$
 >   T = T_o + T_r.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 > *Proposition 5*: the dynamics of the orbital and kinetic energy $T_o: t \mapsto T_o(t)$ is decoupled
 >
 > $$
 >   T_o'(t) = \langle \mathbf{F}, \mathbf{R}'(t) \rangle,
 > $$
 >
 > for all $t \in \mathbb{R}$ with $\mathbf{F}$ the total external force and $\mathbf{R}$ the center of mass.
 >
 > The dynamics of the internal kinetic energy $T_r: t \mapsto T_r(t)$ is not decoupled
 >
 > $$
 >   T_r'(t) = \sum_{i=1}^n \langle \mathbf{f}_i, \mathbf{r}_i'(t) \rangle,
 > $$
 >
 > for all $t \in \mathbb{R}$ with $\mathbf{f}_i$ the sum of both external and internal forces for each particle. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/physics/classical-mechanics/newtonian-mechanics/rotation.md
+++ b/docs/physics/classical-mechanics/newtonian-mechanics/rotation.md
@ -0,0 +1,47 @@
 # Rotation
 Rotation is always viewed with respect to the axis of rotation, therefore in the following definitions the origin of the position is always implies to be the axis of rotation.
 ## Angular momentum
 > *Definition 1*: the angular momentum $L$ of a point mass with position $\mathbf{r}$ and a momentum $\mathbf{p}$ is defined as
 >
 > $$
 >   \mathbf{L} = \mathbf{r} \times \mathbf{p},
 > $$
 >
 > for all $\mathbf{r}$ and $\mathbf{p}$. 
 ## Torque
 > *Definition 2*: the torque $\mathbf{\Gamma}$ acting on a point mass with position $\mathbf{r}$ for a force $\mathbf{F}$ os defined as
 >
 > $$
 >   \mathbf{\Gamma} = \mathbf{r} \times \mathbf{F},
 > $$
 >
 > for all $\mathbf{r}$ and $\mathbf{F}$.
 The torque is related to the angular momentum by the following proposition.
 > *Proposition 1*: let $\mathbf{L}: t \mapsto \mathbf{L}(t)$ be the angular momentum of a point mass, then it holds that
 >
 > $$
 >   \mathbf{L}'(t) = \mathbf{\Gamma}(t),
 > $$
 >
 > for a constant $\mathbf{r}$ and all $t \in \mathbb{R}$ with $\mathbf{\Gamma}: t \mapsto \mathbf{\Gamma}(t)$ the torque acting on the point mass.
 ??? note "*Proof*:"
    Let $\mathbf{L}: t \mapsto \mathbf{L}(t)$ be the angular momentum of a point mass and suppose $\mathbf{r}$ is constant, then
    $$
        \mathbf{L}'(t) \overset{\mathrm{def}} = d_t (\mathbf{r} \times \mathbf{p}(t)) = \mathbf{r} \times \mathbf{p}'(t),
    $$
    by [proposition](momentum.md) we have $\mathbf{p}'(t) = \mathbf{F}(t)$, therefore
    $$
    \mathbf{L}'(t) = \mathbf{r} \times \mathbf{F}(t) \overset{\mathrm{def}} = \mathbf{\Gamma}(t). 
    $$
--- a/docs/physics/electromagnetism/maxwell-equations.md
+++ b/docs/physics/electromagnetism/maxwell-equations.md
@ -0,0 +1 @@
 # Maxwell equations
--- a/docs/physics/electromagnetism/optics/diffraction.md
+++ b/docs/physics/electromagnetism/optics/diffraction.md
@ -0,0 +1,173 @@
 # Diffraction
 ## Huygens principle
 Huygens principle will be used to derive equations for diffraction. 
 > *Assumption*: According to Huygens principle each point on the wavefront of an electromagnetic wave acts as a source of secondary wavelets. When summed over an extended unobstructed wavefront the secondary wavelets recreate the next wavefront. It is assumed that this principle is valid as it is consistent with the laws of reflection and refraction.
 The following theorem follows from Huygens principle.
 > *Law*: the net disturbance $E_P: \mathbb{R} \to \mathbb{R}$ at a perceive point $P$ for a wave travelling from source point $S$ travelling a distance $r' \in \mathbb{R}$ to an aperture opening defined for the points in $D \subseteq \mathbb{R}$ and then travelling a distance $r \in \mathbb{R}$ towards $p$ is given by
 >
 > $$
 >   E_P(t) =  E_0 k e^{-i \omega t} \int_D \frac{1}{2 r r'} (1 + \cos \theta) e^{ik (r+r')} dA,
 > $$
 >
 > for all $t \in \mathbb{R}$ with $E_0 \in \mathbb{R}$, $k \in \mathbb{R}$ the wavenumber of the light, $\omega \in \mathbb{R}$ the angular frequency of the light and $\theta \in [0, 2\pi)$ the angle between the source, aperture and perceive point.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Law*: for two complementary apertures that when taken together form a single opaque screen. Let $E_1$ and $E_2$ be the field at point $P$ for each aperture respectively. Then the combination of these fields must give the unubstructed wave $E_0$. Therefore
 >
 > $$
 >   E_0 = E_1 + E_2.
 > $$
 ??? note "*Proof*:"
    Will be added later.
 ## Fraunhofer diffraction
 The above law for the diffraction at a perceive point $P$ can be simplified under certain conditions such that the integral can be solved easier. 
 > *Corollary*: for small angles between the source, aperture and perceive point $\theta$, implying that source and perceive points are far away and the aperture opening is small then in reasonable approximation the net disturbance $E_P: \mathbb{R} \to \mathbb{R}$ at the perceive point may be given by
 >
 > $$
 >   E_P = E_0 \int_D e^{ikr}dA,
 > $$
 >
 > with $E_0 \in \mathbb{R}$ and $k \in \mathbb{R}$ the wavenumber. Under the condition that
 >
 > $$
 >   r >> \frac{h^2}{2\lambda},
 > $$
 >
 > with $h \in \mathbb{R}$ the height of the aperture and $\lambda \in \mathbb{R}$ the wavelength of the light. 
 ??? note "*Proof*:"
    Will be added later.
 From this simplification the net disturbance caused by several apertures can be derived, given in the corollaries below. 
 > *Corollary*: the net disturbance $E: \mathbb{R} \to \mathbb{R}$ of the eletric field for a single slit aperture is given by
 >
 > $$
 >   E(\theta) = E_0 \text{ sinc } \beta(\theta),
 > $$
 >
 > for all $\theta \in \mathbb{R}$ with $\beta(\theta) = \frac{kb}{2} \sin \theta$ and $E_0, k, b \in \mathbb{R}$ the magnitude of the electric field, the wavenumber and the width of the slit.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Corollary*: the net disturbance $E: \mathbb{R} \to \mathbb{R}$ of the eletric field for a rectangular aperture is given by
 >
 > $$
 >   E(\theta, \varphi) = E_0 \text{ sinc } \alpha(\theta) \text{ sinc } \beta(\varphi),
 > $$
 >
 > for all $(\theta, \varphi) \in \mathbb{R}^2$ with $\alpha(\theta) = \frac{ka}{2} \sin \theta$, $\beta(\varphi) = \frac{kb}{2} \sin \varphi$ and $E_0, k, a, b \in \mathbb{R}$ the magnitude of the electric field, the wavenumber, the height and the width of the rectangle.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Corollary*: the net disturbance $E: \mathbb{R} \to \mathbb{R}$ of the eletric field for a circular aperture is given by
 >
 > $$
 >   E(\theta) = E_0 \frac{2 J_1(\sigma(\theta))}{\sigma(\theta)},
 > $$
 >
 > for all $\theta \in \mathbb{R}^2$ with $J_1: \mathbb{R} \to \mathbb{R}$ the Bessel function of the first order, $\sigma(\theta) = \frac{kd}{2} \sin \theta$ and $E_0, k, d \in \mathbb{R}$ the magnitude of the electric field, the wavenumber and the diameter of the circle.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Corollary*: the net disturbance $E: \mathbb{R} \to \mathbb{R}$ of the eletric field for a $N$-slit aperture with $N \in \mathbb{N}$ is given by
 >
 > $$
 >   E(\theta) = E_0 \text{ sinc } \beta(\theta) \frac{\sin N \gamma(\theta)}{N \sin \gamma(\theta)}
 > $$
 >
 > for all $\theta \in \mathbb{R}$ with $\beta(\theta) = \frac{kb}{2} \sin \theta$, $\gamma(\theta) = \frac{kd}{2} \sin \theta$ and $E_0, k, d, b \in \mathbb{R}$ the magnitude of the electric field, the wavenumber, the distance between the slits and the width of the slits.
 ??? note "*Proof*:"
    Will be added later.
 When taking $N \to \infty$ for the $N$-slits aperture and incidence is normal principal maxima are obtained for $\gamma(\theta) = m \pi$ with $m \in \mathbb{Z}$ therefore 
 $$
    d \sin \theta = m \lambda,
 $$
 with $d, \lambda \in \mathbb{R}$ the distance between the slits and the wavelength of the light.
 When incidence $\theta_i \in \mathbb{R}$ is not normal the principal maxima are given by
 $$
    d (\sin \theta_i + \sin \theta) = m \lambda,
 $$
 also known as the grating equation.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Definition*: two point sources given by the net disturbances of the eletric field $E_{1,2}: D \to \mathbb{R}$ with $D \subseteq \mathbb{R}$ such that $E_{1,2}$ are bijective can be resolved if they satisfy the Reyleigh criterion given by
 >
 > $$
 >   \min E_2^{-1}(E_{02}) \geq \min E_1^{-1}(0),
 > $$
 >
 > $$
 >   \min E_1^{-1}(E_{01}) \geq \min E_2^{-1}(0),
 > $$
 >
 > with $E_{0(1,2)} \in \mathbb{R}$ the eletric field amplitudes.
 This definition will be used in the following propositions.
 > *Proposition*: the chromatic resolving power $\mathcal{R}$ of a $N$-slit aperture based on the Reyleigh criterion can be determined by
 > 
 > $$
 >   \mathcal{R} = N m,
 > $$
 >
 > with $m \in \mathbb{Z}$ the order of the principal maxima and $N \in \mathbb{N}$ the number of slits.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Proposition*: the free spectral range $\text{FSR}$ of a $N$-slit aperture can be determined by
 >
 > $$
 >   \text{FSR} = \frac{\lambda}m,
 > $$
 >
 > with $m \in \mathbb{Z}$ the order and $\lambda \in \mathbb{R}$ the wavelength.
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/physics/electromagnetism/optics/electromagnetic-waves.md
+++ b/docs/physics/electromagnetism/optics/electromagnetic-waves.md
@ -0,0 +1,102 @@
 # Electromagnetic waves
 This section is a direct follow up on the section [Maxwell equations](../maxwell-equations.md). Where the Laplacian of the electric field $\mathbf{E}: U \to \mathbb{R}^3$ and magnetic field $\mathbf{B}: U \to \mathbb{R}^3$ in vacuum ($\varepsilon = \varepsilon_0, \mu = \mu_0$) have been determined, given by
 $$
 \begin{align*}
    &\nabla^2 \mathbf{E}(\mathbf{v}, t) = \varepsilon_0 \mu_0 \partial_t^2 \mathbf{E}(\mathbf{v}, t) \\\\
    &\nabla^2 \mathbf{B}(\mathbf{v}, t) = \varepsilon_0 \mu_0 \partial_t^2 \mathbf{B}(\mathbf{v}, t)
 \end{align*}
 $$
 for all $(\mathbf{v}, t) \in U$. 
 It may be observed that the eletric and magnetic field comply with the $3 + 1$ dimensional wave equation posed in the section [waves](waves.md). Obtaining the speed $v \in \mathbb{R}$ given by
 $$
    v = \frac{1}{\sqrt{\varepsilon_0 \mu_0}} = c,
 $$
 defined by $c$ the speed of light, or more generally the speed of information in the universe. Outside vacuum we have 
 $$
    v = \frac{1}{\sqrt{\varepsilon \mu}} = \frac{c}{n},
 $$
 with $n = \sqrt{K_E K_B}$ the index of refraction. 
 > *Proposition*: let $\mathbf{E},\mathbf{B}: U \to \mathbb{R}^3$, a solution for the wave equations of the electric and magnetic field may be harmonic linearly polarized plane waves satisfying Maxwell's equations given by
 >
 > $$
 > \begin{align*}
 > \mathbf{E}(\mathbf{v}, t) &= \text{Im}\Big(\mathbf{E}_0 \exp i \big(\langle \mathbf{k}, \mathbf{v} \rangle - \omega t+ \varphi\big) \Big) \\ \\ \mathbf{B}(\mathbf{v}, t) &= \text{Im} \Big(\mathbf{B}_0 \exp i \big(\langle \mathbf{k}, \mathbf{v} \rangle - \omega t+ \varphi\big) \Big)
 > \end{align*}
 > $$
 >
 > for all $(\mathbf{v}, t) \in U$ with $\mathbf{E}_0, \mathbf{B}_0 \in \mathbb{R}^3$. 
 ??? note "*Proof*:"
    Will be added later.
 The above proposition gives an example of a light wave, but note that there are much more solutions that comply to Maxwell's equations.
 > *Law*: the electric field $\mathbf{E}$ and the magnetic field $\mathbf{B}$ for all solutions of the posed wave equations are orthogonal to the direction of propagation $\mathbf{k}$. Therefore electromagnetic waves are transverse.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Law*: the electric field $\mathbf{E}$ and the magnetic field $\mathbf{B}$ in a electromagnetic wave are orthogonal to each other; $\langle \mathbf{E}, \mathbf{B} \rangle = 0$. 
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Corollary*: it follows from the above law that the magnitude of the electric and magnetic fields $E, B: U \to \mathbb{R}$ in a electromagnetic wave are related by 
 >
 > $$
 >   E(\mathbf{v}, t) = v B(\mathbf{v}, t)
 > $$
 >
 > for all $(\mathbf{v}, t) \in U$ with $v = \frac{c}{n}$ the wave speed. 
 ??? note "*Proof*:"
    Will be added later.
 ## Energy flow
 > *Law*: the energy flux density $\mathbf{S}: U \to \mathbb{R}^3$ of an electromagnetic wave is given by
 >
 > $$
 >   \mathbf{S}(\mathbf{v}, t) = \frac{1}{\mu_0} \mathbf{E}(\mathbf{v}, t) \times \mathbf{B}(\mathbf{v}, t),
 > $$
 >
 > for all $(\mathbf{v}, t) \in U$. $\mathbf{S}$ is also called the Poynting vector.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Definition*: the time average of the magnitude of $\mathbf{S}$ is called the irradiance. 
 <br>
 > *Proposition*: the irradiance $I \in \mathbb{R}$ for harmonic linearly polarized plane electromagnetic waves is given by
 >
 > $$
 >   I = \frac{\varepsilon_0 c}{2} E_0^2,
 > $$
 >
 > with $E_0$ the magnitude of $\mathbf{E}_0$. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/physics/electromagnetism/optics/geometric-optics.md
+++ b/docs/physics/electromagnetism/optics/geometric-optics.md
@ -0,0 +1,196 @@
 # Geometric optics
 > *Definition*: surfaces that reflect or refract rays leaving a source point $s$ to a conjugate point $p$ are defined as Cartesian surfaces.
 <br>
 > *Definition*: a perfect image of a point is possible with a stigmatic system. For the set of conjugated points no diffraction and abberations occur, obtaining reversible rays.
 <br>
 > *Assumption*: in geometric optics use will be made of the paraxial approximation that states that for small angles $\theta$ 
 >
 > $$
 >   \tan \theta \approx \sin \theta \approx \theta,
 > $$
 >
 > and 
 >
 > $$
 >   \cos \theta \approx 1,
 > $$
 >
 > comes down to using the first term of the Taylor series approximation.
 <br>
 ## Spherical surfaces
 > *Law*: for a spherical reflecting interface in paraxial approximation the relation between the object and image distance $s_{o,i} \in \mathbb{R}$ and the radius $R \in \mathbb{R}$ of the interface is given by
 >
 > $$
 >   \frac{1}{s_o} + \frac{1}{s_i} = \frac{2}{R}
 > $$
 >
 > with $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium. 
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Definition*: for a object distance $s_0 \to \infty$ we let the image distance $s_i = f$ with $f \in \mathbb{R}$ the focal length defining the focal point of the spherical interface.
 Then it follows from the definition that
 $$
    \frac{1}{s_o} + \frac{1}{s_i} = \frac{1}{f}.
 $$
 > *Law*: for a spherical refracting interface in paraxial approximation the relation between the object and image distance $s_{o,i} \in \mathbb{R}$ and the radius $R \in \mathbb{R}$ of the interface is given by
 >
 > $$
 >   \frac{n_i}{s_o} + \frac{n_t}{s_i} = \frac{n_t - n_i}{R}
 > $$
 > 
 > with $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium. 
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Definition*: the transverse magnification $M$ for a optical system is defined as
 >
 > $$
 >   M = \frac{y'}{y}
 > $$
 >
 > with $y, y' \in \mathbb{R}$ the object and image size.
 <br>
 > *Corollary*: the transverse magnification $M$ for a spherical refracting interface in paraxial approximation is by
 >
 > $$
 >   M = - \frac{n_i s_i}{n_t s_o},
 > $$
 >
 > with $s_{o,i} \in \mathbb{R}$ the object and image distance and $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium. 
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Definition*: a lens is defined by two intersecting spherical interfaces with radius $R_1, R_2 \in \mathbb{R}$ respectively.
 <br>
 > *Law*: for a thin lens in paraxial approximation the radii $R_1, R_2 \in \mathbb{R}$ are related to the focal length $f \in \mathbb{R}$ of the lens by
 >
 > $$
 >   \frac{1}{f} = \frac{n_t - n_i}{n_i} \bigg( \frac{1}{R_1} - \frac{1}{R_2} \bigg),
 > $$
 >
 > with $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium. 
 >
 > With the transverse magnification $M$ given by
 >
 > $$
 >   M = - \frac{s_i}{s_o},
 > $$
 >
 > with the object and image distance $s_{o,i} \in \mathbb{R}$.
 ??? note "*Proof*:"
    Will be added later.
 ## Sign convention
 Converging optics have positive focal lengths and diverging optics have negative focal lengths.
 Objects are located left of the optic by a positive object distance and images are located right of the optic by a positive image distance.
 ## Ray tracing
 > *Assumption*: using paraxial approximation and assuming that all optical elements have rotational symmetry and are aligned coaxially along a single optical axis.
 A ray matrix model may be introduced where the ray is defined according to its intersection with a reference plane.
 > *Definition*: a ray may be defined by its intersection with a reference plane by
 >
 > * the parameter $y \in \mathbb{R}$ is the perpendicular distance between the optical axis and the intersection point,
 > * the angle $\theta \in [0, 2\pi)$ is the angle the ray makes with the horizontal.
 <br>
 > *Proposition*: for the translation of the ray between two reference planes within the same medium seperated by a horizontal distance $d \in \mathbb{R}$ the relation
 >
 > $$
 >   \begin{pmatrix} y_2 \\ \theta_2 \end{pmatrix} = \begin{pmatrix} 1 & d \\ 0 & 1 \end{pmatrix} \begin{pmatrix} y_1 \\ \theta_1 \end{pmatrix},
 > $$
 >
 > holds, for $y_{1,2} \in \mathbb{R}$ and $\theta_{1,2} \in [0, 2\pi)$. 
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Proposition*: for the reflection of the ray at the plane of incidence at a spherical interface of radius $R \in \mathbb{R}$ the relation
 >
 > $$
 >   \begin{pmatrix} y_2 \\ \theta_2 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ 2 / R & 1 \end{pmatrix} \begin{pmatrix} y_1 \\ \theta_1 \end{pmatrix},
 > $$
 >
 > holds, for $y_{1,2} \in \mathbb{R}$ and $\theta_{1,2} \in [0, 2\pi)$. 
 ??? note "*Proof*:"
    Will be added later.
 This matrix may also be given in terms of the focal length $f \in \mathbb{R}$ by
 $$
    \begin{pmatrix} 1 & 0 \\ f & 1 \end{pmatrix}.
 $$
 > *Proposition*: fir the refraction of the ray at the plane of incidence at a spherical interfance of radius $R \in \mathbb{R}$ the relation
 >
 > $$
 >   \begin{pmatrix} y_2 \\ \theta_2 \end{pmatrix} = \begin{pmatrix} 1 & 0 \\ - \frac{n_t - n_i}{n_t R} & \frac{n_i}{n_t} \end{pmatrix} \begin{pmatrix} y_1 \\ \theta_1 \end{pmatrix}
 > $$
 >
 > holds, for $y_{1,2} \in \mathbb{R}$, $\theta_{1,2} \in [0, 2\pi)$ and $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium. 
 ??? note "*Proof*:"
    Will be added later.
 This matrix may also be given in terms of the focal length $f \in \mathbb{R}$ by
 $$
    \begin{pmatrix} 1 & 0 \\ -\frac{1}{f} & 1 \end{pmatrix}.
 $$
 > *Law*: the ray matrix model taken as a linear sequence of interfaces and translations can be used to model optical systems of arbitrary complexity under the posed assumptions.
 ??? note "*Proof*:"
    Will be added later.
 ## Abberations
 > *Definition*: an abberation is any effect that prevents a lens from forming a perfect image.
 Various abberations could be
 * Spherical abberation: error of the paraxial approximation.
 * Chromatic abberation: error due to different index of refraction for different wavelengths of light.
 * Astigmatism: deviation from the cylindrical symmetry.
--- a/docs/physics/electromagnetism/optics/interference.md
+++ b/docs/physics/electromagnetism/optics/interference.md
@ -0,0 +1,198 @@
 # Interference
 > *Definition*: when waves are combined in phase they combine to give a larger amplitude constructive interference occurs. When waves are combined out of phase they tend to cancel, destructive interference occurs.
 ## Two source interference
 For interference between two monochromatic electromagnetic waves given by
 $$
 \begin{align*}
    \mathbf{E}_1(\mathbf{v}, t) = \mathbf{E}_{01} \exp i \big(\langle \mathbf{k_1}, \mathbf{v} - \mathbf{s}_1 \rangle - \omega_1 t + \varphi_1 \big), \\
    \\
    \mathbf{E}_2(\mathbf{v}, t) = \mathbf{E}_{02} \exp i \big(\langle \mathbf{k_2}, \mathbf{v} - \mathbf{s}_2 \rangle - \omega_2 t + \varphi_2 \big), \\
 \end{align*}
 $$
 for all $(\mathbf{v}, t) \in U$ with $\mathbf{k}_{1,2} \in \mathbb{R}^3$ the wavenumber, $\mathbf{s}_{1,2} \in \mathbb{R}^3$ the position of the sources. Then we have the combined disturbance at  $\mathbf{v}$ is given by
 $$
 \begin{align*}
    \mathbf{E}(\mathbf{v}, t) &= \mathbf{E}_1(\mathbf{v}, t) + \mathbf{E}_2(\mathbf{v}, t), \\
                              &= \mathbf{E}_{01} \exp i \delta_1(\mathbf{v},t) + \mathbf{E}_{02} \exp i \delta_2(\mathbf{v},t),
 \end{align*}
 $$
 for all $(\mathbf{v}, t) \in U$ with $\delta_i$ the phase difference for $i \in \{1,2\}$ given by
 $$
    \delta_i(\mathbf{v}, t) = \langle \mathbf{k_i}, \mathbf{v} - \mathbf{s}_i \rangle - \omega_i t + \varphi_i.
 $$
 > *Law*: the irradiance at point $\mathbf{v}$ is then given by
 >
 > $$
 >   I(\mathbf{v}, t) = I_1 + I_2 + 2\sqrt{I_1 I_2} \cos \Big(\delta_2(\mathbf{v}, t) - \delta_1(\mathbf{v}, t) \Big),
 > $$
 >
 > for all $(\mathbf{v}, t) \in U$ with $I_{1,2} \in \mathbb{R}$ the irradiance for each wave seperately.
 ??? note "*Proof*:"
    Will be added later.
 Let $\delta(\mathbf{v}, t) = \delta_2(\mathbf{v}, t) - \delta_1(\mathbf{v}, t)$, then we have for $\delta(\mathbf{v}, t) = 2 m \pi$ with $m \in \mathbb{Z}$ constructive interference and for $\delta(\mathbf{v}, t) = (2m + 1) \pi$ we have destructive interference. 
 Writing out $\delta$ for plane waves of the same angular frequency $\omega = \omega_1 = \omega_2$ and propation in the $x$-direction gives 
 $$
    \delta(x, t) = k(x_2 - x_1) + (\varphi_2 - \varphi_1) = \frac{2\pi}{\lambda_0} n (x_2 - x_1) + (\varphi_2 - \varphi_1),
 $$
 for all $(x,t) \in \mathbb{R}^2$ and $n \in \mathbb{R}$ the index of refraction of the medium. The optical path difference is defined as $n (x_2 - x_1)$. 
 ### Double slit interference
 Interference is created by plane waves illuminating both slits creating disturbances at both slits that are correlated in time. Assuming the slits are points sources and the waves have the same frequency, we have a superposition point $P$ described vertically with $y \in \mathbb{R}$ and $r_{1,2} \in \mathbb{R}$ the traveling distances from the slits to this point. Obtaining a phase difference
 $$
    \delta = k(r_2 - r_1) + (\varphi_2 - \varphi_1),
 $$
 ??? note "*Proof*:"
    Will be added later.
 <br>
 If we have $L \in \mathbb{R}$ the horizontal length between the slits and the point $P$ and $d \in \mathbb{R}$ the distance between the slits and assume $L >> d$ and $\varphi_2 - \varphi_1 = 0$ then
 $$
    \delta(\theta) = kd \sin \theta,
 $$
 for all $\theta \in [-\frac{\pi}{2}, \frac{\pi}{2}]$ with $\tan \theta = \frac{y}{L}$.
 ??? note "*Proof*:"
    Will be added later.
 ## Thin film interference
 Interference is created by plane waves illuminating a thin film of thickness $l \in \mathbb{R}$ and index of refraction $n_l \in \mathbb{R}$ under an angle of incidence $\theta \in [-\frac{\pi}{2}, \frac{\pi}{2}]$ deposited on a substrate with index of refraction $n_i \in \mathbb{R}$. A phase shift is introduced between the first external and internal reflected rays obtaining a phase difference $\delta$ given by
 $$
    \delta(\theta) = k 2l \sqrt{n_l^2 - n_i^2 \sin^2 \theta},
 $$
 for all $\theta \in [-\frac{\pi}{2}, \frac{\pi}{2}]$ with $k \in \mathbb{R}$ the wavenumber. 
 ??? note "*Proof*:"
    Will be added later.
 ## Michelson interferometer
 Interference created by splitting and recombining plane waves that have a difference in optical path. With a setup of two mirrors displaced with lengths $L_1, L_2 \in \mathbb{R}$ from the beam splitter under an angle $\theta$ with respect to the incoming plane wave. Assuming the setup is in *one* medium with index of refraction $n \in \mathbb{R}$. Obtaining a phase difference $\delta$ given by
 $$
    \delta(\theta) = k 2n(L_2 - L_1) \cos \theta + \pi,
 $$
 for all $\delta \in [-\frac{\pi}{2}, \frac{\pi}{2}]$ with $k \in \mathbb{R}$ the wavenumber.
 ??? note "*Proof*:"
    Will be added later.
 ## Fabry-perot interferometer
 Interference created by a difference in optical path length with a setup consisting of two parallel flat reflective surfaces seperated by a distance $d \in \mathbb{R}$ If both surfaces have reflection and transmission amplitude ratios $r,t \in [0,1]$ then the phase difference $\delta$ between two adjecent transmitted rays under an angle $\theta \in [-\frac{\pi}{2}, \frac{\pi}{2}]$ is given by
 $$
    \delta(\theta) = 2 kd \cos \theta + 2 \varphi,
 $$
 for all $\theta \in [-\frac{\pi}{2}, \frac{\pi}{2}]$ with $\varphi \in [0. 2\pi)$ the phase change due to reflection dependent on the amplitude ratios.  
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Definition*: The finesse $\mathcal{F}$ and the coefficient of finesse $F$ of a Fabry Perot interferometer are defined by
 >
 > $$
 >   F = \frac{4R}{(1-R)^2} \quad\text{ and }\quad \mathcal{F} = \frac{\pi \sqrt{F}}{2} = \frac{\pi \sqrt{R}}{1 - R},
 > $$
 >
 > with $R \in [0,1]$ the reflectance. The finesse can be seen as the measure of sharpness of the interference pattern.
 <br>
 > *Proposition*: the transmitted irradiance $I$ of a Fabry Perot interferometer is given by
 >
 > $$
 >   I(\theta) = \frac{I_0}{1 + 4 (\mathcal{F} / \pi)^2  \sin^2 (\delta(\theta) / 2)}
 > $$
 >
 > for all $\theta \in [-\frac{\pi}{2}, \frac{\pi}{2}]$ with $I_0 \in \mathbb{R}$.  
 ??? note "*Proof*:"
    Will be added later.
 ### The chromatic resolving power and free spectral range
 The chromatic resolving power and free spectral range are measures that define the ability to distinguish certain features in interference or diffraction patterns. 
 > *Definition*: The full width at half maximum $\text{FWHM}$ for the interference pattern of the Fabry Perot interferometer is defined to be 
 >
 > $$
 >   \text{FWHM} = \frac{4}{\sqrt{F}},
 > $$
 >
 > with $F \in \mathbb{R}$.
 <br>
 > *Definition*: the chromatic resolving power $\mathcal{R}$ is defined by
 >
 > $$
 >   \mathcal{R} = \frac{\lambda}{\Delta \lambda},
 > $$
 >
 > with $\lambda \in \mathbb{R}$ the base wavelength of the light and $\Delta \lambda \in \mathbb{R}$ the spectral resolution at the wavelength $\lambda$. 
 <br>
 > *Proposition*: the chromatic resolving power $\mathcal{R}$ of a Fabry Perot interferometer based on the $\text{FWHM}$ can be determined by
 >
 > $$
 >   \mathcal{R} = \mathcal{F} m,
 > $$
 >
 > with $m \in \mathbb{Z}$ the order of the principal maxima and $\mathcal{F} \in \mathbb{R}$ the finesse. 
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Definition*: the free spectral range $\text{FSR}$ is the largest wavelength range for a given order that does not overlap the same range in an adjacent order.
 <br>
 > *Proposition*: the free spectral range $\text{FSR}$ of a Fabry Perot interferometer can be determined by
 >
 > $$
 >   \text{FSR} = \frac{\lambda}{m},
 > $$
 >
 > with $m \in \mathbb{Z}$ the order and $\lambda \in \mathbb{R}$ the wavelength. 
 ??? note "*Proof*:"
    Will be added later.
--- a/docs/physics/electromagnetism/optics/polarisation.md
+++ b/docs/physics/electromagnetism/optics/polarisation.md
@ -0,0 +1,179 @@
 # Polarisation
 If we consider an electromagnetic wave $\mathbf{E}: \mathbb{R}^2 \to \mathbb{R}^3$ with wavenumber $k \in \mathbb{R}$ and angular frequency $\omega \in \mathbb{R}$ propagating in the positve $z$-direction given by
 $$
    \mathbf{E}(z,t) = \exp i(kz - \omega t + \varphi_1) E_0^{(x)} \mathbf{e}_{(x)} + \exp i(kz - \omega t + \varphi_2) E_0^{(y)}\mathbf{e}_{(y)},
 $$
 for all $(z,t) \in \mathbb{R}^2$ with $E_0^{(x)}, E_0^{(y)} \in \mathbb{R}$ the magnitude of the wave in the $x$ and $y$ direction. We define $\Delta \varphi = \varphi_2 - \varphi_1$. 
 > *Definition*: the electromagnetic wave $\mathbf{E}$ is linear polarised if and only if
 >
 > $$
 >   \Delta \varphi = \pi m,
 > $$
 >
 > for all $m \in \mathbb{Z}$. 
 With polarisation angle $\theta \in [0, 2\pi)$ given by 
 $$
    \theta = \arctan \Bigg( \frac{\max E_0^{(y)}}{\max E_0^{(x)}} \Bigg).
 $$
 ??? note "*Proof*:"
    Will be added later.
 > *Definition*: the electromagnetic wave $\mathbf{E}$ is left circular polarised if and only if
 >
 > $$
 >   \Delta \varphi = \frac{\pi}{2} \;\land\; E_0^{(x)} = E_0^{(y)},
 > $$
 >
 > and right circular polarised if and only if 
 >
 > $$
 >   \Delta \varphi = -\frac{\pi}{2} \;\land\; E_0^{(x)} = E_0^{(y)}.
 > $$
 For every state in between we have elliptical polarisation with a polarisation angle $\theta \in [0, 2\pi)$ given by
 $$
    \theta = \frac{1}{2} \arctan \Bigg(\frac{2 E_0^{(x)} E_0^{(y)} \cos \Delta\varphi}{ \big(E_0^{(x)} \big)^2- \big( E_0^{(y)} \big)^2} \Bigg).
 $$
 ??? note "*Proof*:"
    Will be added later.
 > *Definition*: natural light is defined as light constisting of all linear polarisation states. 
 ## Linear polarisation
 > *Definition*: a linear polariser selectively removes light that is linearly polarised along a direction perpendicular to its transmission axis. 
 We may concretisize this definition by the following statement, considered to be Malus law.
 > *Law*: for a light beam with amplitude $E_0$ incident on a linear polariser the transmitted beam has amplitude $E_0 \cos \theta$ with $\theta \in [0, 2\pi)$ the polarisation angle of the light with respect to the transmission axis. The transmitted irradiance $I: [0, 2\pi) \to \mathbb{R}$ is then given by
 >
 > $$
 >   I(\theta) = I_0 \cos^2 \theta,
 > $$
 >
 > for all $\theta \in [0, 2\pi)$ with $I_0 \in \mathbb{R}$ the irradiance of the incident light. 
 ??? note "*Proof*:"
    Will be added later.
 For natural light the average of all angles must be taken, since $\lim_{\theta \to \infty} \cos^2 \theta = \frac{1}{2}$, we have the relation $I = \frac{1}{2} I_0$ for natural light. 
 ## Birefringence
 Natural light can be polarised in several ways, some are listed below.
 1. Polarisation by absorption of the other component. This can be done with a wiregrid or dichroic materials for smaller wavelengths. 
 2. Polarisation by scattering. Dipole radiation has distinct polarisation depending on the position.
 3. Polarisation by Brewster angle, which boils down to scattering.
 4. Polarisation by birefringence, the double refraction of light obtaining two orthogonal components polarised.
 ??? note "*Proof*:"
    Will be added later.
 > *Definition*: birefringence is a double refraction in a material (often crystalline) and can be derived from the Fresnel equations without assuming isotropic dielectric properties. 
 If isotropic dielectric properties are not assumed it implies that the refractive index may also depend on the polarisation and propgation direction of light.
 Using the properties of birefringence, wave plates (retarders) can be created. They may introduce a phase difference via a speed difference in the polarisation direction.
 * A half-wave plate may introduce a $\Delta \varphi = \pi$ phase difference.
 * A quarter-wave plate may introduce a $\Delta \varphi = \frac{\pi}{2}$ phase difference. 
 ## Jones formalism of polarisation
 Jones formalism of polarisation with vectors and matrices can make it easier to calculate the effects of optical elements such as linear polarizers and wave plates.
 > *Definition*: for an electromagnetic wave $\mathbf{E}: \mathbb{R}^2 \to \mathbb{R}^3$ with wavenumber $k \in \mathbb{R}$ and angular frequency $\omega \in \mathbb{R}$ propagating in the positive $z$-direction given by
 >
 > $$
 >   \mathbf{E}(z,t) = \mathbf{E}_0 \exp i(kz - \omega t),
 > $$
 >
 > for all $(z,t) \in \mathbb{R}^2$. The Jones vector $\mathbf{\tilde E}$ is defined as
 >
 > $$
 >   \mathbf{\tilde E} = \mathbf{E}_0, 
 > $$
 >
 > possibly normalized with $\|\mathbf{\tilde E}\| = 1$.
 For linear polarised light under an angle $\theta \in [0, 2\pi)$ the Jones vector $\mathbf{\tilde E}$ is given by
 $$
    \mathbf{\tilde E} = \begin{pmatrix}\cos \theta \\ \sin \theta\end{pmatrix}.
 $$
 For left circular polarised light the Jones vector $\mathbf{\tilde E}$ is given by
 $$
    \mathbf{\tilde E} = \begin{pmatrix} 1 \\ i \end{pmatrix},
 $$
 and for right circular polarised light
 $$
    \mathbf{\tilde E} = \begin{pmatrix} 1 \\ -i \end{pmatrix}.
 $$
 > *Definition*: Jones matrices $M_i$ with $i \in \{1, \dots, n\}$ with $n \in \mathbb{N}$ may be used to model several optical elements on an optical axis, obtaining the transmitted Jones vector $\mathbf{\tilde E}_t$ from the incident Jones vector $\mathbf{\tilde E}_i$ given by
 >
 > $$
 >   \mathbf{\tilde E}_t = M_n \cdots M_1 \mathbf{\tilde E}_i.
 > $$
 The Jones matrices for several optical elements are now given.
 > *Proposition*: the Jones matrix $M$ of a linear polariser is given by
 >
 > $$
 >   M = \begin{pmatrix} \cos^2 \theta & \frac{1}{2} \sin 2\theta \\ \frac{1}{2} \sin 2\theta & \sin^2 \theta \end{pmatrix},
 > $$
 >
 > with $\theta \in [0, 2\pi)$ the transmission axis of the linear polariser.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Proposition*: the Jones matrix $M$ of a half-wave plate is given by
 >
 > $$
 >   M = \begin{pmatrix} \cos 2\theta & \sin 2\theta \\ \sin 2\theta & -\cos 2\theta \end{pmatrix},
 > $$
 >
 > with  $\theta \in [0, 2\pi)$ the fast axis of the half-wave plate.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Proposition*: the Jones matrix $M$ of a quarter-wave plate is given by
 >
 > $$
 >   M = \begin{pmatrix} \cos^2 \theta + \sin^2 \theta & (1 - i) \sin \theta \cos \theta \\ (1 - i) \sin \theta \cos \theta & i(\cos^2 \theta + \sin^2 \theta) \end{pmatrix},
 > $$
 >
 > with $\theta \in [0, 2\pi)$ the fast axis of the quarter-wave plate.
 ??? note "*Proof*:"
    Will be added later.
 <br>
--- a/docs/physics/electromagnetism/optics/reflection-and-refraction.md
+++ b/docs/physics/electromagnetism/optics/reflection-and-refraction.md
@ -0,0 +1,149 @@
 # Reflection and refraction
 > *Definition*: light rays are perpendicular to electromagnetic wave fronts. 
 Reflection and refraction occur whenever light rays enter into a new medium with index of refraction $n \in \mathbb{R}$. Reflection may be informally defined as the change of direction of the rays that stay within the initial medium. Refraction may be informally defined as the change of direction of the rays that transport to the other medium. 
 > *Law*: the law of reflection states that the angle of reflection of a light ray equals the angle of incidence. 
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Law*: the law of refraction states that the angle of refraction $\theta_t \in [0, 2\pi)$ is related to the angle of incidence $\theta_i \in [0, 2\pi)$ by
 >
 > $$
 >   n_i \sin \theta_i = n_t \sin \theta_t,
 > $$
 >
 > with $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium. 
 ??? note "*Proof*:"
    Will be added later.
 ## Fresnel equations
 In this section the fractions of reflected and transmitted power for specific electromagnetic waves will be derived. 
 > *Lemma*: for the electric field perpendicular to the plane of incidence (s-polarisation) the Fresnel amplitude ratios for reflection $r_s \in [0,1]$ and transmission $t_s \in [0,1]$ are given by
 >
 > $$
 > \begin{align*}
 >   r_s &= \frac{n_i \cos \theta_i - n_t \cos \theta_t}{n_i \cos \theta_i + n_t \cos \theta_t}, \\
 >   \\
 >   t_s &= \frac{2 n_i \cos \theta_i}{n_i \cos \theta_i + n_t \cos \theta_t},
 > \end{align*}
 > $$
 >
 > with $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium and $\theta_{i,t} \in [0, 2\pi)$ the angle of incidence and refraction.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Lemma*: for the electric field parallel to the plane of incidence (p-polarisation) the Fresnel amplitude ratios for reflection $r_p \in [0,1]$ and transmission $t_p \in [0,1]$ are given by
 >
 > $$
 > \begin{align*}
 >   r_p &= \frac{n_i \cos \theta_t - n_t \cos \theta_i}{n_i \cos \theta_t + n_t \cos \theta_i}, \\
 >   \\
 >   t_p &= \frac{2 n_i \cos \theta_i}{n_i \cos \theta_t + n_t \cos \theta_i},
 > \end{align*}
 > $$
 >
 > with $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium and $\theta_{i,t} \in [0, 2\pi)$ the angle of incidence and refraction.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Law*: the fraction of the incident power that is reflected is called the reflectivity $R \in [0,1]$  and is given by
 >
 > $$
 >   R = r^2,
 > $$
 >
 > with $r \in [0, 1]$ the Fresnel amplitude ratio for reflection.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 > *Law*: the fraction of the incident power that is transmitted is called the transmissivity $T \in [0,1]$ and is given by
 >
 > $$
 >   T = \bigg(\frac{n_t \cos \theta_t}{n_i \cos \theta_i}\bigg) t^2 
 > $$
 >
 > with $t \in [0, 1]$ the Fresnel amplitude ratio for transmission, $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium and $\theta_{i,t} \in [0, 2\pi)$ the angle of incidence and refraction.
 ??? note "*Proof*:"
    Will be added later.
 <br>
 ## Limiting cases
 > *Corollary*: we have $r_p = 0$ for an incident angle given by
 >
 > $$
 >   \tan \theta_b = \frac{n_t}{n_i},
 > $$
 >
 > with $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium. The angle $\theta_b$ is called the Brewster angle. 
 ??? note "*Proof*:"
    Will be added later.
 Therefore we have for the Brewster angle the reflectivity equal to zero for p-polarisation. Such relation does not exist for s-polarisation.
 > *Corollary*: we have $r_s = 1$ or total reflection for $n_i > n_t$ and an incident angle given by
 >
 > $$
 >   \sin \theta_i > \frac{n_t}{n_i},
 > $$
 >
 > with $n_{i,t} \in \mathbb{R}$ the index of refraction of the incident and transmitted medium. With
 >
 > $$
 >   \sin \theta_c = \frac{n_t}{n_i},
 > $$
 >
 > the critical angle. 
 ??? note "*Proof*:"
    Will be added later.
 ## Phase changes on reflection
 > *Proposition*: a reflected light ray may obtain a phase shift if
 >
 > 1. for all incident angles and $n_i < n_t$ the reflected light ray is phase shifted by $\pi$,
 > 2. for incident angles $\theta_i > \theta_c$ and $n_i > n_t$ the reflected light ray is not phase shifted,
 > 3. the transmitted light ray is not phase shifted.
 ??? note "*Proof*:"
    Will be added later.
 For incident angles $\theta_i < \theta_c$ and $n_i > n_t$ the phase shifts are complex.
 ## Dispersion
 Will be added later.
 ## Scattering
 Will be added later.
--- a/docs/physics/electromagnetism/optics/waves.md
+++ b/docs/physics/electromagnetism/optics/waves.md
@ -0,0 +1,82 @@
 # Waves
 > *Definition*: a wave is a propagating disturbance transporting energy and momentum. A $1 + 1$ dimensional wave $\Psi: \mathbb{R}^2 \to \mathbb{R}$ travelling can be defined by a linear combination of a right and left travelling function $f,g: \mathbb{R} \to \mathbb{R}$ obtaining
 >
 > $$
 >   \Psi(x,t) = f(x - vt) + g(x + vt),
 > $$ 
 >
 > for all $(x,t) \in \mathbb{R}^2$ and $v \in \mathbb{R}$ the speed of the wave. Satisfies the $1 + 1$ dimensional wave equation
 >
 > $$
 >   \partial_x^2 \Psi(x,t) = \frac{1}{v^2} \partial_t^2 \Psi(x,t).
 > $$
 The derivation of the wave equation can be obtained in section...
 > *Theorem*: a right travelling harmonic wave $\Psi: \mathbb{R}^2 \to \mathbb{R}$ with $\lambda, T, A, \varphi \in \mathbb{R}$ the wavelength, period, amplitude and phase of the wave is given by
 >
 > $$
 > \begin{align*}
 >   \Psi(x,t) &= A \sin \big(k(x-vt) + \varphi\big), \\
 >             &= A \sin(kx-\omega t + \varphi), \\
 >             &= A \sin \Big(2\pi \Big(\frac{x}{\lambda} - \frac{t}{T} \Big) + \varphi \Big),
 > \end{align*}
 > $$
 >
 > for all $(x,t) \in \mathbb{R}^2$. With $k = \frac{2\pi}{\lambda}$ the wavenumber, $\omega = \frac{2\pi}{T}$ the angular frequency and $v = \frac{\lambda}{T}$ the wave speed. 
 ??? note "*Proof*:"
    Will be added later.
 A right travelling harmonic wave $\Psi: \mathbb{R}^2 \to \mathbb{R}$ can also be represented in the complex plane given by
 $$
    \Psi(x,t) = \text{Im} \big(A \exp i(kx - \omega t + \varphi )\big),
 $$
 for all $(x,t) \in \mathbb{R}^2$.
 > *Theorem*: let $\Psi: \mathbb{R}^4 \to \mathbb{R}$ be a $3 + 1$ dimensional wave then it satisfies the $3 + 1$ dimensional wave equation given by
 >
 > $$
 >   \nabla^2 \Psi(\mathbf{x},t) = \frac{1}{v^2} \partial_t^2 \Psi(\mathbf{x},t),
 > $$
 >
 > for all $(\mathbf{x},t) \in \mathbb{R}^4$. 
 ??? note "*Proof*:"
    Will be added later.
 We may formulate various solutions $\Psi: \mathbb{R}^4 \to \mathbb{R}$ for this wave equation. 
 The first solution may be the plane wave that follows cartesian symmetry and can therefore best be described in a cartesian coordinate system $\mathbf{v}(x,y,z)$. The solution is given by
 $$
    \Psi(\mathbf{v}, t) = \text{Im}\big(A \exp i(\langle \mathbf{k}, \mathbf{v} \rangle - \omega t + \varphi) \big),
 $$
 for all $(\mathbf{v}, t) \in \mathbb{R}^4$ with $\mathbf{k} \in \mathbb{R}^3$ the wavevector.
 The second solution may be the cylindrical wave that follows cylindrical symmetry and can therefore best be described in a cylindrical coordinate system $\mathbf{v}(r,\theta,z)$. The solution is given by
 $$
    \Psi(\mathbf{v}, t) = \text{Im}\Bigg(\frac{A}{\sqrt{\|\mathbf{v}\|}} \exp i(k \|\mathbf{v} \| - \omega t + \varphi) \Bigg),
 $$
 for all $(\mathbf{v}, t) \in \mathbb{R}^4$. 
 The third solution may be the spherical wave that follows spherical symmetry and can therefore best be described in a spherical coordinate system $\mathbf{v}(r, \theta, \varphi)$. The solution is given by
 $$
    \Psi(\mathbf{v}, t) = \text{Im}\Bigg(\frac{A}{\|\mathbf{v}\|} \exp i(k\|\mathbf{v}\| - \omega t + \varphi) \Bigg)
 $$
 for all $(\mathbf{v}, t) \in \mathbb{R}^4$. 
 > *Principle*: the principle of superposition is valid for waves, since the solution space of the wave equation is linear. 
 From this principle we obtain the property of constructive and destructive interference of waves.
--- a/Show more
+++ b/Show more
		`@ -0,0 +1,3 @@`
							`# Welcome`

							`Here you can find some notes on various matters that serve as a fallback alongside the memory-leak prone neuronal contraption of mine.`