Tutorial 2: Analytic Geometry¶
Course: Mathematics for Machine Learning Instructor: Mohammed Alnemari
📚 Learning Objectives¶
By the end of this tutorial, you will understand:
- Norms and their role in measuring vector magnitude
- Inner products and their defining axioms
- How lengths, distances, angles, and orthogonality arise from inner products
- Orthogonal matrices, orthonormal bases, and orthogonal complements
- Orthogonal projections and the Gram-Schmidt process
- Rotation matrices and their geometric meaning
Part 1: Norms¶
1.1 What is a Norm?¶
A norm is a function \(\|\cdot\| : \mathbb{R}^n \to \mathbb{R}\) that assigns a non-negative "length" to every vector.
Think of it as... a ruler for vectors. Different norms are like different ways of measuring distance — walking along city blocks versus flying in a straight line.
A norm must satisfy these properties for all \(\mathbf{x}, \mathbf{y} \in \mathbb{R}^n\) and all \(\lambda \in \mathbb{R}\):
| Property | Statement | Intuition |
|---|---|---|
| Non-negativity | \(\|\mathbf{x}\| \geq 0\) | Lengths are never negative |
| Definiteness | \(\|\mathbf{x}\| = 0 \iff \mathbf{x} = \mathbf{0}\) | Only the zero vector has zero length |
| Absolute homogeneity | $|\lambda \mathbf{x}| = | \lambda |
| Triangle inequality | \(\|\mathbf{x} + \mathbf{y}\| \leq \|\mathbf{x}\| + \|\mathbf{y}\|\) | The shortcut is never longer than going around |
1.2 Common Norms¶
| Norm | Name | Formula | Also Called |
|---|---|---|---|
| \(\ell_1\) | Manhattan norm | \(\|\mathbf{x}\|_1 = \displaystyle\sum_{i=1}^{n} \|x_i\|\) | Taxicab norm |
| \(\ell_2\) | Euclidean norm | \(\|\mathbf{x}\|_2 = \sqrt{\displaystyle\sum_{i=1}^{n} x_i^2}\) | Standard norm |
| \(\ell_\infty\) | Max norm | $|\mathbf{x}|\infty = \max | x_i |
1.3 Worked Example: Computing Norms¶
Let \(\mathbf{x} = \begin{bmatrix} 3 \\ -4 \\ 2 \end{bmatrix}\).
\(\ell_1\) norm: $\(\|\mathbf{x}\|_1 = |3| + |-4| + |2| = 3 + 4 + 2 = 9\)$
\(\ell_2\) norm: $\(\|\mathbf{x}\|_2 = \sqrt{3^2 + (-4)^2 + 2^2} = \sqrt{9 + 16 + 4} = \sqrt{29} \approx 5.39\)$
\(\ell_\infty\) norm: $\(\|\mathbf{x}\|_\infty = \max\{|3|, |-4|, |2|\} = 4\)$
Think of it as... The \(\ell_1\) norm counts total blocks walked in a grid city. The \(\ell_2\) norm is the straight-line (as the crow flies) distance. The \(\ell_\infty\) norm is the longest single step you take along any one axis.
Part 2: Inner Products¶
2.1 Definition¶
An inner product on a vector space \(V\) is a function \(\langle \cdot, \cdot \rangle : V \times V \to \mathbb{R}\) that satisfies four axioms:
| Axiom | Statement | For all |
|---|---|---|
| Symmetry | \(\langle \mathbf{x}, \mathbf{y} \rangle = \langle \mathbf{y}, \mathbf{x} \rangle\) | \(\mathbf{x}, \mathbf{y} \in V\) |
| Linearity in 1st argument | \(\langle \lambda\mathbf{x} + \mathbf{z}, \mathbf{y} \rangle = \lambda\langle \mathbf{x}, \mathbf{y} \rangle + \langle \mathbf{z}, \mathbf{y} \rangle\) | \(\mathbf{x}, \mathbf{y}, \mathbf{z} \in V,\ \lambda \in \mathbb{R}\) |
| Positive semi-definiteness | \(\langle \mathbf{x}, \mathbf{x} \rangle \geq 0\) | \(\mathbf{x} \in V\) |
| Positive definiteness | \(\langle \mathbf{x}, \mathbf{x} \rangle = 0 \iff \mathbf{x} = \mathbf{0}\) | \(\mathbf{x} \in V\) |
Think of it as... an inner product is a generalized way of multiplying two vectors together to get a single number that tells you "how much" the vectors agree in direction.
2.2 The Dot Product¶
The most common inner product in \(\mathbb{R}^n\) is the dot product:
Example: $\(\left\langle \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}, \begin{bmatrix} 4 \\ 0 \\ -1 \end{bmatrix} \right\rangle = 1(4) + 2(0) + 3(-1) = 4 + 0 - 3 = 1\)$
2.3 General Inner Products and Positive Definite Matrices¶
Not every inner product is the dot product. We can define a more general inner product using a symmetric positive definite matrix \(A\):
A symmetric matrix \(A\) is positive definite if: $\(\mathbf{x}^T A \mathbf{x} > 0 \quad \text{for all } \mathbf{x} \neq \mathbf{0}\)$
Example: Let \(A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}\) and \(\mathbf{x} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}\).
Think of it as... the standard dot product uses the identity matrix \(I\) as \(A\). Choosing a different positive definite \(A\) stretches or skews the geometry, like measuring distance on a tilted surface instead of a flat table.
Part 3: Lengths and Distances¶
3.1 Induced Norm¶
Every inner product induces a norm:
For the standard dot product this gives the Euclidean norm:
3.2 Distance¶
The distance between two vectors \(\mathbf{x}\) and \(\mathbf{y}\) is:
A distance function (metric) satisfies:
| Property | Statement |
|---|---|
| Non-negativity | \(d(\mathbf{x}, \mathbf{y}) \geq 0\) |
| Identity | \(d(\mathbf{x}, \mathbf{y}) = 0 \iff \mathbf{x} = \mathbf{y}\) |
| Symmetry | \(d(\mathbf{x}, \mathbf{y}) = d(\mathbf{y}, \mathbf{x})\) |
| Triangle inequality | \(d(\mathbf{x}, \mathbf{z}) \leq d(\mathbf{x}, \mathbf{y}) + d(\mathbf{y}, \mathbf{z})\) |
3.3 Cauchy-Schwarz Inequality¶
One of the most important inequalities in all of mathematics:
Equality holds if and only if \(\mathbf{x}\) and \(\mathbf{y}\) are linearly dependent (i.e., one is a scalar multiple of the other).
Think of it as... the dot product can never exceed the product of the lengths. This is what guarantees that the cosine of the angle between two vectors always stays between \(-1\) and \(1\).
Example: Let \(\mathbf{x} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}\) and \(\mathbf{y} = \begin{bmatrix} 3 \\ 1 \end{bmatrix}\).
- \(|\langle \mathbf{x}, \mathbf{y} \rangle| = |1(3) + 2(1)| = |5| = 5\)
- \(\|\mathbf{x}\| \cdot \|\mathbf{y}\| = \sqrt{1+4}\,\sqrt{9+1} = \sqrt{5}\,\sqrt{10} = \sqrt{50} \approx 7.07\)
- Check: \(5 \leq 7.07\) ✓
Part 4: Angles and Orthogonality¶
4.1 Angle Between Vectors¶
The angle \(\theta\) between two non-zero vectors \(\mathbf{x}\) and \(\mathbf{y}\) is defined via:
The Cauchy-Schwarz inequality guarantees that the right-hand side lies in \([-1, 1]\), so \(\theta\) is well-defined.
Example: Find the angle between \(\mathbf{x} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}\) and \(\mathbf{y} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}\).
4.2 Orthogonality¶
Two vectors are orthogonal (perpendicular) if their inner product is zero:
Think of it as... orthogonal vectors carry completely independent information — knowing one tells you nothing about the other. This is exactly the idea behind "uncorrelated features" in machine learning.
Example: $\(\left\langle \begin{bmatrix} 1 \\ -1 \end{bmatrix}, \begin{bmatrix} 1 \\ 1 \end{bmatrix} \right\rangle = 1(1) + (-1)(1) = 0 \quad \checkmark \text{ Orthogonal!}\)$
4.3 Orthogonal and Orthonormal Sets¶
A set of vectors \(\{\mathbf{v}_1, \mathbf{v}_2, \ldots, \mathbf{v}_k\}\) is:
| Term | Condition |
|---|---|
| Orthogonal | \(\langle \mathbf{v}_i, \mathbf{v}_j \rangle = 0\) for all \(i \neq j\) |
| Orthonormal | Orthogonal and \(\|\mathbf{v}_i\| = 1\) for all \(i\) |
Example of an orthonormal set in \(\mathbb{R}^2\): $\(\mathbf{e}_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \quad \mathbf{e}_2 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}\)$
- \(\langle \mathbf{e}_1, \mathbf{e}_2 \rangle = 0\) (orthogonal)
- \(\|\mathbf{e}_1\| = 1\) and \(\|\mathbf{e}_2\| = 1\) (unit length)
Part 5: Orthogonal Matrices¶
5.1 Definition¶
A square matrix \(A \in \mathbb{R}^{n \times n}\) is orthogonal if its columns form an orthonormal set. Equivalently:
Think of it as... an orthogonal matrix performs a "rigid" transformation — it can rotate or reflect vectors but never stretches or squishes them.
5.2 Key Properties¶
| Property | Statement |
|---|---|
| Inverse equals transpose | \(A^{-1} = A^T\) |
| Columns are orthonormal | \(\langle \mathbf{a}_i, \mathbf{a}_j \rangle = \delta_{ij}\) |
| Rows are orthonormal | \(A A^T = I\) |
| Preserves lengths | \(\|A\mathbf{x}\| = \|\mathbf{x}\|\) |
| Preserves angles | \(\langle A\mathbf{x}, A\mathbf{y} \rangle = \langle \mathbf{x}, \mathbf{y} \rangle\) |
| Determinant | \(\det(A) = \pm 1\) |
| Product is orthogonal | If \(A, B\) orthogonal, then \(AB\) is orthogonal |
Proof that orthogonal matrices preserve lengths: $\(\|A\mathbf{x}\|^2 = (A\mathbf{x})^T(A\mathbf{x}) = \mathbf{x}^T A^T A \mathbf{x} = \mathbf{x}^T I \mathbf{x} = \mathbf{x}^T \mathbf{x} = \|\mathbf{x}\|^2\)$
5.3 Example¶
Verify \(A^T A = I\): $\(A^T A = \begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\[4pt] -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix} \begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\[4pt] \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I \quad \checkmark\)$
Part 6: Orthonormal Basis¶
6.1 Definition¶
An orthonormal basis (ONB) for a subspace \(U \subseteq \mathbb{R}^n\) is a basis \(\{\mathbf{u}_1, \ldots, \mathbf{u}_k\}\) such that:
6.2 Why Orthonormal Bases are Useful¶
With an orthonormal basis, finding coordinates becomes trivially easy. If \(\{\mathbf{u}_1, \ldots, \mathbf{u}_k\}\) is an ONB for \(U\) and \(\mathbf{x} \in U\), then:
Think of it as... with an orthonormal basis, you find each coordinate by simply taking a dot product — no system of equations to solve. It is the easiest possible coordinate system.
6.3 How to Find an Orthonormal Basis¶
Given any basis, use the Gram-Schmidt process (covered in Part 9) to convert it into an orthonormal basis.
Part 7: Orthogonal Complement¶
7.1 Definition¶
Let \(U\) be a subspace of \(\mathbb{R}^n\). The orthogonal complement \(U^\perp\) is the set of all vectors orthogonal to every vector in \(U\):
Think of it as... if \(U\) is a plane through the origin in 3D, then \(U^\perp\) is the line perpendicular to that plane. Together they account for all of \(\mathbb{R}^3\).
7.2 Key Properties¶
| Property | Statement |
|---|---|
| Subspace | \(U^\perp\) is itself a subspace |
| Dimension | \(\dim(U) + \dim(U^\perp) = n\) |
| Double complement | \((U^\perp)^\perp = U\) |
| Direct sum | \(\mathbb{R}^n = U \oplus U^\perp\) (every vector splits uniquely) |
7.3 Connection to the Kernel and Row Space¶
For a matrix \(A \in \mathbb{R}^{m \times n}\):
This means: a vector \(\mathbf{x}\) is in the null space of \(A\) if and only if \(\mathbf{x}\) is orthogonal to every row of \(A\).
Example: Let \(A = \begin{bmatrix} 1 & 2 \\ 3 & 6 \end{bmatrix}\).
The row space is \(\text{span}\left\{\begin{bmatrix} 1 \\ 2 \end{bmatrix}\right\}\) (the rows are linearly dependent).
The null space is \(\ker(A) = \text{span}\left\{\begin{bmatrix} -2 \\ 1 \end{bmatrix}\right\}\).
Check: \(\left\langle \begin{bmatrix} 1 \\ 2 \end{bmatrix}, \begin{bmatrix} -2 \\ 1 \end{bmatrix} \right\rangle = 1(-2) + 2(1) = 0\) ✓
Part 8: Orthogonal Projections¶
8.1 Projection onto a Line¶
Given a non-zero vector \(\mathbf{b}\) (defining a line through the origin), the projection of \(\mathbf{x}\) onto the line spanned by \(\mathbf{b}\) is:
The projection matrix is:
Think of it as... shining a flashlight straight down onto a line and seeing where the shadow of your vector lands. The projection is that shadow.
Example: Project \(\mathbf{x} = \begin{bmatrix} 3 \\ 1 \end{bmatrix}\) onto \(\mathbf{b} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}\).
8.2 Projection onto a General Subspace¶
Let \(U = \text{span}\{\mathbf{b}_1, \ldots, \mathbf{b}_k\}\) and define \(B = [\mathbf{b}_1 \mid \cdots \mid \mathbf{b}_k]\). The projection of \(\mathbf{x}\) onto \(U\) is:
The projection matrix is:
Properties of projection matrices:
| Property | Statement |
|---|---|
| Idempotent | \(P^2 = P\) (projecting twice is the same as projecting once) |
| Symmetric | \(P^T = P\) |
| Residual | \(\mathbf{x} - P\mathbf{x}\) is orthogonal to \(U\) |
8.3 Connection to the Pseudo-Inverse¶
The Moore-Penrose pseudo-inverse of \(B\) is:
So the projection simplifies to:
The pseudo-inverse is central to solving least-squares problems: when \(A\mathbf{x} = \mathbf{b}\) has no exact solution, the best approximate solution is \(\hat{\mathbf{x}} = A^\dagger \mathbf{b}\).
8.4 Worked Example: Projection onto a Subspace¶
Project \(\mathbf{x} = \begin{bmatrix} 6 \\ 0 \\ 0 \end{bmatrix}\) onto \(U = \text{span}\left\{\begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix}\right\}\).
Step 1: Form the matrix \(B\): $\(B = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix}\)$
Step 2: Compute \(B^T B\): $\(B^T B = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}\)$
Step 3: Compute \((B^T B)^{-1}\): $\((B^T B)^{-1} = \frac{1}{2(2) - 1(1)} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix} = \frac{1}{3} \begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}\)$
Step 4: Compute \(B^T \mathbf{x}\): $\(B^T \mathbf{x} = \begin{bmatrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{bmatrix} \begin{bmatrix} 6 \\ 0 \\ 0 \end{bmatrix} = \begin{bmatrix} 6 \\ 0 \end{bmatrix}\)$
Step 5: Compute \((B^T B)^{-1} B^T \mathbf{x}\): $\(\frac{1}{3}\begin{bmatrix} 2 & -1 \\ -1 & 2 \end{bmatrix}\begin{bmatrix} 6 \\ 0 \end{bmatrix} = \frac{1}{3}\begin{bmatrix} 12 \\ -6 \end{bmatrix} = \begin{bmatrix} 4 \\ -2 \end{bmatrix}\)$
Step 6: Compute the projection: $\(\pi_U(\mathbf{x}) = B \begin{bmatrix} 4 \\ -2 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{bmatrix}\begin{bmatrix} 4 \\ -2 \end{bmatrix} = \begin{bmatrix} 4 \\ -2 \\ 2 \end{bmatrix}\)$
Verify: The residual \(\mathbf{x} - \pi_U(\mathbf{x}) = \begin{bmatrix} 2 \\ 2 \\ -2 \end{bmatrix}\) should be orthogonal to both basis vectors:
- \(\langle \begin{bmatrix} 2 \\ 2 \\ -2 \end{bmatrix}, \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} \rangle = 2 + 0 - 2 = 0\) ✓
- \(\langle \begin{bmatrix} 2 \\ 2 \\ -2 \end{bmatrix}, \begin{bmatrix} 0 \\ 1 \\ 1 \end{bmatrix} \rangle = 0 + 2 - 2 = 0\) ✓
Part 9: Gram-Schmidt Process¶
9.1 The Algorithm¶
The Gram-Schmidt process takes any set of linearly independent vectors and produces an orthonormal set spanning the same subspace.
Given linearly independent vectors \(\{\mathbf{b}_1, \mathbf{b}_2, \ldots, \mathbf{b}_k\}\):
Step 1: Orthogonalize (produce orthogonal vectors \(\mathbf{u}_i\))
In general: $\(\mathbf{u}_i = \mathbf{b}_i - \sum_{j=1}^{i-1} \frac{\langle \mathbf{b}_i, \mathbf{u}_j \rangle}{\langle \mathbf{u}_j, \mathbf{u}_j \rangle} \mathbf{u}_j\)$
Step 2: Normalize (produce unit vectors \(\mathbf{e}_i\))
Think of it as... taking each new vector and "subtracting off" all the parts that point in the directions you have already handled. What remains is the genuinely new direction. Then you scale it to length 1.
9.2 Worked Example¶
Apply Gram-Schmidt to \(\mathbf{b}_1 = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}\) and \(\mathbf{b}_2 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix}\).
Step 1: Set \(\mathbf{u}_1 = \mathbf{b}_1 = \begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}\).
Step 2: Compute the projection coefficient: $\(\frac{\langle \mathbf{b}_2, \mathbf{u}_1 \rangle}{\langle \mathbf{u}_1, \mathbf{u}_1 \rangle} = \frac{1(1) + 0(1) + 1(0)}{1^2 + 1^2 + 0^2} = \frac{1}{2}\)$
Step 3: Subtract the projection: $\(\mathbf{u}_2 = \mathbf{b}_2 - \frac{1}{2}\mathbf{u}_1 = \begin{bmatrix} 1 \\ 0 \\ 1 \end{bmatrix} - \frac{1}{2}\begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 1/2 \\ -1/2 \\ 1 \end{bmatrix}\)$
Verify orthogonality: $\(\langle \mathbf{u}_1, \mathbf{u}_2 \rangle = 1\!\left(\tfrac{1}{2}\right) + 1\!\left(-\tfrac{1}{2}\right) + 0(1) = 0 \quad \checkmark\)$
Step 4: Normalize: $\(\|\mathbf{u}_1\| = \sqrt{1+1+0} = \sqrt{2}, \quad \mathbf{e}_1 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}\)$
Result: The orthonormal basis is: $\(\mathbf{e}_1 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ 1 \\ 0 \end{bmatrix}, \qquad \mathbf{e}_2 = \frac{1}{\sqrt{6}}\begin{bmatrix} 1 \\ -1 \\ 2 \end{bmatrix}\)$
Part 10: Rotations¶
10.1 Rotation Matrix in 2D¶
A rotation by angle \(\theta\) (counter-clockwise) in \(\mathbb{R}^2\) is given by:
Think of it as... every point in the plane is swung around the origin by the angle \(\theta\). The matrix encodes where the two standard basis vectors land after the rotation.
10.2 Properties of Rotation Matrices¶
| Property | Statement |
|---|---|
| Orthogonal | \(R(\theta)^T R(\theta) = I\) |
| Determinant | \(\det(R(\theta)) = 1\) (no reflection) |
| Inverse is reverse rotation | \(R(\theta)^{-1} = R(-\theta) = R(\theta)^T\) |
| Composition | \(R(\alpha) R(\beta) = R(\alpha + \beta)\) |
| Preserves lengths | \(\|R(\theta)\mathbf{x}\| = \|\mathbf{x}\|\) |
| Preserves angles | Angles between vectors are unchanged |
10.3 Worked Example¶
Rotate \(\mathbf{x} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}\) by \(\theta = 90^\circ\).
This is exactly the unit vector pointing straight up — a \(90^\circ\) counter-clockwise rotation of the unit vector pointing right. ✓
10.4 Rotations in 3D (Preview)¶
In \(\mathbb{R}^3\), a rotation about the \(z\)-axis by angle \(\theta\) is:
General 3D rotations can be composed from rotations about the three coordinate axes.
Summary: Key Takeaways¶
Norms and Inner Products¶
- The \(\ell_1\), \(\ell_2\), and \(\ell_\infty\) norms each measure vector size differently
- An inner product \(\langle \cdot, \cdot \rangle\) must satisfy symmetry, linearity, and positive definiteness
- Every inner product induces a norm: \(\|\mathbf{x}\| = \sqrt{\langle \mathbf{x}, \mathbf{x} \rangle}\)
Geometry from Inner Products¶
- Angles: \(\cos\theta = \frac{\langle \mathbf{x}, \mathbf{y} \rangle}{\|\mathbf{x}\|\|\mathbf{y}\|}\)
- Orthogonality: \(\langle \mathbf{x}, \mathbf{y} \rangle = 0\)
- Cauchy-Schwarz: \(|\langle \mathbf{x}, \mathbf{y} \rangle| \leq \|\mathbf{x}\|\|\mathbf{y}\|\)
Orthogonal Structures¶
- Orthogonal matrices satisfy \(A^{-1} = A^T\) and preserve geometry
- Orthogonal complements: \(\ker(A) = \text{row}(A)^\perp\)
- Gram-Schmidt converts any basis to an orthonormal basis
Projections and Rotations¶
- Projection onto a line: \(P_\pi = \frac{\mathbf{b}\mathbf{b}^T}{\mathbf{b}^T\mathbf{b}}\)
- Projection onto a subspace: \(P = B(B^TB)^{-1}B^T\)
- 2D rotation: \(R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}\)
Practice Problems¶
Problem 1¶
Compute the \(\ell_1\), \(\ell_2\), and \(\ell_\infty\) norms of: $\(\mathbf{x} = \begin{bmatrix} -2 \\ 6 \\ -3 \end{bmatrix}\)$
Problem 2¶
Let \(\mathbf{a} = \begin{bmatrix} 2 \\ 1 \\ -1 \end{bmatrix}\) and \(\mathbf{b} = \begin{bmatrix} 1 \\ -2 \\ 3 \end{bmatrix}\). Compute the angle \(\theta\) between them.
Problem 3¶
Verify that \(A = \begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix}\) is an orthogonal matrix and determine whether it represents a rotation or a reflection.
Problem 4¶
Project \(\mathbf{x} = \begin{bmatrix} 4 \\ 3 \end{bmatrix}\) onto the line spanned by \(\mathbf{b} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}\).
Problem 5¶
Apply the Gram-Schmidt process to the vectors \(\mathbf{b}_1 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}\) and \(\mathbf{b}_2 = \begin{bmatrix} 0 \\ 1 \\ 2 \end{bmatrix}\) to produce an orthonormal basis.
Problem 6¶
Let \(\mathbf{x} = \begin{bmatrix} 1 \\ 0 \end{bmatrix}\). Find \(R(\theta)\mathbf{x}\) for \(\theta = 60^\circ\) and verify that the result has the same norm as \(\mathbf{x}\).
Solutions¶
Solution 1:
Solution 2:
First compute the dot product: $\(\langle \mathbf{a}, \mathbf{b} \rangle = 2(1) + 1(-2) + (-1)(3) = 2 - 2 - 3 = -3\)$
Then compute the norms: $\(\|\mathbf{a}\| = \sqrt{4 + 1 + 1} = \sqrt{6}, \quad \|\mathbf{b}\| = \sqrt{1 + 4 + 9} = \sqrt{14}\)$
Therefore: $\(\cos\theta = \frac{-3}{\sqrt{6}\sqrt{14}} = \frac{-3}{\sqrt{84}} = \frac{-3}{2\sqrt{21}}\)$
Solution 3:
Check \(A^T A = I\): $\(A^T A = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}\begin{bmatrix} 0 & 1 \\ -1 & 0 \end{bmatrix} = \begin{bmatrix} 0(0)+(-1)(-1) & 0(1)+(-1)(0) \\ 1(0)+0(-1) & 1(1)+0(0) \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I \quad \checkmark\)$
So \(A\) is orthogonal.
Determine rotation vs. reflection: $\(\det(A) = 0(0) - (1)(-1) = 1\)$
Since \(\det(A) = +1\), this is a rotation (not a reflection). Specifically, this is a rotation by \(-90^\circ\) (or equivalently \(270^\circ\) counter-clockwise).
Solution 4:
Verify: The residual \(\mathbf{x} - \pi_{\mathbf{b}}(\mathbf{x}) = \begin{bmatrix} 4 - 7/2 \\ 3 - 7/2 \end{bmatrix} = \begin{bmatrix} 1/2 \\ -1/2 \end{bmatrix}\) should be orthogonal to \(\mathbf{b}\):
Solution 5:
Step 1: Set \(\mathbf{u}_1 = \mathbf{b}_1 = \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}\).
Step 2: Compute the projection coefficient: $\(\frac{\langle \mathbf{b}_2, \mathbf{u}_1 \rangle}{\langle \mathbf{u}_1, \mathbf{u}_1 \rangle} = \frac{0(1) + 1(1) + 2(1)}{1+1+1} = \frac{3}{3} = 1\)$
Step 3: Subtract the projection: $\(\mathbf{u}_2 = \mathbf{b}_2 - 1 \cdot \mathbf{u}_1 = \begin{bmatrix} 0 \\ 1 \\ 2 \end{bmatrix} - \begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix} = \begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix}\)$
Check orthogonality: \(\langle \mathbf{u}_1, \mathbf{u}_2 \rangle = -1 + 0 + 1 = 0\) ✓
Step 4: Normalize: $\(\mathbf{e}_1 = \frac{\mathbf{u}_1}{\|\mathbf{u}_1\|} = \frac{1}{\sqrt{3}}\begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix}\)$
Orthonormal basis: \(\left\{\frac{1}{\sqrt{3}}\begin{bmatrix} 1 \\ 1 \\ 1 \end{bmatrix},\ \frac{1}{\sqrt{2}}\begin{bmatrix} -1 \\ 0 \\ 1 \end{bmatrix}\right\}\)
Solution 6:
Verify the norm is preserved:
The norm is preserved, confirming that \(R(60^\circ)\) is an orthogonal (length-preserving) transformation.
Course: Mathematics for Machine Learning Instructor: Mohammed Alnemari
Next: Tutorial 3 - Matrix Decompositions