Linear Algebra

:label:sec_linear-algebra

Notes from D2L

Scalars:

the expression $x \in \mathbb{R}$ is a formal way to say that $x$ is a real-valued scalar. The symbol $\in$ (pronounced “in”) denotes membership in a set. For example, $x, y \in {0, 1}$ indicates that $x$ and $y$ are variables that can only take values $0$ or $1$.

Vectors:

Caution: in Python, as in most programming languages, vector indices start at $0$, also known as zero-based indexing, whereas in linear algebra subscripts begin at $1$ (one-based indexing).

\[\mathbf{x} =\begin{bmatrix}x_{1} \\ \vdots \\x_{n}\end{bmatrix},\]

Matrices:

Just as scalars are $0^{\textrm{th}}$-order tensors and vectors are $1^{\textrm{st}}$-order tensors, matrices are $2^{\textrm{nd}}$-order tensors.

The expression $\mathbf{A} \in \mathbb{R}^{m \times n}$ indicates that a matrix $\mathbf{A}$ contains $m \times n$ real-valued scalars, arranged as $m$ rows and $n$ columns. When $m = n$, we say that a matrix is square.

transpose: Flip the axes. Formally, we signify a matrix $\mathbf{A}$’s transpose by $\mathbf{A}^\top$ and if $\mathbf{B} = \mathbf{A}^\top$, then $b_{ij} = a_{ji}$ for all $i$ and $j$.

Properties

Hadamard Product

The [elementwise product of two matrices is called their Hadamard product] (denoted $\odot$). We can spell out the entries of the Hadamard product of two matrices $\mathbf{A}, \mathbf{B} \in \mathbb{R}^{m \times n}$:

\[\mathbf{A} \odot \mathbf{B} = \begin{bmatrix} a_{11} b_{11} & a_{12} b_{12} & \dots & a_{1n} b_{1n} \\ a_{21} b_{21} & a_{22} b_{22} & \dots & a_{2n} b_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} b_{m1} & a_{m2} b_{m2} & \dots & a_{mn} b_{mn} \end{bmatrix}.\]

A*B #Hadamard product in python

Reduction

Sum

$\sum_{i=1}^n x_i$

$\sum_{i=1}^{m} \sum_{j=1}^{n} a_{ij}$.

a.sum()
a.sum(axis=0) #To sum over all elements along the rows (axis 0)
a.sum(axis=[0, 1]) # 

Mean

A.mean(), A.sum() / A.numel()
A.mean(axis=0), A.sum(axis=0) / A.shape[0]

Non-Reduction Sum

keep the number of axes unchanged

A.sum(axis=1, keepdims=True)
# output
# (tensor([[ 6],
#          [ 6],
#          [12]]),
#  torch.Size([3, 1]))
A.sum(axis=1)
# output
# tensor([ 6,  6, 12])

For instance, since sum_A keeps its two axes after summing each row, we can divide A by sum_A with broadcasting to create a matrix where each row sums up to 1.

A / sum_A
# output
# tensor([[0.0000, 0.3333, 0.6667],
#         [0.2500, 0.3333, 0.4167]])

Dot Products

Given two vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$, their dot product $\mathbf{x}^\top \mathbf{y}$ (also known as inner product, $\langle \mathbf{x}, \mathbf{y} \rangle$) is a sum over the products of the elements at the same position: $\mathbf{x}^\top \mathbf{y} = \sum_{i=1}^{d} x_i y_i$.

torch.sum(x * y)
# or
torch.dot(x, y)

For example, given some set of values, denoted by a vector $\mathbf{x} \in \mathbb{R}^n$,and a set of weights, denoted by $\mathbf{w} \in \mathbb{R}^n$, the weighted sum of the values in $\mathbf{x}$ according to the weights $\mathbf{w}$ could be expressed as the dot product $\mathbf{x}^\top \mathbf{w}$. When the weights are nonnegative and sum to $1$, i.e., $\left(\sum_{i=1}^{n} {w_i} = 1\right)$, the dot product expresses a weighted average.