Tutorial: Refresher on working with matrices#
In this tutorial and the next one we’ll walk through some math concepts that will be helpful for understanding the lectures and homeworks. If you’re comfortable with these concepts already, feel free to simply skim and reference as needed to do the homeworks.
Goals:#
Get everyone on the same place with vectors, sums and products, and matrices
Images courtesy of Wikipedia.
Vectors#
A vector is a series of numbers, arranged in either a column or a row. The numbers stored in a vector are called elements.
data_point <- 3 #this is one element
print(data_point)
vector_1 = c(data_point, data_point + 2)
vector_2 = c(data_point+4, data_point + 6)
print(vector_1)
print(vector_2)
[1] 3
[1] 3 5
[1] 7 9
A data point: \(3\)
A column vector with elements 3 and 5: \(\begin{bmatrix}3\\5\end{bmatrix}\)
A row vector with elements 7 and 9: \(\begin{bmatrix} 7 & 9\end{bmatrix}\)
When vectors are referred to in equations, they’re often simplified to a single variable even though there are many elements within it. For example, the error formula is written as \(e_i = y_i - \hat{y_i}\), where \(i\) is used to identify the particular observation \(y_i\) for which you’d like to calculate error. \(\hat{y_i}\) (“y hat i”) is the model’s predicted value for \(y_i\) (notation note: the \(\hat{}\) symbol over anything will mean that variable’s predicted value in this class). But the formula could also be written in bold as \(\bf{e} = \bf{y} - \bf{\hat{y}}\), to define the error vector for all observations instead of one particular one. The meaning of this formula, when there are only three observations, can be translated into
\(\begin{bmatrix}e_1\\e_2\\e_3\end{bmatrix} = \begin{bmatrix}y_1\\y_2\\y_3\end{bmatrix} - \begin{bmatrix}\hat{y}_1\\\hat{y}_2\\\hat{y}_3\end{bmatrix}\)
My last note about vectors is more of a notation note: the mean value of all elements in a vector will be expressed with a \(\bar{}\) symbol. So the mean value of the vector \(\bf{x}\) is \(\bar{x}\) (“x bar”).
Working with Sums and Products in Equations#
While we’re talking about vectors and indexing, let’s address a few tricks for dealing with summation (\(\Sigma\)) and product (\(\Pi\)) notation in equations. This will be handy for understanding the math shown in lecture and for completing the homeworks.
First thing is to understand the structure of summation notation:
\(\sum_{i=1}^{n}x_i\)
In plain language, you can read this as: “Take the sum of all values from \(x_1\) to \(x_n\).”
So if you have the vector \(x = \begin{bmatrix}x_1\\x_2\\x_3\end{bmatrix}\), you can express the sum of all the elements in that vector as \(\sum_{i=1}^{3}x_i\).
It’s the same as typing sum(x)
in R.
You might also see nested sums occasionally:
\(\sum_{i=1}^{n} \sum_{j=1}^{m} x_{i,j}\)
This means taking the sum of \(x_{i,j}\) values for each value of \(i\), and then summing those sums. It’s the same as typing sum(colSums(x))
in R, where x
is an \(n\) by \(m\) matrix (more on matrices later).
Product notation is similar, except instead of using a \(\Sigma\) you’re using a \(\Pi\), and you’re now multiplying the elements together :
\(\prod_{i=1}^{3}x_i = x_1 \times x_2 \times x_3\)
Below I’ll describe three tricks for working with summation notation that may be handy when you’re doing the homework.
Trick 1: Pulling terms out of a sum#
The first trick that I’ll cover is knowing when and how to pull terms out of a sum. If the expression inside of the sum includes terms that don’t involve the variable being iterated over, i.e., terms that don’t include \(x_i\), you can take those out of the sum expression and just multiply them by the number of times they would’ve been added in. Here’s an example, trying to simplify the expression of the function \(g\) of vector \(x\):
\(g(x) = \sum_{i=1}^{3}(3 - x_i)^2\)
I can expand this using the “Foil” method :
\(g(x) = \sum_{i=1}^{3}(3 - x_i)(3 - x_i) = \sum_{i=1}^{3}(9 - 6x_i + {x_i}^2)\)
9 doesn’t have an \(x_i\) in it so I can just take that term out - however be careful! As the expression is written right now, 9 would be added for every iteration of \(x_i\), so when I take it out of the expression I need to multiply it by the number of iterations. In this case, the number of iterations is 3.
\(g(x) = 27 + \sum_{i=1}^{3}(- 6x_i + {x_i}^2)\)
Ta-da!
Trick 2: Splitting up a sum#
Now we can actually simplify the expression above a bit further. Any terms that are separated by a \(+\) or a \(-\) can be split into separate sum terms.
\(g(x) = 27 + \sum_{i=1}^{3}- 6x_i + \sum_{i=1}^{3}{x_i}^2\)
This makes sense if you think about the commutative property of addition: if you’re adding things together, it doesn’t matter what order you add them in. \(-6x_1 + x_{1}^2 - 6x_2 + x_{2}^2 - 6x_3 + x_{3}^2\) is the same as \( -6x_1-6x_2-6x_3 + x_{1}^2 + x_{2}^2 + x_{3}^2\)
Trick 3: Pulling coefficients out of a sum#
Lastly, since \(-6(x_1 + x_2 + x_3)\) is the same as \(-6x_1 - 6x_2 - 6x_3\), we can simplify the \(\sum_{i=1}^{3}- 6x_i\) by bringing the \(-6\) outside:
\(g(x) = 27 - 6\sum_{i=1}^{3}x_i + \sum_{i=1}^{3}{x_i}^2\)
This is easier to work with than what we had above. Ok, now that we’re comfortable thinking about vectors, indices and sums… lets talk about matrices!
What is a matrix?#
A matrix is a rectangular array of vectors with one or more rows or columns. Matrices are described by the number of rows ( \(m\)) and the number of columns ( \(n\)) they have; a matrix with \(m\) rows and \(n\) columns is an \(m x n\) or “\(m\) by \(n\)” matrix. The image below demonstrates such a matrix, and the element labels show you how to index a particular element in the matrix. \(a_{i,j}\) refers to the element in the \(i\)th row and the \(j\)th column.
You can create a matrix in R by calling the function matrix
and supplying it with data, the number of rows and/or columns you would like, and a method for filling the matrix (by row or by column).
data <- seq(1:10)
nrow = 2
ncol = 5
##First look at the matrix function by typing ?matrix
sample_matrix_by_col <- matrix(data, nrow, ncol, byrow = 0) #fill matrix by column, not by rows
sample_matrix_by_row <- matrix(data, nrow, ncol, byrow = 1)
#note the difference in the structure of the matrix when it's filled by row vs. by column
head(sample_matrix_by_col)
head(sample_matrix_by_row)
1 | 3 | 5 | 7 | 9 |
2 | 4 | 6 | 8 | 10 |
1 | 2 | 3 | 4 | 5 |
6 | 7 | 8 | 9 | 10 |
Only one dimension (nrow
or ncol
) is technically needed when specifiying a matrix in R.
R can infer nrow
if ncol
is specified, for example.
However, to avoid errors, it’s best to be explicit to the extent that it makes your code easier to read and error-check.
Basic kinds of matrices#
These types of matrices will come up at different points in the class (and in life?). Just trust us that they’ll be useful :)
Square#
A square matrix is a matrix that has an equal number of rows and columns (\(m = n\)). So a matrix with 4 columns and 4 rows is square, a matrix with 10 rows and 10 columns is square, and so on.
square_matrix_4 <- matrix(1:16, nrow = 4)
square_matrix_4
square_matrix_10 <- matrix(1:100, nrow=10)
square_matrix_10
1 | 5 | 9 | 13 |
2 | 6 | 10 | 14 |
3 | 7 | 11 | 15 |
4 | 8 | 12 | 16 |
1 | 11 | 21 | 31 | 41 | 51 | 61 | 71 | 81 | 91 |
2 | 12 | 22 | 32 | 42 | 52 | 62 | 72 | 82 | 92 |
3 | 13 | 23 | 33 | 43 | 53 | 63 | 73 | 83 | 93 |
4 | 14 | 24 | 34 | 44 | 54 | 64 | 74 | 84 | 94 |
5 | 15 | 25 | 35 | 45 | 55 | 65 | 75 | 85 | 95 |
6 | 16 | 26 | 36 | 46 | 56 | 66 | 76 | 86 | 96 |
7 | 17 | 27 | 37 | 47 | 57 | 67 | 77 | 87 | 97 |
8 | 18 | 28 | 38 | 48 | 58 | 68 | 78 | 88 | 98 |
9 | 19 | 29 | 39 | 49 | 59 | 69 | 79 | 89 | 99 |
10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
Symmetric#
A square matrix is symmetric if the elements are reflected about the diagonal line of numbers. You can imagine that if you folded a square matrix in half along the diagonal, the reflected elements would touch.
You can also think of a symmetric matrix as a matrix equal to its transposed form: \(A = A^T\)
symmetric_matrix <- matrix(1:16,nrow = 4)
#replace lower triangular portion with lower triangular portion of transpose
symmetric_matrix[lower.tri(symmetric_matrix)] <- t(symmetric_matrix)[lower.tri(symmetric_matrix)]
symmetric_matrix #an example of a symmetric matrix
#a transposed symmetric matrix is equal to the original matrix
t(symmetric_matrix) == symmetric_matrix
1 | 5 | 9 | 13 |
5 | 6 | 10 | 14 |
9 | 10 | 11 | 15 |
13 | 14 | 15 | 16 |
TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE |
TRUE | TRUE | TRUE | TRUE |
Identity#
An identity matrix is a special case of a square, symmetric matrix. Specifically, it has ones on the diagonal line and zeros everywhere else.
Why is this important? As you’ll see later, the identity matrix acts like the number 1 in the real number system, but for matrix operations.
mat <- matrix(0, 5, 5)
diag(mat) <- 1
mat
1 | 0 | 0 | 0 | 0 |
0 | 1 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 0 | 1 |
Summary#
A square matrix has an equal number of rows and columns.
A symmetric matrix is a subset of square matrices.
The identity matrix is subset of square, symmetric matrices, with useful properties that you’ll need later.
Matrix manipulation#
Basic indexing#
Review: We talked about indexing a little bit in the tutorial on basic R. The same principle applies to matrices.
matrix[row_index, column_index]
Note that R uses one-based indexing (some coding languages start indexing at 0 instead of 1 - but this isn’t the case in R). For example, the following command would find the element in the 1st row and the 6th column of a matrix.
Using our prior matrix as an example:
sample_matrix_by_col
1 | 3 | 5 | 7 | 9 |
2 | 4 | 6 | 8 | 10 |
sample_matrix_by_col[1, 5] #element in the 1st row and 5th column
sample_matrix_by_col[ ,2] #accessing all rows of column 2
sample_matrix_by_col[1, ] #accessing all columns of row 1
sample_matrix_by_col[,1:3] #getting all rows of columns 1-3
- 3
- 4
- 1
- 3
- 5
- 7
- 9
1 | 3 | 5 |
2 | 4 | 6 |
Basic matrix operations#
Now that we know what a matrix is, how to create one in R, and how to access their elements, let’s review how to manipulate them.
mat_1 <- matrix(1:6, nrow = 2) #initialize two matrices to manipulate
mat_2 <- matrix(7:12, nrow = 2)
print(mat_1)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
print(mat_2)
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
Addition and subtraction#
Addition and subtraction is an element-wise operation in the world of matrices. That is, when two matrices are added, each element of the first matrix is added to the corresponding element in the second matrix. Importantly, matrices may only be added or subtracted when they are of the same dimension.
Note: R will let you perform matrix addition and subtraction when two matrices are NOT of the same dimension, so watch out for this.
#addition
print(mat_1 + mat_2) #each element is added to the next in-place
[,1] [,2] [,3]
[1,] 8 12 16
[2,] 10 14 18
If you add a constant to a matrix, then that constant is added to all elements, and the rules are the same for subtraction, multiplication, and division by a constant.
print(mat_1 + 3) #each element increases by 3
[,1] [,2] [,3]
[1,] 4 6 8
[2,] 5 7 9
#subtraction
print(mat_2 - mat_1) #each element is subtracted from the next in-place
[,1] [,2] [,3]
[1,] 6 6 6
[2,] 6 6 6
print(mat_2 - 3)
[,1] [,2] [,3]
[1,] 4 6 8
[2,] 5 7 9
Transposition#
When a matrix is transposed, the columns become the rows. R uses the t()
function to transpose matrices. In equations, a transposed matrix is denoted A’.
As an example, look at the original matrix, mat_2
:
print(mat_2)
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
The transposition switches rows and columns:
#transposition
transposed_mat_2 <- t(mat_2)
print(transposed_mat_2)
[,1] [,2]
[1,] 7 8
[2,] 9 10
[3,] 11 12
And if we tranpose the transposed matrix, we return it to its original form:
print(t(transposed_mat_2))
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12
Multiplication#
As mentioned before, multiplying a matrix by a constant multiplies each element of a matrix by that number.
print(mat_1 * 3)
[,1] [,2] [,3]
[1,] 3 9 15
[2,] 6 12 18
You can also do “element-wise” multiplication in R using the typical *
operator. The product will be a matrix where the corresponding elements of the factor matrices are multiplied together.
#element-wise multiplication
print(mat_1*mat_2)
[,1] [,2] [,3]
[1,] 7 27 55
[2,] 16 40 72
Matrix multiplication. Element-wise multiplication isn’t the typical way to multiply matrixes. Normal matrix multiplication, for 2x2 matrixes, is shown below.
\(\begin{bmatrix} a,b \\ c,d\end{bmatrix} * \begin{bmatrix} e, f\\ g, h\end{bmatrix} = \begin{bmatrix} ae + bg,af+bh \\ ce + dg,cf + dh\end{bmatrix}\)
#matrix multiplication
print(mat_1 %*% t(mat_2))
[,1] [,2]
[1,] 89 98
[2,] 116 128
Note that order matters in matrix multiplication, and that the number of columns for the first matrix must equal the number of rows for the second matrix. Matrix multiplication in R uses the %*%
operator.
Division#
Matrices can be divided element-wise.
#division
print(mat_1 / mat_2)
[,1] [,2] [,3]
[1,] 0.1428571 0.3333333 0.4545455
[2,] 0.2500000 0.4000000 0.5000000
And you can divide a matrix by a constant - each element will be divided by that constant.
print(mat_1 / 2)
[,1] [,2] [,3]
[1,] 0.5 1.5 2.5
[2,] 1.0 2.0 3.0
Inversion#
The inverse of a matrix can be thought of as the matrix version of taking the reciprocal of a real number. In equations, the inverse of the matrix A is denoted \(\bf{A^{-1}}\). Usually, inversion can only be performed on square matrices. The formula for the inverse of a 2x2 matrix is:
The method of calculating inverses for larger matrices is more complicated and is usually included in most software.
#taking the inverse
print(solve(mat_2[, 2:3]))
print(mat_2[, 2:3] %*% solve(mat_2[, 2:3]))
[,1] [,2]
[1,] -6 5.5
[2,] 5 -4.5
[,1] [,2]
[1,] 1 1.776357e-15
[2,] 0 1.000000e+00
Determinant#
In the matrix inversion formula shown above, the term
is actually the determinant of the matrix, which could also be written as
Because for any 2 * 2 matrix, its determinant is
Things get more complicated when the matrix is bigger, which we won’t discuss here.
If the determinant of a square matrix is 0, the matrix is a singular matrix and is not invertible. This is the main reason determinants will come up in this class: as a tool to determine whether a matrix is invertible.
Use matrix manipulation to solve a linear regression problem “by hand”#
First, a brief introduction to linear regression.
Linear regression usually takes the basic form of
\(y = \beta X + \epsilon\)
where \(y\) is a vector of the response variable, \(X\) is a vector of predictors, \(\beta\) is a vector of parameters that we want to estimate (the slope and intercept), and \(\epsilon\) is the error term. The error term represents irreducible error, or the variance in \(y\) unassociated with our predictor.
The shape of these variables will look like this:
We will talk more about this in lecture. But for now, know that we can solve the linear regression equation and obtain the best values for our coefficients \(\beta\) by solving the formula :
\(X'\) is \(X\) transposed, and \(X^{-1}\) is the inverse of \(X\). That’s why knowing how to transpose, invert, and calculate determinants will be useful!
Notebook by Krista Bond and Patience Stevens and edited by Charles Wu, Amy Sentis, and Fiona Horner.