# define a variable x, containing 12
Vectors
Introduction to vectors
Previously, we were just defining simple objects such as
Let’s create a vector that contains the ages of 5 people:
# Use c() to define a variable age that contains the ages of 5 people at once
Let’s ask what age
’s type is:
# identify the class of age
Let’s try and create a multi-vector consisting of multiple different types
# try to create a vector, multi_vec, with numeric and character values
Let’s check the class of multi_vec
# check the class of multi_vec
This is an example of “type conversion”.
Let’s try to combine numeric and logical values in a vector
# try to create a vector with numeric and logical values
In type conversion, there is a hierarchy of types: character > numeric > logical.
Exercise
What will be the output and type of the following code?
<- c(TRUE, 4, "hello", FALSE, 0)
vector_example vector_example
[1] "TRUE" "4" "hello" "FALSE" "0"
# check the class
class(vector_example)
[1] "character"
Working with vectors
<- c(12, 18, 22, 21, 17) age
The following is an example of a “vectorized” operation:
# subtract 1 from age
# define a new vector, age2, corresponding to age * 2
We can do computations with the two vectors together:
# subtract age from age2
What happens when you try to do a computation with vectors of different lengths?
# try to subtract the vector c(1, 2) from age
When you do mathematical computations on vectors of different lengths, the computation uses recycling
The above subtraction is equivalent to
- c(1, 2, 1, 2, 1) age
[1] 11 16 21 19 16
Vectorized logical operations
Let’s ask which age
entries are greater or equal to 18
# ask which entries in age are greater than or equal to 18
Let’s ask which entries are equal to 35
# ask which entries in age are equal to 17
Let’s ask which entries are equal to 17 or 18
# try to use == c(17, 18) to ask which entries are equal to 17 or 18
This is only working because we got lucky with our recycling.
This breaks if we switch the order of 18 and 17.
# switch the order of 17 and 18
We can instead use the %in%
operator:
# use %in% to ask which entries in age are equal to 17 or 18
Functions for vectors
The sum()
function can be used to add up all of the entries of a vector
# compute the sum of all entries in age
The mean()
function computes the mean:
# compute the mean of all entries in age
Note if your vector contains missing values, your mean will be NA
, so you will need to provide the na.rm = TRUE
argument to ignore missing values.
# compute the mean of a vector with missing values
The function length()
tells us how many entries the vector contains
# compute the length of the age vector
We can use the sum()
function to tally up the output of logical operations
# compute the sum of the logical vector c(TRUE, FALSE, FALSE, TRUE)
# add up the number of age entries that are 17 or 18
# add up the number of age entries that are over 18
Exercise
What is the proportion of entries in the age
vector that are greater than 18?
Indexing and subsetting vectors
We use the square bracket indexing syntax to extract entries from a vector
# extract the first entry from age
# extract the fourth entry from age
To extract the last entry in a vector:
# extract the final entry from age
To remove an entry from a vector, you can use the -
syntax.
# remove the first entry from age
Has age changed?
# print age
# remove the fourth entry from age
Extracting/removing multiple entries from a vector
Extract the first and third entries from age
# extract entries 1 and 3 from age
Remove the first and third entries
# remove entries 1 and 3 from age
Definining integer sequences
If we want to quickly define a sequence of integers, we can use the :
syntax
# use c() to create a vector 1, 2, 3, 4
# use the : syntax to create a vector 1, 2, 3, 4
# use the : syntax to create a vector of integers from 5 to 25
We can use these sequences to extract entries from vectors
# Use : to extract the first 4 entries from age
What if I want to remove sequences of entries?
# Use : to remove the first 4 entries from age
The seq() function
The seq()
function allows us to create a sequence of integers with non-unit increments
# use seq() to create a sequence from 4 to 20 that increments by 2
# include argument names
# exclude the argument names
# use seq() to extract every second entry from age
Exercise
Create a vector that contains the values 5, 9, 2, 21, 34, 56, 2, -1, 5, 9
Extract the 4th and 9th entries
Extract the 1st and last entry
Extract every 3rd entry starting from the 2nd entry
Named vectors
You can provide names to the entries of a vector
# Add the names "Dean", "Xiao", "Sara", "Ravi", "Maya" to the age vector
You can extract/index entries from a vector using their names:
# extract Ravi's age using named indexing
# extract Maya and Ravi's ages using named indexing
Logical subsetting
We can use logical vectors to subset a vector
# use a logical vector of TRUE/FALSEs to extract the first, fourth, and fifth entries from age
To identify which entries in a vector satisfy a criteria, I can ask a logical question.
# reminder: identify which entries in age are at least 18
and we can use logical expressions to subset a vector too
# use age >= 18 to extract all ages that are at least 18
Subsetting with multiple conditions
How can we combine multiple conditions when subsetting, e.g., >= 17 and < 20
# Identify which entries in age are both >= 16 and less than < 20
The “AND” operator is “&”
# What will be the output of the following & operations:
TRUE & TRUE
[1] TRUE
TRUE & FALSE
[1] FALSE
The “OR” operator is “|”
# What will be the output of the following | operations:
TRUE | TRUE
[1] TRUE
TRUE | FALSE
[1] TRUE
# what do you think will happen when we ask if age <= 16 OR age > 20
# extract all of the ages that are either <= 16 or > 20
Exercise
<- c(4, 19, 2, 2, 3, 90, 55, 12) vec
Write some code for extracting the entries that are
less than 10
less than 25 but greater than 10
either less than 10 or equal to 55