# define a variable x, containing 12
x <- 12
x[1] 12
Previously, we were just defining simple objects such as
# define a variable x, containing 12
x <- 12
x[1] 12
Let’s create a vector that contains the ages of 5 people:
# Use c() to define a variable age that contains the ages of 5 people at once
age <- c(12, 19, 22, 35, 18)
age[1] 12 19 22 35 18
Let’s ask what age’s type is:
# identify the class of age
class(age)[1] "numeric"
Let’s try and create a multi-vector consisting of multiple different types
# try to create a vector, multi_vec, with numeric and character values
multi_vec <- c(1, 9, "banana", 10, -1)
multi_vec[1] "1" "9" "banana" "10" "-1"
Let’s check the class of multi_vec
# check the class of multi_vec
class(multi_vec)[1] "character"
This is an example of “type conversion”.
Let’s try to combine numeric and logical values in a vector
# try to create a vector with numeric and logical values
multi_vec2 <- c(1, 5, TRUE, FALSE, -9)
multi_vec2[1] 1 5 1 0 -9
In type conversion, there is a hierarchy of types: character > numeric > logical.
What will be the output and type of the following code?
vector_example <- c(TRUE, 4, "hello", FALSE, 0)
vector_example[1] "TRUE" "4" "hello" "FALSE" "0"
# check the class
class(vector_example)[1] "character"
age <- c(12, 18, 22, 21, 17)The following is an example of a “vectorized” operation:
# subtract 1 from age
age - 1[1] 11 17 21 20 16
# define a new vector, age2, corresponding to age * 2
age2 <- age * 2
age2[1] 24 36 44 42 34
We can do computations with the two vectors together:
# subtract age from age2
age2 - age[1] 12 18 22 21 17
What happens when you try to do a computation with vectors of different lengths?
# try to subtract the vector c(1, 2) from age
age - c(1, 2)Warning in age - c(1, 2): longer object length is not a multiple of shorter
object length
[1] 11 16 21 19 16
When you do mathematical computations on vectors of different lengths, the computation uses recycling
The above subtraction is equivalent to
age - c(1, 2, 1, 2, 1)[1] 11 16 21 19 16
Let’s ask which age entries are greater or equal to 18
# ask which entries in age are greater than or equal to 18
age >= 18[1] FALSE TRUE TRUE TRUE FALSE
Let’s ask which entries are equal to 35
# ask which entries in age are equal to 17
age == 17[1] FALSE FALSE FALSE FALSE TRUE
Let’s ask which entries are equal to 17 or 18
# try to use == c(17, 18) to ask which entries are equal to 17 or 18
age == c(17, 18)Warning in age == c(17, 18): longer object length is not a multiple of shorter
object length
[1] FALSE TRUE FALSE FALSE TRUE
This is only working because we got lucky with our recycling.
This breaks if we switch the order of 18 and 17.
# switch the order of 17 and 18
age == c(18, 17)Warning in age == c(18, 17): longer object length is not a multiple of shorter
object length
[1] FALSE FALSE FALSE FALSE FALSE
We can instead use the %in% operator:
# use %in% to ask which entries in age are equal to 17 or 18
age %in% c(17, 18) [1] FALSE TRUE FALSE FALSE TRUE
The sum() function can be used to add up all of the entries of a vector
# compute the sum of all entries in age
sum(age)[1] 90
The mean() function computes the mean:
# compute the mean of all entries in age
mean(age)[1] 18
Note if your vector contains missing values, your mean will be NA, so you will need to provide the na.rm = TRUE argument to ignore missing values.
# compute the mean of a vector with missing values
mean(c(1, 4, 2, 8, NA))[1] NA
mean(c(1, 4, 2, 8, NA), na.rm = TRUE)[1] 3.75
The function length() tells us how many entries the vector contains
# compute the length of the age vector
length(age)[1] 5
We can use the sum() function to tally up the output of logical operations
# compute the sum of the logical vector c(TRUE, FALSE, FALSE, TRUE)
sum(c(TRUE, FALSE, FALSE, TRUE))[1] 2
# add up the number of age entries that are 17 or 18
sum(age %in% c(17, 18))[1] 2
# add up the number of age entries that are over 18
sum(age > 18)[1] 2
What is the proportion of entries in the age vector that are greater than 18?
Two possible answers:
sum(age > 18) / length(age)[1] 0.4
mean(age > 18)[1] 0.4
We use the square bracket indexing syntax to extract entries from a vector
# extract the first entry from age
age[1][1] 12
# extract the fourth entry from age
age[4][1] 21
To extract the last entry in a vector:
# extract the final entry from age
age[length(age)][1] 17
To remove an entry from a vector, you can use the - syntax.
# remove the first entry from age
age[-1][1] 18 22 21 17
Has age changed?
age[1] 12 18 22 21 17
# remove the fourth entry from age
age[-4][1] 12 18 22 17
Extract the first and third entries from age
# extract entries 1 and 3 from age
age[c(1, 3)][1] 12 22
Remove the first and third entries
# remove entries 1 and 3 from age
age[-c(1, 3)][1] 18 21 17
If we want to quickly define a sequence of integers, we can use the : syntax
# use c() to create a vector 1, 2, 3, 4
c(1, 2, 3, 4)[1] 1 2 3 4
# use the : syntax to create a vector 1, 2, 3, 4
1:4[1] 1 2 3 4
# use the : syntax to create a vector of integers from 5 to 25
5:25 [1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
We can use these sequences to extract entries from vectors
# Use : to extract the first 4 entries from age
age[1:4][1] 12 18 22 21
What if I want to remove sequences of entries?
# Use : to remove the first 4 entries from age
age[-(1:4)][1] 17
The seq() function allows us to create a sequence of integers with non-unit increments
# use seq() to create a sequence from 4 to 20 that increments by 2
# include argument names
seq(from = 4, to = 20, by = 2)[1] 4 6 8 10 12 14 16 18 20
# exclude the argument names
seq(4, 20, 2)[1] 4 6 8 10 12 14 16 18 20
# use seq() to extract every second entry from age
age[seq(1, 5, 2)][1] 12 22 17
Create a vector that contains the values 5, 9, 2, 21, 34, 56, 2, -1, 5, 9
Extract the 4th and 9th entries
Extract the 1st and last entry
Extract every 3rd entry starting from the 2nd entry
my_vec <- c(5, 9, 2, 21, 34, 56, 2, -1, 5, 9)
my_vec[c(4, 9)][1] 21 5
my_vec[c(1, length(my_vec))][1] 5 9
my_vec[seq(2, length(my_vec), 3)][1] 9 34 -1
You can provide names to the entries of a vector
# Add the names "Dean", "Xiao", "Sara", "Ravi", "Maya" to the age vector
names(age) <- c("Dean", "Xiao", "Sara", "Ravi", "Maya")
ageDean Xiao Sara Ravi Maya
12 18 22 21 17
You can extract/index entries from a vector using their names:
# extract Ravi's age using named indexing
age["Ravi"]Ravi
21
# extract Maya and Ravi's ages using named indexing
age[c("Maya", "Ravi")]Maya Ravi
17 21
We can use logical vectors to subset a vector
# use a logical vector to extract the first, fourth, and fifth entries from age
age[c(TRUE, FALSE, FALSE, TRUE, TRUE)]Dean Ravi Maya
12 21 17
To identify which entries in a vector satisfy a criteria, I can ask a logical question.
# reminder: identify which entries in age are at least 18
age >= 18 Dean Xiao Sara Ravi Maya
FALSE TRUE TRUE TRUE FALSE
and we can use logical expressions to subset a vector too
# use age >= 18 to extract all ages that are at least 18
age[age >= 18]Xiao Sara Ravi
18 22 21
How can we combine multiple conditions when subsetting, e.g., >= 17 and < 20
# Try to combine the two conditions with a comma (,)
age[c(age >= 16, age < 20)]Xiao Sara Ravi Maya <NA> <NA> <NA>
18 22 21 17 NA NA NA
We need to use the & operator
# Identify which entries in age are both >= 16 and less than < 20
(age >= 16) & (age < 20) Dean Xiao Sara Ravi Maya
FALSE TRUE FALSE FALSE TRUE
# What will be the output of the following & operations:
TRUE & TRUE[1] TRUE
TRUE & FALSE[1] FALSE
The “or” operator corresponds to the “|”
# What will be the output of the following | operations:
TRUE | TRUE[1] TRUE
TRUE | FALSE[1] TRUE
# what do you think will happen when we ask if age <= 16 OR age > 20
age <= 16 | age > 20 Dean Xiao Sara Ravi Maya
TRUE FALSE TRUE TRUE FALSE
# extract all of the ages that are either <= 16 or > 20
age[age <= 16 | age > 20]Dean Sara Ravi
12 22 21
vec <- c(4, 19, 2, 2, 3, 90, 55, 12)Write some code for extracting the entries that are
less than 10
less than 25 but greater than 10
either less than 10 or equal to 55
# a
vec[vec < 10][1] 4 2 2 3
# b
vec[(vec < 25) & (vec > 10)][1] 19 12
# c
vec[(vec < 10) | (vec == 55)][1] 4 2 2 3 55