# define a variable x, containing 12
<- 12
x x
[1] 12
Previously, we were just defining simple objects such as
# define a variable x, containing 12
<- 12
x x
[1] 12
Let’s create a vector that contains the ages of 5 people:
# Use c() to define a variable age that contains the ages of 5 people at once
<- c(12, 19, 22, 35, 18)
age age
[1] 12 19 22 35 18
Let’s ask what age
’s type is:
# identify the class of age
class(age)
[1] "numeric"
Let’s try and create a multi-vector consisting of multiple different types
# try to create a vector, multi_vec, with numeric and character values
<- c(1, 9, "banana", 10, -1)
multi_vec multi_vec
[1] "1" "9" "banana" "10" "-1"
Let’s check the class of multi_vec
# check the class of multi_vec
class(multi_vec)
[1] "character"
This is an example of “type conversion”.
Let’s try to combine numeric and logical values in a vector
# try to create a vector with numeric and logical values
<- c(1, 5, TRUE, FALSE, -9)
multi_vec2 multi_vec2
[1] 1 5 1 0 -9
In type conversion, there is a hierarchy of types: character > numeric > logical.
What will be the output and type of the following code?
<- c(TRUE, 4, "hello", FALSE, 0)
vector_example vector_example
[1] "TRUE" "4" "hello" "FALSE" "0"
# check the class
class(vector_example)
[1] "character"
<- c(12, 18, 22, 21, 17) age
The following is an example of a “vectorized” operation:
# subtract 1 from age
- 1 age
[1] 11 17 21 20 16
# define a new vector, age2, corresponding to age * 2
<- age * 2
age2 age2
[1] 24 36 44 42 34
We can do computations with the two vectors together:
# subtract age from age2
- age age2
[1] 12 18 22 21 17
What happens when you try to do a computation with vectors of different lengths?
# try to subtract the vector c(1, 2) from age
- c(1, 2) age
Warning in age - c(1, 2): longer object length is not a multiple of shorter
object length
[1] 11 16 21 19 16
When you do mathematical computations on vectors of different lengths, the computation uses recycling
The above subtraction is equivalent to
- c(1, 2, 1, 2, 1) age
[1] 11 16 21 19 16
Let’s ask which age
entries are greater or equal to 18
# ask which entries in age are greater than or equal to 18
>= 18 age
[1] FALSE TRUE TRUE TRUE FALSE
Let’s ask which entries are equal to 35
# ask which entries in age are equal to 17
== 17 age
[1] FALSE FALSE FALSE FALSE TRUE
Let’s ask which entries are equal to 17 or 18
# try to use == c(17, 18) to ask which entries are equal to 17 or 18
== c(17, 18) age
Warning in age == c(17, 18): longer object length is not a multiple of shorter
object length
[1] FALSE TRUE FALSE FALSE TRUE
This is only working because we got lucky with our recycling.
This breaks if we switch the order of 18 and 17.
# switch the order of 17 and 18
== c(18, 17) age
Warning in age == c(18, 17): longer object length is not a multiple of shorter
object length
[1] FALSE FALSE FALSE FALSE FALSE
We can instead use the %in%
operator:
# use %in% to ask which entries in age are equal to 17 or 18
%in% c(17, 18) age
[1] FALSE TRUE FALSE FALSE TRUE
The sum()
function can be used to add up all of the entries of a vector
# compute the sum of all entries in age
sum(age)
[1] 90
The mean()
function computes the mean:
# compute the mean of all entries in age
mean(age)
[1] 18
Note if your vector contains missing values, your mean will be NA
, so you will need to provide the na.rm = TRUE
argument to ignore missing values.
# compute the mean of a vector with missing values
mean(c(1, 4, 2, 8, NA))
[1] NA
mean(c(1, 4, 2, 8, NA), na.rm = TRUE)
[1] 3.75
The function length()
tells us how many entries the vector contains
# compute the length of the age vector
length(age)
[1] 5
We can use the sum()
function to tally up the output of logical operations
# compute the sum of the logical vector c(TRUE, FALSE, FALSE, TRUE)
sum(c(TRUE, FALSE, FALSE, TRUE))
[1] 2
# add up the number of age entries that are 17 or 18
sum(age %in% c(17, 18))
[1] 2
# add up the number of age entries that are over 18
sum(age > 18)
[1] 2
What is the proportion of entries in the age
vector that are greater than 18?
Two possible answers:
sum(age > 18) / length(age)
[1] 0.4
mean(age > 18)
[1] 0.4
We use the square bracket indexing syntax to extract entries from a vector
# extract the first entry from age
1] age[
[1] 12
# extract the fourth entry from age
4] age[
[1] 21
To extract the last entry in a vector:
# extract the final entry from age
length(age)] age[
[1] 17
To remove an entry from a vector, you can use the -
syntax.
# remove the first entry from age
-1] age[
[1] 18 22 21 17
Has age changed?
age
[1] 12 18 22 21 17
# remove the fourth entry from age
-4] age[
[1] 12 18 22 17
Extract the first and third entries from age
# extract entries 1 and 3 from age
c(1, 3)] age[
[1] 12 22
Remove the first and third entries
# remove entries 1 and 3 from age
-c(1, 3)] age[
[1] 18 21 17
If we want to quickly define a sequence of integers, we can use the :
syntax
# use c() to create a vector 1, 2, 3, 4
c(1, 2, 3, 4)
[1] 1 2 3 4
# use the : syntax to create a vector 1, 2, 3, 4
1:4
[1] 1 2 3 4
# use the : syntax to create a vector of integers from 5 to 25
5:25
[1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
We can use these sequences to extract entries from vectors
# Use : to extract the first 4 entries from age
1:4] age[
[1] 12 18 22 21
What if I want to remove sequences of entries?
# Use : to remove the first 4 entries from age
-(1:4)] age[
[1] 17
The seq()
function allows us to create a sequence of integers with non-unit increments
# use seq() to create a sequence from 4 to 20 that increments by 2
# include argument names
seq(from = 4, to = 20, by = 2)
[1] 4 6 8 10 12 14 16 18 20
# exclude the argument names
seq(4, 20, 2)
[1] 4 6 8 10 12 14 16 18 20
# use seq() to extract every second entry from age
seq(1, 5, 2)] age[
[1] 12 22 17
Create a vector that contains the values 5, 9, 2, 21, 34, 56, 2, -1, 5, 9
Extract the 4th and 9th entries
Extract the 1st and last entry
Extract every 3rd entry starting from the 2nd entry
<- c(5, 9, 2, 21, 34, 56, 2, -1, 5, 9)
my_vec c(4, 9)] my_vec[
[1] 21 5
c(1, length(my_vec))] my_vec[
[1] 5 9
seq(2, length(my_vec), 3)] my_vec[
[1] 9 34 -1
You can provide names to the entries of a vector
# Add the names "Dean", "Xiao", "Sara", "Ravi", "Maya" to the age vector
names(age) <- c("Dean", "Xiao", "Sara", "Ravi", "Maya")
age
Dean Xiao Sara Ravi Maya
12 18 22 21 17
You can extract/index entries from a vector using their names:
# extract Ravi's age using named indexing
"Ravi"] age[
Ravi
21
# extract Maya and Ravi's ages using named indexing
c("Maya", "Ravi")] age[
Maya Ravi
17 21
We can use logical vectors to subset a vector
# use a logical vector to extract the first, fourth, and fifth entries from age
c(TRUE, FALSE, FALSE, TRUE, TRUE)] age[
Dean Ravi Maya
12 21 17
To identify which entries in a vector satisfy a criteria, I can ask a logical question.
# reminder: identify which entries in age are at least 18
>= 18 age
Dean Xiao Sara Ravi Maya
FALSE TRUE TRUE TRUE FALSE
and we can use logical expressions to subset a vector too
# use age >= 18 to extract all ages that are at least 18
>= 18] age[age
Xiao Sara Ravi
18 22 21
How can we combine multiple conditions when subsetting, e.g., >= 17 and < 20
# Try to combine the two conditions with a comma (,)
c(age >= 16, age < 20)] age[
Xiao Sara Ravi Maya <NA> <NA> <NA>
18 22 21 17 NA NA NA
We need to use the &
operator
# Identify which entries in age are both >= 16 and less than < 20
>= 16) & (age < 20) (age
Dean Xiao Sara Ravi Maya
FALSE TRUE FALSE FALSE TRUE
# What will be the output of the following & operations:
TRUE & TRUE
[1] TRUE
TRUE & FALSE
[1] FALSE
The “or” operator corresponds to the “|”
# What will be the output of the following | operations:
TRUE | TRUE
[1] TRUE
TRUE | FALSE
[1] TRUE
# what do you think will happen when we ask if age <= 16 OR age > 20
<= 16 | age > 20 age
Dean Xiao Sara Ravi Maya
TRUE FALSE TRUE TRUE FALSE
# extract all of the ages that are either <= 16 or > 20
<= 16 | age > 20] age[age
Dean Sara Ravi
12 22 21
<- c(4, 19, 2, 2, 3, 90, 55, 12) vec
Write some code for extracting the entries that are
less than 10
less than 25 but greater than 10
either less than 10 or equal to 55
# a
< 10] vec[vec
[1] 4 2 2 3
# b
< 25) & (vec > 10)] vec[(vec
[1] 19 12
# c
< 10) | (vec == 55)] vec[(vec
[1] 4 2 2 3 55