Vectors

Author

Rebecca Barter

Introduction to vectors

Previously, we were just defining simple objects such as

# define a variable x, containing 12
x <- 12
x
[1] 12

Let’s create a vector that contains the ages of 5 people:

# Use c() to define a variable age that contains the ages of 5 people at once
age <- c(12, 19, 22, 35, 18)
age
[1] 12 19 22 35 18

Let’s ask what age’s type is:

# identify the class of age
class(age)
[1] "numeric"

Let’s try and create a multi-vector consisting of multiple different types

# try to create a vector, multi_vec, with numeric and character values
multi_vec <- c(1, 9, "banana", 10, -1)
multi_vec
[1] "1"      "9"      "banana" "10"     "-1"    

Let’s check the class of multi_vec

# check the class of multi_vec
class(multi_vec)
[1] "character"

This is an example of “type conversion”.

Let’s try to combine numeric and logical values in a vector

# try to create a vector with numeric and logical values
multi_vec2 <- c(1, 5, TRUE, FALSE, -9)
multi_vec2
[1]  1  5  1  0 -9

In type conversion, there is a hierarchy of types: character > numeric > logical.

Exercise

What will be the output and type of the following code?

vector_example <- c(TRUE, 4, "hello", FALSE, 0)
vector_example
[1] "TRUE"  "4"     "hello" "FALSE" "0"    
# check the class
class(vector_example)
[1] "character"

Working with vectors

age <- c(12, 18, 22, 21, 17)

The following is an example of a “vectorized” operation:

# subtract 1 from age
age - 1
[1] 11 17 21 20 16
# define a new vector, age2, corresponding to age * 2
age2 <- age * 2
age2
[1] 24 36 44 42 34

We can do computations with the two vectors together:

# subtract age from age2
age2 - age
[1] 12 18 22 21 17

What happens when you try to do a computation with vectors of different lengths?

# try to subtract the vector c(1, 2) from age
age - c(1, 2)
Warning in age - c(1, 2): longer object length is not a multiple of shorter
object length
[1] 11 16 21 19 16

When you do mathematical computations on vectors of different lengths, the computation uses recycling

The above subtraction is equivalent to

age - c(1, 2, 1, 2, 1)
[1] 11 16 21 19 16

Vectorized logical operations

Let’s ask which age entries are greater or equal to 18

# ask which entries in age are greater than or equal to 18
age >= 18
[1] FALSE  TRUE  TRUE  TRUE FALSE

Let’s ask which entries are equal to 35

# ask which entries in age are equal to 17
age == 17
[1] FALSE FALSE FALSE FALSE  TRUE

Let’s ask which entries are equal to 17 or 18

# try to use == c(17, 18) to ask which entries are equal to 17 or 18
age == c(17, 18)
Warning in age == c(17, 18): longer object length is not a multiple of shorter
object length
[1] FALSE  TRUE FALSE FALSE  TRUE

This is only working because we got lucky with our recycling.

This breaks if we switch the order of 18 and 17.

# switch the order of 17 and 18
age == c(18, 17)
Warning in age == c(18, 17): longer object length is not a multiple of shorter
object length
[1] FALSE FALSE FALSE FALSE FALSE

We can instead use the %in% operator:

# use %in% to ask which entries in age are equal to 17 or 18
age %in% c(17, 18) 
[1] FALSE  TRUE FALSE FALSE  TRUE

Functions for vectors

The sum() function can be used to add up all of the entries of a vector

# compute the sum of all entries in age
sum(age)
[1] 90

The mean() function computes the mean:

# compute the mean of all entries in age
mean(age)
[1] 18

Note if your vector contains missing values, your mean will be NA, so you will need to provide the na.rm = TRUE argument to ignore missing values.

# compute the mean of a vector with missing values
mean(c(1, 4, 2, 8, NA))
[1] NA
mean(c(1, 4, 2, 8, NA), na.rm = TRUE)
[1] 3.75

The function length() tells us how many entries the vector contains

# compute the length of the age vector
length(age)
[1] 5

We can use the sum() function to tally up the output of logical operations

# compute the sum of the logical vector c(TRUE, FALSE, FALSE, TRUE)
sum(c(TRUE, FALSE, FALSE, TRUE))
[1] 2
# add up the number of age entries that are 17 or 18
sum(age %in% c(17, 18))
[1] 2
# add up the number of age entries that are over 18
sum(age > 18)
[1] 2

Exercise

What is the proportion of entries in the age vector that are greater than 18?

Two possible answers:

sum(age > 18) / length(age)
[1] 0.4
mean(age > 18)
[1] 0.4

Indexing and subsetting vectors

We use the square bracket indexing syntax to extract entries from a vector

# extract the first entry from age
age[1]
[1] 12
# extract the fourth entry from age
age[4]
[1] 21

To extract the last entry in a vector:

# extract the final entry from age
age[length(age)]
[1] 17

To remove an entry from a vector, you can use the - syntax.

# remove the first entry from age
age[-1]
[1] 18 22 21 17

Has age changed?

age
[1] 12 18 22 21 17
# remove the fourth entry from age
age[-4]
[1] 12 18 22 17

Extracting/removing multiple entries from a vector

Extract the first and third entries from age

# extract entries 1 and 3 from age
age[c(1, 3)]
[1] 12 22

Remove the first and third entries

# remove entries 1 and 3 from age
age[-c(1, 3)]
[1] 18 21 17

Definining integer sequences

If we want to quickly define a sequence of integers, we can use the : syntax

# use c() to create a vector 1, 2, 3, 4
c(1, 2, 3, 4)
[1] 1 2 3 4
# use the : syntax to create a vector 1, 2, 3, 4
1:4
[1] 1 2 3 4
# use the : syntax to create a vector of integers from 5 to 25
5:25
 [1]  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

We can use these sequences to extract entries from vectors

# Use : to extract the first 4 entries from age
age[1:4]
[1] 12 18 22 21

What if I want to remove sequences of entries?

# Use : to remove the first 4 entries from age
age[-(1:4)]
[1] 17

The seq() function

The seq() function allows us to create a sequence of integers with non-unit increments

# use seq() to create a sequence from 4 to 20 that increments by 2
# include argument names
seq(from = 4, to = 20, by = 2)
[1]  4  6  8 10 12 14 16 18 20
# exclude the argument names
seq(4, 20, 2)
[1]  4  6  8 10 12 14 16 18 20
# use seq() to extract every second entry from age
age[seq(1, 5, 2)]
[1] 12 22 17

Exercise

  1. Create a vector that contains the values 5, 9, 2, 21, 34, 56, 2, -1, 5, 9

  2. Extract the 4th and 9th entries

  3. Extract the 1st and last entry

  4. Extract every 3rd entry starting from the 2nd entry

my_vec <- c(5, 9, 2, 21, 34, 56, 2, -1, 5, 9)
my_vec[c(4, 9)]
[1] 21  5
my_vec[c(1, length(my_vec))]
[1] 5 9
my_vec[seq(2, length(my_vec), 3)]
[1]  9 34 -1

Named vectors

You can provide names to the entries of a vector

# Add the names "Dean", "Xiao", "Sara", "Ravi", "Maya" to the age vector
names(age) <- c("Dean", "Xiao", "Sara", "Ravi", "Maya")
age
Dean Xiao Sara Ravi Maya 
  12   18   22   21   17 

You can extract/index entries from a vector using their names:

# extract Ravi's age using named indexing
age["Ravi"]
Ravi 
  21 
# extract Maya and Ravi's ages using named indexing
age[c("Maya", "Ravi")]
Maya Ravi 
  17   21 

Logical subsetting

We can use logical vectors to subset a vector

# use a logical vector to extract the first, fourth, and fifth entries from age
age[c(TRUE, FALSE, FALSE, TRUE, TRUE)]
Dean Ravi Maya 
  12   21   17 

To identify which entries in a vector satisfy a criteria, I can ask a logical question.

# reminder: identify which entries in age are at least 18
age >= 18
 Dean  Xiao  Sara  Ravi  Maya 
FALSE  TRUE  TRUE  TRUE FALSE 

and we can use logical expressions to subset a vector too

# use age >= 18 to extract all ages that are at least 18
age[age >= 18]
Xiao Sara Ravi 
  18   22   21 

Subsetting with multiple conditions

How can we combine multiple conditions when subsetting, e.g., >= 17 and < 20

# Try to combine the two conditions with a comma (,)
age[c(age >= 16, age < 20)]
Xiao Sara Ravi Maya <NA> <NA> <NA> 
  18   22   21   17   NA   NA   NA 

We need to use the & operator

# Identify which entries in age are both >= 16 and less than < 20
(age >= 16) & (age < 20)
 Dean  Xiao  Sara  Ravi  Maya 
FALSE  TRUE FALSE FALSE  TRUE 
# What will be the output of the following & operations:
TRUE & TRUE
[1] TRUE
TRUE & FALSE
[1] FALSE

The “or” operator corresponds to the “|”

# What will be the output of the following | operations:
TRUE | TRUE
[1] TRUE
TRUE | FALSE
[1] TRUE
# what do you think will happen when we ask if age <= 16 OR age > 20
age <= 16 | age > 20
 Dean  Xiao  Sara  Ravi  Maya 
 TRUE FALSE  TRUE  TRUE FALSE 
# extract all of the ages that are either <= 16 or > 20
age[age <= 16 | age > 20]
Dean Sara Ravi 
  12   22   21 

Exercise

vec <- c(4, 19, 2, 2, 3, 90, 55, 12)

Write some code for extracting the entries that are

  1. less than 10

  2. less than 25 but greater than 10

  3. either less than 10 or equal to 55

# a
vec[vec < 10]
[1] 4 2 2 3
# b
vec[(vec < 25) & (vec > 10)]
[1] 19 12
# c
vec[(vec < 10) | (vec == 55)]
[1]  4  2  2  3 55