# load the gapminder dataset using read.csv()
Data frames with base R
Working with data frames using “Base R”
Let’s load the gapminder dataset
To extract individual entries from a data frame using base R
# extract the entry in the 3rd row and 4th column
# extract the entries in the 3rd and 4th rows and 4th column
# extract the entry in the 3rd and 4th rows and 4th and 5th columns
What type of object are these?
Extracting entire columns from a data frame
There are many ways to extract a single column from a data frame:
# Extract the 4th column using df[, col]
# extract the lifeExp column (using df[, "col"] syntax)
# extract the lifeExp column (using df$col syntax)
What type of object are these?
What do you think the output of the following code will be
# head(gapminder[3])
# head(gapminder["year"])
A data frame can be thought of as a collection (technically a “list”) of vectors, so the third entry, is the third vector.
Notice the difference in the output between these two ways of extracting the third column:
# head(gapminder[, 3])
# head(gapminder[3])
To extract the third vector/column directly, you can use double square parentheses [[]]
. This is actually list notation.
# extract the third column with `[[]]` using numbered indexing
# extract the third column with `[[]]` using named indexing ("year")
Exercise
- Extract the
gdpPercap
entry for the fourth and fifth rows
- Extract the entire
lifeExp
column in as many different ways as you can (you may want to just look at the head() of your outputs).
Using logical indexing
Let’s create a logical vector, called is_aus
that is TRUE
when country is “Australia” and FALSE
otherwise.
# create vector is_aus
# test that is_aus contains at least some TRUE values
Use is_aus
to filter to just the rows for Australia.
# use is_aus to filter to just the rows for Australia
Removing columns using negative indexing
You can use negative indexing to remove columns
# remove the third column from gapminder (don't overwrite gapminder)
# if you wanted to update the gapminder dataset:
# gapminder <- gapminder[-3]
Adding columns
You can also use the above syntaxes to add new columns
# create a copy of gapminder called gapminder_tmp
# add a new column to gapminder_tmp called gap, which is the product of pop and gdpPercap
# look at the head of gapminder
Exercise
Modify the lifeExp
column of gapminder_tmp
so that it is rounded to the nearest integer (use round()
).
Challenge: do this JUST for the Australia rows. Check your output for the just the country and lifeExp columns for the first 100 rows
Hint: to undo any changes to gapminder_tmp
, reassign it to the original gapminder object: gapminder_tmp <- gapminder