We’ve seen the x and y point aesthetics, but there are many others too.
For example, you can specify the color of the points using the color aesthetic:
# use gapminder_2007 to create a scatterplot of gdpPercap (x) against lifeExp (y)# where color is based on continentggplot(gapminder_2007) +geom_point(aes(x = gdpPercap, y = lifeExp, color = continent))
To specify a global aesthetic that does not depend on a column in your data, you need to specify it outside the aes() function.
# use gapminder_2007 to create a scatterplot of gdpPercap (x) against lifeExp (y)# where all points are colored blueggplot(gapminder_2007) +geom_point(aes(x = gdpPercap, y = lifeExp), color ="blue")
Exercise
Specify the shape aesthetic of each point in two ways:
Sometimes when you have a lot of data points, you might want to add some transparency. You can do this using the alpha argument. alpha takes values between 0 and 1. alpha = 1 is not transparent at all, and alpha = 0 is completely transparent.
# add transparency to the 2007 scatterplot of gdpPercap (x) against lifeExp (y)ggplot(gapminder_2007) +geom_point(aes(x = gdpPercap, y = lifeExp), alpha =0.5)
Exercise
Recreate the 2007 gdpPercap vs lifeExp plot in which you color by continent, size is determined by population, and the points have a transparency of 0.5.
ggplot(gapminder_2007) +geom_point(aes(x = gdpPercap, y = lifeExp, color = continent, size = pop), alpha =0.5)
Line plots
Let’s create a line plot of lifeExp by year for each country in the Americas.
# create a line plot for each country in the Americasgapminder |>filter(continent =="Americas") |>ggplot() +geom_line(aes(x = year, y = lifeExp, # if you want separate lines, you need to provide a group variablegroup = country))
Exercise
Compute the average life expectancy for each continent for each year, and then create a line plot of the average life expectancy for each continent over time.
gapminder |>group_by(continent, year) |>summarize(mean_life_exp =mean(lifeExp)) |>ggplot() +geom_line(aes(x = year, y = mean_life_exp, color = continent))
`summarise()` has grouped output by 'continent'. You can override using the
`.groups` argument.
Boxplots
Let’s create some boxplots of lifeExp for each continent
# create boxplots of the lifeExp for each continentggplot(gapminder) +geom_boxplot(aes(x = continent, y = lifeExp))
Histograms
Let’s create a histogram of lifeExp
# create a histogram of lifeExpggplot(gapminder) +geom_histogram(aes(x = lifeExp))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Bar charts
You can create a count bar chart, by providing a categorical (character/factor) variable as your x-aesthetic to geom_bar()
# create a bar chart of the continent *counts*ggplot(gapminder) +geom_bar(aes(x = continent))
If you want to create bar charts where you specify the height of each bar based on a variable in your data, you need to use geom_col() instead of geom_bar().
# create a bar chart of the average lifeExp for each continent using geom_col()gapminder |>group_by(continent) |>summarize(mean_life_exp =mean(lifeExp)) |>ggplot() +geom_col(aes(x = continent, y = mean_life_exp))
Layering geom_layers
You can add multiple layers of geoms in the same plot.
# (from the exercise above) compute the average lifeExp for each continent-year # combination, then create a line plot of the mean_life_exp over time for each # continent, and then # add the points on top of the linegapminder |>group_by(continent, year) |>summarize(mean_life_exp =mean(lifeExp)) |>ggplot(aes(x = year, y = mean_life_exp, color = continent)) +geom_line() +geom_point()
`summarise()` has grouped output by 'continent'. You can override using the
`.groups` argument.
Getting fancy with ggplot2
Transformations
You can apply log-scale transformations to your axis by adding a scale layer.
# for the 2007 gdpPercap-lifeExp scatterplot colored by continent# add a log10 scale layer to the x-axis ggplot(gapminder_2007) +geom_point(aes(x = gdpPercap, y = lifeExp, color = continent)) +scale_x_log10()
Labels
You can clean the labels of your figure using the labs() function
# take your previous plot, add nice labels using `labs()`# save the ggplot2 object as my_scattermy_scatter <-ggplot(gapminder_2007) +geom_point(aes(x = gdpPercap, y = lifeExp, color = continent)) +scale_x_log10() +labs(x ="GDP per capita", y ="Life expectancy", title ="GDP per cap vs life expectancy")my_scatter
Themes
You can change the theme of your figure by adding a themes layer
# try out a few themes layers: theme_classic(), theme_bw(), theme_dark()my_scatter +theme_classic()
my_scatter +theme_bw()
my_scatter +theme_dark()
ggplot(gapminder_2007) +geom_point(aes(x = gdpPercap, y = lifeExp, color = continent)) +scale_x_log10() +labs(x ="GDP per capita", y ="Life expectancy", title ="GDP per cap vs life expectancy") +theme_dark()
Faceted grids
You can create a grid of plots using facet_wrap().
# create a line plot of lifeExp over time for each country, separately for each continentggplot(gapminder) +geom_line(aes(x = year, y = lifeExp, group = country),alpha =0.2) +facet_wrap(~continent, ncol =2)
Project exercise: world happiness
Load in the world happiness dataset (whr_2023.csv). Look at the data dictionary provided. Identify which variable indicates the country’s happiness score.
Note that there are many missing values (NA) in this data. If you want to compute a mean of a variable with missing values, you need to specify the na.rm = TRUE. If you need to, you can also use the drop_na() dplyr function to remove all rows with missing values (but this is not necessarily recommended).
mean(c(1, 4, NA, 2))
[1] NA
mean(c(1, 4, NA, 2), na.rm =TRUE)
[1] 2.333333
Conduct some explorations of the data using your dplyr and ggplot2 skills. Create at least one interesting polished plot. You are welcome to look at just one year, or even just one country!
Make sure that your plot has a clear takeaway message. Remember that less is sometimes more: just because you can add a billion things to your plot, doesn’t mean that you should!
One idea: Look at Australia’s happiness score (life_ladder) over time.
happiness <-read_csv("data/whr_2023.csv")
Rows: 2970 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (1): country_name
dbl (10): year, life_ladder, log_GDP_per_capita, social_support, healthy_lif...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
happiness
# A tibble: 2,970 × 11
country_name year life_ladder log_GDP_per_capita social_support
<chr> <dbl> <dbl> <dbl> <dbl>
1 Afghanistan 2005 NA NA NA
2 Afghanistan 2006 NA NA NA
3 Afghanistan 2007 NA NA NA
4 Afghanistan 2008 3.72 7.35 0.451
5 Afghanistan 2009 4.40 7.51 0.552
6 Afghanistan 2010 4.76 7.61 0.539
7 Afghanistan 2011 3.83 7.58 0.521
8 Afghanistan 2012 3.78 7.66 0.521
9 Afghanistan 2013 3.57 7.68 0.484
10 Afghanistan 2014 3.13 7.67 0.526
# ℹ 2,960 more rows
# ℹ 6 more variables: healthy_life_expectancy_at_birth <dbl>,
# freedom_to_make_life_choices <dbl>, generosity <dbl>,
# perceptions_of_corruption <dbl>, positive_affect <dbl>,
# negative_affect <dbl>
happiness |>filter(country_name =="Australia", year >=2010) |>ggplot() +geom_line(aes(x = year, y = life_ladder),col ="firebrick", linewidth =1.1) +theme_classic() +labs(x ="Year", y ="Happiness score", title ="Australia's decreasing happiness trend")