+ - 0:00:00
Notes for current slide
Notes for next slide

Mapping Data
to Graphics

Session 3

PMAP 8921: Data Visualization with R
Andrew Young School of Policy Studies
Summer 2022

1 / 67

Plan for today

2 / 67

Plan for today

Data, aesthetics, & the grammar of graphics

2 / 67

Plan for today

Data, aesthetics, & the grammar of graphics

Grammatical layers

2 / 67

Plan for today

Data, aesthetics, & the grammar of graphics

Grammatical layers

Aesthetics in extra dimensions

2 / 67

Plan for today

Data, aesthetics, & the grammar of graphics

Grammatical layers

Aesthetics in extra dimensions

Tidy data

2 / 67

Data, aesthetics,
& the grammar of graphics

3 / 67
4 / 67

Source: Wikipedia

Long distance!

Moscow to Vilnius
Moscow to Vilnius
5 / 67

Very cold!

6 / 67

Lots of people died!

7 / 67
8 / 67

Mapping data to aesthetics

ZZZ

Aesthetic

Visual property of a graph

Position, shape, color, etc.

Data

A column in a dataset

9 / 67

Mapping data to aesthetics

Data Aesthetic Graphic/Geometry
Longitude Position (x-axis)  Point
Latitude Position (y-axis) Point
Army size Size Path
Army direction  Color Path
Date Position (x-axis) Line + text
Temperature Position (y-axis) Line + text
10 / 67

Mapping data to aesthetics

Data aes() geom
Longitude x geom_point()
Latitude y geom_point()
Army size size geom_path()
Army direction  color geom_path()
Date x geom_line() + geom_text()
Temperature y geom_line() + geom_text()
11 / 67

ggplot() template

12 / 67

ggplot() template

12 / 67

This is a dataset named troops:

longitude latitude direction survivors
24 54.9 A 340000
24.5 55 A 340000
13 / 67

This is a dataset named troops:

longitude latitude direction survivors
24 54.9 A 340000
24.5 55 A 340000
13 / 67

14 / 67
15 / 67
16 / 67

Source: Gapminder

Mapping data to aesthetics

Data aes() geom
Wealth (GDP/capita) x geom_point()
Health (Life expectancy)  y geom_point()
Continent color geom_point()
Population size geom_point()
17 / 67

This is a dataset named gapminder_2007:

country continent gdpPercap lifeExp pop
Afghanistan Asia 974.5803384 43.828 31889923
Albania Europe 5937.029526 76.423 3600523
18 / 67

This is a dataset named gapminder_2007:

country continent gdpPercap lifeExp pop
Afghanistan Asia 974.5803384 43.828 31889923
Albania Europe 5937.029526 76.423 3600523
18 / 67

Health and wealth

19 / 67

Grammatical layers

20 / 67

Grammar components as layers

So far we know about data, aesthetics, and geometries

Think of these
components as layers

Add them to foundational ggplot() with +

 

21 / 67

Possible aesthetics

color (discrete)

color (continuous)

size

fill

shape

alpha

22 / 67

Possible geoms

Example geom What it makes
geom_col() Bar charts
geom_text() Text
geom_point() Points
geom_boxplot() Boxplots
geom_sf() Maps
23 / 67

Possible geoms

There are dozens of possible geoms and
each class session will cover different ones.

See the ggplot2 documentation for
complete examples of all the different geom layers

24 / 67

Additional layers

There are many of other grammatical layers we can use to describe graphs!

We sequentially add layers onto the foundational ggplot() plot to create complex figures

25 / 67

Scales

Scales change the properties of the variable mapping

Example layer What it does
scale_x_continuous() Make the x-axis continuous
scale_x_continuous(breaks = 1:5)  Manually specify axis ticks
scale_x_log10() Log the x-axis
scale_color_gradient() Use a gradient
scale_fill_viridis_d() Fill with discrete viridis colors
26 / 67

Scales

scale_x_log10()

27 / 67

Scales

scale_x_log10()

scale_color_viridis_d()

27 / 67

Facets

Facets show subplots for different subsets of data

Example layer What it does
facet_wrap(vars(continent)) Plot for each continent
facet_wrap(vars(continent, year)) Plot for each continent/year
facet_wrap(..., ncol = 1) Put all facets in one column
facet_wrap(..., nrow = 1) Put all facets in one row
28 / 67

Facets

facet_wrap(vars(continent))

29 / 67

Facets

facet_wrap(vars(continent))

facet_wrap(vars(continent, year))

29 / 67

Coordinates

Change the coordinate system

Example layer What it does
coord_cartesian() Plot for each continent
coord_cartesian(ylim = c(1, 10)) Zoom in where y is 1–10
coord_flip() Switch x and y
coord_polar() Use circular polar system
30 / 67

Coordinates

coord_cartesian(ylim = c(70, 80), xlim = c(10000, 30000))

31 / 67

Coordinates

coord_cartesian(ylim = c(70, 80), xlim = c(10000, 30000))

coord_flip()

31 / 67

Labels

Add labels to the plot with a single labs() layer

Example layer What it does
labs(title = "Neat title") Title
labs(caption = "Something") Caption
labs(y = "Something") y-axis
labs(size = "Population") Title of size legend
32 / 67

Labels

ggplot(gapminder_2007,
aes(x = gdpPercap, y = lifeExp,
color = continent, size = pop)) +
geom_point() +
scale_x_log10() +
labs(title = "Health and wealth grow together",
subtitle = "Data from 2007",
x = "Wealth (GDP per capita)",
y = "Health (life expectancy)",
color = "Continent",
size = "Population",
caption = "Source: The Gapminder Project")

33 / 67

Theme

Change the appearance of anything in the plot

There are many built-in themes

Example layer What it does
theme_grey() Default grey background
theme_bw() Black and white
theme_dark() Dark
theme_minimal() Minimal
34 / 67

Theme

theme_dark()

35 / 67

Theme

theme_dark()

theme_minimal()

35 / 67

Theme

There are collections of pre-built themes online,
like the ggthemes package

ggthemes
36 / 67

Theme

Organizations often make their own custom themes, like the BBC

ggthemes
37 / 67

Theme options

Make theme adjustments with theme()

There are a billion options here!
We have a whole class session dedicated to this!

theme_bw() +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold"),
panel.grid = element_blank(),
axis.title.y = element_text(face = "italic"))
38 / 67

So many possibilities!

These were just a few examples of layers!

See the ggplot2 documentation for
complete examples of everything you can do

39 / 67

Putting it all together

We can build a plot sequentially
to see how each grammatical layer
changes the appearance

40 / 67

Start with data and aesthetics

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv))

41 / 67

Add a point geom

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv)) +
geom_point()

42 / 67

Add a smooth geom

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv)) +
geom_point() +
geom_smooth()

43 / 67

Make it straight

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv)) +
geom_point() +
geom_smooth(method = "lm")

44 / 67

Use a viridis color scale

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv)) +
geom_point() +
geom_smooth(method = "lm") +
scale_color_viridis_d()

45 / 67

Facet by drive

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv)) +
geom_point() +
geom_smooth(method = "lm") +
scale_color_viridis_d() +
facet_wrap(vars(drv), ncol = 1)

46 / 67

Add labels

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv)) +
geom_point() +
geom_smooth(method = "lm") +
scale_color_viridis_d() +
facet_wrap(vars(drv), ncol = 1) +
labs(x = "Displacement", y = "Highway MPG",
color = "Drive",
title = "Heavier cars get lower mileage",
subtitle = "Displacement indicates weight(?)",
caption = "I know nothing about cars")

47 / 67

Add a theme

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv)) +
geom_point() +
geom_smooth(method = "lm") +
scale_color_viridis_d() +
facet_wrap(vars(drv), ncol = 1) +
labs(x = "Displacement", y = "Highway MPG",
color = "Drive",
title = "Heavier cars get lower mileage",
subtitle = "Displacement indicates weight(?)",
caption = "I know nothing about cars") +
theme_bw()

48 / 67

Modify the theme

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv)) +
geom_point() +
geom_smooth(method = "lm") +
scale_color_viridis_d() +
facet_wrap(vars(drv), ncol = 1) +
labs(x = "Displacement", y = "Highway MPG",
color = "Drive",
title = "Heavier cars get lower mileage",
subtitle = "Displacement indicates weight(?)",
caption = "I know nothing about cars") +
theme_bw() +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold"))

49 / 67

Finished!

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = drv)) +
geom_point() +
geom_smooth(method = "lm") +
scale_color_viridis_d() +
facet_wrap(vars(drv), ncol = 1) +
labs(x = "Displacement", y = "Highway MPG",
color = "Drive",
title = "Heavier cars get lower mileage",
subtitle = "Displacement indicates weight(?)",
caption = "I know nothing about cars") +
theme_bw() +
theme(legend.position = "bottom",
plot.title = element_text(face = "bold"))

50 / 67

A true grammar

With the grammar of graphics, we don't talk about specific chart types

Hunt through Excel menus for a stacked bar chart and manually reshape your data to work with it

Excel chart types
51 / 67

A true grammar

With the grammar of graphics, we do talk about specific chart elements

Map a column to the x-axis, fill by a different variable, and geom_col() to get stacked bars

Geoms can be interchangable
(e.g. switch geom_violin() to geom_boxplot())

Grammar of graphics layers
52 / 67

Describing graphs with the grammar

Map wealth to the x-axis, health to the y-axis, add points, color by continent, size by population, scale the y-axis with a log, and facet by year

ggplot(data = filter(gapminder, year %in% c(2002, 2007)),
mapping = aes(x = gdpPercap,
y = lifeExp,
color = continent,
size = pop)) +
geom_point() +
scale_x_log10() +
facet_wrap(vars(year), ncol = 1)

53 / 67

Describing graphs with the grammar

Map health to the x-axis, add a histogram with bins for every 5 years, fill and facet by continent

ggplot(data = gapminder_2007,
mapping = aes(x = lifeExp,
fill = continent)) +
geom_histogram(binwidth = 5,
color = "white") +
guides(fill = "none") + # Turn off legend
facet_wrap(vars(continent))

54 / 67

Describing graphs with the grammar

Map continent to the x-axis, health to the y-axis, add violin plots and semi-transparent boxplots, fill by continent

ggplot(data = gapminder,
mapping = aes(x = continent,
y = lifeExp,
fill = continent)) +
geom_violin() +
geom_boxplot(alpha = 0.5) +
guides(fill = "none") # Turn off legend

55 / 67

Aesthetics in
extra dimensions

56 / 67

Time

Use gganimate to map variables to a time aesthetic

ggplot(gapminder, aes(x = gdpPercap, y = lifeExp,
size = pop, color = country)) +
geom_point(alpha = 0.7) +
scale_size(range = c(2, 12)) +
scale_x_log10(labels = scales::dollar) +
guides(size = "none", color = "none") +
facet_wrap(~continent) +
# Special gganimate stuff
labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') +
transition_time(year) +
ease_aes('linear')

57 / 67

Sound

Visualize internal rhyming schemes in music

http://graphics.wsj.com/hamilton/

Daveed Diggs in 'Washington On Your Side'
58 / 67
Daveed Diggs in 'Washington On Your Side'
Kendrick Lamar in 'good kid, m.A.A.d city'
59 / 67

Animation, time, and sound

60 / 67

Tidy data

61 / 67

Data shapes

For ggplot() to work,
your data needs to be in a tidy format

62 / 67

Data shapes

For ggplot() to work,
your data needs to be in a tidy format

This doesn't mean that it's clean—
it refers to the structure of the data

62 / 67

Data shapes

For ggplot() to work,
your data needs to be in a tidy format

This doesn't mean that it's clean—
it refers to the structure of the data

All the packages in the tidyverse work best with
tidy data; that why it's called that!

62 / 67

Tidy data

Each variable has its own column

63 / 67

Tidy data

Each variable has its own column

Each observation has its own row

63 / 67

Tidy data

Each variable has its own column

Each observation has its own row

Each value has its own cell

63 / 67

Tidy data

Each variable has its own column

Each observation has its own row

Each value has its own cell

63 / 67

Untidy data example

Real world data is often untidy, like this:

Example of untidy data
64 / 67

Tidy data example

Here's the tidy version of that same data:

Example of tidy data

This is plottable!

65 / 67

Wide vs. long

Tidy data is also called "long" data

Example of tidy data
66 / 67

Figure by Garrick Aden-Buie in tidyexplain

Moving from wide to long

Nowadays, gather() is called pivot_longer() and spread() is called pivot_wider()

Moving from wide to long

67 / 67

Figure by Garrick Aden-Buie in tidyexplain

Moving from wide to long

Nowadays, gather() is called pivot_longer() and spread() is called pivot_wider()

Moving from wide to long

67 / 67

Figure by Garrick Aden-Buie in tidyexplain

Plan for today

2 / 67
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow