Session 3
PMAP 8921: Data Visualization with R
Andrew Young School of Policy Studies
Summer 2022
Data, aesthetics, & the grammar of graphics
Data, aesthetics, & the grammar of graphics
Grammatical layers
Data, aesthetics, & the grammar of graphics
Grammatical layers
Aesthetics in extra dimensions
Data, aesthetics, & the grammar of graphics
Grammatical layers
Aesthetics in extra dimensions
Tidy data
Source: Wikipedia
Source: Wikimedia Commons
Aesthetic
Visual property of a graph
Position, shape, color, etc.
Data
A column in a dataset
Data | Aesthetic | Graphic/Geometry |
---|---|---|
Longitude | Position (x-axis) | Point |
Latitude | Position (y-axis) | Point |
Army size | Size | Path |
Army direction | Color | Path |
Date | Position (x-axis) | Line + text |
Temperature | Position (y-axis) | Line + text |
Data | aes() |
geom |
---|---|---|
Longitude | x |
geom_point() |
Latitude | y |
geom_point() |
Army size | size |
geom_path() |
Army direction | color |
geom_path() |
Date | x |
geom_line() + geom_text() |
Temperature | y |
geom_line() + geom_text() |
ggplot()
templateggplot()
templateThis is a dataset named troops
:
longitude | latitude | direction | survivors |
---|---|---|---|
24 | 54.9 | A | 340000 |
24.5 | 55 | A | 340000 |
… | … | … | … |
This is a dataset named troops
:
longitude | latitude | direction | survivors |
---|---|---|---|
24 | 54.9 | A | 340000 |
24.5 | 55 | A | 340000 |
… | … | … | … |
Source: New York Times
Source: Gapminder
Data | aes() |
geom |
---|---|---|
Wealth (GDP/capita) | x |
geom_point() |
Health (Life expectancy) | y |
geom_point() |
Continent | color |
geom_point() |
Population | size |
geom_point() |
This is a dataset named gapminder_2007
:
country | continent | gdpPercap | lifeExp | pop |
---|---|---|---|---|
Afghanistan | Asia | 974.5803384 | 43.828 | 31889923 |
Albania | Europe | 5937.029526 | 76.423 | 3600523 |
… | … | … | … | … |
This is a dataset named gapminder_2007
:
country | continent | gdpPercap | lifeExp | pop |
---|---|---|---|---|
Afghanistan | Asia | 974.5803384 | 43.828 | 31889923 |
Albania | Europe | 5937.029526 | 76.423 | 3600523 |
… | … | … | … | … |
So far we know about data, aesthetics, and geometries
Think of these
components as layers
Add them to foundational ggplot()
with +
Layer analogy borrowed from Thomas Lin Pedersen and his "Drawing Anything with ggplot2" workshop.
color
(discrete)
color
(continuous)
size
fill
shape
alpha
Example geom | What it makes | |
---|---|---|
![]() |
geom_col() |
Bar charts |
![]() |
geom_text() |
Text |
![]() |
geom_point() |
Points |
![]() |
geom_boxplot() |
Boxplots |
![]() |
geom_sf() |
Maps |
There are dozens of possible geoms and
each class session will cover different ones.
See the ggplot2 documentation for
complete examples of all the different geom layers
There are many of other grammatical layers we can use to describe graphs!
We sequentially add layers onto the foundational ggplot()
plot to create complex figures
Scales change the properties of the variable mapping
Example layer | What it does |
---|---|
scale_x_continuous() |
Make the x-axis continuous |
scale_x_continuous(breaks = 1:5) |
Manually specify axis ticks |
scale_x_log10() |
Log the x-axis |
scale_color_gradient() |
Use a gradient |
scale_fill_viridis_d() |
Fill with discrete viridis colors |
scale_x_log10()
scale_x_log10()
scale_color_viridis_d()
Facets show subplots for different subsets of data
Example layer | What it does |
---|---|
facet_wrap(vars(continent)) |
Plot for each continent |
facet_wrap(vars(continent, year)) |
Plot for each continent/year |
facet_wrap(..., ncol = 1) |
Put all facets in one column |
facet_wrap(..., nrow = 1) |
Put all facets in one row |
facet_wrap(vars(continent))
facet_wrap(vars(continent))
facet_wrap(vars(continent, year))
Change the coordinate system
Example layer | What it does |
---|---|
coord_cartesian() |
Plot for each continent |
coord_cartesian(ylim = c(1, 10)) |
Zoom in where y is 1–10 |
coord_flip() |
Switch x and y |
coord_polar() |
Use circular polar system |
coord_cartesian(ylim = c(70, 80), xlim = c(10000, 30000))
coord_cartesian(ylim = c(70, 80), xlim = c(10000, 30000))
coord_flip()
Add labels to the plot with a single labs()
layer
Example layer | What it does |
---|---|
labs(title = "Neat title") |
Title |
labs(caption = "Something") |
Caption |
labs(y = "Something") |
y-axis |
labs(size = "Population") |
Title of size legend |
ggplot(gapminder_2007, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) + geom_point() + scale_x_log10() + labs(title = "Health and wealth grow together", subtitle = "Data from 2007", x = "Wealth (GDP per capita)", y = "Health (life expectancy)", color = "Continent", size = "Population", caption = "Source: The Gapminder Project")
Change the appearance of anything in the plot
There are many built-in themes
Example layer | What it does |
---|---|
theme_grey() |
Default grey background |
theme_bw() |
Black and white |
theme_dark() |
Dark |
theme_minimal() |
Minimal |
theme_dark()
theme_dark()
theme_minimal()
Make theme adjustments with theme()
There are a billion options here!
We have a whole class session dedicated to this!
theme_bw() + theme(legend.position = "bottom", plot.title = element_text(face = "bold"), panel.grid = element_blank(), axis.title.y = element_text(face = "italic"))
These were just a few examples of layers!
See the ggplot2 documentation for
complete examples of everything you can do
We can build a plot sequentially
to see how each grammatical layer
changes the appearance
Start with data and aesthetics
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv))
Add a point geom
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point()
Add a smooth geom
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth()
Make it straight
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm")
Use a viridis color scale
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d()
Facet by drive
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1)
Add labels
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars")
Add a theme
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars") + theme_bw()
Modify the theme
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars") + theme_bw() + theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
Finished!
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars") + theme_bw() + theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
With the grammar of graphics, we don't talk about specific chart types
Hunt through Excel menus for a stacked bar chart and manually reshape your data to work with it
With the grammar of graphics, we do talk about specific chart elements
Map a column to the x-axis, fill by a different variable, and geom_col()
to get stacked bars
Geoms can be interchangable
(e.g. switch geom_violin()
to geom_boxplot()
)
Map wealth to the x-axis, health to the y-axis, add points, color by continent, size by population, scale the y-axis with a log, and facet by year
ggplot(data = filter(gapminder, year %in% c(2002, 2007)), mapping = aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) + geom_point() + scale_x_log10() + facet_wrap(vars(year), ncol = 1)
Map health to the x-axis, add a histogram with bins for every 5 years, fill and facet by continent
ggplot(data = gapminder_2007, mapping = aes(x = lifeExp, fill = continent)) + geom_histogram(binwidth = 5, color = "white") + guides(fill = "none") + # Turn off legend facet_wrap(vars(continent))
Map continent to the x-axis, health to the y-axis, add violin plots and semi-transparent boxplots, fill by continent
ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp, fill = continent)) + geom_violin() + geom_boxplot(alpha = 0.5) + guides(fill = "none") # Turn off legend
Use gganimate to map variables to a time aesthetic
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, size = pop, color = country)) + geom_point(alpha = 0.7) + scale_size(range = c(2, 12)) + scale_x_log10(labels = scales::dollar) + guides(size = "none", color = "none") + facet_wrap(~continent) + # Special gganimate stuff labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') + transition_time(year) + ease_aes('linear')
Erika Navarro for the Weather Channel: https://www.pewtrusts.org/en/research-and-analysis/articles/2018/12/03/the-weather-channel-uses-animation-to-show-dangers-of-storm-surge
For ggplot()
to work,
your data needs to be in a tidy format
For ggplot()
to work,
your data needs to be in a tidy format
This doesn't mean that it's clean—
it refers to the structure of the data
For ggplot()
to work,
your data needs to be in a tidy format
This doesn't mean that it's clean—
it refers to the structure of the data
All the packages in the tidyverse work best with
tidy data; that why it's called that!
Each variable has its own column
Each variable has its own column
Each observation has its own row
Each variable has its own column
Each observation has its own row
Each value has its own cell
Each variable has its own column
Each observation has its own row
Each value has its own cell
Real world data is often untidy, like this:
Here's the tidy version of that same data:
This is plottable!
Tidy data is also called "long" data
Figure by Garrick Aden-Buie in tidyexplain
Nowadays, gather()
is called pivot_longer()
and spread()
is called pivot_wider()
Figure by Garrick Aden-Buie in tidyexplain
Nowadays, gather()
is called pivot_longer()
and spread()
is called pivot_wider()
Figure by Garrick Aden-Buie in tidyexplain
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Session 3
PMAP 8921: Data Visualization with R
Andrew Young School of Policy Studies
Summer 2022
Data, aesthetics, & the grammar of graphics
Data, aesthetics, & the grammar of graphics
Grammatical layers
Data, aesthetics, & the grammar of graphics
Grammatical layers
Aesthetics in extra dimensions
Data, aesthetics, & the grammar of graphics
Grammatical layers
Aesthetics in extra dimensions
Tidy data
Source: Wikipedia
Source: Wikimedia Commons
Aesthetic
Visual property of a graph
Position, shape, color, etc.
Data
A column in a dataset
Data | Aesthetic | Graphic/Geometry |
---|---|---|
Longitude | Position (x-axis) | Point |
Latitude | Position (y-axis) | Point |
Army size | Size | Path |
Army direction | Color | Path |
Date | Position (x-axis) | Line + text |
Temperature | Position (y-axis) | Line + text |
Data | aes() |
geom |
---|---|---|
Longitude | x |
geom_point() |
Latitude | y |
geom_point() |
Army size | size |
geom_path() |
Army direction | color |
geom_path() |
Date | x |
geom_line() + geom_text() |
Temperature | y |
geom_line() + geom_text() |
ggplot()
templateggplot()
templateThis is a dataset named troops
:
longitude | latitude | direction | survivors |
---|---|---|---|
24 | 54.9 | A | 340000 |
24.5 | 55 | A | 340000 |
… | … | … | … |
This is a dataset named troops
:
longitude | latitude | direction | survivors |
---|---|---|---|
24 | 54.9 | A | 340000 |
24.5 | 55 | A | 340000 |
… | … | … | … |
Source: New York Times
Source: Gapminder
Data | aes() |
geom |
---|---|---|
Wealth (GDP/capita) | x |
geom_point() |
Health (Life expectancy) | y |
geom_point() |
Continent | color |
geom_point() |
Population | size |
geom_point() |
This is a dataset named gapminder_2007
:
country | continent | gdpPercap | lifeExp | pop |
---|---|---|---|---|
Afghanistan | Asia | 974.5803384 | 43.828 | 31889923 |
Albania | Europe | 5937.029526 | 76.423 | 3600523 |
… | … | … | … | … |
This is a dataset named gapminder_2007
:
country | continent | gdpPercap | lifeExp | pop |
---|---|---|---|---|
Afghanistan | Asia | 974.5803384 | 43.828 | 31889923 |
Albania | Europe | 5937.029526 | 76.423 | 3600523 |
… | … | … | … | … |
So far we know about data, aesthetics, and geometries
Think of these
components as layers
Add them to foundational ggplot()
with +
Layer analogy borrowed from Thomas Lin Pedersen and his "Drawing Anything with ggplot2" workshop.
color
(discrete)
color
(continuous)
size
fill
shape
alpha
Example geom | What it makes | |
---|---|---|
![]() |
geom_col() |
Bar charts |
![]() |
geom_text() |
Text |
![]() |
geom_point() |
Points |
![]() |
geom_boxplot() |
Boxplots |
![]() |
geom_sf() |
Maps |
There are dozens of possible geoms and
each class session will cover different ones.
See the ggplot2 documentation for
complete examples of all the different geom layers
There are many of other grammatical layers we can use to describe graphs!
We sequentially add layers onto the foundational ggplot()
plot to create complex figures
Scales change the properties of the variable mapping
Example layer | What it does |
---|---|
scale_x_continuous() |
Make the x-axis continuous |
scale_x_continuous(breaks = 1:5) |
Manually specify axis ticks |
scale_x_log10() |
Log the x-axis |
scale_color_gradient() |
Use a gradient |
scale_fill_viridis_d() |
Fill with discrete viridis colors |
scale_x_log10()
scale_x_log10()
scale_color_viridis_d()
Facets show subplots for different subsets of data
Example layer | What it does |
---|---|
facet_wrap(vars(continent)) |
Plot for each continent |
facet_wrap(vars(continent, year)) |
Plot for each continent/year |
facet_wrap(..., ncol = 1) |
Put all facets in one column |
facet_wrap(..., nrow = 1) |
Put all facets in one row |
facet_wrap(vars(continent))
facet_wrap(vars(continent))
facet_wrap(vars(continent, year))
Change the coordinate system
Example layer | What it does |
---|---|
coord_cartesian() |
Plot for each continent |
coord_cartesian(ylim = c(1, 10)) |
Zoom in where y is 1–10 |
coord_flip() |
Switch x and y |
coord_polar() |
Use circular polar system |
coord_cartesian(ylim = c(70, 80), xlim = c(10000, 30000))
coord_cartesian(ylim = c(70, 80), xlim = c(10000, 30000))
coord_flip()
Add labels to the plot with a single labs()
layer
Example layer | What it does |
---|---|
labs(title = "Neat title") |
Title |
labs(caption = "Something") |
Caption |
labs(y = "Something") |
y-axis |
labs(size = "Population") |
Title of size legend |
ggplot(gapminder_2007, aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) + geom_point() + scale_x_log10() + labs(title = "Health and wealth grow together", subtitle = "Data from 2007", x = "Wealth (GDP per capita)", y = "Health (life expectancy)", color = "Continent", size = "Population", caption = "Source: The Gapminder Project")
Change the appearance of anything in the plot
There are many built-in themes
Example layer | What it does |
---|---|
theme_grey() |
Default grey background |
theme_bw() |
Black and white |
theme_dark() |
Dark |
theme_minimal() |
Minimal |
theme_dark()
theme_dark()
theme_minimal()
Make theme adjustments with theme()
There are a billion options here!
We have a whole class session dedicated to this!
theme_bw() + theme(legend.position = "bottom", plot.title = element_text(face = "bold"), panel.grid = element_blank(), axis.title.y = element_text(face = "italic"))
These were just a few examples of layers!
See the ggplot2 documentation for
complete examples of everything you can do
We can build a plot sequentially
to see how each grammatical layer
changes the appearance
Start with data and aesthetics
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv))
Add a point geom
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point()
Add a smooth geom
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth()
Make it straight
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm")
Use a viridis color scale
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d()
Facet by drive
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1)
Add labels
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars")
Add a theme
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars") + theme_bw()
Modify the theme
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars") + theme_bw() + theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
Finished!
ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) + geom_point() + geom_smooth(method = "lm") + scale_color_viridis_d() + facet_wrap(vars(drv), ncol = 1) + labs(x = "Displacement", y = "Highway MPG", color = "Drive", title = "Heavier cars get lower mileage", subtitle = "Displacement indicates weight(?)", caption = "I know nothing about cars") + theme_bw() + theme(legend.position = "bottom", plot.title = element_text(face = "bold"))
With the grammar of graphics, we don't talk about specific chart types
Hunt through Excel menus for a stacked bar chart and manually reshape your data to work with it
With the grammar of graphics, we do talk about specific chart elements
Map a column to the x-axis, fill by a different variable, and geom_col()
to get stacked bars
Geoms can be interchangable
(e.g. switch geom_violin()
to geom_boxplot()
)
Map wealth to the x-axis, health to the y-axis, add points, color by continent, size by population, scale the y-axis with a log, and facet by year
ggplot(data = filter(gapminder, year %in% c(2002, 2007)), mapping = aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) + geom_point() + scale_x_log10() + facet_wrap(vars(year), ncol = 1)
Map health to the x-axis, add a histogram with bins for every 5 years, fill and facet by continent
ggplot(data = gapminder_2007, mapping = aes(x = lifeExp, fill = continent)) + geom_histogram(binwidth = 5, color = "white") + guides(fill = "none") + # Turn off legend facet_wrap(vars(continent))
Map continent to the x-axis, health to the y-axis, add violin plots and semi-transparent boxplots, fill by continent
ggplot(data = gapminder, mapping = aes(x = continent, y = lifeExp, fill = continent)) + geom_violin() + geom_boxplot(alpha = 0.5) + guides(fill = "none") # Turn off legend
Use gganimate to map variables to a time aesthetic
ggplot(gapminder, aes(x = gdpPercap, y = lifeExp, size = pop, color = country)) + geom_point(alpha = 0.7) + scale_size(range = c(2, 12)) + scale_x_log10(labels = scales::dollar) + guides(size = "none", color = "none") + facet_wrap(~continent) + # Special gganimate stuff labs(title = 'Year: {frame_time}', x = 'GDP per capita', y = 'life expectancy') + transition_time(year) + ease_aes('linear')
Erika Navarro for the Weather Channel: https://www.pewtrusts.org/en/research-and-analysis/articles/2018/12/03/the-weather-channel-uses-animation-to-show-dangers-of-storm-surge
For ggplot()
to work,
your data needs to be in a tidy format
For ggplot()
to work,
your data needs to be in a tidy format
This doesn't mean that it's clean—
it refers to the structure of the data
For ggplot()
to work,
your data needs to be in a tidy format
This doesn't mean that it's clean—
it refers to the structure of the data
All the packages in the tidyverse work best with
tidy data; that why it's called that!
Each variable has its own column
Each variable has its own column
Each observation has its own row
Each variable has its own column
Each observation has its own row
Each value has its own cell
Each variable has its own column
Each observation has its own row
Each value has its own cell
Real world data is often untidy, like this:
Here's the tidy version of that same data:
This is plottable!
Tidy data is also called "long" data
Figure by Garrick Aden-Buie in tidyexplain
Nowadays, gather()
is called pivot_longer()
and spread()
is called pivot_wider()
Figure by Garrick Aden-Buie in tidyexplain
Nowadays, gather()
is called pivot_longer()
and spread()
is called pivot_wider()
Figure by Garrick Aden-Buie in tidyexplain