Outline

Why ggplot2?

0) Prep the Data

Load packages we will be using:

library(dplyr)
library(ggplot2)

GOAL: Use dplyr tools to reformat our data so that we can make visualizations for mammals data.

Load data:

download.file("http://kbroman.org/datacarp/portal_clean.csv",
              "portal_clean.csv")
surveys <- read.csv("portal_clean.csv")

Create three datasets:

#just_dm
just_dm <- surveys %>% filter(species_id=="DM")
str(just_dm)
## 'data.frame':    9727 obs. of  13 variables:
##  $ record_id      : int  226 233 245 251 257 259 268 346 350 354 ...
##  $ month          : int  9 9 10 10 10 10 10 11 11 11 ...
##  $ day            : int  13 13 16 16 16 16 16 12 12 12 ...
##  $ year           : int  1977 1977 1977 1977 1977 1977 1977 1977 1977 1977 ...
##  $ plot_id        : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ species_id     : Factor w/ 19 levels "BA","DM","DO",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ sex            : Factor w/ 2 levels "F","M": 2 2 2 2 2 2 1 1 2 2 ...
##  $ hindfoot_length: int  37 25 37 36 37 36 36 37 37 38 ...
##  $ weight         : int  51 44 39 49 47 41 55 36 47 44 ...
##  $ genus          : Factor w/ 9 levels "Baiomys","Chaetodipus",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ species        : Factor w/ 18 levels "albigula","baileyi",..: 12 12 12 12 12 12 12 12 12 12 ...
##  $ taxa           : Factor w/ 1 level "Rodent": 1 1 1 1 1 1 1 1 1 1 ...
##  $ plot_type      : Factor w/ 5 levels "Control","Long-term Krat Exclosure",..: 1 1 1 1 1 1 1 1 1 1 ...
#stat_summary
stat_summary <- surveys %>%
    group_by(species_id) %>%
    summarize(mean_wt=mean(weight),
              mean_hfl=mean(hindfoot_length),
              n=n())
str(stat_summary)
## Classes 'tbl_df', 'tbl' and 'data.frame':    19 obs. of  4 variables:
##  $ species_id: Factor w/ 19 levels "BA","DM","DO",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ mean_wt   : num  8.6 43.1 48.9 120.2 158.8 ...
##  $ mean_hfl  : num  13 36 35.6 50 32.2 ...
##  $ n         : int  45 9727 2790 2023 1045 905 2081 2803 1198 1469 ...
#year_summary
year_summary <- surveys %>%
    group_by(species_id, year, sex) %>%
    summarize(mean_wt=mean(weight),
              mean_hfl=mean(hindfoot_length),
              n=n())
str(year_summary)
## Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame':  613 obs. of  6 variables:
##  $ species_id: Factor w/ 19 levels "BA","DM","DO",..: 1 1 1 1 1 1 1 2 2 2 ...
##  $ year      : int  1989 1990 1990 1991 1991 1992 1992 1977 1977 1978 ...
##  $ sex       : Factor w/ 2 levels "F","M": 2 1 2 1 2 1 2 1 2 1 ...
##  $ mean_wt   : num  7 8.38 7 9.74 7.67 ...
##  $ mean_hfl  : num  13 13.8 14 12.8 13 ...
##  $ n         : int  3 8 3 19 6 4 2 75 106 165 ...
##  - attr(*, "vars")= chr  "species_id" "year"
##  - attr(*, "drop")= logi TRUE
#count_by_year
count_by_year <- surveys %>%
    group_by(year) %>%
    tally

1) ggplot() function

Goal: scatterplot of weight (x) by hindfoot_length (y) using surveys dataset.

ggplot(surveys, aes(x = weight, y = hindfoot_length)) 

Empty plot! We need to tell ggplot() what kind of plot we want. Default is to only plot the axes. To select the plot type, we need to learn about geom’s or geometries.

2) geom’s

ggplot(surveys, aes(x = weight, y = hindfoot_length)) + geom_point()

Can assign this plot to an object:

p1 <- ggplot(surveys, aes(x = weight, y = hindfoot_length)) + geom_point()
#nothing happens
p1

This makes it easy to try different things using + operator.

#log scale for x-axis
p1 + scale_x_log10()

#square root scale for x-axis
p1 + scale_x_sqrt()

CHALLENGE 1: Make a scatterplot of hindfoot_length vs. weight but only for species_id “DM”**

  • Use the dataset we created,just_dm
  • Use our ggplot2() code above but with this new dataset in place of surveys.
#Challenge solution

ggplot(just_dm, aes(x=weight, y= hindfoot_length)) + geom_point()

Other Aesthetics

ggplot(surveys, aes(x = weight, y = hindfoot_length)) +
    geom_point(shape="triangle")

#assign base plot to p2 to avoid extra typing
p2 <- ggplot(surveys, aes(x = weight, y = hindfoot_length))
p2 + geom_point(size=0.5)