Remember to

  • download surveys.csv.
  • download species.csv.
  • download portal_mammals.sqlite.

  • Consider removing the dplyr package so you can demonstrate installing it.
    • Linux users: you may not want to do this because the source install is slow

INSTALL WARNING: Some Windows users have issues installing dplyr receiving errors that the associate files cannot be copied. This is an issue with McAfee anti-virus. Streaming protection should be temporarily disabled to allow installation.


Basic dplyr

  • Start a new project (modeling good practice)
surveys <- read.csv("surveys.csv")

Do Exercise 2 - Shrub Volume Data Basics.


surveys_by_species <- group_by(surveys, species_id)

Do Exercise 3 - Shrub Volume Aggregation.


species <- read.csv("species.csv")
combined <- inner_join(surveys, species, by = "species_id")

Do Exercise 4 - Shrub Volume Join.


surveys_DS <- filter(surveys, species_id == "DS")
surveys_DS_by_yr <- group_by(surveys_DS, year)
avg_weight_DS_by_yr <- summarize(surveys_DS_by_yr, 
                                 avg_weight = mean(weight, na.rm=TRUE))
surveys %>%
  filter(species_id == "DS") %>%
  group_by(year) %>%
  summarize(avg_weight = mean(weight, na.rm=TRUE))

Do Fix the Code.

Using dplyr with databases

portaldb <- src_sqlite("portal_mammals.sqlite")
surveys <- tbl(portaldb, "surveys")
query <- "SELECT year, month, day, genus, species
          FROM surveys JOIN species
          ON surveys.species_id = species.species_id"
tbl(portaldb, sql(query))
# Loading from SQLite completes instantly
bbs_sqlite <- src_sqlite("bbs.sqlite")
bbs_counts <- tbl(bbs_sqlite, "bbs_counts")

# Loading from csv takes 30 seconds
bbs_counts_csv <- read.csv("BBS_counts.csv")

Do Links to Databases.