Learning Objectives

Following this assignment students should be able to:

  • use version control to keep track of changes to code
  • collaborate with someone else via a remote repository


Lecture Notes

  1. Version Control
  2. Project Structure


  1. -- Set Up Git --

    This exercise and Version Control Basics assignment references the Data Management Review problem. It will not be necessary to complete the Data Management Review exercise for this assignment, though we encourage the review and self-evaluation of your problem solving wizardry.

    You’re continuing your analyses of house-elves with Dr. Granger. Unfortunately you weren’t using version control and one day your cat jumped all over your keyboard and managed to replace your analysis code with:


    before somehow hitting Ctrl-s and overwriting all of your hard word.

    Determined to not let this happen again you’ve committed to using git for version control.

    Install Git for your operating system following the setup instructions. Then create a new repo at the Github organization for the class:

    1. Navigate to Github in a web browser and login.
    2. Click the + at the upper right corner of the page that shows the words Create new... when you hover over it and choose New repository.
    3. Choose the class organization (e.g., dcsemester) as the Owner of the repo.
    4. Fill in a Repository name that follows the form FirstnameLastname.
    5. Select Private.
    6. Select Initialize this repository with a README.
    7. Click Create Repository.

    Next, set up a project for this assignment in RStudio with the following steps:

    1. File -> New Project -> New Directory -> Version Control -> Git
    2. Navigate to your new Git repo -> Click the Clone or download button -> Click the Copy to clipboard button.
    3. Paste the Repository URL:. A suggested Project directory name: should be automatically generated.
    4. Choose where to Create project as subdirectory of:.
    5. Click Create Project.
    6. Check to make sure you have a Git tab in the upper right window.
  2. -- First Commit --

    This is a follow up to Set Up Git.

    Create a new file for your analysis named houseelf-analysis.R and add a comment at the top describing what the analysis is intended to do.

    Commit this file to version control with a good commit message. Then check to see if you can see this commit in the history.

  3. -- Importing Data --

    This is a follow up to First Commit.

    1. Download a copy of the main data file and save it to the a data subdirectory in your project folder.
    2. Commit this file to version control.
    3. Add some code to houseelf-analysis.R that imports the data into R.
    4. Commit these changes to version control
  4. -- Commit Multiple Files --

    This is a follow up to Importing Data.

    After talking with Dr. Granger you realize that houseelf-earlength-dna-data.csv is only the first of many files to come. To help keep track of the files you’ll need to number them, so rename the current file houseelf_earlength_dna_data_1.csv and change your R code to reflect this name change.

    Git will initially think you’ve deleted houseelf-earlength-dna-data.csv and created a new file houseelf_earlength_dna_data_1.csv. But once you click on both the old and new files to stage them, git will recognize what’s been done and indicate that it is renaming the files and indicate this with an R.

    In a single commit, add renaming of the data file and the changes to the R file.

  5. -- Pushing Changes --

    Now that you’ve set up your Github repository for collaborating with Dr. Granger and made some changes, you’d better get her some work so she can see what you’re doing.

    1. Write a function to calculate the GC-content of a sequence, regardless of the capitalization of that sequence. (Hint: using the function str_to_lower() or str_to_upper() in the stringr package might be useful). This function should also be able to take a vector of sequences and return a vector of GC-contents (it probably does this without any extra work so give it a try).
    2. Commit this change.
    3. Once you’ve committed the change click the Push button in the upper right corner of the window and then click OK when git is done pushing.
    4. You should be able to see the changes you made on Github.
    5. Email your teacher to let them know you’ve finished this exercise. Include in the email a link to your Github repository.
  6. -- Pulling and Pushing --

    This is a follow up to Pushing Changes.

    STOP: Make sure you sent your teacher an email following the last exercise with a link to your Github repository and wait until your teacher has told you they’ve updated your repository before doing this one.

    While you were working on your vectorized GC-content function, Dr. Granger (who has suddenly developed some pretty impressive computational skills) wrote some code to generate a data.frame with dplyr. To get it you’ll need to pull the most recent changes from Github.

    1. On the Git tab click on the Pull button with the blue arrow. You should see some text that looks like:

      From github.com:ethanwhite/gryffindorforever
         1e24ac8..815e600  master     -> origin/master
      Updating 1e24ac8..815e600
       testme.txt | 1 +
       1 file changed, 1 insertion(+)
      create mode 100644 youareawesome.txt
    2. Click OK.
    3. You should see the new lines of code in your `houseelf-analysis.R.

      data_size_class <-
        data %>% 
        rowwise() %>% 
        transmute(id = id, earlengthcat = get_ear_len_cat(earlength, 10))
    4. Modify the code to add a gccontent column to the data.frame that includes the id and earlengthcat for each individual. The gccontent column should hold the results of your GC-content function.
    5. Save this data frame as a CSV file using write.csv()
    6. Commit the new code and the resulting CSV file and push the results to Github.
  7. -- Project Proposal --

    Familiarize yourself with the guidelines for the class project.

    Set up a new GitHub repository for your class project in the class organization like you did in Set Up Git. This time name the repo FirstnameLastnameProject.

    Write a 1-2 paragraph project proposal in a .txt file. Commit and push it to your class project repo to submit it for instructor feedback.