Learning Objectives
Following this assignment students should be able to:
- use version control to keep track of changes to code
- collaborate with someone else via a remote repository
Reading
Lecture Notes
Exercises
-- Set Up Git --
This exercise and Version Control Basics assignment references the Data Management Review problem. It will not be necessary to complete the Data Management Review exercise for this assignment, though we encourage the review and self-evaluation of your problem solving wizardry.
You’re continuing your analyses of house-elves with Dr. Granger. Unfortunately you weren’t using version control and one day your cat jumped all over your keyboard and managed to replace your analysis code with:
asd;fljkzbvc;iobv;iojre,nmnmbveaq389320pr9c9cd ds8 a d8of8pp
before somehow hitting
Ctrl-s
and overwriting all of your hard word.Determined to not let this happen again you’ve committed to using
git
for version control.Install
Git
for your operating system following the setup instructions. Then create a new repo at the Github organization for the class:- Navigate to Github in a web browser and login.
- Click the
+
at the upper right corner of the page that shows the wordsCreate new...
when you hover over it and chooseNew repository
. - Choose the class organization (e.g.,
dcsemester
) as theOwner
of the repo. - Fill in a
Repository name
that follows the formFirstnameLastname
. - Select
Private
. - Select
Initialize this repository with a README
. - Click
Create Repository
.
Next, set up a project for this assignment in RStudio with the following steps:
- File -> New Project -> New Directory -> Version Control -> Git
- Navigate to your new Git repo -> Click the
Clone or download
button -> Click theCopy to clipboard
button. - Paste the
Repository URL:
. A suggestedProject directory name:
should be automatically generated. - Choose where to
Create project as subdirectory of:
. - Click
Create Project
. - Check to make sure you have a
Git
tab in the upper right window.
-- First Commit --
This is a follow up to Set Up Git.
Create a new file for your analysis named
houseelf-analysis.R
and add a comment at the top describing what the analysis is intended to do.Commit this file to version control with a good commit message. Then check to see if you can see this commit in the history.
-- Importing Data --
This is a follow up to First Commit.
- Download a copy of the
main data file and
save it to the a
data
subdirectory in your project folder. - Commit this file to version control.
- Add some code to
houseelf-analysis.R
that imports the data into R. - Commit these changes to version control
- Download a copy of the
main data file and
save it to the a
-- Commit Multiple Files --
This is a follow up to Importing Data.
After talking with Dr. Granger you realize that
houseelf-earlength-dna-data.csv
is only the first of many files to come. To help keep track of the files you’ll need to number them, so rename the current filehouseelf_earlength_dna_data_1.csv
and change your R code to reflect this name change.Git will initially think you’ve deleted
houseelf-earlength-dna-data.csv
and created a new filehouseelf_earlength_dna_data_1.csv
. But once you click on both the old and new files to stage them, git will recognize what’s been done and indicate that it is renaming the files and indicate this with anR
.In a single commit, add renaming of the data file and the changes to the R file.
-- Pushing Changes --
Now that you’ve set up your Github repository for collaborating with Dr. Granger and made some changes, you’d better get her some work so she can see what you’re doing.
- Write a function to calculate the GC-content of a sequence, regardless of the
capitalization of that sequence. (Hint: using the function
str_to_lower()
orstr_to_upper()
in thestringr
package might be useful). This function should also be able to take a vector of sequences and return a vector of GC-contents (it probably does this without any extra work so give it a try). - Commit this change.
- Once you’ve committed the change click the
Push
button in the upper right corner of the window and then clickOK
whengit
is done pushing. - You should be able to see the changes you made on Github.
- Email your teacher to let them know you’ve finished this exercise. Include in the email a link to your Github repository.
- Write a function to calculate the GC-content of a sequence, regardless of the
capitalization of that sequence. (Hint: using the function
-- Pulling and Pushing --
This is a follow up to Pushing Changes.
STOP: Make sure you sent your teacher an email following the last exercise with a link to your Github repository and wait until your teacher has told you they’ve updated your repository before doing this one.
While you were working on your vectorized GC-content function, Dr. Granger (who has suddenly developed some pretty impressive computational skills) wrote some code to generate a
data.frame
withdplyr
. To get it you’ll need topull
the most recent changes from Github.-
On the
Git
tab click on thePull
button with the blue arrow. You should see some text that looks like:From github.com:ethanwhite/gryffindorforever 1e24ac8..815e600 master -> origin/master Updating 1e24ac8..815e600 Fast-forward testme.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 youareawesome.txt
- Click
OK
. -
You should see the new lines of code in your `houseelf-analysis.R.
library(dplyr) data_size_class <- data %>% rowwise() %>% transmute(id = id, earlengthcat = get_ear_len_cat(earlength, 10))
- Modify the code to add a
gccontent
column to thedata.frame
that includes theid
andearlengthcat
for each individual. Thegccontent
column should hold the results of your GC-content function. - Save this data frame as a
CSV
file usingwrite.csv()
- Commit the new code and the resulting
CSV
file and push the results to Github.
-
-- Project Proposal --
Familiarize yourself with the guidelines for the class project.
Set up a new GitHub repository for your class project in the class organization like you did in Set Up Git. This time name the repo
FirstnameLastnameProject
.Write a 1-2 paragraph project proposal in a
.txt
file. Commit and push it to your class project repo to submit it for instructor feedback.