Learning Objectives
Following this assignment students should be able to:
- use version control to keep track of changes to code
- collaborate with someone else via a remote repository
Reading
Lecture Notes
Exercises
-- Set Up Git --
This exercise and Version Control Basics assignment references the Data Management Review problem. It will not be necessary to complete the Data Management Review exercise for this assignment, though we encourage the review and self-evaluation of your problem solving wizardry.
You’re continuing your analyses of house-elves with Dr. Granger. Unfortunately you weren’t using version control and one day your cat jumped all over your keyboard and managed to replace your analysis code with:
asd;fljkzbvc;iobv;iojre,nmnmbveaq389320pr9c9cd ds8 a d8of8ppbefore somehow hitting
Ctrl-sand overwriting all of your hard word.Determined to not let this happen again you’ve committed to using
gitfor version control.Install
Gitfor your operating system following the setup instructions. Then create a new repo at the Github organization for the class:- Navigate to Github in a web browser and login.
- Click the
+at the upper right corner of the page that shows the wordsCreate new...when you hover over it and chooseNew repository. - Choose the class organization (e.g.,
dcsemester) as theOwnerof the repo. - Fill in a
Repository namethat follows the formFirstnameLastname. - Select
Private. - Select
Initialize this repository with a README. - Click
Create Repository.
Next, set up a project for this assignment in RStudio with the following steps:
- File -> New Project -> New Directory -> Version Control -> Git
- Navigate to your new Git repo -> Click the
Clone or downloadbutton -> Click theCopy to clipboardbutton. - Paste the
Repository URL:. A suggestedProject directory name:should be automatically generated. - Choose where to
Create project as subdirectory of:. - Click
Create Project. - Check to make sure you have a
Gittab in the upper right window.
-- First Commit --
This is a follow up to Set Up Git.
Create a new file for your analysis named
houseelf-analysis.Rand add a comment at the top describing what the analysis is intended to do.Commit this file to version control with a good commit message. Then check to see if you can see this commit in the history.
-- Importing Data --
This is a follow up to First Commit.
- Download a copy of the
main data file and
save it to the a
datasubdirectory in your project folder. - Commit this file to version control.
- Add some code to
houseelf-analysis.Rthat imports the data into R. - Commit these changes to version control
- Download a copy of the
main data file and
save it to the a
-- Commit Multiple Files --
This is a follow up to Importing Data.
After talking with Dr. Granger you realize that
houseelf-earlength-dna-data.csvis only the first of many files to come. To help keep track of the files you’ll need to number them, so rename the current filehouseelf_earlength_dna_data_1.csvand change your R code to reflect this name change.Git will initially think you’ve deleted
houseelf-earlength-dna-data.csvand created a new filehouseelf_earlength_dna_data_1.csv. But once you click on both the old and new files to stage them, git will recognize what’s been done and indicate that it is renaming the files and indicate this with anR.In a single commit, add renaming of the data file and the changes to the R file.
-- Pushing Changes --
Now that you’ve set up your Github repository for collaborating with Dr. Granger and made some changes, you’d better get her some work so she can see what you’re doing.
- Write a function to calculate the GC-content of a sequence, regardless of the
capitalization of that sequence. (Hint: using the function
str_to_lower()orstr_to_upper()in thestringrpackage might be useful). This function should also be able to take a vector of sequences and return a vector of GC-contents (it probably does this without any extra work so give it a try). - Commit this change.
- Once you’ve committed the change click the
Pushbutton in the upper right corner of the window and then clickOKwhengitis done pushing. - You should be able to see the changes you made on Github.
- Email your teacher to let them know you’ve finished this exercise. Include in the email a link to your Github repository.
- Write a function to calculate the GC-content of a sequence, regardless of the
capitalization of that sequence. (Hint: using the function
-- Pulling and Pushing --
This is a follow up to Pushing Changes.
STOP: Make sure you sent your teacher an email following the last exercise with a link to your Github repository and wait until your teacher has told you they’ve updated your repository before doing this one.
While you were working on your vectorized GC-content function, Dr. Granger (who has suddenly developed some pretty impressive computational skills) wrote some code to generate a
data.framewithdplyr. To get it you’ll need topullthe most recent changes from Github.-
On the
Gittab click on thePullbutton with the blue arrow. You should see some text that looks like:From github.com:ethanwhite/gryffindorforever 1e24ac8..815e600 master -> origin/master Updating 1e24ac8..815e600 Fast-forward testme.txt | 1 + 1 file changed, 1 insertion(+) create mode 100644 youareawesome.txt - Click
OK. -
You should see the new lines of code in your `houseelf-analysis.R.
library(dplyr) data_size_class <- data %>% rowwise() %>% transmute(id = id, earlengthcat = get_ear_len_cat(earlength, 10)) - Modify the code to add a
gccontentcolumn to thedata.framethat includes theidandearlengthcatfor each individual. Thegccontentcolumn should hold the results of your GC-content function. - Save this data frame as a
CSVfile usingwrite.csv() - Commit the new code and the resulting
CSVfile and push the results to Github.
-
-- Project Proposal --
Familiarize yourself with the guidelines for the class project.
Set up a new GitHub repository for your class project in the class organization like you did in Set Up Git. This time name the repo
FirstnameLastnameProject.Write a 1-2 paragraph project proposal in a
.txtfile. Commit and push it to your class project repo to submit it for instructor feedback.
