Exercises

  1. -- List Review --

    The number of birds banded at a series of sampling sites has been counted by your field crew and entered into the following list. The first item in each sublist is an alphanumeric code for the site and the second value is the number of birds banded. Cut and paste the list into your assignment and then answer the following questions by printing them to the screen.

    data = [['A1', 28], ['A2', 32], ['A3', 1], ['A4', 0],
            ['A5', 10], ['A6', 22], ['A7', 30], ['A8', 19],
    		['B1', 145], ['B2', 27], ['B3', 36], ['B4', 25],
    		['B5', 9], ['B6', 38], ['B7', 21], ['B8', 12],
    		['C1', 122], ['C2', 87], ['C3', 36], ['C4', 3],
    		['D1', 0], ['D2', 5], ['D3', 55], ['D4', 62],
    		['D5', 98], ['D6', 32]]
    
    1. How many sites are there?
    2. How many birds were counted at the 7th site?
    3. How many birds were counted at the last site?
    4. What is the total number of birds counted across all sites?
    5. What is the average number of birds seen on a site?
    6. What is the total number of birds counted on sites with codes beginning with C? (don’t just identify this sites by eye, in the real world there could be hundreds or thousands of sites)
  2. -- Data Management Review --

    Dr. Granger is interested in studying the relationship between the length of house-elves’ ears and aspects of their DNA. This research is part of a larger project attempting to understand why house-elves possess such powerful magic. She has obtained DNA samples and ear measurements from a small group of house-elves to conduct a preliminary analysis (prior to submitting a grant application to the Ministry of Magic) and she would like you to conduct the analysis for her (she might know everything there is to know about magic, but she sure doesn’t know much about computers). She has placed the file on the web for you to download.

    You might be able to do this analysis by hand in Excel, but counting all of those bases would be a lot of work, and besides, Dr. Granger seems to always get funded, which means that you’ll be doing this again soon with a much larger dataset. So, you decide to write a script so that it will be easy to do the analysis again.

    Write a Python script that:

    1. Imports the data into a data structure of your choice
    2. Loops over the rows in the dataset
    3. For each row in the dataset checks to see if the ear length is large (>10 cm) or small (<=10 cm) and determines the GC-content of the DNA sequence (i.e., the percentage of bases that are either G or C)
    4. Stores this information in a table where the first column has the ID for the individual, the second column contains the string ‘large’ or the string ‘small’ depending on the size of the individuals ears, and the third column contains the GC content of the DNA sequence.
    5. Prints the average GC-content for both large-eared elves and small-eared elves to the screen.
    6. Exports the table of individual level GC values to a csv (comma delimited text) file titled grangers_analysis.csv.

    This code should use functions to break the code up into manageable pieces. For example, here’s a function for importing the data from the web:

    def get_data_from_web(url):
        webpage = urllib.urlopen(url)
        datareader = csv.reader(webpage)
        data = []
        for row in datareader:
            data.append(row)
        return data
    

    This function imports the data as a list of lists. Another good option would be to use either a Pandas data frame or a Numpy array. An example function using Pandas looks like:

    def get_data_from_web(url):
        data = pd.read_csv(url)
    	return data
    

    Throughout the assignment feel free to use whatever data structures you prefer. Ask your instructor if you have questions about the best choices.

  3. -- Unit Conversion Challenge --

    Measures of the amount of energy used by biological processes are critical to understanding many aspects of biology from cellular physiology to ecosystem ecology. There are many different units for energy use and their utilization varies across methods, research areas, and lab groups. Write a function, convert_energy_units(energy_value, input_unit, output_unit) to convert units between the following energy values - Joules(J), Kilojoules(KJ), Calories(CAL), and Kilocalories (KCAL; this is unit used for labeling the amount of energy contained in food). A Kilojoule is 1000 Joules, a Calorie is 4.1868 Joules, a Kilocalorie is 4186.8 Joules. An example of a call to this function would look like:

    energy_in_cals = 200
    
    energy_in_joules = convert_energy_units(energy_in_cals, "CAL", "J")
    

    Make this function more efficient by using ‘else if’ (elif) statements. If either the input unit or the output unit do not match the five types given above, have the function print - “Sorry, I don’t know how to convert “ + the name of the unit provided. Use your function to answer the following questions.

    ​a. What is the daily metabolic energy used by a human (~2500 KCALs) in Joules.

    ​b. How many times more energy does a common seal use than a human? The common seal uses ~52,500 KJ/day (Nagy et al. 1999). Use the daily human metabolic cost given above.

    ​c. How many ergs (ERG) are there in one kilocalorie. [Since we didn’t include the erg conversion this should trigger our ‘don’t know how to convert’ message]

    Instead of writing an individual conversion between each of the different currencies (which would require 12 if statements) you could choose to convert all of the input units to a common scale and then convert from that common scale to the output units. This approach is especially useful since we might need to add new units later and this will be much easier using this approach.

  4. -- Tree Biomass --

    Understanding the total amount of biomass (the total mass of all individuals) in forests is important for understanding the global carbon budget and how the earth will respond to increases in carbon dioxide emissions. Measuring the mass of entire trees is difficult, and it’s pretty much impossible to weigh an entire forest (even if we were willing to clear cut a forest for science), but fortunately we can estimate the mass of a tree based on its diameter.

    There are lots of equations for estimating the mass of a tree from its diameter, but one good option is the equation M = 0.124*D^(2.53), where M is measured in kg of dry (above-ground) biomass and D is in cm d.b.h. (Brown 1997). We’re going to estimate the total tree biomass for trees in a 96 hectare area of the Western Ghats in India.

    1. Write a function that takes an array/Series of tree diameters as an argument and returns an array/Series of tree masses.
    2. The raw data is available on Ecological Archives, but unfortunately due to poor database structure using all of the trees would be a hassle. You could try to solve this problem yourself, but it turns out that someone else has already solved it for you. Install the EcoData Retriever and use it to download and cleanup this data automatically (using the command line interface the command would be retriever install csv Ramesh2010 and the data will be stored in Ramesh2010-macroplots.csv) and import it into Python.
    3. If you look at the file or the metadata carefully you’ll notice that the data is actually in girth (i.e., circumference, which is equal to pi * diameter) rather than diameter. Write a function to take an array/Series of circumferences as an argument and returns an array/Series of diameters. Use the math module to get an accurate value of pi.
    4. Use the two functions you’ve written to estimate the total biomass (i.e., the sum of the masses) of trees in this dataset and print the result to the screen.