Exercises: Basic

The use of the c and sum functions

This exercise uses epidemiological data. Vicente et al. (2006) analysed data from observations of wild boar and red deer reared on a number of estates in Spain. The dataset contains information on tuberculosis (Tb) in both species, and on the parasite Elaphostrongylus cervi, which only infects red deer.

In Zuur et al. (2009), Tb was modelled as a function of the continuous explanatory variable, length of the animal, denoted by LengthCT (CT is an abbreviation of cabeza-tronco, which is Spanish for head-body). Tb and Ecervi are shown as a vector of zeros and ones representing absence or presence of Tb and E. cervi larvae. Below, the first seven rows of the spreadsheet containing the deer data are given.

Farm Month Year Sex LengthClass LengthCT Ecervi Tb
MO 11 00 1 1 75 0 0
MO 07 00 2 1 85 0 0
MO 07 01 2 1 91.6 0 1
MO NA NA 2 1 95 NA NA
LN 09 03 1 1 NA 0 0
SE 09 03 2 1 105.5 0 0
QM 11 02 2 1 106 0 0
  1. Using the c() function, create a variable that contains the length values of the seven animals.
  2. create a variable that contains the Tb values. Include the NAs.
  3. What is the average length of the seven animals?

The use of the cbind function using epidemiological data.

Continue with the deer data.

  1. Create variables Farm and Month that contain the relevant information.
    - Note that Farm is a string of characters.
  2. Use the cbind() command to combine month, length, and Tb data, and store the results in the variable, Boar.
    - Make sure that you can extract rows, columns, and elements of Boar.
  3. Use the dim(), nrow(), and ncol() functions to determine the number of animals and variables in Boar.

The use of the vector function using epidemiological data

Continue with the deer data.

  1. Instead of the c() function that you used in Exercise 2 to combine the Tb data, can you do the same with the vector() function?
  2. Give the vector a different name, for example, Tb2.

Using the read.table() function and accessing variables from a data frame with epidemiological data

The file BirdFlu.xls contains the annual number of confirmed cases of human Avian Influenza A/ (H5N1) for several countries reported to the World Health Organization (WHO). The data were taken from the WHO website (www.who.int/en/) and reproduced for educational purposes.

  1. Prepare the spreadsheet and import these data into R.
    - If you are a non-Windows user, start with the file BirdFlu.txt.
    - Note that you will need to adjust the column names and some of the country names.
  2. Use the names and str command in R to view the data.
  3. What is the total number of bird flu cases in 2003 and in 2005?
  4. Which country has had the most cases?
  5. Which country has had the least bird flu deaths?
  6. What is the total number of bird flu cases per country?
  7. What is the total number of cases per year?

Using the read.table() function and accessing subsets of a data frame with deep sea research data.

Import the data from the ISIT.xls file.

  1. In R, extract the data from station 1.
  2. How many observations were made at this station?
  3. What are the minimum, median, mean, and maximum sampled depth at station 1?
  4. What are the minimum, median, mean, and maximum sampled depth at station 2?
  5. At station 3?
  6. Identify any stations with considerably fewer observations.
  7. Create a new data frame omitting these stations.
  8. Extract the data from 2002.
  9. Extract the data from April (of all years).
  10. Extract the data that were measured at depths greater than 2000 meters (from all years and months).
  11. Show the data according to increasing depth values.
  12. Show the data that were measured at depths greater than 2000 meters in April.

Using the write.table() function with deep sea research data

In the final step of the previous exercise, data measured at depths greater than 2000 meters in April were extracted.

  1. Export these data to a new ascii file.

Using the factor() function and accessing subsets of a data frame with deep sea research data

Stations 1 through 5 were sampled in April 2001, stations 6 through 11 in August 2001, stations 12 through 15 in March 2002, and stations 16 through 19 in October 2002.

  1. Create two new variables in R to identify the month and the year.
    - Note that these are factors.
    - Do this by adding the new variables inside the data frame.

data: ISIT.xls ISIT.txt deer.xls

data: Birdflu.txt Birdflu.xls Birdflu_Corrected.txt