Exercises: Basic
The use of the c and sum functions
This exercise uses epidemiological data. Vicente et al. (2006) analysed data from observations of
wild boar and red deer reared on a number of estates in Spain. The dataset contains information
on tuberculosis (Tb) in both species, and on the parasite Elaphostrongylus cervi, which only infects
red deer.
In Zuur et al. (2009), Tb was modelled as a function of the continuous explanatory variable, length
of the animal, denoted by LengthCT (CT is an abbreviation of cabeza-tronco, which is Spanish for
head-body). Tb and Ecervi are shown as a vector of zeros and ones representing absence or
presence of Tb and E. cervi larvae. Below, the first seven rows of the spreadsheet containing the
deer data are given.
Farm |
Month |
Year |
Sex |
LengthClass |
LengthCT |
Ecervi |
Tb |
MO |
11 |
00 |
1 |
1 |
75 |
0 |
0 |
MO |
07 |
00 |
2 |
1 |
85 |
0 |
0 |
MO |
07 |
01 |
2 |
1 |
91.6 |
0 |
1 |
MO |
NA |
NA |
2 |
1 |
95 |
NA |
NA |
LN |
09 |
03 |
1 |
1 |
NA |
0 |
0 |
SE |
09 |
03 |
2 |
1 |
105.5 |
0 |
0 |
QM |
11 |
02 |
2 |
1 |
106 |
0 |
0 |
- Using the
c()
function, create a variable that contains the length values of the seven animals.
- create a variable that contains the
Tb
values. Include the NAs.
- What is the average length of the seven animals?
The use of the cbind function using epidemiological data.
Continue with the deer data.
- Create variables Farm and Month that contain the relevant information.
- Note that Farm is a string of characters.
- Use the
cbind()
command to combine month, length, and Tb data, and store the results in the variable, Boar.
- Make sure that you can extract rows, columns, and elements of Boar.
- Use the
dim()
, nrow()
, and ncol()
functions to determine the number of animals and variables in Boar.
The use of the vector function using epidemiological data
Continue with the deer data.
- Instead of the
c()
function that you used in Exercise 2 to combine the Tb data, can you do the same with the vector()
function?
- Give the vector a different name, for example, Tb2.
Using the read.table() function and accessing variables from a data frame with epidemiological data
The file BirdFlu.xls
contains the annual number of confirmed cases of human Avian Influenza A/
(H5N1) for several countries reported to the World Health Organization (WHO). The data were taken
from the WHO website (www.who.int/en/) and reproduced for educational purposes.
- Prepare the spreadsheet and import these data into R.
- If you are a non-Windows user, start with the file BirdFlu.txt
.
- Note that you will need to adjust the column names and some of the country names.
- Use the names and str command in R to view the data.
- What is the total number of bird flu cases in 2003 and in 2005?
- Which country has had the most cases?
- Which country has had the least bird flu deaths?
- What is the total number of bird flu cases per country?
- What is the total number of cases per year?
Using the read.table() function and accessing subsets of a data frame with deep sea research data.
Import the data from the ISIT.xls
file.
- In R, extract the data from station 1.
- How many observations were made at this station?
- What are the minimum, median, mean, and maximum sampled depth at station 1?
- What are the minimum, median, mean, and maximum sampled depth at station 2?
- At station 3?
- Identify any stations with considerably fewer observations.
- Create a new data frame omitting these stations.
- Extract the data from 2002.
- Extract the data from April (of all years).
- Extract the data that were measured at depths greater than 2000 meters (from all years and months).
- Show the data according to increasing depth values.
- Show the data that were measured at depths greater than 2000 meters in April.
Using the write.table() function with deep sea research data
In the final step of the previous exercise, data measured at depths greater than 2000 meters in April
were extracted.
- Export these data to a new ascii file.
Using the factor() function and accessing subsets of a data frame with deep sea research data
Stations 1 through 5 were sampled in April 2001, stations 6 through 11 in August 2001, stations 12
through 15 in March 2002, and stations 16 through 19 in October 2002.
- Create two new variables in R to identify the month and the year.
- Note that these are factors.
- Do this by adding the new variables inside the data frame.
data: ISIT.xls ISIT.txt deer.xls
data: Birdflu.txt Birdflu.xls Birdflu_Corrected.txt