M & W 14:30 – 15.50, Kroon G01, Lecture and lab
F 10:00 – 13:00, Kroon 319, R Bootcamp (~office hours), attendance is optional but recommended
Lecturer: Simon Queenborough
TF: Andrew Muehleisen
This course will teach you how to use the computer programming language R.
Many natural and social scientists use R to explore, analyze, and present their data.
This course is designed to help you learn and understand a new language, as well as provide guidance on best practice.
Much like learning any new language, it will often be frustrating as you grapple with new words, meanings, and the nuts and bolts of specific grammar and syntax. However, the end result is beautiful. A whole new world will be open before you, and you will be equiped with a powerful tool and principles to guide you through it.
The course assumes no prior knowledge of R, programming, or how we will interact with your computer via the command line interface.
By the end of the course you will be able to:
import and export data
produce publication-quality graphics
analyze data and write up results correctly
be confident in continuing to learn R
articulate the principles of best practices in data management, data analysis, graphics, workflow, and statistical approaches: And use them!
Programming, writing code, command line interface, and collection, storage, analysis and display of data are all transferable skills that will be useful whatever software you end up using or work you end up doing.
Please note, this is not a statistics course. Links to background material will be provided, but we will not teach you statistics.
The literature on learning suggests that three elements are helpful to learning a new skill quickly: repetition, assessment, and rapid feedback.
R is the perfect environment to learn the R language. We will engage in repeating tasks and new parts of the language every week. These commands are assessed immediately by R when they are entered in the program, but R does not in itself provide helpful assessment or feedback. We will use a program called SWIRL to provide immediate assessment and helpful feedback in R as you work through the lessons.
The labs will repeat much of the material of the lesson, but with new data and without the feedback from SWIRL. Submitted labs will be assessed regularly throughout the week, and multiple submissions are encouraged.
Finally, analysing your own, or another’s, data will motivate you to try out the code and analyses outside of class, to explore these data and reinforce the class material. Sufficiently motivated students could collaborate on a peer-reviewed publication.
The lecturer and TF will be available for any and all questions during class time.
Friday R Bootcamps are designed to provide extra time with the instructors as students work through lessons, labs, or assignments. We encourage all students to attend and complete labs and assignments and get graded in the bootcamps.
There are four main (jokingly-titled) approaches to this course:
You attend at least one class during shopping period and decide to take the c(o)urse next year.
You want to learn R but may have no immediate need. You may have no data of your own and no interest in working toward a collaborative publication. You will learn the basics of a new language, complete all the lessons, labs, best-practice assignments, but not put much effort into the data project. This approach will probably result in a Pass (work of acceptable character).
You have data! You want to learn R to analyse it! You will demonstrate proficiency in the material, demonstrate understanding of best practice, and contribute to the data project in a meaningful way, either towards your own data or a group project. This will probably result in a High Pass (work of outstanding character, above average).
By the end of the semester, you have developed proficiency in the material covered in the course, as well as self-learning skills and are happy to teach yourself further material. Your data project report will highlight these skills and best practice. This approach will probably result in a Honors (work of exceptional character, professional-level).
Successful students will be able to:
This course will provide an overview and introduction to the statistical software R. Class time will primarily be used for working through examples and problems as a class or individually. The best way to improve and feel comfortable in R is to use it frequently and regularly.
We will move through topics in the sequence below. How long we spend on each section depends how comfortable everyone is. Reading and problems will be assigned in class, on the course website, and via email each week. Infrequent guest lectures on specific topics will also occur.
After October recess we will choose some advanced topics that will be useful to students as well as work on one data project. Data projects can be done singly or in groups. You may bring your own data, or we will have some data sets available that will help partners of FES in their work. Contribution to a peer-reviewed publication is a possibility.
The course will be assessed via:
There are no examinations.
Lessons are short (20-30 minute) scripts that each student works through independently, during class and/or afterwards in their own time or during the R Bootcamp sessions.
These lessons will walk through R commands and ideas, providing direct real-time feedback as the student writes code in RStudio.
Students must complete all lessons.
Students will have five days to complete each lesson (i.e., lessons must be submitted by Friday of each week).
Each lesson will have an associated lab that reinforces the material and develops understanding and coding skills.
Students must complete all labs.
Labs will be graded frequently and may be retaken as often as the student wants.
The lab final grade will be recorded one week after the lab is released (i.e., the Friday after the associated lesson is due).
Each week, students will be assigned a task that reflects best practice.
These tasks will include revising a figure, cleaning some code, cleaning a data set, writing up a statistical test result.
The goal is to use this assignment either to advance the analysis of your own data in a single-author project, or data from a collaborator as a group project.
Several data sets will be made available, including from the Wildlife Conservation Society and African People & Wildlife.
Students will document their analysis of an existing data set.
The data set could be
the student’s own, or
provided by another researcher (e.g., their faculty advisor or collaborator), or
a generated/simulated dataset with identical columns, format, etc to data that you will collect in future, or
a data set archived in an online repository (e.g., Data Dryad, Figshare, Ecology data archive, the Social Science Information System), or associated with a published paper. R also has a datasets package that could be useful.
Identical questions should not have been addressed with the specific data set before, but please talk to us if you have any questions.
The projects will proceed in two stages.
Before October recess (Oct 17 2017, 23:59), please submit a proposal (max. 1 page A4).
Include the following sections:
Introduction A short introductory paragraph describing the scientific rationale for the question.
Question The specific scientific question/s or hypothesis/es you will address (e.g., what is the effect of providing cookies on student attendance at class?).
Data set A brief description of the data (e.g., unit of observation, number of observations, summaries of covariates and response variables), and either a link to the dataset or the dataset itself (if not online and data-sharing is agreed with the data owner).
We will check that the questions and data are feasible and appropriate.
Group data projects will be presented during class either by the Lecturer or talk by a collaborator.
Students will submit three documents for each project (due in December):
The raw data set or link to it (please see Simon or Andrew if there are issues of data sharing).
A file of documented R code, walking through the whole analysis, from data entry to completed publication quality figures and/or tables. The length of this document will depend on the analysis. The file should be plain-text, either .txt or .R.
A summary document written in the style of an academic paper, containing the question, and methods and results sections pertinent to that question. This summary should be no more than 1 page of A4. The document should be submitted as a PDF.
Grades will be based on:
comprehensive exploration of the data (plotting, testing of model assumptions, etc) (5%),
sufficient commenting of code to ensure understanding by other readers (5%),
correct statistical analysis of the data (5%),
the summary document sufficient to replicate the analysis and that correctly reports the describes the results (5%),
publication-quality graphics demonstrating best practice (5%).
Sat/Sun: Reading before class
Mondays: Background lecture on topic and lessons
Wednesdays: Best Practice lecture and labs
Fridays: R Bootcamp
Students are expected to comply with the Yale Graduate School’s Programs and Policies, especially that on personal conduct.
In particular, students should note the following:
The Graduate School specifically prohibits the following forms of behavior by graduate students:
1. Cheating on examinations, problem sets, and any other form of test; also, falsification and/or fabrication of data. 2. Plagiarism, that is, the failure in a dissertation, essay, or other written exercise to acknowledge ideas, research, or language taken from others. 3. Multiple submission of the same work without obtaining explicit written permission from both instructors before the material is submitted.
With regards to this class, all work submitted for assessment should be the individual student’s own work.
The Yale Community is diverse—in race, background, age, religion, and in many other ways. This is certainly the case at F&ES, where more than 30% of students are international, and our domestic students come from a wide range of backgrounds. The personal actions of each community member must maintain and foster an inclusive and supportive environment that is respectful of our diverse community. Principles of free speech remain paramount at Yale, but it is vital that the learning environment that is welcoming to students of all backgrounds, and is free of conscious or unintended bias or harassment. Respect for the rights and dignities of all members of our community, regardless of their differences, is paramount.
Please familiarize yourself with Yale’s Equal Opportunity Statement and Statement on Sexual Harassment
Eradicating sexual misconduct is of the very highest priority at Yale. Therefore, please familiarize yourself with Yale’s definitions, policies, procedures, and resources for preventing and responding to sexual misconduct:
The Yale Sexual Misconduct Policies and Related Definitions outline behaviors that need to be reported. If you are unsure whether an incident does (or could be perceived to) fall within the University definition of sexual misconduct, you should consult with the F&ES Title IX coordinator to make a determination.
The University’s Sexual Misconduct Response website summarizes options for reporting and responding to sexual misconduct, as well as links to more detailed information.
The Preventing and Responding to Sexual Misconduct booklet includes the definitions and resources above, and offers additional guidance on effective prevention, intervention, and response.
The Rights and Options handout.