Writing functions

In R the use of functions allows the user to easily extend, modify and manipulate objects and analyses. A function is part of a computer programme that performs a specific action, but is not itself a programme. R packages generally contain functions that carry out specific actions for a given type of analysis.

We have already used a number of functions, e.g. mean(), sd(), plot() etc.

Why might you want to write your own function?

Generally, if you find yourself writing the same code a few times, it will be worthwhile to try and create a function.

Function template

Functions in R take zero or more inputs (arguments), performs some actions on these inputs, and returns an output.

The basic template for an R function is

function_name <- function(function_argument1, function_argument2){
  function_body
  function_return_value
}

For example

Test.function1 <- function(a, b) {
    
    # function body
    ab.prod <- a * b
    ab.sum <- a + b
    prod.sum = ab.prod - ab.sum
    
    # return statement, tells the function what value you want to output
    return(prod.sum)
}

Let's go through each part of a function.

function_name

The function name can be anything you like, but something sensible that informs what the function does is sensible. As with object names, it is possible to overwrite exisiting objects and functions, so take care.

Once defined, you call the function as normal: function_name():

Test.function1(a = 2, b = 3)
## [1] 1
Test.function1(a = 7, b = 2)
## [1] 5

As normal, you can save the output of the function as an object:

a4.b5 <- Test.function1(a = 4, b = 5)
a4.b5
## [1] 11

If we call the function without parentheses, we can see the function definition itself:

Test.function1
## function(a, b) {
##     
##     # function body
##     ab.prod <- a * b
##     ab.sum <- a + b
##     prod.sum = ab.prod - ab.sum
##     
##     # return statement, tells the function what value you want to output
##     return(prod.sum)
## }

function

To tell R that you are writing a function, you need to inform R that the new object has a class function

function arguments

The arguments to a function tell R what to run the function on, what kind of actions to perform, or anything else.

In Test.function1(), the function arguments provide the data.

Other options include:

No arguments

Sometimes a function is used as a convenience and it always does the same thing, so input is not important. An example might be the ubiquitous world" example from just about any computer science book.

hello.world <- function() {
    print("hello world")
}

hello.world()
## [1] "hello world"

For functions with only one line, you can leave out the curly braces { }.

hello.world <- function() print("hello world")

hello.world()
## [1] "hello world"

An argument

We could personalize this function, using an argument for the name. Here we call another function within our function: paste().

hello.someone <- function(name) {
    print(paste("hello ", name))
}

hello.someone("fred")
## [1] "hello  fred"
defaults for arguments

What happens if you try hello.someone() without an argument?

hello.someone()
## Error: argument "name" is missing, with no default

R returns an error - we should have a sensible default. We can define these within the parentheses when we define the function, using argument = default.

hello.someone <- function(name = "world") {
    print(paste("hello ", name))
}
hello.someone()
## [1] "hello  world"

argument order

We define a function for simulating n random numbers from a normal distribution with a mean of 10 and standard deviation of 5, and then calculating the sum of those numbers.

sim.t <- function(n, mu = 10, sigma = 5) {
    X <- rnorm(n, mu, sigma)
    return(sum(X))
}

There are numerous ways that we can call this function:

sim.t(4)                        # using defaults

sim.t(4, 3, 10)                 # n = 4, mu = 3, sigma = 10

sim.t(4, 5)                     # n = 4, mu = 5, sigma the default 5

sim.t(4, sigma = 100)           # n = 4, mu the default 10, sigma = 100

sim.t(4, sigma = 100, mu = 1)   # named arguments don't need to be in order

Using named arguments, such as sim.t(4, sigma = 100, mu = 1) allows you to switch the order and avoid specifying all the values. For arguments with lots of variables this is very convenient.

the ', ...' variable

Within a function, , ... takes these values and passes them to an internal function, especially useful with graphics.

plot.f <- function(f, a, b, ...) {
    xvals <- seq(a, b, length = 100)
    plot(xvals, f(xvals), type = "l", ...)
}

This code will plot the sine curve from 0 to 2*pi:

plot.f(f = sin, a = 0, b = 2 * pi)
Fig. Plot of sine curve.

Fig. Plot of sine curve.

Because we included , ..., we can easily modify the plot without changing the function.

plot.f(f = sin, a = 0, b = 2 * pi, lty = 4)
Fig. Plot of sine curve with different line type.

Fig. Plot of sine curve with different line type.

we could not do this if , ... was not an argument in the function.

plot.f <- function(f, a, b) {
    xvals <- seq(a, b, length = 100)
    plot(xvals, f(xvals), type = "l", ...)
}

plot.f(f = sin, a = 0, b = 2 * pi, lty = 4)
## Error: unused argument (lty = 4)

lazy evaluation

default arguments and lazy evaluation in R


Exercises