R is a language and environment for statistical computing and graphics. In more practical terms:
You’re already used to programming since spreadsheets are code:
=SUM(OFFSET($data.$A$1,MATCH($T197,$data.$C$1:$C$1048576,0)-1,MATCH($AL$7&$AL$6,$data.$A$1:$AMJ$1,0)-29,1,2))/1000+SUM(OFFSET($data.$A$1,MATCH($T197,$data.$C$1:$C$1048576,0)-1,RIGHT(V$53,2)*8-7,1,2))/1000-SUM(OFFSET($data.$A$1,MATCH($T197,$data.$C$1:$C$1048576,0)-1,MATCH($AL$7&$AL$6,$data.$A$1:$AMJ$1,0)-37,1,2))/1000
This is a formula from an actual spreadsheet in use by a company. This code is a nightmare to maintain and it’s very difficult to check if there are any errors in it. Spreadsheets create a soup of data storage, processing and presentation (i.e. numbers, formulas and graphs). This hinders transparency and as spreadsheets get more complex, it becomes very difficult to trace how the data analysis is actually done. With R, it’s possible to use code to read from data files and then automatically generate reports in the form of Word documents, pdfs, or html pages. Separating your data, processing and presentation of results into separate components will help you to stay organized.
You need to have both R and RStudio set up. You can use R without RStudio, but if you do, the interface will look like the one on the left image below, and this can be a bit intimidating. You’ll see this interface also referred to as the “R console”. RStudio runs on top of the R console and provides a more visual interface, in addition to greatly assisting you with various aspects of programming.
R console |
R console + RStudio | |
---|---|---|
|
vs. |
|
Mathematics & Statistics
-> R
.
|
|
On your own computer you will have to install both R and RStudio
Once you open up RStudio, you’ll see that it has four panels, which are:
Console - Bottom Left
Environment/History - Top Right
R Script Files - Top Left
Files/Plots/Packages/Help - Bottom Right
|
|
Everything shown in the large gray boxes below is code that you can run by copy/pasting into the Console
tab in RStudio. For example:
print("hello world")
## [1] "hello world"
Gray bits of text like this
usually refer to individual R commands, package names, or the exact name of things that you will see in the user interface.
For sections labelled Exercise we only show the results of the code and not the code itself. You should check to make sure that you get the same answers.
All of the commands that you use in the console are recorded in the History
. You can also access previous commands by pressing the up key on your keyboard. When doing the exercises, it may take a few attempts for you to figure out the correct commands to use, and by using the up key you can easy recall your previous commands and modify them.
You can also collect these statements in a new R file (File
-> New File
-> R Script
) which you can then run.
If you do create a new R file, you can run the code line by line by first placing your mouse cursor at the end of a line of code and then clicking on the Run
button or using Ctrl
+ Enter
. You can do the same if you also select several lines of code.
Additionally, it is possible to run all the lines of code at once via Ctrl
+ Alt
+ R
. This will be useful once you work on your projects for this course.
You can use R just as you would use a calculator, for example, to add two numbers:
3 + 2
## [1] 5
We can assign the results of the calculation 3 + 2
to a variable named a
by using <-
a <- 3 + 2
If you look in the top right quadrant of RStudio, you’ll see that in the Environment
tab, it shows that the variable a
now has a value of 5
. In simple terms, the environment shows all of the variables which have some value assigned to them. This means that you can reuse each of these variables later on for other calculations.
|
|
Note that in R you can also just use =
to assign the result. You often will see people use both forms of these.
a = 3 + 2
Now print out the value of a
:
a
## [1] 5
You can also do the same with the print
function:
print(a)
## [1] 5
Although you assigned a number to the variable a
, you can also assign to it different types of data like text:
a <- "this is some text"
a
## [1] "this is some text"
In the examples above, we just a variable simply named a
. R actually allows you to create much more descriptive names for variables. A variable name must start with a letter, but after that you can use a mix of letters, numbers, as well as periods and underscores.
In short the requirements are:
a-z
or A-Z
a-z
, A-Z
, 0-9
, _
, .
Valid variable names:
abc
ABC
a123
a_123
a_12.3
theta_a_b
Invalid variable names:
2abc
_a
a-123
In general, it’s a good strategy to use descriptive “self-documenting” variable names like temperature_Groningen
instead of a
so that it’s easier for others (and you later on) to understand your code.
Assign the result of five times three to a variable named b
and print out the resulting value
print(b)
## [1] 15
Assign the result of b
divided by 10 to a new variable c
and print out the results using print(c)
print(c)
## [1] 1.5
We can also assign a variable a list of values instead of just a single value. Vectors can be useful for representing information like hourly temperature readings over the course of a year.
Below we use the c()
function which is used to concatenate or combine values together.
a <- c(3, 7, 1, 6)
a
## [1] 3 7 1 6
Note that when we print out the results, the commas are replaced by spaces.
We can also use c()
to add values to the beginning or end. Below we add 10
to the beginning and 21
to the end.
a <- c(10, a)
a
## [1] 10 3 7 1 6
a <- c(a, 21)
a
## [1] 10 3 7 1 6 21
We can also concatenate two vectors together:
b <- c(9, 2, 5)
c <- c(a, b)
c
## [1] 10 3 7 1 6 21 9 2 5
Note that here we have a variable named c
and we also use the function c()
. R is able to understand that when we use c
with parenthesis like c(a,b)
we’re referring to the function c()
, while when we mention it without parenthesis like with c <- 3
, we’re referring to the variable c
and not to the function.
With vectors you can do element-wise operations. Below we divide each element of c
by 10
.
c / 10
## [1] 1.0 0.3 0.7 0.1 0.6 2.1 0.9 0.2 0.5
We can divide one vector by another:
a <- c(10, 6)
b <- c(2, 5)
a / b
## [1] 5.0 1.2
This result is equivalent to c(10/2, 6/5)
, in other words, dividing the first element of a
by the first element of b
and then dividing the second element of a
by the second element of b
If we divide vectors that are not of the same length, then we get a bit of a strange result:
a <- c(10, 6, 4)
b <- c(2, 5)
a / b
## Warning in a/b: longer object length is not a multiple of shorter object
## length
## [1] 5.0 1.2 2.0
Even though R gives us a result, we get a warning that the vectors are not of the same length. What is happening here is that R will wrap around the shorter vector. Behind the scenes, it’s doing a calculation like c(10/2, 6/5, 4/2)
. Note that the 2
appears twice as it is the first element of b
.
Above we divided the longer vector by the shorter vector. Below we divide the shorter vector by the longer vector:
b / a
## Warning in b/a: longer object length is not a multiple of shorter object
## length
## [1] 0.2000000 0.8333333 0.5000000
Here the calculation being performed is c(2/10, 5/6, 2/4)
. Note that in both cases the resulting vector has three elements, meaning that the result always has the same number of elements as the longest vector.
Sometimes we don’t want to use the all the values in a vector, but only want certain values. For the examples below we’ll work with a vector of ten random values:
x <- c(-1, 0, 0, -9, 1, 4, 8, -2, 3, 5)
Get the third element of the vector:
x[3]
## [1] 0
Note that in R, indices start one, while in other languages they may start at zero. In other words, in Python, x[3]
would give you the fourth element in the vector.
If you try to access elements in the vector at invalid locations, R will not generate an error, but will return numeric(0)
or NA
. In a later practical we’ll discuss what these mean, but for now you should keep in mind that if you see something like this, it may mean that you’re trying to access an element that doesn’t exist.
x[0] # there is no element at location 0
## numeric(0)
x[11] # only ten elements in the vector
## [1] NA
If you do run into issues like this, you can always use the length()
function to see how many items are contained within a vector.
length(x)
## [1] 10
Return everything except for the seventh element:
x[-7]
## [1] -1 0 0 -9 1 4 -2 3 5
Get the fifth, sixth and seventh element:
x[5:7]
## [1] 1 4 8
Return all elements except for the fifth, sixth and seventh element:
x[-(5:7)]
## [1] -1 0 0 -9 -2 3 5
Return elements at locations 5 and 7:
x[c(5,7)]
## [1] 1 8
Note that while x[c(5:7)]
will give you the same results as x[5:7]
i.e. you don’t need to include the c()
, you will get an error if instead of x[c(5,7)]
, you try x[5,7]
x[5,7]
## Error in x[5, 7]: incorrect number of dimensions
The reason is that this is the syntax that is used to access elements of a matrix, specifically the element at row 5 and column 7. We will discuss this later on in the practical.
Find all values equal to zero:
x[x == 0]
## [1] 0 0
Note that zero is listed twice as we have two zeros in the vector
Find all values less than four:
x[x < 4]
## [1] -1 0 0 -9 1 -2 3
Find all values in x
which are in the set of numbers one through four:
x[x %in% 1:4]
## [1] 1 4 3
The values are returned in the order in which they are found, which is why the 4
appears before the 3
With vectors you can also perform operations like finding the sum, cumulative sum, mean, median and standard deviation:
sum(x)
## [1] 9
cumsum(x) # cumulative sum
## [1] -1 -1 -1 -10 -9 -5 3 1 4 9
mean(x) # average of all values
## [1] 0.9
median(x)
## [1] 0.5
sd(x) # standard deviation
## [1] 4.629615
For this exercise, you will work with the vector y:
y = c(8, 3, 3, 6, 8, -4, -3, -2, -2, 10, 8, -4, 4, 0, -8, -6, 1, -8, -4, 4)
Find the average:
## [1] 0.7
Find the average and standard deviation for all values less than zero:
## [1] -4.555556
## [1] 2.297341
Create a vector containing the elements 4 2 7 1
and divide it by y
.
## [1] 0.5000000 0.6666667 2.3333333 0.1666667 0.5000000 -0.5000000
## [7] -2.3333333 -0.5000000 -2.0000000 0.2000000 0.8750000 -0.2500000
## [13] 1.0000000 Inf -0.8750000 -0.1666667 4.0000000 -0.2500000
## [19] -1.7500000 0.2500000
Note that when printing the contents of a vector, R will wrap the results in order to fit the screen. If your results look different, then first check that the individual numbers are the same. For example, when you see [1]
, [7]
, [13]
, and [19]
, that means that the values next to them are those at position 1, 7, 13 and 17 in the vector.
In other words, if I print out the vector 1:8
, then depending on the size of the screen, R might print it out like either of the two variants below:
[1] 1 2 3 4
[5] 5 6 7 8
[1] 1 2 3
[4] 4 5 6
[7] 7 8
In R you will often be creating vectors that contain sequences of numbers. For many of the sequences you will need to create, R has several techniques that allow you to generate these without having to specify every element individually.
Based on what we showed previously, if you would like to create a vector for the sequence of integers from 3
to 7
, you can do:
c(3, 4, 5, 6, 7)
## [1] 3 4 5 6 7
However, in this case, there’s no point in specifing the intermediate numbers since we’re just taking consecutive integers. What we want R to do is to start at a number and keep counting by one until we get to another number.
Using the :
operator, we can shorten our code so that looks like start_number:end_number
3:7
## [1] 3 4 5 6 7
You will also sometimes see people use c()
to do this. This also gives the same result.
c(3:7)
## [1] 3 4 5 6 7
If the second number is smaller, then you will get a sequence that counts down.
7:3
## [1] 7 6 5 4 3
It’s possible to use this technique with real numbers and not just integers.
3.1459:10
## [1] 3.1459 4.1459 5.1459 6.1459 7.1459 8.1459 9.1459
Note that the final value is 9.1459
and not 10
. This is because the next number in the sequence would be 10.1459
and this is greater than 10
. In general, the final value in a sequence will be a value less than or equal to the end value you specified.
Yet another way to do this is via the seq()
function
seq(3, 7, by = 1)
## [1] 3 4 5 6 7
What’s special about this is the by
argument which allows us to specify by which value we would like to increment the sequence
seq(3, 7, by = 0.5)
## [1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
Again, the final value will be less than or equal to the end value we request:
seq(3, 7, by = 3.14159)
## [1] 3.00000 6.14159
The seq()
function can also calculate the increment needed in order to have a sequence of a specified length with certain start and end values. In this case, we want a sequence with ten elements that starts with 3
and ends with 10
.
seq(3, 7, length.out = 10)
## [1] 3.000000 3.444444 3.888889 4.333333 4.777778 5.222222 5.666667
## [8] 6.111111 6.555556 7.000000
Instead of creating sequences that increment by a fixed value, you may also want to create sequences of repeating elements. R allows you to do this with the rep()
function.
Just like with the seq()
function, there are different arguments we can use to get different behaviour. If we use the times
argument, we can repeat a sequence multiple times like we would with c(1:3, 1:3, 1:3, 1:3)
rep(1:3, times=4)
## [1] 1 2 3 1 2 3 1 2 3 1 2 3
If we want to repeat each element in a vector multiple times in a row, then we can use the each
argument.
rep(1:3, each = 4)
## [1] 1 1 1 1 2 2 2 2 3 3 3 3
You can even call rep()
multiple times to generate very complex sequences
rep(rep(1:3, times=2), each=2)
## [1] 1 1 2 2 3 3 1 1 2 2 3 3
Create a sequence from -10 to 10, where each value is incremented by 0.5
## [1] -10.0 -9.5 -9.0 -8.5 -8.0 -7.5 -7.0 -6.5 -6.0 -5.5 -5.0
## [12] -4.5 -4.0 -3.5 -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5
## [23] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
## [34] 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
Create a sequence from -10 to 10, with only five values
## [1] -10 -5 0 5 10
A matrix can be thought of as a tabular data contained in a set of rows and columns. As a simple example, here we initialize a matrix of zeros with three columns and two rows.
b <- matrix(0, ncol=3, nrow=2)
b
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
Then set the value at row 1 and column 2 to 3
. The syntax we use here is of the form b[row_number, column_number]
b[1,2] <- 3
b
## [,1] [,2] [,3]
## [1,] 0 3 0
## [2,] 0 0 0
We can also convert a vector of values into a matrix. In this example, the vector consists of six elements, while the matrix consists of three columns and two rows. R will just take the values in the vector and arrange them from top to bottom in the columns, moving from left to right.
matrix(1:6, ncol=3, nrow=2)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
In the previous example, the vector had the same number of elements as the matrix. We can also create a matrix where the vector is repeated. Note how the values are repeated.
matrix(1:3, ncol=3, nrow=2)
## [,1] [,2] [,3]
## [1,] 1 3 2
## [2,] 2 1 3
Note that you will get a warning of the length of the vector is not a sub-multiple or multiple of the number of rows in the matrix. R will still recycle the elements though.
matrix(1:5, ncol=3, nrow=2)
## Warning in matrix(1:5, ncol = 3, nrow = 2): data length [5] is not a sub-
## multiple or multiple of the number of rows [2]
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 1
We can also control how R populates the matrix with vector values, by using the byrow=TRUE
argument. By default it will place vector values in a matrix by going down the columns from left to right.
matrix(1:6, ncol=3, nrow=2, byrow=TRUE)
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
You can append a matrix to the end of another matrix using rbind
which means that you will bind the rows together.
b <- matrix(0, ncol=3, nrow=2)
c <- matrix(1, ncol=3, nrow=2)
d <- rbind(b,c)
d
## [,1] [,2] [,3]
## [1,] 0 0 0
## [2,] 0 0 0
## [3,] 1 1 1
## [4,] 1 1 1
You can also bind the columns together using cbind
cbind(b,c)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 0 0 0 1 1 1
## [2,] 0 0 0 1 1 1
R also has functions that run operations such as mean
and sum
on matrix rows and columns:
rowMeans(d)
## [1] 0 0 1 1
rowSums(d)
## [1] 0 0 3 3
colMeans(d)
## [1] 0.5 0.5 0.5
colSums(d)
## [1] 2 2 2
We can also use the same functions as we used for vectors. These will return a single value as they analyze all elements in the matrix.
mean(d)
## [1] 0.5
sd(d)
## [1] 0.522233
sum(d)
## [1] 6
median(d)
## [1] 0.5
Here we create a new matrix:
a <- matrix(1:12, nrow=3, ncol=4, byrow = TRUE)
a
## [,1] [,2] [,3] [,4]
## [1,] 1 2 3 4
## [2,] 5 6 7 8
## [3,] 9 10 11 12
We can subset elements of a matrix by using square brackets, similar to how we accessed elements of a vector. The main difference is that for matrices we need to specify first number corresponds to the row number, the second corresponds to the column number.
Select values from row 2:
a[2,]
## [1] 5 6 7 8
Select values from column 2:
a[,2]
## [1] 2 6 10
Select element at row 1, column 2:
a[1,2]
## [1] 2
It’s still possible to access matrix elements by a single index, although this is a bit dangerous as it’s not always intuitive where the value is coming from.
a[10]
## [1] 4
Just like with vectors, we can still access elements that meet some condition
a[a > 5]
## [1] 9 6 10 7 11 8 12
By using the summary()
function we can get a quick overview of statistical properties of the columns of the matrix. This helps to give you an idea about the distribution of values by showing the min, max, median, etc values per column.
summary(a)
## V1 V2 V3 V4
## Min. :1 Min. : 2 Min. : 3 Min. : 4
## 1st Qu.:3 1st Qu.: 4 1st Qu.: 5 1st Qu.: 6
## Median :5 Median : 6 Median : 7 Median : 8
## Mean :5 Mean : 6 Mean : 7 Mean : 8
## 3rd Qu.:7 3rd Qu.: 8 3rd Qu.: 9 3rd Qu.:10
## Max. :9 Max. :10 Max. :11 Max. :12
Other functions show you the number of rows, columns and dimensions of the matrix
nrow(a) # number of rows in a
## [1] 3
ncol(a) # number of columns in a
## [1] 4
dim(a) # dimension of a (number of rows and number of columns)
## [1] 3 4
Find the average of all values in the second column of a
## [1] 6
Find the standard deviation of all values in a
which are less than 6.5
## [1] 1.870829
Find the averages of the first and third rows of a
. You should just need a single line of code for this.
## [1] 2.5 10.5
One issue with matrices is that it can only hold one type of data. This is not always ideal as in your own research, you will often use data that is a mix of both text and numbers. As example of this would be data containing the names of weather stations and then numerical information about temperature, humidity, etc.
The example below shows what happens when you try to combine matrices with numbers and text:
a <- matrix(1:12, nrow=4, ncol=3)
a
## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
b <- matrix(c("this", "is", "some", "text"), nrow=4, ncol=1)
b
## [,1]
## [1,] "this"
## [2,] "is"
## [3,] "some"
## [4,] "text"
cbind(a,b)
## [,1] [,2] [,3] [,4]
## [1,] "1" "5" "9" "this"
## [2,] "2" "6" "10" "is"
## [3,] "3" "7" "11" "some"
## [4,] "4" "8" "12" "text"
What’s happened is that by default R converted all the elements to text since it can’t combine numbers and text together. You can tell this since the values are surrounded by quotations marks like "
, although there are some cases where R will not always display these.
A way around this is to use data frames. A key difference from matrices is that data frames only require that you have a single data type per column. In the example below, the first two columns are numbers, while the last column is text. Note that when we specify x =
, y =
and z =
below, we actually creating a type of table where the columns will be labelled x
, y
and z
.
a <- data.frame(x = c(1:3),
y = c(4:6),
z = c("a", "b", "c"))
a
## x y z
## 1 1 4 a
## 2 2 5 b
## 3 3 6 c
Data frames are similar to matrices in that you can access elements based on their row and column indices. For example, to get the element in the 2nd row and third column:
a[2,3]
## [1] "b"
We can also get just the 2nd row:
a[2,]
## x y z
## 2 2 5 b
Or just the 3rd column:
a[,3]
## [1] "a" "b" "c"
One of the nice things about data frames is that you can use the names of the columns (combined with the $
sign) to directly access the values in that column. So if we want to see the values of only the z
column, we can use a$z
a$z
## [1] "a" "b" "c"
Multiple columns can be selected. Note that we have to include a comma to indicate that we want all rows
a[,c("x", "y")]
## x y
## 1 1 4
## 2 2 5
## 3 3 6
Same, but rows two and three for columns x
and y
:
a[2:3,c("x", "y")]
## x y
## 2 2 5
## 3 3 6
You can also add a new column to an existing data frame. Here we add a new column t
by using the syntax a$t
a$t <- c(10, 13, 17)
a
## x y z t
## 1 1 4 a 10
## 2 2 5 b 13
## 3 3 6 c 17
We can remove an existing column by assinging it a value of NULL
a$x <- NULL
a
## y z t
## 1 4 a 10
## 2 5 b 13
## 3 6 c 17
We’ll not look at the mtcars
data set that is included with R. If you type ?mtcars
in the console, you’ll see more documentation. Looking at the first few lines of the mtcars
data frame, we see the following which shows data in several columns: mpg
, cyl
, disp
, hp
, drat
, wt
, qsec
, vs
, am
, gear
and carb
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The head
command just shows us the top few rows, and you can also use the tail
command to look at the bottom rows.
tail(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.5 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.5 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.6 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.6 1 1 4 2
For specific columns, we can use the mean
and sd
functions to find the average and standard deviation.
mean(mtcars$mpg)
## [1] 20.09062
sd(mtcars$mpg)
## [1] 6.026948
Just like with matrices, we can also run the summary()
function to get an overview of the range and distribution of values in each column:
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
The same functions used to understand the size of matrices can be used for data frames too
nrow(mtcars) # number of rows
## [1] 32
ncol(mtcars) # number of columns
## [1] 11
dim(mtcars) # dimension (number of rows and number of columns)
## [1] 32 11
Create a data frame with three columns apples
, pears
and oranges
with the data values shown below:
## apples pears oranges
## 1 Groningen 10 1
## 2 Amsterdam 9 2
## 3 Rotterdam 8 3
## 4 Utrecht 7 4
Find the averages of the pears
and oranges
columns
## pears oranges
## 8.5 2.5
Comments
You may also see comments included in the code, as indicated by text following the
#
signThese comments can also be placed at the end of a line of R code
For your own work, it is generally a good idea to include comments to help other people understand your code, and they can also be useful for yourself if you haven’t looked at a piece of code in a long time.