To get back to the main page click here


1 Introduction

In this session you’ll learn some of the basics in R and RStudio. You will learn how to use R as a calculator and store mathematical calculations in objects, what a vector is and how you can create one. We will also cover the different types of vectors (variables), and how you can extract information from a vector. You will even learn how to manually calculate the standard deviation of a vector in R! We will also teach you how you can create a dataframe in R, as well as how to obtain information on elements inside a given dataframe. Lastly, we’ll introduce you to missing values in R, which are labelled as NA.

Curriculum

The curriculum of this session is chapter 2 in R for Data Science (R4DS).

2 Using R as a calculator

In its most basic form, R can be used as a powerful calculator and the common arithmetic operators can be used:

Operation Operator Example.code Example.result
Addition + 95+5 100
Subtraction - 73-41 32
Division / 81/9 9
Multiplication * 9*6 54
Exponentiation ^ 8^3 512
Square root sqrt() sqrt(81) 9

Exercises

Try it out yourself.

1. Write the R code required to add two plus two:

2+2

2. Write the R code required to subtract 43 from 72:

72-43

3. Write the R code required to divide 25 by 5:

25/5

4. Write the R code required to multiply 30 by 10:

30*10

5. Write the R code required to exponentiate 5 to the power of 2:

5^2

2.1 More complex expressions

These are basic arithmetic expressions. However, if we want to calculate more complex expressions we need to be careful. Normally R will execute expressions by evaluating each item from left to right, but here’s the catch: some arithmetic operators have precedence in the order of which R evaluates them.

For example:

10+14*2
## [1] 38

gives a result of 38. Why? Because R did not evaluate the expression from left to right. Instead, R actually evaluated 14*2 and then added the result (28) to 10.

This is caused by the operator precedence rules in R (and more generally in math), which follows the PEMDAS order: Parentheses (), Exponentiation (^), Multiplication (*), Division, Addition (+) and finally Subtraction (-).

If the intention of the expression was to add 10+14 and then multiply the result by 2, we should include parentheses in the expression which imposes the order of evaluation done by R:

(10+14)*2
## [1] 48

By including parentheses, we force R to first evaluate the addition and then to multiply the result by 2.

Exercises

1. I expected to get a result of 14 when executing the following expression: add the result of 2 multiplied by 5 to the result of 6 multiplied by 3 and divide that result on 2. Instead I got 19. What is wrong with my code? Modify the code below to get the expected result of 14:

(2*5) + (6*3) / 2
[1] 19
'Have you included enough parentheses?
((2*5) + (6*3)) / 2

2.2 The assignment operator

Further, you can also store calculations (see the section on calculating standard deviation of a vector) or a number in objects by using the assignment operator <-. Remember from the first tutorial session that when you use the assignment operator in combination with a name for your object, the object will be stored in the environment pane in RStudio.

In the following example, we use the assignment operator to create variables which contains numbers. After creating these objects (or variables if you will), we can then use them in mathematical operations. In the example below, for the single product of ice-cream, we have created two variables price and cost and given them values by using the assignment operator. Then we computed the profit for each ice-cream and assigned the result to the variable profit. Finally, we printed the result of the profit per ice cream.

price <- 27
cost <- 19
profit <- price - cost
profit
## [1] 8

Let’s say the store sold 224 ice-creams today. Given this information, we can further compute our daily profit:

units_sold <- 224
daily_profit <- units_sold * profit
daily_profit
## [1] 1792

Exercises

1. Use the console below to write in the correct code to solve the following tasks…

a. …Assign a variable weekly_hours to the value of 22

weekly_hours <- 22
"Replace the dotted lines with the correct number to solve the task"

weekly_hours <- ...

b. Assign a variable salary_per_hour to the value of 170

salary_per_hour <- 170
"Replace the dotted lines with the correct number to solve the task"

salary_per_hour <- ...

c. Assign a variable weekly_salary and compute it based on the variables weekly_hours and salary_per_hour

___ <- ____ * weekly_hours
weekly_salary <- salary_per_hour * weekly_hours
"Replace the dotted lines with the correct variable names to solve the task"

weekly_salary <- ... * ...

d. There’s five working days per week. Assign a variable daily_salary and compute it.

daily_salary <- weekly_salary / 5
"Replace the dotted lines with the correct specifications to solve the task"

daily_salary <- ... / ...

2.3 Calculating the square root in R

What if we wanted to calculate the square root of a number or a variable? Using the sqrt() function allows us to do just that:

sqrt(2400)
## [1] 48.98979
sqrt(daily_profit)
## [1] 42.33202

Exercise

1. In the following R code the code computes the square root of 3567. Change it so it computes the square root of 6783.

sqrt(3567)
[1] 59.72437
sqrt(6783)

2.4 Logical operators

In addition to the common arithmetic operators, logical operators is also used in the R programming language. Logical operators are used to assess whether or not a statement is true. For example, we could ask R the question whether 12 equals 12. In R the answer to these types of questions would be given in a TRUE or FALSE response, also called a boolean value. Of course we know that 12 equals 12, and in this case R would give the answer TRUE. Below is a table of the logical operators used in R, examples of how to use them and the answer R would provide when evaluating the statements.

Operation Operator Example Answer
Greater than > 12 > 20 FALSE
Greather than or equal to >= 12 >= 12 TRUE
Lesser than < 30 < 20 FALSE
Lesser than or equal to <= 30 <= 40 TRUE
Equal to == 5 == 4 FALSE
Not equal to != 5 != 4 TRUE
Not ! !(5 == 5) FALSE
And & (1==1) & (10==9) FALSE
Or | (1==1) | (10==9) TRUE

Exercises

1. Type in the value R would return when evaluating the following: (5+6 == 9) | (5+6 == 10).

FALSE
"Is it TRUE or FALSE?"

2. In exercise 3, you created a variable named daily_salary. Is the daily salary this week greater than last weeks daily salary of 600 and less than the expected daily salary for next week of 800? Write the correct code that would give the answer to this question.

(daily_salary > 600) & (daily_salary < 800)
" Finish the code below to solve the exercise"

(daily_salary > 600) ... (...)
" Finish the code below to solve the exercise"

(daily_salary > 600) & (...)

3. Type in the value R would return when evaluating this statement: 7 <= 14?

"Is 7 lesser than or equal to 14?"
"Is 7 lesser than or equal to 14? I dont think its FALSE"

3 Vectors

A vector is a basic data structure in R. The R manual defines a vector as a single entity consisting of a collection of things. Basically, a vector is a variable consisting of multiple values of the same data type. A single vector can not contain values of different data types. A “numeric” vector, for example, can only hold “numeric” data. More on the different types of vectors will follow later in this session, but for now we will focus on numeric vectors.

Creating a vector

There is several ways to create a vector, and the most general one is to use the c() function combined with the assignment operator <-:

x1 <- c(35, 42, 56, 38, 40)
x1
## [1] 35 42 56 38 40

The : operator is especially useful if we want to create a vector consisting of consecutive numbers:

x2 <- 5:10
x2
## [1]  5  6  7  8  9 10

A third way to create a vector is to use the seq() function. This function is useful if we want to create more complex sequences of numbers, e.g to specify the step size between the values:

x3 <- seq(1, 5, by = 0.5)          # specify step size between values
x3
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0

As mentioned, the most common way of creating a vector is using the c() function. This is probably the one you will get most familiar with yourselves. For further purposes, let’s use a conceptual example. Suppose we have a company who wants to register the amount of hours their employees work per week. The company has nine employees who work 37, 18, 26, 42, 15, 12, 28, 30 and 33 hours per week. The information of hours worked for each employee can be stored in a vector:

hours_per_week <- c(37,18,26,42,15,12,28,30,33)
hours_per_week
## [1] 37 18 26 42 15 12 28 30 33

As we can see, the information on all the employees work hours is now stored in the vector hours_per_week.

Obtaining information on vectors: indexing

Assume that the employer wanted to check the amount of hours for the first employee. To obtain this information, you have to reference the first element in the vector by using square brackets and indexing the number of the element:

hours_per_week[1]
## [1] 37

Likewise, if you wanted to obtain information on the second value in the vector, you would type:

hours_per_week[2]
## [1] 18

It is also possible to use logical indexing to obtain information on a variable. Let’s assume the employer wanted to know how many of the employees worked more than 22 hours per week. For this purpose we can use the logical operators learned previously:

hours_per_week[hours_per_week > 22]
## [1] 37 26 42 28 30 33

Further, if you want to obtain information about the length (number of rows) of a vector, you can use the length() function:

length(hours_per_week)
## [1] 9

That makes sense - we already know that the employer has nine employees and the vector hours_per_week consists of one row for each employee which amounts to nine rows.

Exercises

1. Write the correct R code to create a vector called y1 consisting of the numbers 5, 8, 10 and 12 using the c() function:

y1 <- c(5,8,10,12)

2. Write the correct R code to create a vector called y2 which consists of consecutive numbers from 1 to 10 using the : operator:

y2 <- 1:10

3. Write the correct R code to create a vector called y3 using the seq() function with numbers from -50 to +50 and specify the step size between the values to 5:

y3 <- seq(-50, 50, by=5)
y <- seq(-50, 50, ...)

3.1 Calculating the standard deviation of a vector (variable)

You can even manually calculate the standard deviation of a vector in R! Let’s calculate the standard deviation of the vector age. As you can see, we’ve created the vector age by using the assignment operator <- combined with the c() function. age is a vector because it consists of multiple values of the same data type. In this particular case, age is a numeric vector and contains five observations:

age <- c(39,52,28,33,46)
age
## [1] 39 52 28 33 46

The process of calculating the standard deviation consists of the following six steps (as described in example 2.7 in BPS (p. 57-58)):

  • 1. Find the mean: Sum all the scores and divide the results by the number of observations
  • 2. Find each score’s deviation from the mean: Subtract the mean from each score
  • 3. Square each deviation from the mean
  • 4. Find the sum of the squared deviations
  • 5. Find the variance: divide the sum of the squared deviations by one less than the number of observations (n - 1)
  • 6. Find the square root of the variance by using the sqrt() function.

In the example below, we’ll calculate the standard deviation of the vector age by following these steps, and storing the results in different objects. Remember that calculations can be stored in objects by using the assignment operator, and after they’re stored in your environment in RStudio - they can then be used for further mathematical operations. We start of by calculating the mean of age:

(39+52+28+33+46) / 5 
## [1] 39.6

Okey, so R tells us that the mean of age is 39.6. Now we can proceed with the remaining steps to calculate the standard deviation. In line 1 of the code snippet below, we execute steps 2, 3 and 4 all in one line of code. We subtract each score’s deviation from the mean, we square each deviation from the mean and we calculate the sum of the squared deviations. The result is then stored in a object named sum_sq_dev.

In line 2, we then proceed to find the variance (step 5) by dividing the sum of the squared deviations (sum_sq_dev) by n-1 (5-1) and store it in a object named variance.

In line 3, we find the standard deviation of age (step 6) by calculating the square root of the variance using sqrt(variance) and storing the result in the object std_dev.

Finally, in line 4, we round up the decimals of the standard deviation to two decimalpoints by using round(std_dev, 2).

sum_sq_dev <- ((39-39.6)^2 + (52-39.6)^2 + (28-39.6)^2 + (33-39.6)^2 + (46-39.6)^2)

variance <- sum_sq_dev / (5-1)

std_dev <- sqrt(variance)

round(std_dev, 2)
## [1] 9.66

As you can see from the output, R tells us that the standard deviation for age is 9.66. This means that each score deviates from the mean by 9.66 on average.

We can also manually calculate the standard deviation of age by using just a simple line of code!

sqrt(sum((age-mean(age))^2/(length(age)-1)))
## [1] 9.659193

As we can see, this line of code gives us the exact same result as the former procedure. The sqrt() function is in use and surrounds the whole line of code. However, the code also uses some unfamiliar functions like sum() and mean(). We will learn more about these functions later, but in this particular case these functions gives the sum of the squared deviations from the mean of age. Then it divides the sum of the squared deviations by using the function length() which we learned about previously, and subtracts 1 from the length of age. Remember that length() tells us about the number of rows (or observations if you will) of a vector. Since we know the length of age is 5 (5 observations), n-1 for age is 4, and alternatively you could also write:

sqrt(sum((age-mean(age))^2)/4)
## [1] 9.659193

Exercises

1. a) Assign a vector named test_scores five different values: 50, 32, 44, 46 and 19.

test_scores <- c(50,32,44,46,19)
"I think the c() function may be useful"

test_scores <- c(..., ...)

b) Calculate the mean of test_scores.

(50+32+44+46+19) / 5
"Calculating mean: with parentheses you get further ahead"
"Replace the dotted lines with the correct numbers to solve the exercise:"

(... + ... + ... + ... + ...) / ...

2. Compute the standard deviation of test_scores by following the four line example code described above. Start each step at a new line. (PS: name the objects the same names as in the example above, and remember to round up the decimals to two decimal points at the end!).

sum_sq_dev <- ((50-38.2)^2 + (32-38.2)^2 + (44-38.2)^2 + (46-38.2)^2 + (19-38.2)^2)
variance <- sum_sq_dev / (5-1)
std_dev <- sqrt(variance)
round(std_dev, 2)
"The first step is calculating the sum of the squared deviations. Replace the dotted line with the correct mathematical operations to finish this part"

sum_sq_dev <- (...)    
"The next step is calculating the variance. Replace the dotted line with the correct code for calculating the variance to solve this step."

sum_sq_dev <- ((50-38.2)^2 + (32-38.2)^2 + (44-38.2)^2 + (46-38.2)^2 + (19-38.2)^2)

variance <- ...
"The third step is calculating the standard deviation. Replace the dotted line with the correct code for calculating the standard deviation to solve this step."

sum_sq_dev <- ((50-38.2)^2 + (32-38.2)^2 + (44-38.2)^2 + (46-38.2)^2 + (19-38.2)^2)
variance <- sum_sq_dev / (5-1)

std_dev <- ...
"The final step is to round up the decimals of the standard deviation to two decimal points. Replace the dotted line with the correct code for rounding the decimals to solve this step, and complete the exercise."

sum_sq_dev <- ((50-38.2)^2 + (32-38.2)^2 + (44-38.2)^2 + (46-38.2)^2 + (19-38.2)^2)
variance <- sum_sq_dev / (5-1)
std_dev <- sqrt(variance)

round(..., ...)

3.2 Different types of vectors

Basically a vector is a variable containing multiple values of the same data type. Going back to our conceptual example, the vector hours_per_week is a “numeric” vector, and consists of only numbers: the amount of hours worked per week. However, there are also other data types, and the most common ones are numeric, character and factor. These are the ones we will focus on.

Character vectors

In our conceptual example, the employer wants to have a register of hours worked per week so the paychecks are correctly paid out. However for this register to work properly, it would be useful if the employees working hours were associated with ids for the employees. We can create a vector called id for this purpose:

id <- c("employee1", "employee2", "employee3", "employee4", "employee5", "employee6", "employee7", "employee8", "employee9")
id
## [1] "employee1" "employee2" "employee3" "employee4" "employee5" "employee6"
## [7] "employee7" "employee8" "employee9"

As you can see, this vector contains both numbers and text. What kind of data type does the vector id consist of? We can ask R by using the class() function:

class(id)
## [1] "character"

R tells us that the vector id is of class character. To be able to create a vector holding the “character” data type, the values have to be in double quotes as you can see from the example code where we created the variable id.

Factor vectors

Besides numeric and character data types, we also have factors. Factors are used in R to represent categorical variables, i.e. variables which have a known and fixed set of values. We will learn more about the use of factors later. For now, we’ll just introduce the data type and learn how to create a factor vector.

The employer in our conceptual example also wanted to add information about which department the employees work in. There are nine employees and three departments, so we start off by making a vector indicating which of the departments the employees work in. Then we turn department into a factor by using the factor() function, and finally we print the factor by typing its name.

department <- c(1,2,2,3,1,1,3,2,3)
department <- factor(department)
department
## [1] 1 2 2 3 1 1 3 2 3
## Levels: 1 2 3

As you can see, the factor department consists of nine values and three levels corresponding to the three different departments. Let’s say the company is planning to hire some new employees and establishing a fourth department. Even though there are currently no employees working in department four, we can specify the levels to include the fourth department:

department <- factor(department, levels = c(1,2,3,4))
department
## [1] 1 2 2 3 1 1 3 2 3
## Levels: 1 2 3 4

From the output we can see that the factor now has four levels. Further, we can also label the levels so that they are more informative:

department <- factor(department, levels = c(1,2,3,4), labels = c("dep_1", "dep_2", "dep_3", "dep_4"))
department
## [1] dep_1 dep_2 dep_2 dep_3 dep_1 dep_1 dep_3 dep_2 dep_3
## Levels: dep_1 dep_2 dep_3 dep_4

(It is important to mention that it is only numeric data types which can be used for mathematical operations. Character vectors can not be used in arithmetic operations. However, factors can be used in arithmetic operations because the levels are numeric)

Help documentation

To see the help-pages for the function factor(), just run the following code and the help-page will open in your browser.

help(factor)

Exercises

1.

a) Create a vector called sick_days with 12 values, one for each month in 2020. The values for each month, starting from january, is the following: 6.2, 7.5, 9.0, 5.6, 4.1, 6.3, 2.3, 4.2, 3.8, 7.9, 8.1, 5.2.

sick_days <- c(6.2, 7.5, 9.0, 5.6, 4.1, 6.3, 2.3, 4.2, 3.8, 7.9, 8.1, 5.2)
sick_days <- c(..., ...)
"You can copy and paste all the numbers from the text."

b) Create a vector called month with 12 values (1-12), one for the name of each month in a year. (Hint: remember the : operator!)

month <- 1:12
"For a more effective code, remember the : operator"
1:12
"But you can also use seq() function or write all the numbers using c()."

c) Convert month to a factor variable and label the levels the different names of each month (“January”, “February” etc..). Assign the result to a new object month_fac.

TIP: R has the month names built-in as month.name, so you do not have to type them all. So, if you write month.name and click “Run Code”, you can copy-paste it into your code.

month_fac ___ factor(month, ___ = seq(___, ___, by = ___), labels = c("___", "___", "___", "April", "May", "June", "July", "August", "September", "October", "November", "December"))
month_fac <- factor(month, 
                    levels = seq(1, 12, by = 1),
                    labels = c("January", "February", "March", "April", "May", 
                               "June", "July", "August", "September", "October", "November", "December"))
"The factor() function combined with the assignment operator, <-, will get you further"
"Assignment, factor, levels and labels"

month <- factor(month, ...)
month <- factor(month, levels = c(...), labels = c(...))
"You can actually also use these objects directly into the code:"
month <- factor(month, levels=month, labels = month.name)
"The results are the same."

4 Data frames

As we mentioned in the first session, a data frame is a two-dimensional table structure which is used to hold the data values of multiple vectors. It is a special case of a list where each component (e.g vectors) has equal length. Each column in the data frame contains the value of one variable, and each row contains the value of each column. All kinds of data types can be stored in different columns of a data frame, for example numeric data, character data and/or factor data. To be more specific, in R, data frames consists of two or more vectors of equal length.

In this chapter, you’ll learn the basics of the data frame structure in R. But first, we’ll learn how to create a data frame by following our conceptual example of the employer who wanted to create a register of the employees working hours to make sure the pay checks are paid out correctly.

4.1 Creating a data frame

Even though we now have created the vectors with the information we need in our conceptual example (hours_per_week, id, department), the vectors are still not associated with each other. For the employer to be able to actually pay out the correct paycheck to his employees, we have to know which amount of hours corresponds to which employee. Essentially, what we want is to create a list with three columns: id, hours_per_week and department. By using the structure of a data frame with the data.frame() function we are able to do just that. Remember that a data frame basically is a structure consisting of two or multiple vectors.

pay <- data.frame(id, hours_per_week, department)
pay

That looks exactly like we wanted! We now have list structure with three columns corresponding to the three vectors, and nine rows which corresponds to the nine employees.

Help documentation

To see the help-pages for the function data.frame(), just run the following code and the help-page will open in your browser.

help(data.frame)

4.1.1 Exercises

1. Write the correct code to create a data frame called sick_2020 which consists of the vectors month and sick_days that you created in the previous exercise, then print the data frame by typing it’s name in a new line of code, and hit the “run code” button before you submit your answer.

sick_2020 <- data.frame(month, sick_days)
sick_2020
"Remember to use the data.frame() function"
sick_2020 <- data.frame(..., ...)
sick_2020 <- data.frame(month, sick_days)
"Type in the name of the data frame to print the data frame:"

sick_2020 <- data.frame(month, sick_days)

sick_2020

4.2 Obtaining information of a data frame

There is several ways you could obtain information of a data frame or the elements inside it. In addition to the the functions head(), glimpse() and view() which gives information on the complete dataset and which we covered in the first session (Essentials in R), we could also get information of the different elements that a dataframe contains. In this section, we’ll show you how to extract information of elements (i.e variables or observations) inside a dataframe. But first, we’ll introduce you to the dim() function.

For further purposes, we will use the dataset issp to illustrate the different commands for extracting information from a dataframe. The dataset are loaded in your workspace.

The dim() function

Remember how we used the function length() for counting the elements stored in a vector? The dim() function basically does the same for a data frame. However, compared to the single number we obtained when using the length() function on a vector, the dim() function provides us with two numbers. This is because data frames are two-dimensional data structures, consisting of multiple columns and rows, while vectors only consists of one column and multiple rows. The first number in the output refers to the number of rows (observations) in the dataset, while the second number refers to the number of columns (vectors/variables). Simply put, the dim() function gives us information on the dimensions of a data frame.

4.2.1 Exercises

1. Use the code chunk below to write the correct code for obtaining information about the numbers of rows and columns in the issp dataset. Press the “run code” button before submitting your answer so you’ll be able to see the output.

dim(issp)
"Replace the dotted lines with the name of the dataset:"

dim(...)

4.3 Indexing

In our first tutorial session ‘Essentials in R’, we learned how to obtain information of a complete dataset. However, what if we wanted to extract information about a specific variable or row in a given dataset? In this part we’ll teach you how you can extract information about specific variables, rows or a combination of the two. We’ll be using the issp dataset to illustrate the examples.

Let’s say we wanted to extract information on the age variable in the issp dataset. Considering that R may have several datasets loaded in the workspace at any given time, we have to tell R which dataset and which variable we want information on. To obtain information on the variable age, we can use the $ operator to reference the dataset and the variable of interest:

issp$age
##    [1] 31 71 68 71 59 78 64 50 52 56 61 52 75 66 49 29 54 40 58 72 46 20 66 67
##   [25] 42 65 37 29 58 56 53 45 58 24 57 51 63 68 27 73 47 18 32 27 32 41 44 28
##   [49] 54 68 75 34 77 20 47 54 77 34 49 53 64 46 26 71 34 53 59 26 70 78 54 74
##   [73] 60 53 31 24 34 54 79 58 75 50 51 42 43 36 39 47 28 66 75 39 32 35 69 19
##   [97] 50 32 57 68 30 62 59 61 20 63 52 68 68 27 70 41 62 34 65 53 50 74 44 42
##  [121] 50 27 23 32 40 70 21 68 41 20 67 47 65 56 52 66 59 77 73 48 59 48 42 63
##  [145] 49 68 37 70 77 22 24 66 72 71 34 24 64 61 68 36 66 37 22 75 31 53 40 30
##  [169] 68 20 59 59 67 54 79 58 67 23 53 48 36 59 27 40 76 31 32 53 18 34 71 73
##  [193] 38 46 63 55 71 48 38 78 18 61 72 64 73 66 22 33 49 71 30 53 34 41 48 42
##  [217] 62 63 73 60 56 76 32 27 51 67 24 34 53 75 51 58 48 48 52 24 58 21 31 18
##  [241] 55 59 21 50 62 60 67 30 25 62 23 60 72 67 21 68 39 46 56 18 23 34 49 23
##  [265] 53 25 53 57 64 65 73 67 70 69 74 66 77 45 37 59 60 52 30 38 66 35 65 53
##  [289] 20 61 53 29 51 75 36 63 43 45 25 66 70 27 71 60 68 57 64 26 47 51 55 63
##  [313] 44 50 53 69 49 67 18 44 57 31 69 62 67 73 32 76 34 65 46 25 67 30 62 55
##  [337] 73 49 27 56 23 70 73 39 57 71 31 50 74 65 41 36 34 72 53 70 62 44 27 78
##  [361] 57 31 38 45 46 51 64 65 22 43 30 37 72 57 35 50 60 23 38 72 39 55 64 73
##  [385] 72 43 25 58 35 70 43 45 76 59 57 50 46 46 26 52 78 52 64 74 55 29 50 46
##  [409] 51 53 48 23 55 48 37 24 65 53 37 55 60 34 57 63 38 28 30 30 22 68 71 60
##  [433] 36 66 75 68 50 26 42 54 35 64 71 57 56 38 73 23 76 57 74 40 55 33 50 64
##  [457] 62 33 51 76 53 72 18 25 40 25 53 54 52 76 58 44 32 39 54 57 18 71 51 56
##  [481] 74 35 54 22 64 38 25 20 50 27 58 38 29 50 57 32 49 31 46 25 78 47 55 68
##  [505] 60 58 55 68 21 53 47 67 70 22 54 48 26 50 47 57 76 32 49 34 45 54 43 36
##  [529] 41 44 74 33 39 19 77 46 34 48 70 55 59 59 71 34 48 60 23 69 30 48 79 72
##  [553] 64 41 37 40 57 70 66 20 20 59 53 48 78 69 22 62 68 39 68 19 25 24 60 44
##  [577] 48 27 74 39 73 39 50 49 33 37 20 37 32 58 61 29 48 45 63 58 47 79 21 45
##  [601] 55 53 47 50 20 79 64 69 55 75 21 37 45 47 70 61 50 65 62 33 61 52 33 52
##  [625] 45 65 60 41 30 59 74 71 71 32 72 21 53 41 30 60 69 56 18 59 20 22 54 70
##  [649] 54 49 30 66 42 75 45 43 28 32 64 46 28 56 50 48 54 50 61 65 43 52 38 61
##  [673] 65 72 61 64 49 39 55 44 18 60 37 52 62 18 53 58 56 75 32 36 35 72 60 33
##  [697] 48 56 32 74 72 63 48 54 61 73 77 21 45 56 70 63 30 29 48 71 78 36 57 56
##  [721] 68 58 51 52 19 37 37 44 32 78 64 68 58 63 78 42 72 37 27 25 68 26 60 63
##  [745] 28 67 69 18 61 22 58 37 31 61 50 38 47 47 73 63 31 64 57 73 29 50 41 38
##  [769] 31 35 53 47 20 71 74 38 55 70 48 45 57 46 35 70 68 51 30 21 63 52 67 39
##  [793] 73 64 59 77 47 36 NA 68 72 56 76 63 34 43 69 50 42 57 74 53 64 76 62 31
##  [817] 21 35 77 78 39 51 28 45 60 59 47 67 65 37 NA 76 31 43 65 61 66 31 71 60
##  [841] 76 25 54 73 67 33 44 20 51 54 65 36 59 57 38 43 21 77 18 79 52 19 68 63
##  [865] 42 22 57 66 79 54 51 45 40 50 69 65 46 40 37 61 77 65 50 51 73 27 22 21
##  [889] 49 51 48 58 77 21 50 51 63 36 55 37 34 28 79 66 57 62 51 51 56 53 47 20
##  [913] 49 71 63 68 45 31 64 50 59 41 45 55 48 49 53 29 75 70 56 40 46 23 47 21
##  [937] 43 58 46 46 62 41 25 54 43 43 52 33 71 60 76 31 45 35 42 39 37 65 42 77
##  [961] 48 58 44 47 18 65 45 25 71 50 28 77 60 35 79 31 33 27 43 55 20 62 40 53
##  [985] 40 66 48 55 22 76 23 70 32 71 65 70 71 35 65 72 50 64 48 54 48 74 19 23
## [1009] 64 41 22 53 78 53 37 48 58 31 44 73 56 25 68 66 66 52 74 66 51 47 27 54
## [1033] 46 51 63 59 46 35 55 42 66 49 54 63 69 49 56 51 32 60 30 58 54 54 70 68
## [1057] 74 73 50 46 29 26 77 67 74 74 57 42 27 24 19 62 61 68 61 22 44 64 54 39
## [1081] 19 68 76 25 52 70 70 73 53 39 59 21 41 33 62 69 25 34 36 61 66 75 63 68
## [1105] 22 50 42 47 73 46 69 64 72 42 54 39 40 37 30 71 56 52 51 74 55 19 72 74
## [1129] 37 65 26 46 53 39 58 74 52 46 64 18 68 35 31 25 50 32 55 60 73 58 35 54
## [1153] 69 50 33 32 47 51 59 32 18 31 34 46 55 62 65 33 76 58 24 28 37 60 43 20
## [1177] 47 52 43 49 47 18 38 21 49 79 49 65 62 63 26 68 64 48 76 46 65 55 46 70
## [1201] 47 74 54 69 55 65 46 70 25 60 69 75 33 53 19 30 68 74 68 70 24 35 68 61
## [1225] 74 35 49 38 52 23 64 64 40 68 56 73 70 56 61 40 18 51 39 20 20 42 52 69
## [1249] 52 56 51 71 77 67 37 29 44 51 72 67 44 25 56 48 44 53 34 28 60 49 37 62
## [1273] 52 29 27 57 52 77 65 54 76 33 58 53 48 56 49 56 75 49 58 29 75 59 58 32
## [1297] 57 39 76 57 64 73 64 41 74 56 70 30 53 37 19 56 27 65 63 42 27 35 47 20
## [1321] 73 68 50

In the code above we first type the name of the dataset, then the $ followed by the specific variable of interest. As you can see the output is pretty overwhelming because of the large number of observations (N=1323) in this dataset, and this function is not used so much by itself. However, it is very useful in combination with other functions to extract information.

Remember how we used indexing on a vector? This also works when operating on a data frame structure. Let’s say we want to know how many respondents in the issp dataset are less than 30 years old. Combining the $ operator with indexing and logical operators, we can extract this information from the age variable:

issp$age[issp$age<30]
##   [1] 29 20 29 24 27 18 27 28 20 26 26 24 28 19 20 27 27 23 21 20 22 24 24 22 20
##  [26] 23 27 18 18 22 27 24 24 21 18 21 25 23 21 18 23 23 25 20 29 25 27 26 18 25
##  [51] 27 23 27 22 23 25 26 29 23 24 28 22 26 23 18 25 25 18 22 25 20 27 29 25 21
##  [76] 22 26 19 23 20 20 22 19 25 24 27 20 29 21 20 21 21 18 20 22 28 28 18 18 21
## [101] 29 19 27 25 26 28 18 22 29 20 21 NA 21 28 NA 25 20 21 18 19 22 27 22 21 21
## [126] 28 20 29 23 21 25 18 25 28 27 20 22 23 19 23 22 25 27 29 26 27 24 19 22 19
## [151] 25 21 25 22 19 26 18 25 18 24 28 20 18 21 26 25 19 24 23 18 20 20 29 25 28
## [176] 29 27 29 19 27 27 20

Further we could also obtain information on respondents that were less than 30 years old and greater than 20 years old:

issp$age[issp$age<30 & issp$age>20]
##   [1] 29 29 24 27 27 28 26 26 24 28 27 27 23 21 22 24 24 22 23 27 22 27 24 24 21
##  [26] 21 25 23 21 23 23 25 29 25 27 26 25 27 23 27 22 23 25 26 29 23 24 28 22 26
##  [51] 23 25 25 22 25 27 29 25 21 22 26 23 22 25 24 27 29 21 21 21 22 28 28 21 29
##  [76] 27 25 26 28 22 29 21 NA 21 28 NA 25 21 22 27 22 21 21 28 29 23 21 25 25 28
## [101] 27 22 23 23 22 25 27 29 26 27 24 22 25 21 25 22 26 25 24 28 21 26 25 24 23
## [126] 29 25 28 29 27 29 27 27

Or what if we wanted to see the first two rows of each variable in the dataset?

issp[1:2, ]

We can also extract information about specific rows and columns. In the following example the code below returns the values in the first three rows of the third and fourth variable:

issp[1:3, c(3,4)]

As you may have noticed, the first set of numbers (1:3) refers to the rows, while the second set (c(3,4)) refers to the columns.

NA and other special values

NA

The keen observer may have noticed that some values in the age variable has a weird value called NA. NA usually refers to a missing value. There were supposed to be another value stored, but it is missing. NA basically means ‘unknown’. Missing data happens all the time, and when you will be working with your own projects you will be seeing it a lot! It is important to note that when you use summary functions for descriptive statistics, like mean() etc, you have to specify that NA values will be removed from the calculations. If not, the output will be wrong. Take this example:

mean(issp$age)
## [1] NA

The reason we get a output with NA when using the mean() function, is because whenever the vector (age) has unknown values, NA, the mean of the vector will also be unknown. If we want to check whether there is missing values in a variable we can use the is.na() function which returns either TRUE (value is NA) or FALSE (Value is not NA):

is.na(issp$age)
##    [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##   [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##   [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##   [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##   [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##   [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##   [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##   [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##   [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [157] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [169] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [205] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [217] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [229] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [253] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [265] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [277] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [301] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [313] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [325] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [349] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [361] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [373] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [397] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [409] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [421] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [433] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [445] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [457] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [469] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [481] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [493] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [505] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [517] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [529] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [541] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [553] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [565] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [577] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [589] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [601] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [613] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [625] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [637] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [649] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [661] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [673] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [685] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [697] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [709] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [721] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [733] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [745] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [757] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [769] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [781] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [793] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
##  [805] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [817] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [829] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [841] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [853] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [865] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [877] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [889] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [901] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [913] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [925] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [937] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [949] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [961] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [973] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [985] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [997] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1009] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1021] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1033] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1045] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1057] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1069] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1081] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1093] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1105] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1117] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1141] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1153] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1165] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1189] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1201] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1213] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1237] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1249] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1261] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1285] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1297] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1309] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [1321] FALSE FALSE FALSE

As we can see from the output basically everything is FALSE, but on ‘line’ 793 and 829 we can observe the value TRUE. To be able to get the correct output of the mean() function, we therefore have to specify that we want NA’s to be removed before R computes the calculation:

mean(issp$age, na.rm = TRUE)
## [1] 50.52233

We now get the mean of age in the output. The argument na.rm = refers to a logical argument which tells the function whether or not to remove NA values from the calculation. It literally means NA remove. If the argument is set to FALSE, NA’s will not be removed. More about descriptive statistics and summary functions will follow in later sessions. In the next session, you will learn how to recode NA values.

Other special values: NaN and Inf

In addition to NA, you may also come across other special values such as NaN and Inf. These are not as common as NA, but it is useful to briefly mention them as well. NaN means ‘Not a Number’, and is used to indicate that there is no mathematically defined number for this, for example: 0/0 would return NaN:

0/0
## [1] NaN

Inf on the other hand, is reserved to indicate an infinitely large number. If you divide a positive number by 0, R would return the value Inf.

436/0
## [1] Inf

4.3.1 Exercises

1. In the issp dataset, the variable gender refers to the gender of the respondents where 1 = “Male” and 2 = “Female”. Modify the code below to find out how many respondents are male.

issp$gender[issp$gender=="Female"]
issp$gender[issp$gender=="Male"]

2. How many respondents in the issp dataset are older than 70 years old? Write the correct code to obtain this information.

issp$age[issp$age>70]
"Replace the dotted lines with the correct logical operator followed by the correct number"

issp$age[issp$age...]

3. Write the correct code to print the first four rows of the second and third column in the issp dataset.

issp[1:4, c(2,3)]
"Replace the dotted lines with the specified rows and the specified columns. First the rows, then the columns"

issp[..., c(...)]

4. Challenge!. In exercise 9, you created a data frame called sick_2020 with the variables sick_days and month. By using this dataset…

a) …How would you extract the number of sick days for the month of October? Write the correct code to obtain this information.

sick_2020$sick_days[sick_2020$month=="October"]
"You are interested in the number of sick_days for the month of October. Using indexing first type sick_days, then month inside the square brackets"

sick_2020$...[sick_2020$...]
sick_2020$sick_days[sick_2020$month...]
"Remember the equality operator: =="

b) …How would you extract the months where the number of sick days were less than 6?

sick_2020$month[sick_2020$sick_days<6]
"Using indexing: First month, then sick_days inside the square brackets followed by the correct logical operator and condition"

sick_2020$...[sick_2020$...]
"Remember the lesser than operator: <"

Published 1 September 2021.
Revised: 12 September, 2021