Descriptive tables

To get back to the main page click here

1 Simple table

In this section, we will create publication-ready tables, moving from relatively simple tables to more complicated setup. The packages gt provides a “grammar of tables”. The package gtsummary is a wrapper for gt which makes it easier to create summary tables from data. These packages work seamlessly into a tidyverse workflow, and you will need to use functions you have learned in the previous chapters.

Documentation and vignettes for gtsummary is available at the author’s homepage here: http://www.danieldsjoberg.com/gtsummary/

We will work with data from the International Social Survey Programme, http://w.issp.org/menu-top/home/

1.1 Look at the data

The packages tidyverse and gtsummary are loaded in your workspace along with the dataset issp.

Get familiar with the structure of the dataset using head(), glimpse() and summary(). Replace the dotted lines with the name of the dataset.

The dataset contain just a few variables:

kjonn - the sex of the respondent
religion_viktig - to what extent the respondent thinks religion is important
boerbuti - the respondent’s opionion about how much shop employees should earn (in NOK)
boerlege - the respondent’s opionion about how much medical doctors should earn (in NOK)

head(...)

head(issp)

"Fill inn datasett name for ..." 
glimpse(issp)

summary(issp)

1.2 Make a simple table

Use the issp dataset and select only the variable religion_viktig and then make a table with descriptive statistics using tbl_summary(). Since the select()-statement only keeps one variable, you do not need to specify variables in tbl_summary(). Save the table to an object with name tab_religion, and print that object to see the results.

... <- issp %>% 
  select(religion_viktig) ... 
  ... 

tab_religion

'Did you add a new line with tbl_summary()?'

'Have you remembered to use pipes? '

'Have you stored the results in an object denoted *tab_religion*? And printed it afterwards?'

tab_religion <- issp %>% 
  select(religion_viktig) %>% 
  tbl_summary()

tab_religion

2 Modify how the table looks

The table will by default show the variable name, which might - and might not - give a hint of what kind of information the variable contains. Some times, the variable name is a bit cryptic, and you typically would like a label that is more meaningful. We can do this within the tbl_summary() by specifying the label = argument as follows:

tbl_summary(label = VariableName ~ "yourLabel")

Thus: the label is specified with first the variable name followed by ~ and the the text you would like to show. Remember to put the text in quotes.

For these data, the respondents have been asked the following question: “How important is religion in getting ahead?”, where they answer on a likert-scale form 1 to 5. The data have here been re-coded to a dummy-variable so that 0 = not important (<4 on original scale) or 1 = important (4 or 5 on original scale).

Make the table again, but modify the code to add a label which give a better description. Set the question the respondents have been asked as label.

Replace the ___ with correct code, store the results in an object denoted tab_religion and print the results.

tab_religion <- issp %>% 
  ...(religion_viktig) %>% 
  tbl_summary(___ = religion_viktig ~ "___") 

___

tab_religion <- issp %>% 
  select(religion_viktig) %>% 
  tbl_summary(label = religion_viktig ~ "How important is religion in getting ahead?")  

tab_religion

All columns have a header, and what is automatically selected might not be how you like it to be. So, you should change that. To do so, you must add another line (remember using the pipe-operator (%>%), and use the modify_header()-statement.

You do this by referring to the names of the headers and set a new text for each. These names in the table are fixed, and the first is named label and the first column with figures is named stat_0. To check the names of a given table, use show_header_names() as follows:

tab <- issp %>% 
  select(religion_viktig) %>% 
  tbl_summary(label = religion_viktig ~ "How important is religion in getting ahead?")  

tab %>% 
  show_header_names()

## ℹ As a usage guide, the code below re-creates the current column headers.

## modify_header(update = list(
##   label ~ '**Characteristic**',
##   stat_0 ~ '**N = 1,323**'
## ))

## 
## 
## Column Name   Column Header      
## ------------  -------------------
## label         **Characteristic** 
## stat_0        **N = 1,323**

You can now update the headers in your table, by specifying a list of headers you would like to change. If you would like the new header to be in bold font, add two asterisk * on each side of the label. The following example updates the two column headers with new text.

tab <- issp %>% 
  select(religion_viktig) %>% 
  tbl_summary(label = religion_viktig ~ "How important is religion in getting ahead?")  

tab %>% 
  modify_header(update = list(
    label ~ "**Variable name**",
    stat_0 ~ "**N**")
    )

Variable name	N¹
How important is religion in getting ahead?
Not important	952 (72%)
Important	220 (17%)
Do not know	151 (11%)
¹ n (%)

A couple of details for you to try out is:

To make a bold font, add ** before and after the word(s) within the apostrophs of the label.
To make a cursive font, add * before and after the word(s) within the apostrophs of the label.
To remove the header, add a blank space: label ~ ” ”
Change only one of the column headers by removing the other from the code

Now, let’s create the table again, but include such modifications. Replace the dotted lines with suitable code. Play around with this code and make modifications. This exercise will not be corrected.

tab_religion <- issp %>% 
  select(religion_viktig) %>% 
  tbl_summary(label = religion_viktig ~ "How important is religion in getting ahead?")  %>% 
  modify_header(update = list(
    label ~ "...",
    stat_0 ~ "...")
    )  

tab_religion

Now that you’ve tried out a bit, correct the following code so that the first column do not have a header, and the second column have the header “Count (percent)”. (Yes, use bold just like that).

tab_religion <- issp %>% 
  select(religion_viktig) %>% 
  tbl_summary(label = religion_viktig ~ "How important is religion in getting ahead?")  %>% 
  modify_header(update = list(
    label ~ "...",
    stat_0 ~ "...")
    )  

tab_religion

tab_religion <- issp %>% 
  select(religion_viktig) %>% 
  tbl_summary(label = religion_viktig ~ "How important is religion in getting ahead?")  %>% 
  modify_header(update = list(
    label ~ "",
    stat_0 ~ "**Count** (percent)")
    )  
tab_religion

'To remove a header, just specify it as "". '

'You can make just a single word bold within the header'

The function tbl_summary() recognizes if the variables are continuous or categorical. Thus, a factor-variable will be interpreted as such, and the table will show frequencies and percentages as these are typically the quantity of interest for such variables. This is also why the column header include the total number of observations, and a footnote indicating that the number given is a count and percentages in parenthesis.

Similarly, if we had included a continuous variable, tbl_summary() would have returned the mean and standard deviation.

You might like to change that. Perhaps you only would like to have the raw numbers for categorical variables, and only the mean for continuous. If so, you can change that by specifying the statistics for each variable type. We do so by providing a list of specifications as this example shows:

issp %>% 
  select(religion_viktig) %>% 
  tbl_summary(statistic = list(all_continuous() ~ "{mean}",
                               all_categorical() ~ "{n}"))

Characteristic	N = 1,323¹
religion_viktig
Not important	952
Important	220
Do not know	151
¹ n

You shall now create the table again, but include only percentages by specifying p within the curly brackets. You must also change the column header to indicate percentages accordingly as “Percent”.

tab_religion <- issp %>% 
  select(religion_viktig) %>% 
  tbl_summary(statistic = list(all_categorical() ~ "{...}"))  %>% 
    modify_header(update = list(
      label ~ " ",
      stat_0 ~ "**...**")
    )  
...

tab_religion <- issp %>% 
  select(religion_viktig) %>% 
  tbl_summary(statistic = list(all_categorical() ~ "{p}"))  %>% 
    modify_header(update = list(
      label ~ " ",
      stat_0 ~ "**Percent**")
    )  
tab_religion

You might still not be quite happy with this table. You might find the footnote to be pointless in this case, and you would like to add a row at the bottom with the total number. While this can be done in a number of ways, one convenient way is a bit quick-and-dirty but perhaps the easiest for such a simple table:

add a new variable to the dataset using mutate()-statement and give the new variable the value 1 for all cases. For ease, you might give that variable the name Total to avoid changing the label in the table later.
include the sum of this variable in the table, by specifying it in the list of statistics.

In the statistic =-list, each variable can be specified. In this case, we would like to the variable Total to only include the total N, specified as {N}, and the religion_viktig to include percentages, specified as {p}. Fill in the code below to get it right.

issp %>% 
  select(religion_viktig) %>% 
  mutate(Total = 1) %>% 
  tbl_summary(statistic = list(Total ~ "{...}", 
                               religion_viktig ~ "{...}")
              ) %>% 
  modify_header(update = list(
      label ~ "",
      stat_0 ~ "**Percent**")
    )

issp %>% 
  select(religion_viktig) %>% 
  mutate(Total = 1) %>% 
  tbl_summary(statistic = list(Total ~ "{N_obs}", 
                               religion_viktig ~ "{p}")
              ) %>% 

  modify_header(update = list(
      label ~ "",
      stat_0 ~ "**Percent**")
    )

The table is almost done, but the footnote is no longer needed, and the variable name does not need to be shown either. (If you write a report, you will add a table caption that explains what this table shows, and explain the variable shown). Let’s remove them. You remove the footnote by adding:

modify_footnote(update = everything() ~ NA)

which specifies that all footnotes are set to missing, NA.

You remove the variable name by removing that particular row. This is done by:

remove_row_type(religion_viktig, type = "header")

which specifies that the header-row of that variable is to be removed.

Replace the dotted lines in the code below with the above mentioned codes.

issp %>% 
  select(religion_viktig) %>% 
  mutate(Total = 1) %>% 
  tbl_summary(statistic = list(Total ~ "{...}", 
                               religion_viktig ~ "{...}")
              ) %>% 
  modify_header(update = list(
      label ~ "",
      stat_0 ~ "**Percent**")
    ) %>% 
  ...  %>% 
  ...

issp %>% 
  select(religion_viktig) %>% 
  mutate(Total = 1) %>% 
  tbl_summary(statistic = list(Total ~ "{N}", 
                               religion_viktig ~ "{p}")
              ) %>% 
  modify_header(update = list(
      label ~ "",
      stat_0 ~ "**Percent**")
    ) %>% 
  modify_footnote(update = everything() ~ NA)  %>% 
  remove_row_type(religion_viktig, type = "header")

3 Crosstabulations

Religion might differ for males and females. In tbl_summary() you can specify variables to place in the column. Make the table again, but now by kjonn by adding by = kjonn inside tbl_summary(). Save the result in an object called tab_cross.

tab_cross <- issp %>% 
  select(religion_viktig, kjonn) %>% 
  tbl_summary(by = ... ) %>% 
  modify_header(label ~ "**Variable**")

...

tab_cross <- issp %>% 
  select(religion_viktig, kjonn) %>% 
  tbl_summary(by=kjonn) %>% 
   modify_header(label ~ '**Variable**')
tab_cross

'Did you add kjonn to the select-statement? The variable must exist to work. '

Notice that the variable for sex only have the values 1 and 2, and these should be replaced by meaningful text. This can be done with functions for gtsummary, but a simpler solution is to make the variable a factor with labels (see the chapter on dplyr using mutate()-statement). See tutorials and exercises in an earlier chapter on how to create factor-variables here.

Make the table again, but add a mutate()-statement converting the variable kjonn to a factor with the labels “Female” and “Male”.

tab_cross <- issp %>% 
  mutate(kjonn = factor(..., labels = c(...))) %>% 
  select(religion_viktig, kjonn) %>% 
  tbl_summary(by=kjonn) %>% 
   modify_header(label ~ '**Variable**')

tab_cross <- issp %>% 
  mutate(kjonn = factor(kjonn, labels = c("Female", "Male"))) %>% 
  select(religion_viktig, kjonn) %>% 
  tbl_summary(by=kjonn) %>% 
   modify_header(label ~ '**Variable**')

tab_cross

3.1 Adding margins

You might like to have the marginal summary as well. You can just add the function add_overall() at the end.

tab_cross <- issp %>% 
  mutate(kjonn = factor(kjonn, labels = c("Female", "Male"))) %>% 
  select(religion_viktig, kjonn) %>% 
  tbl_summary(by=kjonn) %>% 
  modify_header(label ~ '**Variable**') %>% 
  ...  

...

tab_cross <- issp %>% 
  mutate(kjonn = factor(kjonn, labels = c("Female", "Male"))) %>% 
  select(religion_viktig, kjonn) %>% 
  tbl_summary(by=kjonn) %>% 
  modify_header(label ~ '**Variable**') %>% 
  add_overall()

tab_cross

'Have you replaced all the dots?'

'Did you remember to use pipes?'

4 Larger tables

You would typically like to have one large table rather than many small ones. Drop the select()-statement to include all variables. Make a table by gender using the kjonn variable and include the overall summaries.

issp %>% 
  ...(by=kjonn) %>% 
  ...()

issp %>% 
  tbl_summary(by=kjonn) %>% 
  add_overall()

Now, let’s clean up this table as we did with the simpler tables.

Add a row for total by adding a new variable with mutate(), and modify the statistic and labels.
Use modify_header() to update the column headers, and remove the footnote using modify_footnote().

issp %>% 
  mutate(...) %>% 
  tbl_summary(by=kjonn, 
              statistic = list(Total ~ "{...}", 
                               all_categorical() ~ "{n}",
                               all_continuous() ~ "{mean}"),
              label = list(religion_viktig ~ "Religion important (n)", 
                           boerbuti ~ "Work in shops should earn (NKr)",
                           boerlege ~ "Doctors should earn (NKr)")
              ) %>% 
  add_overall() %>% 
  modify_header(update = list(
      label ~ "",
      stat_0 ~ "**All**",
      ...  ~ "**Women**",
      stat_2 ~ "..."
      )
    ) %>% 
  ... (update = everything() ~ NA)

issp %>% 
  mutate(Total = 1) %>% 
  tbl_summary(by=kjonn, 
              statistic = list(Total ~ "{N}", 
                               all_categorical() ~ "{n}",
                               all_continuous() ~ "{mean}"),
              label = list(religion_viktig ~ "Religion important (n)", 
                           boerbuti ~ "Work in shops should earn (NKr)",
                           boerlege ~ "Doctors should earn (NKr)")
              ) %>% 
  add_overall() %>% 
  modify_header(update = list(
      label ~ "",
      stat_0 ~ "**All**",
      stat_1 ~ "**Women**",
      stat_2 ~ "**Men**"
      )
    ) %>% 
  modify_footnote(update = everything() ~ NA)