Introduction to R and Quarto

Luong Nguyen Thanh // April 28-29, 2025
Basic R // GU Workshop 2025

Before we start…

Open the workshop website and make sure you’ve completed the pre-work and have the required software, packages, and exercises.

ntluong95.quarto.pub/gothenburg_ws2025/prework.

If you have not done it yet, you still have some times.

05:00

Learning objectives

Understand the fundamentals of R and its development environment.
Get introduced to Quarto and learn how to create dynamic, reproducible reports.
Convert a Quarto document into an interactive sharable report.
Be able to clean and transforming data with R

Schedule

Time	Topic
9:00 - 9:30	Welcome & Overview of R and Quarto
9:30 - 10:30	Basics functions used in importing and cleaning datasets
10:30 - 10:45	Break
10:45 - 12:00	Transforming data for analysis using participants’ examples
11:45 - 12:00	Lunch Break

Working with R

Install R alone would be enough for analysis
Working with R before was very difficult due to lacking of supporting features of the development environment like code completion, syntax highlighting, project management, etc.
RStudio IDE released in 2013 and Positron IDE released in 2024 makes working with R much easier

Working directory

A data analysis project must be organized in a folder structure
If you work with Rstudio, you should create a Rstudio project
It helps addressing a lot of stress regarding setwd(), getwd()
If you work with Positron, you don’t need to create a project
In STATA or SPSS, you work with .dta and .sav file, which is assumed as your project, but also a data file
For R and general programming language, a project and a data file is two different concepts

Import and export data

Use of the rio package to flexibly import() and export() many types of files
Use of the here package to locate files relative to an R project root - to prevent complications from file paths that are specific to one computer
Specific import scenarios, such as:
- Specific Excel sheets
- Messy headers and skipping rows
- From Google sheets
- From data posted to websites
- With APIs
- Importing the most recent file
Manual data entry
R-specific file types such as RDS and RData
Exporting/saving files and plots

Import and export data

import() function is versatile to import any kind of data, while here() function is used to construct the file path.
The use of here() is to create a relative file path, avoiding cross OS system, different computers problems.

here::here("data", "Linet_14.04.2025.dta")

[1] "D:/04 EXPERIMENTS/gothenburg_ws/data/Linet_14.04.2025.dta"

data <- import(here("data", "Linet_14.04.2025.dta"))

DON’T

data <- import("D:/04 EXPERIMENTS/gothenburg_ws/data/Linet_14.04.2025.dta")

R allows you to work with multple datasets at the same time
You can export safely your cleaned data into a .Rds file, share with colleagues and re-use but it is not a common practice
WHY? Because file size is usually large. In the era of big data, it is not optimal. Also having no cleaned data file forces you to write a reproducible code

Basic R syntax

Pipes

Used to chain (connect) several functions together
2014+ magrittr pipe %>%
2021+ (R \(\geq\) 4.1.0) native R pipe |>

do_something(arg1, arg2, arg3, ...)

arg1 |>
  do_something(arg2, arg3)

mean(0:10)

0:10 |>
  mean()

Keyboard shortcut to create pipe:

Windows: Ctrl + Shift + M

Mac: Cmd + Shift + M

Load packages

R provides built-in baseR functions, which does not require to load
Functions from external libraries needs to be loaded before hand
Classically, load single package with library(package_name) function
The modern syntax using pacman::p_load() function (pacman: package manager)
pacman will automatically install and load package

# Classic syntax
install.packages("tidyverse")
install.packages("dlnm")

library(tidyverse)
library(dlnm)

# Modern syntax
pacman::p_load(
  tidyverse,
  dlnm
)

Namespacing

package::function()

dplyr::select()

used when we just need to use 1 function from a package 1,2 times only
the :: notations means to access function() from the package
i.e., tells R explicitly to use the function select from the package dplyr
helps avoid name conflicts (e.g., MASS::select())
does not require to call library(dplyr)

Data wrangling with `tidyverse packages`

There are many functions and packages in R, you are not supposed to remember them all
Remember core packages of tidyverse meta-packages solves 80% of your daily data cleaning tasks
- dplyr: for transforming data. Key functions includs group_by(), select(), filter(), rename(), mutate(), arrange(), distinct()
- tidyr: for tidying data. Key functions includes drop_na(), pivot_longer(), pivot_wider()
- stringr: for working with string and characters. Key functions includes str_detect(), str_extract()
- forcats: for working with categorical variables (factor).
- lubridate: for working date variables
- ggplot2: for creating plots

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Quarto

Quarto is a publishing system that supports many multiple & outputs like docx, pdf, html, pptx
This website and slides are create with Quarto
It allows for open science and data communication
A simple Quarto document is a single .qmd file consists of:
- A YAML healding: always on top of the document
- Code blocks: any places in the document
- Text blocks (markdown): any places in the document
If you want to create a more complex output like a website, book, you need a Quarto project

Practice!!!

Create a folder dedicated for a data analysis project according to the recommended structure
Use your IDE of choice, either RStudio or Positron to start the project
Create a Quarto document
Start importing your data
Write code to explore your data and write your comments about them
Write to clean your data and write your comments on how data changed
Export your cleaned data for later use

Introduction to R and Quarto

Before we start…

Learning objectives

Schedule

Working with R

Working directory

Import and export data

Import and export data

Basic R syntax

Pipes

Load packages

Namespacing

Data wrangling with tidyverse packages

Quarto

Practice!!!

Data wrangling with `tidyverse packages`