Introduction to R (the Basics)

Udi Alter

2022-05-23

1 R and RStudio Installation

1.1 Step 1: Installing R (should be done before installing RStudio)

  • To install R, visit cran.r-project.org and click Download R for [your operating system]. For example, because I’m using an apple computer, I clicked on Download R for macOS.

  • Then, under ‘Latest release’, click on the first/top one. For the macOS users this should be R-4.2.0.pkg (if you don’t have apple silicone chip; usually computers older than 2) or R-4.2.0-arm64.pkg (if your apple computer has an M1 or newer silicone chip).

  • For Windows, you’ll need to click on ‘base’ (or ‘install R for the first time,’ same thing) to download the most recent version of R. To finish installing R on your computer, all that is left to do is to run the .exe file.

1.2 Step 2: Installing R studio (should be done after installing R)

  • Once R is installed, you can proceed to install the RStudio at www.rstudio.com/products/rstudio/download/#download. Under ‘Download RStudio Desktop’, click on ’DOWNLOAD RSTUDIO FOR [your operating system].

  • For Windows: Run the .exe file to finish installing RStudio on your computer.

  • To access R from here on, you only need to open RStudio. Feel free to explore it. But, don’t worry about this too much, I will explain RStudio when we first meet.

  • Here’s a brief video with R and RStudio installation instructions for your reference.

  • If you’re unable to install R and RStudio before class, you can use RStudio Cloud instead: rstudio.cloud. RStudio Cloud is a cloud-based solution that allows anyone to use R and RStudio without any prior installation. Signing up is free! After you log-in, all you need to do is go to New Project > New RStudio Project. You can save, share, and download all R projects created in your workspace henceforth.

Note: If you run into any issues with the installation process or have any questions before we start, please feel free to e-mail me at .

2 Firing Up RStudio

2.0.1 Before a new data analysis:

  • Create a folder for each data analysis project.
  • Open Rstudio.
  • Go to File > New Project > Existing Directory
  • Navigate to the folder you created
  • Click Create Project. A “.Rproj” file will be created in the folder. Next time, simply double click this file to open your project.
  • Go to File > New File > R Script and click OK.
  • Save the new R Script file.

3 Basic Operations

  • Hashtags # are used to comment

3.1 Math Operations

1+2
[1] 3
2-3
[1] -1
5*6
[1] 30
1/6
[1] 0.1666667
3^2 # to the power of 2
[1] 9
(2+3)*5 #BEDMAS
[1] 25

3.2 Logical Operations

1>2 # greater than
[1] FALSE
2<17 # less than
[1] TRUE
2==7 # equals
[1] FALSE
2<=7 # less or equal to
[1] TRUE
7>=6 # greater or equal to
[1] TRUE
2*2 != 3 # does not equal
[1] TRUE
(2 | 3) < 7 # or
[1] TRUE
(2 & 3) == 3 # and
[1] FALSE

4 Functions

Functions are little programs. Almost everything we do in R requires functions. Functions can import data, manipulate/clean our data, and export our data to use elsewhere. They are the fundamental building blocks within R. We can use functions by typing them directly into the Console pane or into script file. Here are a few of examples of functions in R:
NOTE: spaces do not matter, but R is case sensitive

sqrt(9) # square root of 9
[1] 3
factorial(3) # 3 factorial - 3! e.g., 3*2*1
[1] 6
seq(1,5) # sequence of numbers from 1 to 5 e.g., 1, 2, 3, 4, 5
[1] 1 2 3 4 5
class(1)
[1] "numeric"
class(2>=3)
[1] "logical"
#' You can even combine functions:
seq(1,3)*factorial(2) # multiply each number in the sequence of 1-3 by 2!
[1] 2 4 6
#' or incorporate them into one another:
factorial(sqrt(9)) # the factorial of the square root of 9
[1] 6
seq(1,sqrt(9)) # sequence from 1 to 3 (square root of 9)
[1] 1 2 3
# Functions' arguments
seq(from = 1, to = 5, by=1)
[1] 1 2 3 4 5
seq(from = 1, to = 5, by=2)
[1] 1 3 5
plot(x=1:10, y=11:20) # spaces do not matter, but R is case sensitive

5 Getting Help with Functions

?mean # single ? searches within loaded environement
?seq
??neg.reg 
# double ??, same as search bar in Help tab, looks for function/package on CRAN
# Try the search box in Help tab, and look at what's inside the {}, that's the package name!

5.0.1 Google is programmers’ best friend!

It’s True!

R Help is not considered great, but as you get more familiar with R you will find yourself using it… Even expert R users often look for help, nobody knows and/or remembers all commands, functions, and packages (not to mention that these change and evolve all the time). The R Documentation is a bit dry and you will often find more comprehensive explanations and examples online.

Ask anyone, looking up code (including stuff that you know and tried before) is the bread an butter of R (and other software) users!

6 Objects

In R, we temporary save the results of operations or function inside an object. Think of objects as bins containing what you need. To create a bin, you should name it first. It is SO helpful (both for you and others) to select meaningful names for your bins so that they would intuitively reveal their content. It is also helpful to define (or know) what is the type of content in your object, e.g., data frame, vector, integer, list, etc.

6.0.1 Assignment Operator <-

R uses a special operator for creating objects to hold our results: <- It frequently is read as “gets” or “assign.”
NOTE: You cannot start your object name with a number (but in the middle/end name, you can)

meow <- 1 # NOT a good name, why?

variable <- c(1,2,3,4,5, NA) # How about this?

calc_results <- 2*3 # Good object name!

# Also good name:
data <- data.frame(x=1:10, y=11:20) # creating an object called data that will be a dataframe with variables x and y

Take a look at the Environment tab! Remember that R is case sensitive, so calc_results will NOT be the same as Calc_Results. You can use either “snake case” e.g., calc_results or “CamelCase,” e.g., CalcResults, or “dot case,” e.g., calc.results.

You can call on the object to see its content, for example:

calc_results
[1] 6

7 Udi’s Recommendation Corner

  • Use # to comment on your code as much as you can. Think of comments as little post-its you are leaving for others and your future self! The better your comments, the easier it will be for you and others to understand your code.
  • Always, pick object names that are meaningful. An object name should be a good hint about what results are saved inside an object. It is worth the time you spend now choosing an appropriate object name compared to the time you will likely spend later trying to figure out which object is which.

8 Packages

You can think of a package as a collection of functions, data and help files collated into a well defined standard structure which you can download and install in R. These packages can be downloaded from a variety of sources but the most popular are CRAN, Bioconductor and GitHub. The base installation of R comes with many useful packages as standard. These packages will contain many of the functions you will use on a daily basis. However, as you start using R for more diverse projects (and as your own use of R evolves) you will find that there comes a time when you will need to extend R’s capabilities. Fortunately, many thousands of R users have developed useful code and shared this code as installable packages - this is what is meant by open-source!

To use functions from a package, you must first:

8.0.1 Install the package:

install.packages("car")

But, make sure to remove the install.package() line (or make it a comment) before saving. You can also install packages from the Packages tab.

8.0.2 Load the package:

library(car)

8.0.3 Use any of the functions from this package:

Example:

data <- data.frame(x=1:10, y=11:20) # creating an object called data that will be a dataframe with variables x and y
scatterplotMatrix(data)

9 Types of Data & Data Structure

9.1 Atomic (Data) Types

There are 6 atomic types of data in R, but, for now you only need to know (and use) 4 of them:

9.1.1 Logical (TRUE/FALSE)

1 > 2
[1] FALSE
2 > 1
[1] TRUE

9.1.2 Integer

e.g., 1, 96, NA, 78910

9.1.3 Numeric

e.g., 2, 4.67, pi

9.1.4 Character

e.g., “hello”, “34”

"I like a lot pulp in my organge juice"
[1] "I like a lot pulp in my organge juice"

9.2 Data Structures

Data structures are ways which R stores data. Just like the name implies, a data structure tells us exactly how our data is structured or organized within R. R, like most programming languages, uses a variety of data structures. The most common of which are:

9.2.1 Vectors

A series of data which must have the same data type. Vectors can be any of the basic data types, e.g., logical, integer, numeric, character

vector_example1 <- 1:5
vector_example2 <- c(1,2,3,4,5) # c function is meant to **combine** values to form a vector
vector_example3 <- c(T, F, T, F, T) # T and F is the same as TRUE and FALSE
vector_example4 <- c("My", "mom", "is", "a", "teacher")

9.2.2 Matrices

Must be 2 dimensional. Must be of the same data type.

(matrix_example <- matrix(1:10, 1:10))
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    2    3    4    5    6    7    8    9    10

9.2.3 Arrays

Similar to matrices, but they can be more than 2 dimensions

(array_exaple <- array(dim = c(3,4,5)))
, , 1

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA

, , 2

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA

, , 3

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA

, , 4

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA

, , 5

     [,1] [,2] [,3] [,4]
[1,]   NA   NA   NA   NA
[2,]   NA   NA   NA   NA
[3,]   NA   NA   NA   NA

9.2.4 Lists

A list can have mixed data types, but they can be a bit trickier to use.

list_example <- list("cat", 3, TRUE)

9.2.5 Data frames

Used for tabular data, think of them as a basic spreadsheet, but each column is a vector.

(dataframe_example <- data.frame(x=1:10, y=11:20))
    x  y
1   1 11
2   2 12
3   3 13
4   4 14
5   5 15
6   6 16
7   7 17
8   8 18
9   9 19
10 10 20
(dataframe_example2 <- data.frame(eyecolour= c("blue", "brown", "grey"),
                                 hight = c(1.75, 2.1, 1.56),
                                 registered = c(T,F, T)))
  eyecolour hight registered
1      blue  1.75       TRUE
2     brown  2.10      FALSE
3      grey  1.56       TRUE

9.2.6 Tibbles

Similar to data frames with a few differences. Tibbles are the standard data structures within the tidyverse.

(tibble_example <- tibble(x=letters, y=1:length(letters)))
# A tibble: 26 × 2
   x         y
   <chr> <int>
 1 a         1
 2 b         2
 3 c         3
 4 d         4
 5 e         5
 6 f         6
 7 g         7
 8 h         8
 9 i         9
10 j        10
# … with 16 more rows

If you are not sure what type of data or data structure an object (or value) is, you can check using:

class(dataframe_example)
[1] "data.frame"
class(1.34)
[1] "numeric"
class(TRUE)
[1] "logical"
class(list_example)
[1] "list"
class(dataframe_example[1,2]) # dataframe_example[1,2] means the value in the 1st row, 2nd column
[1] "integer"
# It is always: [Rows, Columns]

dataframe_example
    x  y
1   1 11
2   2 12
3   3 13
4   4 14
5   5 15
6   6 16
7   7 17
8   8 18
9   9 19
10 10 20
dataframe_example[1,2]
[1] 11

10 Importing Data

R is very flexible and can import many data formats. RStudio will help you with that, using necessary packages. In this workshop I will show you how to do it through RStudio and also through R script. One thing to notice is that if you have a RStudio project in the same folder where your data sets are, you don’t need to specify path addresses to import your data. R will by default import and save anything on that folder.

10.0.1 R allows you to read in data from many different formats:

  • SPSS
  • Excel
  • SAS
  • .csv
  • .txt
  • .dat

The easiest way to get started with reading data into R is go to the environment tab and click on the import dataset button and then read in the data accordingly (note that csv files are a type of text file).

Let’s start by importing an SPSS dataset (.sav) using the read_sav() function in the haven package:

10.0.2 Install the haven package

install.packages("haven")

10.0.3 Load the haven package

library(haven)

10.0.4 Use read_sav()

You want to make sure the file is saved in the same workspace as your R Script, or set the path to where the data is. And, don’t forget to store the data in an object!

aggression_data <- read_sav("aggression.sav")

class(aggression_data) # what data structure is the dataset 
[1] "tbl_df"     "tbl"        "data.frame"
str(aggression_data) # structure, variables and their data type and structure
tibble [275 × 8] (S3: tbl_df/tbl/data.frame)
 $ age    : num [1:275] 18 18 20 17 17 17 17 17 17 17 ...
  ..- attr(*, "format.spss")= chr "F2.0"
 $ BPAQ   : num [1:275] 2.62 2.24 2.72 1.93 2.72 ...
  ..- attr(*, "label")= chr "Aggression total score"
  ..- attr(*, "format.spss")= chr "F12.10"
  ..- attr(*, "display_width")= int 14
 $ AISS   : num [1:275] 2.65 2.85 3.05 2.65 2.95 1.95 2.55 2.3 2 2.15 ...
  ..- attr(*, "label")= chr "sensation seeking total score"
  ..- attr(*, "format.spss")= chr "F4.2"
 $ alcohol: num [1:275] 28 NA 80 28 10 12 21 3 21 0 ...
  ..- attr(*, "label")= chr "alcohol consumption (in drinks)"
  ..- attr(*, "format.spss")= chr "F2.0"
 $ BIS    : num [1:275] 2.15 3.08 3 1.85 2.08 ...
  ..- attr(*, "label")= chr "Impulsivity total score"
  ..- attr(*, "format.spss")= chr "F12.10"
  ..- attr(*, "display_width")= int 14
 $ NEOc   : num [1:275] 2.83 2.5 2.75 3.42 3.58 ...
  ..- attr(*, "label")= chr "Conscientiousness total score"
  ..- attr(*, "format.spss")= chr "F12.10"
  ..- attr(*, "display_width")= int 14
 $ gender : dbl+lbl [1:275] 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, ...
   ..@ label      : chr "biological sex of participant"
   ..@ format.spss: chr "F1.0"
   ..@ labels     : Named num [1:2] 0 1
   .. ..- attr(*, "names")= chr [1:2] "male" "female"
 $ NEOo   : num [1:275] 2.92 4.17 3.92 4.17 3.5 ...
  ..- attr(*, "label")= chr "openness to experience total score"
  ..- attr(*, "format.spss")= chr "F12.10"
  ..- attr(*, "display_width")= int 14
head(aggression_data) # inspect the top 6 (head) of the data
# A tibble: 6 × 8
    age  BPAQ  AISS alcohol   BIS  NEOc     gender  NEOo
  <dbl> <dbl> <dbl>   <dbl> <dbl> <dbl>  <dbl+lbl> <dbl>
1    18  2.62  2.65      28  2.15  2.83 1 [female]  2.92
2    18  2.24  2.85      NA  3.08  2.5  1 [female]  4.17
3    20  2.72  3.05      80  3     2.75 1 [female]  3.92
4    17  1.93  2.65      28  1.85  3.42 1 [female]  4.17
5    17  2.72  2.95      10  2.08  3.58 0 [male]    3.5 
6    17  2.45  1.95      12  2.62  3.83 1 [female]  3.25
names(aggression_data) # variable/column names
[1] "age"     "BPAQ"    "AISS"    "alcohol" "BIS"     "NEOc"    "gender" 
[8] "NEOo"   

10.0.5 finding the value located in the 2nd row, 3rd column

aggression_data[2,3]
# A tibble: 1 × 1
   AISS
  <dbl>
1  2.85

10.0.6 looking only at the age variable/column

aggression_data$age
 [1] 18 18 20 17 17 17 17 17 17 17 17 17 17 18 18 18 18 18 18 18 18 18 18 18 18
[26] 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
[51] 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18 18
 [ reached getOption("max.print") -- omitted 200 entries ]
attr(,"format.spss")
[1] "F2.0"

10.0.7 finding the age of the 3rd participant

aggression_data$age[3]
[1] 20

10.0.8 finding the mean age of all participants

mean(aggression_data$age)
[1] 20.21091

10.0.9 finding the standard deviation of age

sd(aggression_data$age)
[1] 4.960342

10.0.10 finding the median of age

median(aggression_data$age)
[1] 18

10.0.11 find how many values in age

length(aggression_data$age)
[1] 275
nrow(aggression_data)
[1] 275

10.0.12 plotting age vs. alcohol

plot(aggression_data$age, aggression_data$alcohol)

10.1 CONGRATULATIONS! You are now officially an R user! Next week, we will learn more about data exploration and manipulation!

