Tutorial 1: Introduction to R

10.4 Basic Coding

There are a number of ways in which we can work with data in R. Here, we will primarily rely on data.table, which is one of the ways of coding in R, the other prominent dialect being tidyverse. But to develop an intuition, let’s start off with base R, specifically–matrices.

Consider a sequence of observations:

a <- c(1,0,4,3,2,6)
a
## [1] 1 0 4 3 2 6

This sequence, unlike a vector, has no dimensions. But we can transform it to a \(n \times 1\) vector using the as.matrix() function:

b <- as.matrix(a)
b
##      [,1]
## [1,]    1
## [2,]    0
## [3,]    4
## [4,]    3
## [5,]    2
## [6,]    6

The result is a \(6 \times 1\) vector, or a column matrix. To obtain a \(1 \times 6\) vector, or a row matrix, we transpose the aforementioned vector using the t() function:

bt <- t(b)
bt
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    1    0    4    3    2    6
dim(bt)
## [1] 1 6

We can create any \(n \times k\) matrix, using the matrix() function. For example, consider a \(3 \times 2\) matrix:

B <- matrix(a,nrow=3,ncol=2)
B
##      [,1] [,2]
## [1,]    1    3
## [2,]    0    2
## [3,]    4    6

Note that we included the earlier generated sequence of six values, which we assigned to a, as the elements of this matrix.

We can add column names and row names to this matrix:

colnames(B) <- c("c1","c2")
rownames(B) <- c("r1","r2","r3")
B
##    c1 c2
## r1  1  3
## r2  0  2
## r3  4  6

If, at this point, we would like to only work with, say, the first column of the matrix, we can call it using its column number or the column name as follows:

B[,"c1"]
## r1 r2 r3 
##  1  0  4

Similarly, if we want to refer to a matrix element, say \(b_{3,2}\), we can do this by entering the respective indices in the brackets:

B[3,2]
## [1] 6

Matrix multiplication is done using %*% command, granted that the two matrices are compatible. For example, we obtain a product of matrix \(B\) and a new \(2 \times 1\) vector, \(d\), as follows:

d <- as.matrix(c(5,-2))
Bd <- B%*%d
Bd
##    [,1]
## r1   -1
## r2   -4
## r3    8

We can add columns (and rows) to the existing matrix using a cbind() function:

c3 <- c(0,1,0)
D <- cbind(B,c3)
D
##    c1 c2 c3
## r1  1  3  0
## r2  0  2  1
## r3  4  6  0

We can invert a(n invertible) matrix using the solve() function:

Di <- solve(D)
Di
##            r1 r2         r3
## c1 -1.0000000  0  0.5000000
## c2  0.6666667  0 -0.1666667
## c3 -1.3333333  1  0.3333333

10.5 Advanced Coding

By now, we have covered enough ground to obtain the least squares estimator. For that, we will generate vectors of dependent and independent variables, and then estimate the vector of parameters. Specifically, we will generate a sequence of 200 binary variables, which will serve as our independent variable x, and then we will construct the dependent variable y using the following formula: \(y=2+0.5x+e\), where e is a sequence of 200 standard normal random variables.

set.seed(1)
x <- sample(c(0,1),200,replace=T)
set.seed(2)
e <- rnorm(200)
y <- 2+0.5*x+e

X <- cbind(1,x)
b <- solve(t(X)%*%X)%*%t(X)%*%y
b
##        [,1]
##   1.8699318
## x 0.7639315

The foregoing “do it by hand” exercise can be easily replicated using the lm() function:

ols <- lm(y~x)
ols
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##      1.8699       0.7639

We can apply the summary() function to see the complete set of regression results:

summary(ols)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.44493 -0.77746 -0.06555  0.80647  2.22089 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.8699     0.1058  17.674  < 2e-16 ***
## x             0.7639     0.1511   5.054  9.8e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.069 on 198 degrees of freedom
## Multiple R-squared:  0.1143, Adjusted R-squared:  0.1098 
## F-statistic: 25.55 on 1 and 198 DF,  p-value: 9.8e-07

Page built: 2022-12-24 using R version 4.2.2 (2022-10-31 ucrt)