Collin Knezevich 2022-07-17

Of the machine learning methods we learned about, I found Random Forests to be the most interesting. The idea of a Random Forest model is to fit many different models on many different re-samples of data, then average all predictions from these models. In addition, each model will be fit using a random selection of predictor variables. This is done to reduce variance in the model and correlations between predictions. All of this will result in more accurate predictions.

Fitting a Random Forest Model

We will use the randomForest package to fit a model, and we will use the mtcars dataset.

library(randomForest)

## randomForest 4.7-1.1

## Type rfNews() to see new features/changes/bug fixes.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::combine()  masks randomForest::combine()
## ✖ dplyr::filter()   masks stats::filter()
## ✖ dplyr::lag()      masks stats::lag()
## ✖ ggplot2::margin() masks randomForest::margin()

mtcars <- mtcars

Now we can build our random forest model, with MPG as our response variable.

rfModel <- randomForest(mpg ~ ., data = mtcars, 
                        mtry = ncol(mtcars)/3, 
                        ntree = 200, 
                        importance = TRUE)

Finally, we can evaluate the accuracy of the model by creating predictions from our model, and using them to calculate the root MSE.

preds <- predict(rfModel, newdata = select(mtcars, -mpg))
rmse <- sqrt(mean(preds - mtcars$mpg)^2)
rmse

## [1] 0.004355878

Project 2 Reflections

Blog 5