Home » Posts » Fangraphs Seasonal Batting Data, 2000 – 2019

Fangraphs Seasonal Batting Data, 2000 – 2019

This data set contains seasonal batting data from the MLB from 2000 to 2019.

Download a Zip Archive of the Data and Script (includes CSV and RDS)

Click here for an explanation of the variables

Wrangling Operation

The operation requires the following packages, particularly Bill Pettiā€™s excellent baseballr package for wrangling MLB data:

# Download the Baseball R Package:
# devtools::install_github(repo = "BillPetti/baseballr")

library(baseballr)
library(rvest)
library(plyr)

First, I create a table of player identifiers from the Chadwick Baseball Bureau using get_chadwick_lu() in baseballr. These identifiers will help users merge this data table to other baseball data.

# Player Identifiers
dat.playerid <- get_chadwick_lu()
dat.playerid <- dat.playerid[c(1:7,13:15,19,25:28)]
saveRDS(dat.playerid, "Player Identifiers.RDS")
identifiers <- readRDS("Player Identifiers.RDS")

Next, I download seasonal MLB batting leader tables from Fangraphs through fg_bat_leaders()

# Scraping Batting Data
for (i in 2000:2019){
  temp <- fg_bat_leaders(i, i, league = "all", qual = "n", ind = 1)
  assign(paste0("fg_bat_", i), temp)
}

As these are all identical versions of the same data table, just representing different years, I can stack them together using rbind():

dat.bat <- fg_bat_2000
for (i in 2001:2019){
  temp <- get(paste0("fg_bat_", i))
  temp <- rbind(dat.bat, temp)
  assign("dat.bat", temp)
}

Rename the identifier in the Fangraphs table so that it is the same as the Chadwick Bureau identifer data. I then merge the two sets so that the batting data can more readily be merged with other sources.

names(dat.bat)[1] <- paste("key_fangraphs")

temp <- merge(dat.bat, identifiers, by = "key_fangraphs")

# Clean Up Data Types and Sort
temp$key_fangraphs <- as.numeric(temp$key_fangraphs)
temp$Season <- as.numeric(temp$Season)
temp <- arrange(temp, Name, Season)

# Write data
write.csv(temp, "Fangraphs Batting Leaders 2000 - 2019.csv")
saveRDS(temp, "Fangraphs Batting Leaders 2000 - 2019.RDS")

# Clean up memory 
rm(temp)
rm(list=ls(pattern = "fg_bat"))

Using the Data

To call the data in your analysis

# To load the CSV
dat.bat <- read.csv(paste0(data.directory("Fangraphs Batting Leaders 2000 - 2019.csv"))

# To load the RDS
dat.bat <- readRDS(paste0(data.directory("Fangraphs Batting Leaders 2000 - 2019.RDS"))

Where data.directory is the path to the folder containing the data files.


Leave a comment

Your email address will not be published.

css.php
Need help with the Commons? Visit our
help page
Send us a message