Collin Knezevich 6/19/2022
In this vignette, I will walk you through how to read data from this Pokemon data API. I will be creating functions to easily query data from the API, in such a way that it will be easy to call the function and query specific data.
Additionally, here is the link to my Github repository.
In order to query data from the Pokemon API, we will need the following packages:
tidyverse
httr
jsonlite
The Pokemon API stores data about a specific Pokemon in two different places (both containing different info): a general “Pokemon” location, and a “Pokemon Species” location. This first function will query data from the former.
The returnPkmn
function takes either the ID number of a Pokemon, or
the name of a Pokemon (as a string), as its input. It will return a data
frame with the following relevant information from this location in the
API:
returnPkmn <- function(idName){
# set up URL for paste functions
startURL <- "https://pokeapi.co/api/v2/pokemon/"
endURL <- "/"
# if id specified
if (typeof(idName) == "double"){
apiDat <- GET(paste0(startURL, as.character(idName), endURL))
rawDat <- rawToChar(apiDat$content)
fullDat <- fromJSON(rawDat)
}
# if name specified
else if (typeof(idName) == "character"){
name <- tolower(idName)
apiDat <- GET(paste0(startURL, name, endURL))
rawDat <- rawToChar(apiDat$content)
fullDat <- fromJSON(rawDat)
}
# error control - if input improperly specified
else {
stop("Input incorrectly specified.")
}
# constructing data frame with relevant information
df <- data.frame(Name = fullDat$name,
ID = fullDat$id,
Type1 = fullDat$types$type$name[1],
Type2 = fullDat$types$type$name[2],
HP = fullDat$stats$base_stat[1],
Attack = fullDat$stats$base_stat[2],
Defense = fullDat$stats$base_stat[3],
SpAttack = fullDat$stats$base_stat[4],
SpDefense = fullDat$stats$base_stat[5],
Speed = fullDat$stats$base_stat[6])
return(df)
}
The returnPkmnSpecies
function takes in either the ID number of a
Pokemon, or the name of a Pokemon, as its input. It will return a data
frame with the following particularly relevant information from the
API:
# Pokemon species function
returnPkmnSpecies <- function(idName){
# set up URL for paste functions
startURL <- "https://pokeapi.co/api/v2/pokemon-species/"
endURL <- "/"
# if id specified
if (typeof(idName) == "double"){
apiDat <- GET(paste0(startURL, as.character(idName), endURL))
rawDat <- rawToChar(apiDat$content)
fullDat <- fromJSON(rawDat)
}
# if name specified
else if (typeof(idName) == "character"){
name <- tolower(idName)
apiDat <- GET(paste0(startURL, name, endURL))
rawDat <- rawToChar(apiDat$content)
fullDat <- fromJSON(rawDat)
}
# error control - if input improperly specified
else {
stop("Input incorrectly speficied.")
}
# constructing data frame with relevant information
df <- data.frame(Name = fullDat$name,
ID = fullDat$id,
Gen = fullDat$generation$name,
PreEvo = ifelse(is.null(fullDat$evolves_from_species$name),
NA, fullDat$evolves_from_species$name),
EggGrp1 = fullDat$egg_groups$name[1],
EggGrp2 = fullDat$egg_groups$name[2],
Legendary = fullDat$is_legendary,
Mythical = fullDat$is_mythical)
return(df)
}
The returnMove
function will take in a move (as a string) as its
input, and will return the following information about the move:
# Move function
returnMove <- function(move){
# convert input to lowercase, and replace spaces with dashes
move <- tolower(move)
move <- gsub(" ", "-", move)
# set up URL for paste functions
startURL <- "https://pokeapi.co/api/v2/move/"
endURL <- "/"
apiDat <- GET(paste0(startURL, move, endURL))
rawDat <- rawToChar(apiDat$content)
fullDat <- fromJSON(rawDat)
df <- data.frame(Move = fullDat$name,
Power = ifelse(is.null(fullDat$power), 0, fullDat$power),
Accuracy = fullDat$accuracy,
PP = fullDat$pp,
Priority = fullDat$priority,
Type = fullDat$type$name,
DmgType = fullDat$damage_class$name)
return(df)
}
The returnLocation
function will take in the name of a location (as a
string) as its input, and wil return the following information about the
location:
returnLocation <- function(name){
# convert input to lowercase, and replace spaces with -'s
name <- tolower(name)
name <- gsub(" ", "-", name)
# set up URL for paste functions
startURL <- "https://pokeapi.co/api/v2/location/"
endURL <- "/"
apiDat <- GET(paste0(startURL, name, endURL))
rawDat <- rawToChar(apiDat$content)
fullDat <- fromJSON(rawDat)
df <- data.frame(Location = fullDat$name,
Region = fullDat$region$name,
Generation = fullDat$game_indices$generation$name)
return(df)
}
The returnItem
function will take in either the ID number for an item,
or the name of an item (as a string), as its input. It will return the
following information about the item:
returnItem <- function(idName){
# set up URL for paste functions
startURL <- "https://pokeapi.co/api/v2/item/"
endURL <- "/"
# if id specified
if (typeof(idName) == "double"){
apiDat <- GET(paste0(startURL, as.character(idName), endURL))
rawDat <- rawToChar(apiDat$content)
fullDat <- fromJSON(rawDat)
}
# if name specified
else if (typeof(idName) == "character"){
name <- tolower(idName)
name <- gsub(" ", "-", name)
apiDat <- GET(paste0(startURL, name, endURL))
rawDat <- rawToChar(apiDat$content)
fullDat <- fromJSON(rawDat)
}
# error control if input improperly specified
else {
stop("Input improperly specified.")
}
df <- data.frame(Item = fullDat$name,
Cost = fullDat$cost,
Type = fullDat$category$name,
Effect = fullDat$effect_entries$short_effect)
return(df)
}
Technical Machines (TMs) are items you can use to teach your Pokemon a
move - the returnTM
function will take the ID number for a TM as its
input, and will return the following information about the TM:
returnTM <- function(id){
# set up URL for paste functions
startURL <- "https://pokeapi.co/api/v2/machine/"
endURL <- "/"
apiDat <- GET(paste0(startURL, as.character(id), endURL))
rawDat <- rawToChar(apiDat$content)
fullDat <- fromJSON(rawDat)
df <- data.frame(ID = fullDat$id,
Name = fullDat$item$name,
Move = fullDat$move$name,
Game = fullDat$version_group$name)
return(df)
}
Throughout this Exploratory Data Analysis exercise, I will be comparing the attributes of Pokemon across different “eras” of Pokemon games. The eras are defined as follows:
First, we will obtain data on all Pokemon from the Pokemon and Pokemon Species endpoints, and combine the two datasets.
# query from Pokemon endpoint
pkmn <- lapply(X = as.list(as.double(seq(1, 898))), FUN = returnPkmn) %>%
bind_rows()
# query from Species endpoint
pkmnSpec <- lapply(X = as.list(as.double(seq(1, 898))), FUN = returnPkmnSpecies) %>%
bind_rows()
# removing Name column from second query - redundant information
pkmnSpec <- pkmnSpec %>% select(-Name)
# merging two queries on Pokemon name
allPkmn <- merge(pkmn, pkmnSpec, by = "ID")
Now, we will create two new variables based on the information in this data frame. First, we will create the “Era” variable, based on a Pokemon’s generation, as specified above. Second, we will create “BST” - Base Stat Total - the sum of a Pokemon’s base stats (HP, Attack, Defense, SpAttack, SpDefense, Speed). This will give an estimate of a Pokemon’s overall level of strength.
# creating era variable
allPkmn <- allPkmn %>%
mutate(Era = if_else(Gen == "generation-i", 1,
if_else(Gen == "generation-ii", 1,
if_else(Gen == "generation-iii", 1,
if_else(Gen == "generation-iv", 2,
if_else(Gen == "generation-v", 2, 3))))))
# creating BST variable
allPkmn <- allPkmn %>%
mutate(BST = (HP + Attack + Defense + SpAttack + SpDefense + Speed))
Now we are ready to begin our analysis. We will start by creating some contingency tables.
First, we will create a table to show how many Pokemon are in each Era that we have defined.
table(allPkmn$Era)
##
## 1 2 3
## 386 263 249
The table shows us that there are fewer Pokemon in each new Era. It is notable that Era 2 had more Pokemon introduced than Era 3, despite only spanning two generations rather than three.
Next, we will create a table to show the number of Legendary Pokemon in each Era.
table(allPkmn$Era, allPkmn$Legendary)
##
## FALSE TRUE
## 1 369 17
## 2 245 18
## 3 226 23
It appears that the proportion of Pokemon that are Legendary is increasing with each new Era. Each new Era has more Legendaries than the previous, despite having fewer new Pokemon (based on previous table).
We will continue our analysis by comparing some numeric variables across the different Eras. First, we will observe the mean, median, and standard deviation Base Stat Total across each Era.
allPkmn %>% group_by(Era) %>% summarise(avg = mean(BST), med = median(BST),
sd = sd(BST))
## # A tibble: 3 × 4
## Era avg med sd
## <dbl> <dbl> <dbl> <dbl>
## 1 1 406. 410 109.
## 2 2 434. 464 109.
## 3 3 438. 474 117.
Both the mean and median BSTs increase with each new Era. Notably, the median BSTs for Eras 2 and 3 are much higher than the median BST for the first Era. This could indicate that overall, most newer Pokemon are stronger than those from the first Era.
Next, we will compare specifically the Speed stat across the different Eras.
allPkmn %>% group_by(Era) %>% summarise(avg = mean(Speed), med = median(Speed),
sd = sd(Speed))
## # A tibble: 3 × 4
## Era avg med sd
## <dbl> <dbl> <dbl> <dbl>
## 1 1 64.5 64 27.2
## 2 2 67.8 65 27.9
## 3 3 66.3 63 30.8
This comparison reveals that the Speed of Pokemon across the different Eras is more or less the same.
Finally, we will compare the BSTs of Pokemon across different primary Types. We will sort the output by mean BST so that the table is easier to understand.
allPkmn %>% group_by(Type1) %>% summarise(avg = mean(BST), med = median(BST),
sd = sd(BST)) %>%
arrange(desc(avg))
## # A tibble: 18 × 4
## Type1 avg med sd
## <chr> <dbl> <dbl> <dbl>
## 1 dragon 491. 490 135.
## 2 steel 469. 485 110.
## 3 psychic 450. 470 130.
## 4 fire 444. 465 102.
## 5 rock 439. 460 96.2
## 6 dark 436. 450. 112.
## 7 electric 435. 440 106.
## 8 ghost 434. 474 101.
## 9 ice 432. 472. 114.
## 10 fighting 427. 455 104.
## 11 fairy 426. 450 130.
## 12 ground 424. 430 103.
## 13 flying 420 475 136.
## 14 water 417. 440 104.
## 15 poison 412. 448 103.
## 16 grass 408. 416. 103.
## 17 normal 397. 415 110.
## 18 bug 373. 390 116.
The output shows that the Dragon type has the highest average BST, while the Bug type has the lowest average BST. This tells us that Dragon types are, overall, much stronger than Bug types (and are a good bit stronger than most other types, as well).
Finally, we will create graphs to visualize our analysis. First, let us visualize the table we created earlier comparing the number of Legendary Pokemon in each Era. We will create a filled bar plot for this.
ggplot(data = allPkmn, aes(x = Era)) +
geom_bar(aes(fill = Legendary), position = "fill") +
labs(title = "Proportion of Legendary Pokemon for Each Era", y = "Proportion")
Despite the overall low proportion of Legendary Pokemon across all Eras,
it is clear that the proportion of Legendary Pokemon is increasing with
each Era.
Next, we will create a histogram to show the distribution of BSTs across the different Eras.
ggplot(data = allPkmn, aes(x = BST, y = ..density..)) +
geom_histogram(color = "black", bins = 15) +
facet_grid(~ Era) +
labs(title = "Histograms of BSTs by Era", y = "Density", x = "Base Stat Total")
The distributions of BSTs for each Era appear to be bimodal. However, even though we have seen that the mean and median BSTs for each Era are different, the overall distributions shown in the histograms do not appear to be so different.
Now, we will create a kernel density plot to show this same distribution.
ggplot(data = allPkmn, aes(x = BST, color = as.factor(Era))) +
geom_density(alpha = 0.5) +
labs(title = "Kernel Density Plot of BSTs by Era", x = "Base Stat Totals",
y = "Density", color = "Era")
The kernel density plots of the BSTs show the same bimodal pattern as the histograms. Additionally, the relationship between Era and BSTs is easier to see in this graph. In the distribution of BSTs for Era 1, the second peak is shifted to the left of the other distributions, and is a lower peak. Also, in the distribution of BSTs for Era 3, the first peak is lower tha the other distributions.
Next, we will create a box plot to further visualize this distribution, and potentially locate any outliers.
ggplot(data = allPkmn, aes(x = as.factor(Era), y = BST)) +
geom_boxplot() +
geom_jitter(aes(color = as.factor(Era))) +
labs(title = "Boxplot of BSTs by Era", x = "Era", color = "Era",
y = "Base Stat Total")
The boxplots show that the median BSTs, and overall distributions, are shifted upwards for the second two Eras compared to the first. Additionally, there do not appear to be any outliers present in the data. Another tangential yet interesting observation is that there appears to be many Pokemon concentrated at a BST of exactly 600, as well as another concentration at approximately 575.
Finally, we will create a scatterplot to show the relationship between Speed and Defense. We will also color the observations by Era.
ggplot(data = allPkmn, aes(x = Speed, y = Defense)) +
geom_point(aes(color = as.factor(Era))) +
geom_smooth(method = lm) +
labs(title = "Scatterplot: Speed vs. Defense", color = "Era")
## `geom_smooth()` using formula 'y ~ x'
There is no relationship between Speed and Defense - as a Pokemon’s speed increases, its defense does not tend to increase nor decrease.
This was an fun project to work on and I enjoyed the process of it. One of the most difficult parts of the project was querying data from the API. I did not find that the documentation for this particular API was tremendously helpful in explaining how to make more specific queries. Still, the documentation was helpful in explaining the variables you can query from each endpoint, and how those sets of data are related.
Another point of difficulty was with Github pages and getting that set up properly. I think I just need more experience with this to become more comfortable with it.
In the future, I would probably look into more creative ways to combine data from the different endpoints. I believe it could lead to some interesting analysis.