Skip to the content.

League of Legends Role Analysis

by Trey Wayment (twayment@umich.edu)

Introduction

This project uses post-game data from professional League of Legends (LoL) matches played in 2022 to explore how player roles affect performance. The dataset comes from Oracle’s Elixir. Each match has 12 relevant rows: one for each of the 5 players on both teams and 2 containing summary data for the two teams. The dataset contains 150,588 rows and 161 columns.

My project is centered around the question: Which role “carries” (does the best) in their team more often: ADCs (Bot laners) or Mid laners?

This question is interesting because it’s something that League players constantly debate — do ADCs or Mid laners carry more often? By digging into real pro-level data, we can actually get some objective insight into that debate.

This dataset is valuable because it captures how different roles behave statistically in high-level play. That kind of information is helpful for coaches, analysts, and players trying to understand what performance looks like in each position.

The following columns from the dataset are most relevant to my analysis:

Data Cleaning and Exploratory Data Analysis

Data Cleaning

To get the dataset ready for analysis, I took a few steps to clean things up and focus on the parts that actually matter for my analysis:

To get the dataset ready for analysis, I took a few steps to clean things up and focus on the parts that actually matter for my analysis:

  1. Kept only complete rows
    The dataset has a datacompleteness column that marks whether each row is fully recorded. I filtered the data to include only rows marked as 'complete', which helped get rid of any matches that were missing information.

  2. Cleaned up binary columns
    A lot of the columns that start with 'first' — like firstblood and firstdragon — are essentially yes-or-no values, but they weren’t stored as bool types in the dataset. I converted them (along with the result column) to True/False so they’d be easier to work with if needed. I didn’t end up using these columns in the final analysis, but it was still useful to clean them for completeness.

  3. Kept only the relevant columns
    To simplify things further, I selected just the columns used in my analysis — the same ones I highlighted as most relevant in the introduction.

These are the first 5 rows of the cleaned dataset:

position kills deaths assists xpat10 totalgold visionscore damageshare cspm
top 2 3 2 4909 10934 26 0.278784 8.0911
jng 2 5 6 3484 9138 48 0.208009 5.1839
mid 2 2 3 4556 9715 29 0.252086 6.7601
bot 2 4 2 3103 10605 25 0.196358 7.9159
sup 1 5 6 2161 6678 69 0.0647631 1.4711

Univariate Analysis

This histogram displays the distribution of Damage Share, or the percentage of a team’s damage dealt by each player.
The shape of the distribution suggests that damage output isn’t evenly distributed among all players, which supports the idea that certain roles — like Mid or ADC — are more likely to carry.

This histogram shows that most players fall between 5–10 Creep Score Per Minute (CSPM), with a right-skewed distribution. Since high farm rates are often linked to carry potential, CSPM is useful for comparing Mid laners and ADCs.

Bivariate Analysis

This box plot shows that ADCs tend to have slightly higher kill counts than Mid laners. This supports the idea that ADCs frequently take on the primary carry role through kill participation.

This box plot shows that ADCs also tend to have higher and more consistent CSPM than Mid laners. This suggests that ADCs contribute through superior farming efficiency.

Interesting Aggregates

The table below shows the average damage share, kills, and Creep Score Per Minute (CSPM) for each player role.

position damageshare kills cspm
bot 0.26 4.26 8.73
jng 0.16 3.09 5.68
mid 0.26 3.55 8.27
sup 0.08 0.9 1.13
top 0.23 2.8 7.81

Bot laners (ADCs) and Mid laners have the highest damage share and CSPM, which fits their role as primary damage dealers. Bot laners also have the highest average number of kills, suggesting that ADCs may carry their team more often than Mid laners.

Imputation

I didn’t perform any imputation because all of the columns used in my analysis were complete after filtering the dataset to only include rows where datacompleteness == complete. Since none of the key columns had missing values, there was no need to fill in or estimate any data. This is ideal, as it means my analysis is based entirely on actual observed values rather than approximations.

Framing a Prediction Problem

The goal of my prediction task is to determine which role (top, jungle, mid, bot/ADC, or support) a player played based on their post-game data. This is a multiclass classification task since there are five possible roles.

The response variable is position, and I chose it because roles are central to understanding player behavior in League of Legends. Predicting a player’s role based on their performance can reveal how distinct the roles are and whether they exhibit unique patterns.

I used the post-game features: kills, deaths, assists, totalgold, visionscore, xpat10, and cspm, These are all values that are available after a match finishes, so everything used is known at the time of prediction.

I evaluated the model using accuracy, since all five classes are relatively balanced in size. I didn’t use F1-score because the classification problem isn’t highly imbalanced — so tracking precision and recall separately wasn’t necessary.

Baseline Model

For my baseline model, I used three quantitative parameters: kills, xpat10, and cspm. These features were chosen because they reflect key aspects of a player’s performance, which differ across roles. In particular, xpat10 is a strong indicator for identifying bot lane players, since ADCs and supports share a lane and typically have lower experience compared to solo laners. All features are quantitative and were passed directly into a RandomForestClassifier inside a simple pipeline, without any scaling or feature engineering.

The model achieved an accuracy of 80%, which I think is pretty good for a simple model. It shows that even basic stats like kills, early-game XP, and CS per minute can go a long way in predicting player roles.

Final Model

For my final model, I used five quantitative features: kills, xpat10, cspm, and two that I engineered — kda and vision_efficiency. I created these using a FunctionTransformer directly inside the pipeline. KDA (kills + assists divided by deaths) reflects a player’s impact while staying alive. Roles like Mid and ADC often have high KDA due to their carry potential, while Supports and Top laners often show lower values, making it a useful signal for classification. Vision efficiency (vision score divided by total gold) captures how effectively a player contributes vision relative to their resources. This is particularly helpful for identifying Supports, who tend to have lower gold income but are expected to have high vision scores. Including these features adds more role-specific context to the model, making it better at distinguishing between different styles of play tied to each position.

Since I used a RandomForestClassifier, I didn’t apply StandardScaler. Random Forests aren’t sensitive to the scale of features, so scaling would’ve added unnecessary complexity without any benefit.

For hyperparameter tuning, I used GridSearchCV with 5-fold cross-validation to test combinations of n_estimators and max_depth. The best combination turned out to be n_estimators=100 and max_depth=10, with a best cross-validation accuracy of 82.8%.

The final model achieved an accuracy of 82.5%, improving on the baseline model’s 80%. This boost likely comes from the engineered features, which added role-specific insights beyond just raw performance stats. This tells us that the model correctly predicts a player’s role based off their post game data 82.5% of the time. I’m pretty happy with how it turned out — it feels like a solid model that does a good job without being overly complicated.