Picking a fantasy football team

What’s the optimal run in a season?
Published

September 21, 2023

Modified

November 4, 2024

This post is a work in progress.

Cartola is a fantasy football league following the Brazilian Série A, where players assume the role of team managers. For the past couple of seasons, I’ve been collecting historical data to attempt to answer the question: what’s the optimal run in a given season?

The problem

Before each round t = 1, \dots, 38, managers are presented with N_t candidate players. Candidates have costs \mathbf{c}_{t} \in \mathbb{R}_+^{N_t} and positions \mathbf{p}_{t} \in \{1, \dots, 6\}^{N_t}. For convenience, positions can be encoded as dummies P_t \in \{0, 1\}^{N_t \times 6}. There are i = 1, \dots, 7 valid formations F \in \mathbb{N}^{7 \times 6}, where F_{ij} indicates exactly how many players of position j are allowed in formation i. All formations include 11 players and 1 coach, or \sum_{j=1}^6 F_{ij} = 12 for all i. The manager begins each round with a budget b_t \in \mathbb{R}_+ and they must pick a team \mathbf{x}_{t} \in \{0, 1\}^{N_t} following a formation \mathbf{y}_{t} \in \{0, 1\}^{7}. At the end of the round, players receive scores \mathbf{s}_t \in \mathbb{R}^{N_t} according to their in-game performance. The manager’s goal is to maximize the team score \mathbf{s}_t^T \mathbf{x}_t.

Since the manager doesn’t know the scores when picking their team, they must estimate score predictions \hat{\mathbf{s}}_t \in \mathbb{R}^{N_t}. However, predictions aren’t always accurate. Also, scores of players from the same team are correlated. To minimize the risk of picking many players from a single team and having that team perform badly, the manager might want to include the covariance between players S_t \in \mathbb{R}_+^{N_t, N_t} in the problem. One way to do this is to set a risk aversion \gamma \in \mathbb{R}_+ and maximize

\hat{\mathbf{s}}_t^T \mathbf{x}_t - \gamma \mathbf{x}_t^T \Sigma_t \mathbf{x}_t.

Finally, the team is subject to the constraints:

  1. Cost less or equal to the budget \mapsto \mathbf{c}_t^T \mathbf{x}_t \leq b_t
  2. Follow a single formation \mapsto \mathbf{1}^T \mathbf{y}_t = 1
  3. Follow a valid formation \mapsto P_t^T \mathbf{x}_t = F^T \mathbf{y}_t.

This problem is similar to the problem of Modern Portfolio Theory (Markowitz 1952).

Markowitz, Harry. 1952. “Portfolio Selection.” The Journal of Finance 7 (1): 77–91. http://www.jstor.org/stable/2975974.
import cvxpy as cp


def problem(predictions, covariance, costs, positions, budget, risk_aversion):
    picks = cp.Variable(scores.size, "picks", boolean=True)
    formation = cp.Variable(7, "formation", boolean=True)
    objective = cp.Maximize(
        predictions.T @ picks - risk_aversion * cp.quad_form(picks, covariance)
    )
    constraints = [
        prices.T @ picks <= budget,
        cp.sum(formation) == 1,
        positions.T @ picks == formations.T @ formation,
    ]
    problem = cp.Problem(objective, constraints)
    return problem

Backtesting

So far, I’ve simplified the manager’s goal to maximize \mathbf{s}_t^T \mathbf{x}_t for each round. The manager’s true final goal is to maximize their total score at the end of the season \sum_t \mathbf{s}_t^T \mathbf{x}_t. These two objectives aren’t necessarily the same, because players increase or decrease in valuation according to scores. Since \mathbf{s}_t^T \mathbf{x}_t depends on the budget b_t, which depends on the scores \mathbf{s}_{t - 1}^T \mathbf{x}_{t - 1}, one could argue that it might be a good idea to maximize a balance between scoring and valuation. In the next section, I’ll show that maximizing the score for each round is sufficient for maximizing the total score, given good enough predictions.

For now, I’ll define a function to simulate the manager’s performance across an entire season. At the start of the season, the manager has a budget of b_1 = 100. Then, for each round t:

  1. Solve the team picking problem \mapsto \mathbf{x}_t
  2. Calculate the round score \mapsto r_t = \mathbf{s}_t^T \mathbf{x}_t
  3. If t < 38, update the budget \mapsto b_{t + 1} = b_t + (\mathbf{c}_{t + 1} - \mathbf{c}_t)^T \mathbf{x}_t
import numpy as np


def backtest(
    initial_budget,
    scores,
    predictions,
    covariance,
    costs,
    appreciations,
    positions,
    risk_aversion,
):
    budget = initial_budget
    rounds = len(predictions)
    run = np.empty(rounds)
    for t in range(rounds):
        prob = problem(
            predictions[t], covariance[t], costs[t], positions[t], budget, risk_aversion
        )
        prob.solve()
        picks = problem.var_dict["picks"].value
        run[t] = scores[t].T @ picks[t]
        if t < 38:
            budget += appreciations[t].T @ picks
    return scores

Scenarios

  1. Perfect predictions \mapsto \hat{\mathbf{s}}_t = \mathbf{s}_t, \gamma = 0
  2. Perfect predictions and infinite budget \mapsto \hat{\mathbf{s}}_t = \mathbf{s}_t, \gamma = 0, \mathbf{b}_1 \gg 100
  3. Simple predictions and varying levels of risk aversion \mapsto \hat{\mathbf{s}}_t = \bar{\mathbf{s}}_{1:(t - 1)}1, \gamma \in \{0, 0.5, 1\}
  4. Random predictions \mapsto \hat{\mathbf{s}}_t \sim N(\mathbf{0}, I_{N_t}), \gamma = 0

1 Explain that this is player-level…

2 Unfortunately, data for the 38th round is missing…

I’l plot… 2

Other ideas

Consider valuation, improve predictions, team leader…

Readings:

  • https://peterellisjones.com/posts/fantasy-machine-learning/
  • https://www.alexmolas.com/2024/07/15/fantasy-knapsack.html

Citation

BibTeX citation:
@online{assunção2023,
  author = {Assunção, Luís},
  title = {Picking a Fantasy Football Team},
  date = {2023-09-21},
  url = {https://assuncaolfi.github.io/site/blog/fantasy-football/},
  langid = {en}
}
For attribution, please cite this work as:
Assunção, Luís. 2023. “Picking a Fantasy Football Team.” September 21, 2023. https://assuncaolfi.github.io/site/blog/fantasy-football/.