P2PY: A New Football Analytical Tool


The development of a simple analytical tool that allows prediction of game scores and season long results is desirable for both fans and handicappers.


At the 2013 Sloan Conference, Tim Chou gave a presentation on the use of Yards Per Point (YPP) as a “new” efficiency metric to rate football teams. In essence, this stat measures how many yards it takes for an offense to gain a point, a value they want to minimize, and how many yards a defense gives up before allowing a point (a value they want to maximize). Large positive values of YPP differential indicate good performance, and negative numbers mean a team has performed poorly. The difference between a team’s defensive YPP and offensive YPP correlate to the overall strength of that team. In his presentation, Chou showed a correlation of YPP differential to win-% of 77% based on 2012 FBS values.

As it turns out though, YPP was not a new stat and has been used by handicappers for years to assess efficienc. In 2004, Aaron Schatz wrote about the differences between YPP and his VOA statistics. Many other articles on the topic exist.

When considering overall team performance, we must not only consider efficiency but also, overall productivity, and YPP is missing this. By YPP standards, a team with 100 yards gained and 10 points scored in a game is the same as one with 500 yards and 50 points. Obviously, these two teams would not be the same – overall output along with efficiency matters.

The simplest productivity measurements for football are points scored per game (PPG) for the offense and points allowed per game (PPGA). Whether a team needs to drive 90 yards or 50 yards to score, if the end of the possession results in a touchdown, that team has likely scored seven points. If the team manages to score with a high productivity rate (even with a potentially low efficiency), they can still win games.

Below is my attempt to develop a simple yet effective predictive metric for determining game results.


Through combination of the concepts of efficiency (YPP) and productivity (Points Per Game), we can establish Points Squared Per Yard (P2PY) from the simple equations below.

Offensive P2PY (OP2PY) = PPG * (PPG / YPG)

Defensive P2PY (DP2PY) = PPGA * (PPGA / YPGA)

P2PY Differential = OP2PY – DP2PY

As you can see, the difference in P2PY versus YPP is two-fold. First, instead of yards per play, I am using points per yard. Offenses inherently want to maximize the number of points they score. They don’t try to minimize the yards it takes to get points. Similarly, defenses try to minimize the points they give up and not maximize the yards they allow without giving up points. I feel that points per yard is a more intuitive ratio for what teams actually strive for. Second, rather than only using the efficiency metric, now we’re multiplying efficiency by total output or productivity – points per game.

Similarly to Chou’s use of YPP, here we will also have offensive (OP2PY) and defensive ratings (DP2PY) along with the overall (P2PY Differential). When it comes to trying to build predictive power into this measurement, all three of these calculations will be used.



I wanted to compare the new P2PY calculation to YPP. To do so, I analyzed YPP differential on a wider data set than Chou originally used: all of the FBS offensive and defensive stats from 2006 – 2015, which were taken from Yahoo!. As a note, teams that entered FBS during this time are included only for their full FBS years but not prior. Also, it is noted that these data are full season averages, without including a strength of schedule adjustment, and don’t necessarily reflect how a team may have improved or depreciated through the course of a season.

Taking the 10-year performance of YPP as a predictor of win percentage is slightly worse than what Chou found for the 2012 season. Each point in the graph below represents one team’s performance for one year in terms of win percentage and YPP. For this analysis, we find an R-squared value of 0.70 worse than the 0.77 he cited in the 2013 presentation.


Comparing YPP differential to absolute point differential (PPG – PPGA) gives a better fit of 0.80, but this still doesn’t necessarily indicate a high correlation.


As a point of reference, absolute point differential corresponds better to win percentage than YPP differential and yields an R-squared value of 0.84. This indicates that just using a measure of productivity is considerably better than using only efficiency.


Using P2PY as our basis for comparing to win percentage, we get an R2 of 0.82 which is very close to absolute point differential and still considerably better than YPP. The reason for this may be that, again, only using efficiency is not an effective metric for performance but we should also consider productivity.


When we plot P2PY versus absolute point differential, we can a very high coefficient of determination of 0.98 indicating a very strong correlation. You might say that this is simply due to putting more weight on points for the sake of weighting on yards. This is true but as we continue to increase the power of points beyond 2, the correlation begins to erode. It would appear that points squared per yard is nearly the ideal weighting system for the combination of productivity and efficiency when correlating to win-% and absolute point differential.


It’s noted again here that both YPP and P2PY are very simple calculations. And there are many other factors that contribute to point (special teams performance, turnovers, strength of schedule, etc.), and not explicitly factoring these terms may limit the overall predictability.


I believe that throughout 2016, we will be able to use a form of P2PY to effectively rank teams, predict scores, and estimate final conference standings. As a weekly column during the season, I will evaluate games as win/loss and against the spread predictions using this concept. As a part of this exercise and development, I expect to build on P2PY through the following ways.

  • First, we will collect some sample size (3 games at least) before trying to predict the performance through the rest of the year. Preseason polls are fun and interesting but are primarily based on opinion and gut feel.
  • Second, we will discount performances that are far in the past for the sake of the recent games. By the time we get to week 9 or 10, a team’s week 1 or 2 games will be worth less than week 8, for example.
  • I will include Strength of Schedule (SOS) to adjust a team’s rating. The source of SOS will be determined later.

In the mean time, I will perform more analysis on the P2PY metric across the data of the last decade to determine what else can be learned from this simple but seemingly effective calculation. Things like: how have offenses or defenses developed through the decade, conferences bias, and certainly an all-decade top ten will be evaluated.

Main Photo: