Evaluating Relievers in the Modern Game
Evaluating relievers has become more challenging in the last 35 years or so as their in-game roles have changed. Even before this development, it was tough for the simple fact that there were severe limitations in the statistics available. The vast majority of these statistics were developed for starters, and even the ones that were created for relievers have their holes. With the large selection, it can be difficult to separate the useful statistics from the noise. Furthermore, just because a statistic has been around for a long time doesn’t necessarily mean it’s the best tool for evaluating relievers.
So what are we to make of everything in the data-driven era we now live in? We need to focus on what a pitcher’s job is, and, specifically, what a reliever’s job is. Finding the statistics that measure those aspects of a reliever’s game is most important. Even further, we need to see which statistics are designed more for starters than relievers and then find a way to make those statistics work, if possible.
What Makes an Effective Pitcher
A pitcher’s job is to keep the opponent from scoring enough runs to defeat his team. That is reached in three main ways: keeping runs off the board, getting opponents out, and pitching with command (the process). However, many of the stats that gain the most attention focus on the product (win, loss, save) rather than the process. Pitchers with good win/loss records might not have done so by being a dominant pitcher but rather by being the guy fortunate enough to be on the mound while his team outslugged their opponent.
With relievers, the process statistics are even more important than the product statistics. This is because the only positive “product” statistics tend to go to the closers and set-up men. However, the negative “product” statistics can go to anyone — long relievers, middle relievers, set-up men, and closers alike. Therefore, we must, when evaluating relievers, focus on the statistics that measure how well a reliever does the three main jobs of a pitcher. We must look solely at statistics that measure run prevention, avoiding baserunners, and pitching with command.
The main job of a pitcher is to prevent the other team from scoring runs. When showing how many runs a pitcher has allowed, there are two stats: runs and earned runs. Earned runs are runs that are scored without the aid of a fielding, throwing, or catching error by the defense. In other words, they’re the runs that are, for lack of a better term, a pitcher’s fault.
Raw run totals do not tell the full story. Pitchers who pitch more often will almost always give up more cumulative runs. To counter that fact, we need a “rate” stat for run prevention. Baseball has used Earned Run Average (ERA) for over a century to measure run prevention. Its premise is simple, measuring how many earned runs a pitcher would allow, on average, if he were to pitch a full nine-inning game.
However, numbers mean little without context. We need to know a “good” ERA value. Over the years, it has changed. Run-scoring environments change, being affected by obvious factors like ballpark size, elevation, number of teams in the league, and the designated hitter. Subtle factors like the baseball itself and how hot and/or dry a summer is also come into play. To account for these, we need more.
Comparing a pitcher’s ERA to the league average from that season gives a much better snapshot into how well he prevents runs. Taking the pitcher’s home ballpark into consideration makes the snapshot even better. Baseball Reference uses ERA-plus (ERA+) to do this, while Fangraphs uses ERA-minus (ERA–). ERA– is the better stat of the two. It takes a pitcher’s ERA, incorporates his home ballpark’s park factor, and divides it by the league average ERA. For readability, the answer is multiplied by 100.
Why ERA– Is Better than ERA+
ERA– is better than ERA+ for two reasons. First, the lower the number, the better — just like ERA. Secondly, ERA– compares the pitcher to the rest of the league. ERA+ compares the rest of the league to the pitcher. It might sound like it’s the same thing, but it’s not. To illustrate: 80 cents is 80% of a dollar. 80/100 = .80 = 80%. A dollar is 125% of 80 cents. 100/80 = 1.25 = 125%. If you ask lots of people which of these two representations makes more sense, most will say the former and not the latter.
Problem for Relievers with ERA and ERA–
These statistics are great for starting pitchers, largely because ERA was designed for starting pitchers. That’s because of how it’s calculated — earned runs, multiplied by nine, divided by innings pitched. In a start that goes well, starting pitchers typically pitch between six and nine innings. Relievers, however, rarely pitch more than one inning per outing. This makes one bad inning — or, sometimes, one mistake pitch — cause long-lasting damage to a reliever’s ERA.
Think of it this way. Say a starting pitcher pitches a scoreless first but gives up a three-run home run in the second inning. If he doesn’t give up any more runs, chances are good that he’ll last through the sixth inning. That one bad inning didn’t hurt the pitcher’s ERA all that much, since he had four scoreless innings to bring it back down some. Now say a reliever does the same thing. How long will it take for him to pitch enough innings to bring his ERA back down to Earth? A week? Two?
Take Arizona Diamondbacks reliever J.B. Wendelken for an example of how two pitches can, as The Athletic’s Zach Buchanan once put it in a press conference, “nuke” a reliever’s ERA. Wendelken joined the team August 15, pitching 18 2/3 innings in 20 appearances, mostly in late innings. In 18 of those appearances, he gave up four earned runs over 17 innings. That comes out to a 2.11 ERA, strong by any measure. In his other two appearances, he gave up a three-run homer in Denver and a two-run homer in Seattle, a game where he still earned the save. Those two mistake pitches ran his ERA as a Diamondback up to 4.34 — deceptively high, given his performance in other games.
Fractional innings compound this problem. Say a reliever comes in with one out and the bases empty. Let’s say he retires the first hitter, walks the second, and then gives up a two-run homer before getting the hook. Using the ERA formula — earned runs times nine divided by innings pitched — gives us 2 x 9, or 18, divided by one third. Remember fractional arithmetic — when dividing by a fraction, you invert the fraction and multiply. This means that 18 divided by one third is 18 times 3 — 54.00.
Since relievers typically pitch only one inning per outing, it would take at least six straight scoreless outings to pull his ERA below 3.00. This would take, at minimum, almost a week. In more likelihood, it would take closer to two.
Reliever Solution: Scoreless Outing Percentage
So why not look, instead, at how often a reliever has a scoreless outing? A reliever’s job is to get outs without allowing runs. There is little-to-no margin for error. Either he allowed runs or he didn’t.
Scoreless Outing Percentage shows how often a reliever keeps runs off the board. To calculate, take the number of scoreless outings a reliever has and divide it by his total relief appearances. Convert the decimal answer to a percent, and voilà. For reference, one can also add how many appearances a reliever had where he didn’t allow any earned runs.
Example: 46 scoreless outings out of 51 total relief appearances
46/51 = .902, or 90.2%
WHIP: Getting Outs
How do pitchers keep runs off the board? By getting guys out — in other words, avoiding baserunners. If a batter isn’t put out, he reaches base, almost exclusively via walk or hit. The stat that measures how many baserunners a pitcher allows per inning is Walks and Hits per Inning Pitched (WHIP). This stat has been used for decades, and the people who invented it have used three decimal digits. When the Elias Sports Bureau decided just recently to use it, they — in their “infinite wisdom” — decided to only use two decimal places, making it look too much like ERA. Yet we digress.
Obviously, the lower a pitcher’s WHIP, the better. The league average hovers in the 1.300s every season. A WHIP in the 1.200s will get a pitcher on an All-Star Team; a WHIP in the 1.100s, if done consistently, will get a pitcher into the Hall of Fame.
A common criticism of WHIP is that it only measures walks and hits but not what type of hit. However, that is not the goal of the stat. It’s simply to measure how many baserunners a pitcher allows per inning. To see what type of hit a pitcher allows, we need something else.
Two statistics work together to show us that. One is extra-base hit percentage (XBH%), which measures how often a pitcher gives up an extra-base hit. To calculate this stat, take the number of extra-base hits (doubles, triples, and home runs) a pitcher allows and divide it by the total batsmen faced (TBF or BF).
The other statistic is X/H%, which shows the percentage of a pitcher’s hits allowed that go for extra bases. To calculate this, use the same formula as XBH%, but instead of dividing extra-base hits allowed by total batsmen faced, divide them by the number of hits a pitcher allows.
Pitching with Command
The most effective pitchers force their opponents to hit their way on base. In other words, they pitch with control, striking out opponents more often than they walk them. One of the best pitchers at this was Curt Schilling, who struck out 3,116 while walking only 711 in his career. If he were to ever make the Hall of Fame, he’d have the record for best ratio of strikeouts to walks among Hall of Fame pitchers.
Of currently active players, Max Scherzer excels at striking out far more batters than he walks — 3,020 to 677. Justin Verlander also does well in that regard, with 3,013 to 851, although his ratio is not quite as good as Scherzer’s.
Problem with K/9 and BB/9
To measure command, statisticians came up with strikeouts per nine innings (K/9) and walks per nine innings (BB/9). This takes a pitcher’s strikeouts, multiplies them by nine, and divides the answer by the number of innings he pitched. For BB/9, it does the same, except it uses walks instead of strikeouts.
There’s only one problem with using these stats: not all innings are the same length. Furthermore, longer innings sometimes aren’t a pitcher’s fault. Errors — fielding, catching, throwing, or mental — are notorious culprits for extending an inning.
Or consider this scenario: two pitchers strike out the side. One does so in 1-2-3 fashion, while the other strikes out three batters but allows five hits. Which pitcher was more dominant? Obviously, the first one was, but these two will have the same amount of K/9.
The solution: K%, BB%, K–BB%
The solution to this statistical hole comes in three statistics that are used in conjunction with each other. They are strikeout percentage (K%), walk percentage (BB%), and strikeout minus walk percentage (K-BB%). Strikeout percentage states what percentage of all batters faced by a pitcher end up striking out. It is calculated by dividing a pitcher’s strikeouts by the total number of batsmen faced. Walk percentage does the same thing and is calculated the exact same way. The third stat shows the difference in percentage points between the two.
For relievers, using percentages instead of “per nine” stats to measure strikeouts and walks is crucial. This is, again, because they pitch so many fewer innings than starters — both in a season and per outing.
Saves: The Problem
Now for the granddaddy of all reliever stats but also the one that is among the most flawed: the save. Saves can be incredibly deceptive. This comes, in great part, due to the qualifications to get one. It can only go to a finishing relief pitcher who enters the game with a lead that he never surrenders. He must also do one of the following: pitch at least three innings without getting the win; enter the game with the tying run either on base, at bat, or on deck; enter with a lead of three runs or fewer and pitch at least one inning.
Middle relievers who would never, ever get the save simply due to their roles can get a blown save. Saves can also go to pitchers whose sole accomplishment is that they didn’t blow a lead — three runs in one inning — that teams hold 24 out of 25 times. (Win Probability for teams who enter the last inning with a three-run lead is 96%.) Pitchers who enter on the winning side of a blowout in the seventh inning and finish the game also get the save, regardless of the final score.
Clutch Late Innings: The Goose Egg
Nate Silver saw the flaws in the save and created a statistic for clutch late innings. This fills in the holes left by the save. Long-time readers of this author’s analyses are quite familiar with it — the Goose Egg.
Goose Eggs (GE) go to relief pitchers only, and they are awarded by inning. To get a Goose Egg, a pitcher must pitch in the seventh inning or later, must pitch for three outs, enter the game when it is either tied or his team leads by a max of two runs (or has the tying run either on base or at bat), and not allow any run of any type to score. If a pitcher in a Goose Egg situation has an earned run charged to him, he gets a Broken Egg (BE). Allowing an inherited run to score gives him a Meh (M). If he has an unearned run charged to him, he also gets a Meh.
There are some special cases. If a pitcher doesn’t pitch all three outs of an inning, he can still get a GE if the number of runners on base when he enters plus the number of outs he gets is three or more. A pitcher also will get a M instead of a BE in an inning where he gives up an earned run but still finishes a win.
The All-Purpose Reliever Stats: WPA, Shutdowns, Meltdowns
Goose Eggs do well for clutch late innings. However, they only cover specific scenarios. They’re akin to a pass/fail grade. Sometimes a reliever’s performance is lights-out dominant. Other times he gets the job done but gives his coaches and fans a heart attack in the process. On the negative side, sometimes a reliever pitches fairly well but doesn’t quite get over the hump. And, of course, there are outings that are utter disasters.
How do we quantify these? Using Win Probability (Win Expectancy), we can look at a reliever’s Win Probability Added (WPA) on a game-by-game basis and categorize these outings. If, in his outing, he increases his team’s Win Probability by .060 (6%) or more, it’s a shutdown. The opposite of that — decreasing his team’s Win Probability by .060 or more — is a meltdown. These two categories, tracked on Fangraphs, work together to give us an in-depth look at how effective a reliever has been in a season.
Outlook for Evaluating Relievers
Evaluating relievers in the modern game can seem daunting at first glance. However, if certain principles are kept in mind, it becomes less difficult. When a statistic is used in a way that it was not intended to be used, that invalidates the statistic. We need to make sure to avoid that when looking at relievers. Ultimately, it boils down to answering the question of whether a relief pitcher did his job. However, we must do it in such a way that it doesn’t paint a deceptive picture of the reliever’s performance, and these statistics should go a long way toward taking care of it.