Monday, January 25, 2010

When FIP Lies

FIP (Fielding Independent Pitching) is one of my favorite baseball statistics. It does exactly what it is supposed to do and does it with surprising effectiveness and reliability. It is one of the first things I look at when I want to gauge a pitcher's true level of performance. In fact I think that many statistically minded baseball thinkers have come to rely on FIP as the backbone of most pitching related analysis. However, I do not think that FIP is totally infallible.

When I am bored one of the things I will do to pass the time is scroll through fangraph's statistics pages and just look for something interesting. This is exactly how my article on cutters evolved, and that article has probably gotten more positive feedback than anything I have written recently. I was looking through FIPs and started to see a couple of trends that I felt were worth pointing out.

First of all, generally a pitcher's actual ERA will fluctuate a bit on either side of their FIP from season to season (depending mostly on their luck in a given year) but typically the two average out and are similar over a larger sample size. This makes it very useful at determining just how lucky a pitcher was in a given year. Perhaps the greatest secret to the game of baseball is just how much the outcome of each game is dependent on luck, and having FIP to remove some of the statistical noise luck creates is invaluable.

Knowing this, what happens when a pitcher's actual ERA falls a decent amount away from from his FIP routinely? I'm not talking about a pitcher under-performing or over-performing his FIP for two years; I'm talking about a player doing this over the course of many years. At some point the sample size becomes big enough that I think we can logically conclude that FIP just does not evaluate the pitcher properly and that there is something inherent within them that makes them consistently miss their FIP number. I wanted to try and figure out exactly why this happens, and that is what this article is about: finding the types of pitchers that FIP tends to miss on a bit.

I will not lie, the inspiration for this article is Javier Vazquez. Vazquez has a career ERA of 4.19 and a career FIP of 3.83. That may not seem like a huge difference, but over the course of 2,490 innings (Vazquez' career total) it is actually quite large. We actually have a pretty good idea of at least one reason why Vazquez continually under-performs his FIP; when no one is on base, he allows an OPS of .697 and when there are men on base that number jumps to .773. Vazquez simply does not pitch as well out of the stretch as he does from the windup. FIP assumes that this is mostly bad luck but I think everyone will agree in Vazquez' case that it is more just him at this point.

In other words, Vazquez is an outlier, but we at least know one reason why. But he's not the only outlier. The goal here is to find the other outliers and try to figure what, if anything, they all have in common.

First, we have to find the outliers, in both directions. My method for doing this was to look at all pitchers who qualified for the ERA title over the last 10 years and count the number of seasons in which they were in the top 20 in either ERA-FIP or FIP-ERA. A simple tabulation of the players who appear most commonly should reveal who the greatest outliers are. Only very large sample sizes will actually be of use for this which is why I'm going back 10 years.

I will spare you the boring details of individual season data and simply give you the list of the players who appeared most commonly with the number of appearances, as well as some relevant career statistics.

Actual ERA Higher than FIP
1. Kevin Millwood - 5 (2,314.1 IP, 4.02 ERA, 3.91 FIP; 2 Appearances on opposite list)
2. Mike Mussina - 4 (3,562.2 IP, 3.68 ERA, 3.57 FIP; 0 Appearances on opposite list)
2. Jeremy Bonderman - 4 (1,005 IP, 4.78 ERA, 4.17 FIP; 0 appearances on opposite list)
2. Andy Pettitte - 4 (2,926.1 IP, 3.91 ERA, 3.74 FIP; 1 appearance on opposite list)
2. Javier Vazquez - 4 (2,490 IP, 4.19 ERA, 3.83 FIP; 0 appearances on opposite list)
2. Livan Hernandez - 4 (2,734 IP, 4.45 ERA, 4.44 FIP; 2 appearances on opposite list) *
7. Brad Penny - 3 (1,633.2 IP, 4.14 ERA, 4.00 FIP; 1 appearance on opposite list)
7. Josh Beckett - 3 (1401 IP, 3.79 ERA, 3.61 FIP; 0 appearances on opposite list)
7. Joel Pineiro - 3 (1456.1 IP, 4.39 ERA, 4.23 FIP; 1 appearance on opposite list)
7. Kenny Rogers -3 (3302.2 IP, 4.27 ERA, 4.38 FIP; 1 appearance on opposite list) *
7. Mark Redman - 3 (1238.2 IP, 4.85 ERA, 4.41 FIP; 0 appearances on opposite list)
7. Randy Johnson - 3 (4,125.1 IP, 3.29 ERA, 3.19 FIP; 0 appearances on other list)
7. Curt Schilling - 3 (3,261 IP, 3.46 ERA, 3.23 FIP; 0 appearances on other list)
7. Esteban Loaiza - 3 (2,099 IP, 4.65 ERA, 4.34 FIP; 0 appearances on other list)

First let me just explain briefly why Hernandez and Rogers are asterisk'd. I do not feel they are worth using for analysis because their career numbers do not fall in line with the rest of the group as expected, even though my method plucked them out. This happened for 2 major reasons in both cases. The biggest reason is because both of their careers also existed for a great deal in years prior to time period I was looking at (2000-2009) and during the years prior they did not generally have an actual ERA that was higher than their FIP. Secondly I believe there is a legitimate reason why they appear on both lists several times. It is that both Rogers and Hernandez tend to allow a ton of balls to be put in play. Balls in play are the ones that are the most subject to "luck" variables, and therefore we would expect them to deviate from the mean more than a pitcher who allows fewer balls in play. In the end I feel that because of these reasons including them in this study would be more counterproductive than anything so they will be ignored going forward.

Not surprisingly the inspiration for this article, Javier Vazquez stands out prominently on this list. Bonderman, Loaiza and Redman also jump out for how much their career ERA's and FIP's differ. Mussina, Johnson and Schilling do not show as much variance but because their sample sizes are so huge I do believe the difference is significant and that makes them very useful for this study.

So that gives us 12 outliers: Millwood, Mussina, Bonderman, Pettitte, Vazquez, Penny, Beckett, Pineiro, Redman, Johnson, Schilling, Loaiza. What could they possibly have in common that makes FIP perceive them as better than they actually are?

Theory 1: Well, we already established at least one reason why Vazquez is on this list, so why not see if all the other members here also struggle when men are on base?

Much worse w/men on: Millwood, Vazquez, Penny, Redman, Schilling
Slightly worse w/ men on: Mussina, Bonderman, Pettitte, Beckett, Pineiro, Johnson, Loaiza

This is honestly not surprising, nearly every pitcher allows a higher OPS with men on base. Still we had several where the gap was more significant than most, and none who were all that close to being neutral. So maybe that is part of it.

Theory 2: There are a lot of pitchers here who have had, for at least most of their career, above average control. Schilling, Mussina, Vazquez and mid-late career Johnson had some of the lowest walk rates in the last 20 years. Also, there is nobody on this list with particularly bad control. Perhaps FIP favors pitchers with good control a little too strongly?

Theory 3: This is totally subjective, but I am going to throw it out there. I have watched most of these guy pitch a lot in my life; several played for or against the Yankees often in their career, and all them had what I would call "substantial" time in the big leagues where they would be visible.

There was one thing that jumped out at me that all of these guys have in common when I went back and thought about watching them pitch. When they missed their target, they all tended to miss in the strikezone. Yes, they had good control for the most part, but when their control let them down I feel like they all tended to err on the side of throwing a fat pitch rather than issuing a ball.

Just a few subjective examples that I have noticed:
- Vazquez throws his curveball over the middle of the plate all the time, and most often hitters take it for a strike. But, a decent amount that do pull the trigger hit it hard because of the location.
- Randy Johnson left so many sliders that were supposed to be off the plate to his glove side over the middle of the plate it became a running joke in my house. I can still envision Posada setting up inside to a right-handed batter, putting his glove on the ground behind the batter's feet and then having to reach across his body as the pitch came in right over the middle of the plate. Johnson did this with his other teams as well and it continued to get more pronounced as he aged.
- Schilling loved throwing his four-seem fastball on the outside part of the plate early in the count, particularly with nobody on, and even when he missed over the plate hitters swung through it fairly often, but some connected and hit it very far.
- Mussina commonly got ahead of hitters early in the count by throwing strikes, but some hitters pounced on them and hit them hard. Mussina also would just about always throw a fat strike in a 3-2 count, usually over the middle of the plate and hitters knew this.
- Beckett, particularly early in his career, loved throwing his fastball over the plate as hard as he could because he did not think batters could handle its velocity, unfortunately some could.

*This brings me to another point these pitchers have in common. Other than perhaps Mussina, all of them trusted their stuff. Schilling, Johnson, Vazquez, Bonderman, Penny... really all of them except Mussina never nibbled at the plate. They threw strikes and dared the batters to hit them.

Also, most of these pitchers have been homer-prone in their career. Even players like Pettitte and Pineiro tend to give up more home runs than other groundball pitchers. Schilling, Vazquez, Johnson (later in his career), Beckett, Mussina... pretty much all of these guys have given up home runs at an above average rate. And, if you want to extend that comparison farther, they have given up home runs at a higher rate than their contemporaries of similar quality.

I think this is actually a significant part of why these players tend to have higher ERAs than FIP says they should. FIP knows walks are a very bad thing and these pitchers avoid walks. However, I think occasionally that in the process of avoiding walks, they give up more hard-hit balls because they are leaving pitches over the plate more often. This is merely a mindset within the pitcher and I don't actually expect a formula to be able to parse it out.

A Few Important Things

- Most of the pitchers above were excellent pitchers, I think it is possible 4 or even 5 out of the 12 may get into the Hall of Fame one day. I am merely hypothesizing that their style may have been over-favored by FIP.

- The pitchers that FIP was overly optimistic about are far outnumbered by the pitchers FIP said were worse than there ERA's would indicate. Also the gap between the actual ERA figures and FIP's were much larger in the upcoming group as well.

- After looking at the above group my admiration for FIP has actually grown. These were the players it missed by the most on to one side of the spectrum and it did not miss by much on most of them.

Actual ERA Lower than FIP
1. Tom Glavine - 6 (4413.1 IP, 3.54 ERA, 3.95 FIP; 0 appearances on opposite list)
1. Jamie Moyer - 6 (3908.2 IP, 4.22 ERA, 4.44 FIP; 0 appearances on opposite list)
1. Jarrod Washburn - 6 (1853.2 IP, 4.10 ERA, 4.60 FIP; 0 appearances on opposite list)
4. Jeff Suppan - 4 (2410.2 IP, 4.68 ERA, 4.85 FIP; 2 appearances on opposite list)
4. Steve Trachsel - 4 (2501 IP, 4.39 ERA, 4.84 FIP; 0 appearances on opposite list)
6. Kirk Rueter - 3 (1918 IP, 4.27 ERA, 4.66 FIP; 0 appearances on opposite list)
6. Tim Hudson - 3 (2059.2 IP, 3.49 ERA, 3.79 FIP; 0 appearances on opposite list)
6. Tim Wakefield - 3 (2931.2 IP, 4.33 ERA, 4.72 FIP; 0 appearances on opposite list)
6. Barry Zito - 3 (1999 IP, 3.84 ERA, 4.30 FIP; 1 appearance on opposite list)
6. Carlos Zambrano - 3 (1551.1 IP, 3.51 ERA, 3.95 FIP; 0 appearances on opposite list)
6. Oliver Perez - 3 (1065.1 IP, 4.54 ERA, 4.78 FIP; 0 appearances on opposite list)
6. Jake Peavy - 3 (1362.2 IP, 3.26 ERA, 3.46 FIP; 1 appearance on opposite list)
6. Mark Buehrle - 3 (2061.0 IP, 3.80 ERA, 4.17 FIP; 0 appearances on opposite list)
6. Johan Santana - 3 (1709.2 IP, 3.12 ERA, 3.38 FIP; 0 appearances on opposite list)
*. John Lannan - (423 IP, 3.91 ERA, 4.80 FIP)
*. Woody Williams - (2216.1 IP, 4.19 ERA, 4.63 FIP)

Wow. I don't really need to even go in depth on that list to already notice some very serious trends. Unlike the previous list, no one needs to be excluded for not actually fulfilling the criteria (which is interesting in itself and I will touch on why in a bit) but I have actually added two names that my method did not pluck out.

John Lannan missed the original search because he has only pitched a little more than two season, and Williams missed because the vast majority of his career happened in the years prior to my predetermined range of dates. However I have included them because in the corners of their careers that my research did catch they were so blatantly trending in one direction I felt like they belonged with these other players.

First of all I must say that the gap between actual ERA and FIP is much larger in this group than the previous. It appears as though FIP is more prone to underestimating pitchers than it is to overestimating them. Just a simple observation.

Well let's break this group down with simple bullet points.

- 13 out of the 16 pitchers listed here do not throw any straight pitches. The lone exceptions being Zito, Santana and Perez who throw fairly straight fastballs.

- At least 12 out of the 16 do not have a fastball with plus velocity

- 9 out of the 16 had no real problem with issuing a walk you if you did not offer at their pitch.

- 6 out of the 9 lefties fall under the stereotype of "crafty."

- 4 out of the 7 righties could be described as "crafty" as well.

That pretty much sums it up. I remember a conversation I had with a friend that must have been almost eight years ago now while watching Trachsel pitch for the Mets together. After watching him get through five innings, putting about nine guys on base and allowing no runs I said something along the lines of, "How does he keep getting out of trouble!?!?!!" My friend calmly responded, "He's got a lot of moxie."

That's how I would describe this group for the most part, they've got a lot of moxie. The exceptions to this are: Santana, Perez and Peavy, who all actually have a relatively small disparity between their FIP and ERA anyway. So why are they on this list with all of these other pitchers?

Perez, I think is the easiest to pin down. He is, and I actually hate to use this term because I think it is ridiculous on its surface, "effectively wild." He is totally capable of leading off an inning with three walks, striking out the next two guys and getting a popup. If I saw him do this I wouldn't even blink because that is just who he is. FIP does not like players like this. Glavine is here because he often walked people he did not want to face in order to get a better matchup, and it worked for him. Perez is here because he often walked people because he could not help it, but then got the next guy anyway because he had such great stuff. FIP sees a loose cannon with Perez, and he is one, but it underrates his ability to sneak out of trouble because his stuff is so good.

Peavy is here, I think, mostly because he has played his whole career in such a pitcher friendly environment. I know that FIP indirectly takes park elements into effect, but Peavy has been so far to the extreme in pitcher friendliness I think his inclusion here is an accident. His FIP and ERA are not actually all that far apart, which further enforces this belief of mine. I think he is an outlier, within a group of outliers, because his pitching situation is so unique.

That leaves Santana. It's actually hard to figure out what he has in common with this group. The only reason he is here is because his career BABIP (batting average on balls in play) allowed is a surprisingly low .287. FIP tends to think that things like that are not sustainable, but for Santana it has been. In fact his BABIP allowed hasn't been over .300 since 2002. I think this, combined with the fact that the difference between his FIP and ERA is not as much as most of the other pitchers in this list explains him.

Random Notes

- The pitchers in the first group have mostly been very durable, I think partially because they are so efficient. Even Beckett who has not been durable has missed most of his time because of blisters rather than an arm or shoulder problem.

- Mark Redman is in the first group but certainly falls under the "crafty lefty" label that would seem to put him in the second group... why? As strange as it sounds I think he threw too many strikes. Even though he didn't have as much pure stuff as the other players in his group he challenged hitters anyway.

- Nobody on the second list seems to fit the stereotype of the first group the way Redman appears to belong in the first group. As a result of this, I have more confidence in the conclusions I have drawn regarding FIP's relationship with the second group because the sample group is much more closely related to each other.

- Most of the pitchers in the first group throw 4-seem fastballs. Most of the pitchers in the second group throw 2-seem fastballs.

- I think that most of the pitchers in the second group pitch the way they do (nibbling at the plate and trying to get batters to chase) because they are incapable of pitching like the guys in the first group. If they tried they would get beat up the way Mark Redman did (he had the highest career ERA of anyone in either group.)

- If you were to build a pitcher that fit the first group's stereotype perfectly they would be: efficient, have good stuff, challenge hitters, issue very few walks, give up a few too many home runs... oh look it's Javier Vazquez!

- If you were to build a pitcher that fit the second group's stereotype perfectly they would be: left-handed, throw less than 90 MPH, not throw any straight pitches, have an underwhelming BB:K ratio... oh look it's Tom Glavine!

- Just to build on the previous two points for a second. Glavine's career ERA is comfortably more than half a run better than Vazquez' but he has a higher career FIP. Is it more logical to assume that they both were largely affected by luck for their entire, long, careers in opposite directions or that perhaps the metric we trust most to evaluate their true effectiveness is a bit off?

What Does It All Mean?

- FIP may slightly undervalue crafty pitchers who are willing to issue walks because they know they lack the stuff to routinely challenge hitters.

- FIP may slightly overvalue aggressive pitchers who fearlessly throw strikes rather than trying to get hitters to chase.

- In order to make FIP as accurate as possible we should probably find some way to make home runs hurt a pitcher's FIP more and/or find a way to make walks hurt a pitcher's FIP a bit less. As I said a few times, there are more pitchers overall that have a higher FIP than ERA and they also tend to skew farther away from their FIP number than people on the opposite side. Doesn't it make sense to push FIP a little bit in their favor so that the number that deviate to each side are closer to even?


Snatchmike said...

Great article. You have just clearly demonstrated that FIP is not as useful as some seem to believe, and should not be thought of as a replacement for ERA, but rather a compliment to it. Thank you for the work you do in putting perspective on advanced statistics. Far too often, people view stats like FIP as gospel, without truly understanding their role, importance, and fallibility.

Anonymous said...

Ding them a little for home runs, but aren't walks still worse than HRs from a pitching perspective?

If you want to make walks hurt less, could they be judged like errors at all? See how many runs then are "unearned" and nail their FIPs.

Anonymous said...

Really interesting piece, thanks. One thing I would say is that your description of Mussina (fat 3-2 strikes, not trusting his stuff, nibbling, etc.) is certainly true for 2004-2008 Mussina, but most definitely not true for 1992-2003 Mussina. When he lost his fastball in 2004, he became a completely different pitcher.

James Esatto said...

I don't think walks are worse than home runs for a pitcher, giving up a home run is the worst thing you can do in baseball....

I have a part 2 and 3 possibly planned for this where I go about actually trying to fix the formula, no real timetable for those at the moment though, I'm currently buried in prospect material I want to put on the site.

er: Mussina... I do agree with you somewhat. Most of the data I was looking at for this article came during Mussina's later years so my description was more referring to that version of him. I think his timidness may have crept in a little sooner than you but that is hard to prove, he certainly did adjust as he got older and change his style to a degree.

Also thank you for the kind words, I always appreciate feedback.