Friar Watch

Keeping an eye on the San Diego Padres pitchers

Friar Watch header image 2

How to use MLB Gameday data

April 23rd, 2007 · 11 Comments

There’s a lot of good information in MLB.com’s Enhanced Gameday XML data but unfortunately they don’t provide any documentation. Here’s what I’ve been able to uncover.

The real starting point for anyone interested in Gameday XML hacking has to be Joe P. Sheehan’s excellent articles at Baseball Analysts. Joe was the first person that I’m aware of to present this information to wider audience.

Now for the actual data. There’s no actual link to it on MLB.com, at least none that I could find. The Gameday application pulls the data from this URL:

http://gd2.mlb.com/components/game/mlb/year_2007/

You can drill down to the game you’re interested in. The files I’m most interested in are in the pbp/pitchers directories. Each pitcher is assigned an ID number, you need to know the number of the pitcher you’re interested in. For instance, Greg Maddux is 118120. There are a couple of ways to get this data into Excel. You can right click on the 118120.xml and save it to disk, then open Excel and load the file. Some older versions of Excel won’t display the file properly unless you insert a version tag at the beginning of the page, like so:

<?xml version=”1.0″?>

An easier way to import the data into Excel is to go to the Data menu and select Import External Data/New Web Query. Simply paste the URL of the desired page into the address bar and click Import. It should bring in all the data, perfectly formatted.

Now that you have the data in Excel you’ll notice that some of the column headings are obvious, des is a description of the play for example. Others are not so obvious. Here’s a list of what I’ve been able to figure out:

x and y: Location of the pitch using the old Gameday coordinate system. I don’t use this at all.
Start Speed, End Speed: Speed in mph of the pitch at 55 feet from home plate and as it crosses the plate.
sz top and sz bot: Top and bottom of the hitter’s strike zone, in feet, as measured by the Gameday system.
px and pz: location of the pitch as it crosses home plate, in feet.
x0,y0,z0: Release point, in feet.
break y, break angle and break length: Measures the direction and magnitude of the movement on the pitch. I’m still trying to figure out how these values work to represent what the pitch is doing.

There are several other variables in these files but I don’t know what they’re measuring. If I figure it out I’ll be sure to write it up.

Tags: General

11 responses so far ↓

  • 1 LynchMob // Apr 24, 2007 at 10:58 pm

    Yowza! THANKS for this! I’ve done a little bit of exploring the XML files … but ended up with more questions than answers … this gets me several steps farther!

  • 2 LynchMob // Apr 25, 2007 at 9:47 am

    I see that you don’t use the x,y data in the pitching files … but do you know what it represents? I’m interested to understand what the x,y represents in hitting files such as http://gd2.mlb.com/components/game/mlb/year_2007/month_04/day_19/gid_2007_04_19_arimlb_sdnmlb_1/inning/inning_hit.xml … it’s not obvious from a simple xy plot of the data …

  • 3 joe p // Apr 25, 2007 at 4:24 pm

    the x,y coordinates in the file you mentioned give information about the location where balls-in-play land on the field.

  • 4 Bill // May 21, 2007 at 11:29 pm

    I can’t get it to work… it just loads all of that text in Row 1, Column A with all the brackets and stuff.

    I tried adding by editing the file and sticking it at the top, but it didn’t work.

    Any idea what I’m doing wrong?

    I’m using Microsoft Excel 2000, by the way. Should I “get” a more recent version or will 2000 work with it?

    Thanks.

  • 5 JOhn S // Jun 6, 2007 at 4:31 pm

    For some reason I cant get any other info than the basic stats. Could you help me out

  • 6 Enhanced Gameday analysis cataloged by date « Fast Balls // Sep 1, 2007 at 7:07 pm

    […] April 23, Anthony posted “How to use MLB Gameday Data”, describing some of the PITCHf/x data […]

  • 7 Enhanced Gameday analysis cataloged by author « Fast Balls // Sep 1, 2007 at 7:35 pm

    […] April 23, he posted “How to use MLB Gameday Data”, describing some of the PITCHf/x data […]

  • 8 Catalog of Enhanced Gameday analysis « Fast Balls // Sep 1, 2007 at 8:53 pm

    […] April 23, he posted “How to use MLB Gameday Data”, describing some of the PITCHf/x data […]

  • 9 Glossary of the Gameday pitch fields « Fast Balls // Sep 11, 2007 at 9:13 pm

    […] article “How to Use MLB Gameday Data” by Anthony at Friar […]

  • 10 Jesse Litsch is 22 Years Old « The Mockingbird // Sep 20, 2007 at 2:29 am

    […] Litsch is 22 Years Old Jump to Comments I’ve been rather obsessed with enhanced pitch data lately. The folks at pitch f/x took the wind out of this post by only getting it running 7 innings […]

  • 11 crashburnalley.com » D-Rays, Twins Swap Young, Garza, and more // Nov 28, 2007 at 6:28 pm

    […] Gameday Pitch F/X data in Microsoft Excel (2000), but I couldn’t get it to work. I followed the directions from Friar Watch, but when I imported the data, it simply went into Row 1, Column A as XML […]

Leave a Comment