Download Play By Play Retrosheet Data Mac Chadwick

  1. Download Play By Play Retrosheet Data Mac Chadwicks
  2. Download Play By Play Retrosheet Data Mac Chadwick Boseman
  3. Download Play By Play Retrosheet Data Mac Chadwicks Of Boston

Statistical Databases and Websites

It's a good time to be a baseball researcher with a computer. Thiswas true in 2000 when 'How to Do Baseball Research' was originallypublished and it is even more true today. And it isn't only that computerand internet speeds are more than an order of magnitude faster than they wereback at the end of the last century; there has been a tremendousincrease in the amount of information available as well. This chapter willdescribe some of the internet's best sources of baseball data. Few ofthese sites even existed back at the beginning of the millenium andmost of those that did have changed almost beyond recognition.

Chadwick tools for manipulating baseball data. Contribute to chadwickbureau/chadwick development by creating an account on GitHub. Chadwick is a collection of command-line utility programs for extracting information from baseball play-by-play and boxscore files in the DiamondWare format, as used by Retrosheet. Retrosheet Full Play by Play? I’m trying to get the full play by play for retrosheet for 2019, but ideally any year. When I download it, it gives me it separates it by team and even then it’s incomplete. If anyone has any fixes or advice I would be thankful. BEVENT.EXE: Creates play-by-play data file suitable for import into database programs, spreadsheets, and custom written programs.Enter the DOS command 'bevent -h' for more information. version update: BOX.EXE: Generates traditional box scores from Retrosheet data files.Enter the DOS command 'box -h' for more information.

  1. In this repository, the branch 'official' contains the latest official upstream data from Retrosheet. The branch 'master' contains an augmented version of the data. These additions may include errata from the latest Retrosheet release, as well as versions of the data which contain extra metadata usable by the Chadwick library as extensions on.
  2. Play-by-Play Data Files (Event Files) See notices about use and limitations of data. Recipients of Retrosheet data are free to make any desired use of the information, including (but not limited to) selling it, giving it away, or producing a commercial product based upon the data. Retrosheet has one requirement for any such transfer of data.

So what can we find on the internet and where can we find it?We will break the answer to this question into two parts: sites thatlet us browse statistical data and sites that let us download it.

Browsing Baseball Data

This section will focus on data that is made available primarily forbrowsing. Of course, just about anything that can be browsed on theinternet can also be downloaded ('File'->'Save Page As'), but these sitesnormally present their data scattered over thousands of pages andin many different formats. And while they will let you search for data(by player or team name, for example) and will often even let you sortthe data, they are not usually designed to let you answer complicatedquestions. In many ways, these sites simply provide on-line baseballencyclopedias and their visitors access them in much the same way people havebeen accessing print encyclopedias for decades.

Note: some of these sites also provide data for downloading, but thiswill be discussed in the next section.

So let's start by answering a sample question: how many hits did DerekJeter have in 2004? Here are some places that will answer that question:

While each of the sites above provided the same answer to our originalquestion (Derek Jeter had 188 hits in 2004), they also provided awide range of presentation styles and additional information. The restof this section will discuss what is available on each of these sites(and several others not included above). It is important to note thatwe are focusing on the statistical data available on the sites below.In many cases, statistical data is only a small part of these siteshave to offer.

ESPN offers a variety of statistics on players who are currentlyactive. Both regular and some sabrmetric stats (IsoP, SecA, PSN) areincluded. What is available varies from year to year. For example, thenumber of pitches is included starting in 2002 and ground ball/flyball information starts in 2004. Batter-pitcher match-ups as well asbatting and pitching splits are available for active players. Samplesare here(Derek Jeter's match-ups against the Tigers pitchers) andhere(Derek Jeter's 2005 batting splits).

They also provides sortable regular and post-season stats on allmajor league players back to 2002 and player game logs on activeplayers back to 2001. Samples arehere(2003 major league hitters sorted by triples) andhere(Nomar Garciaparra's 2002 game log).

This site is free, although the sell an insider subscriptionpermitting you to access all their columnists as well as getting someother services (mostly relating to fantasy baseball). has statistics on both active and 'historical'players. Derek Jeter's sample page above shows what is included foractive players.Ground ball/fly ball and pitch count data starts in 1999. Onefeature is a leaderboard section showing all seasons a player was inthe top 25 in a major statistical category. A sample showing DelBissonette's entries on the leaderboard arehere.Another feature for active players is a hit chart showing the locationof various types of batted balls (singles, doubles, triples, homeruns, ground outs and fly outs) in each park back to 2000.There are also game logs and splits for active players for the currentseason.

They provide sortable team and player stats for seasons(including multiple seasons) back to 1871.Here is the main page for this. If you always wondered what player hit themost triples in the AL from 1923 to 1926 (Goose Goslin,with 70), thisis the place for you. The flexible interface allows you to specify theleague (AL, NL or both), the hitting or pitching stat, and the seasonsof interest (hint: hold down the button to add years to thelist). From 1999 onward, it will also looks like you should be able toadd postion and situational splits to the selection criteria, butmost of these queries didn't end well ('There was a problem retrievingyour requested stats. Please try again later').

This site is free.

baseball-reference contains a wealth of data on all major leagueplayers. Both the usual and unusual (BAbip, OWn%, Rtz, Rtzhm and manymore) are included. Two features in the data display that warrantmention: you can sort the years in each display by any category amdyou can sum the stats for any group of years by highlighting them.

Other things included on the site are box scores (most withplay-by-play descriptions), batting and pitchingsplits and daily game logs back to the early 1950s.Hereis a link to the schedule and results page (with links to daily boxscores) for the 1961 Milwaukee Braves, hereis a link to Sandy Koufax's 1965 pitching game log, andhereis a link to Johnny Bench's career splits.Each player's page also contains similarity scores (which players aremost like the one in question), a leaderboard section showing allthe times the player appeared in the top ten in any category, anda home run log (accessed by clicking 'HR Log' at the top of theStandard Batting and Pitching sections (Mantle's home run log ishere).

In addition to major league players, baseball-reference also hasseason and career data on minor league players, as well as the minorleague records of major league players. A sample, Joe Hauser's minorleague record, ishere.

The things mentioned so far are free. You can also purchase asubscription that allows you access to the play index, a tool thatpermits you to query seasonal data as well as Retrosheet game databack to 1954. There is a very flexible and powerful interface thatallows subscribers to answer all sorts of questions.

Here are just a few examples of some of the types of questions you can answer usingthis tool:

Download Play By Play Retrosheet Data Mac Chadwicks

1) When was the last time a player who was not in the starting lineup had7 or more RBIs in a game? To answer this, you would go to theBatting Gamelog Finder, change the Batter's Defensive Position from'Either' to 'Sub' and the first set of stat fields to 'RBI', '>='and '7'. Pressing 'Get Results' quickly told us that JohnMayberry had 7 RBIs for the Blue Jays onJune 26, 1978and Roy Siever did the same for the White Sox onJune 21, 1961.

2) Who was the last pitcher to win a game in which he game up 5 home runsor more? To answer this, you would go to the Pitching Gamelog Finder,change the Pitcher Decision to 'Win' and the first set of stat fields to'HR', '>=' and '5'. Pressing 'Get Results' showed us that TimWakefield last did this onAugust 8, 2004.

3) Who was the last Boston Red Sox switch-hitter to have 30 or moredoubles and home runs in a season? The Batting Season Finder can tellus the answer to this one. Once there, change League to 'AnericanLeague', team to 'Boston Red Sox', Bats to 'Switch' and the first twosets of stats fields to '2B', '>=', '30' and 'HR', '>=', '30'. Theanswer: Carl Everettin 2000.

Retrosheet contains box scores with play-by-play data covering 1952-2008 for theNL and 1953-2008 for the AL. In addition, it also has box scores(without play-by-play data) for 1872 and 1874 for the NationalAssociation, 1911 for the NL and 1920 to 1931 for both leagues.Hereis a sample game log (with links to box scores) for the 1965 New York Mets.

It has encyclopedia entries for all players like the Derek Jeterpage shown above. In addition to the usual statistic data, each playerpage will have links to game logs (likethis onefor Rogers Hornsby's 1922 season), splits (Billy Williams' 1970 splits arehere),a top performance page (likeLou Gehrig's),and batting and pitching matchups (Joe Pepitone's arehere).

Download Play By Play Retrosheet Data Mac Chadwick Boseman


There are all-time top performance pages covering top statisticalmarks inonethrougheightconsecutive games, as well as top performance pages for each baseballfranchise (the Mets' page ishere).Finally, the are ballpark pages containing various splits (the EbbetsField page ishere).

Data for the current year is not available until late November.

Much of the data used to generate the pages on Retrosheet's site isalso available for downloading, but that will be discussed in the nextsection.

This site is free.

Baseball Prospectus contains the kind of statistical data you'd expectfrom a sabrmetrically sophisticated site like theirs, which can beseen from Derek Jeter's sample entry above. In addition, they alsohave seasonal data that can be sorted by a variety of fieldshere.Each report permits you to select a year back to 1954 (it doesn't looklike a range of years is supported), an optional defensive position (for battingstats), and a series of statistics to sort on. As a example,hereis the report of 1996 pitchers, sorted by the number of fly balls theyallowed.

The stuff described above is free, although they do offer asubscription service (which include Custom Sortable Stats).

In addition to offering player encyclopedia entries on major leagueplayers with a wide range of normal as well as advanced sabrmetricdata (see the example above), Fangraph also has extensive winprobability data (since 1974) as well as statistics onbatted balls, pitch type and plate discipline (since 2002). They alsohave player game logs available since 2002 (Manny Ramirez's 2003 log ishere),a play log containing every batting play in each season (also since 2002)that can be sorted by a host of categories (Travis Hafner's 2006 log,sorted by win probability added, ishere),as well as series of graphs showing, among other things, how eachplayer has compared to league averages and players his own age inseveral rate categories.Hereis how Barry Bonds compared to his league in on-base percentage duringhis career.Andhereis Ichiro Suzuki's daily graphs in a host of rate categories since 2002.

Minor league data is also included starting in 2006.

This site is free.

Baseball Musings gives you access to daily logs and player splits from1957 to 2009(here)in a flexible format that allows you to select games to include from awide criteria. A few examples of the kinds of reports you can generate:

1) a list of Ernie Bank's games played from 1957 to 1971 against theCardinals in Wrigley Field ishere.Note: the summary line at the end of the report giving the totals ofall the games displayed.

2) Albert Pujols' splits from 2001 to 2003 arehere.

3) Todd Helton's yearly road record from 1997 to the present arehere.

4) Jorge Posada's batter-pitcher matchups from 2003 to 2006 coveringonly those games played in Yankee Stadium arehere.

You can also generate lists of batters, again using a wide selectionof both inclusion and sorting criteria. Two examples:

1) A list of all the players who reached first base on catcher'sinterference at least 10 times from 1957 to the present ishere.

2) A list of the visiting pitchers from 1957 to 2008 with the mostshutouts at Yankee Stadium ishere.

This site is free.

Baseball Almanac has encyclopedia entries for all players (see theexample above). They also have a tool calledstatmasterthat will allow you to generate team reports containing a variety ofstatistical categories. There are boxscores available for manyteams from 1958 to 2004 (the 1960 White Sox main page, is here)and player logs from 1954 to 2008 (Stan Musial's 1954 log ishere).

In addition, the site has an extensive section on baseball records.

The site is free.

The Baseball Cube contains major league statistical data back to 1903,minor league data starting in 1978 and NCAA data from 2002. TimLincecum's page, showing all three types of data, ishere.Baseball boxscores are available from 1957 to 2008 here andplayer logs are also available for the same years. A sample game log, coveringGreg Maddux's 1995 season, is availablehere.

The site is free.

Howe Sports Data has major and minor league data on all active players,major league daily logs back to 2002 and minor league logs back to1999. A sample log, Jeff Francis' 2004 Texas League record, ishere.

This site is free.

Minor League Baseball Split displays minor league splits for players back to 2005. A sample page ishere.

This site is free.

Download Play By Play Retrosheet Data Mac Chadwicks Of Boston allows you to look at PitchFX data for games since2007. PitchFX data captures information on each pitch, including speedand vertical and horizontal position, movement and spin angle. Note:not all 2007 games have this data. From the main page, youselect a date, game and pitcher and can see a variety of data on thatpitcher's pitching in the game. For example, Jon Garland's pitches inhis September 3, 2009 start for the Dodgers against the Diamondbacksishere.

Josh Kalk's website has a PitchFX data interface. A description of it(with user comments) ishere.

There is an on-line encyclopedia of Japanese baseball stats availablehere,including player registers, yearly standings and team records. Thereare separate sections for Japanese and non-Japanese players. A samplebatting register (Oh - Oishi) ishere.

Downloading Baseball Data

In this section, we will discuss webistes that provide data you candownload or purchase on CD. contains a number of tables that togethercomprise a encyclopedia of seasonal data. There are 27 tables in all,with everything from the usual (batting, pitching and fielding data)to the less usual (salary data, award and hall of fame voting).The tables are available in both comma-delimited (txt files) and as aMySQL database.

A good tutorial on how to use this data isStatistically Speaking(part 2 of the tutorial ishere)which contains a good description on how to get and install MySQL, how toadd the Baseball-Databank data into it, and how to query it.

There is also a good discussion on the data and how to use it inJoseph Adler's 2006 book Baseball Hacks.

There is also a yahoo egroup to discuss the data available at thiswebsitehere.

The data is free.

The Baseball Archive contains the same data that is available atBaseball-Databank, but it is available here in some different formats,including Microsoft Access (free) and on a CD-Rom (not free).This site also contains documentation on the tables in the databasehere.

Also available at this site isThe Baseball Statistics System,a free Windows application developed by Randy Myers, which is aninteractive interface to the data. Documentation on this is availablehere.

Retrosheet contains two basic types of game data: event files and gamelogs. Event files come into two varieties: regular event files,containing a play-by-play description of a game, and box score eventfiles, which contains information sufficient to generate a box scorefor a game but does not contain play descriptions.Game logs contain a wide variety of information on each game (not allof the information is available for each year) back to 1871.

Both types of event files can be downloadedhere.The format of the regular event files are describedhereand the box score event files are describedhere.

There is a step-by-step example showing how to use the event fileshere.

Retrosheet makes some software available for accessing regular eventfiles (running on Windows) called BOX, BEVENT and BGAME. They are describedhere.

Chadwick is an excellent software package, written by Ted Turocy, that canbe used to access both regular and box score event files. Adescription of how it works, as well as how to download and installit, ishere.

Retrosheet's game logs are availablehereand their format is describedhere.

An excellent tutorial on how to use Retrosheet data (with Chadwick) is atRetrosheet Database.There is also an fine discussion on using their data inJoseph Adler's Baseball Hacks

There is also a yahoo egroup to discuss the data available at thiswebsitehere.

The Complete Baseball Encyclopedia was developed by Lee Sinans andallows the users to sort player data and generate lists in a varietyof ways. You can not actually download the data, but you can purchaseit on CD.

Old-Time Data is the brainchild of Pat Doyle and is actually twoproducts for purchase on CD: Professional Baseball PlayersDatabase, containing a few batting and pitching stats for both theminor and major leagues from 1922 to 2004, and Professional Baseball PlayersStatistical Database, containing a lot more statistics for the samegroup of players from 1920 to 1945.

Since most of the major league data is covered better elsewhere, thefocus here is on the minor league data. And while it is true thatSABR and baseball-reference now cover much the same territory withtheir on-line data, Pat Doyle's products are valuable because they provide asecond, independent view of this data. They also have the advantage of notrequiring internet access, since it is installed directly on your computer.

National Pastime Almanac is a free downloadable that runs on Windowsand lets you do a wide variety of sorting and selecting on seasonaland career player and manager data. Its user interface is a little old-fashioned(it really likes to take over your entire screen and you need tocontinually resize it if you don't want it hogging of all the screen'sreal estate) but it does let you run a wide variety of queries on itsdata. For example, if you've ever wondered how many pitchers walked100 or more batters, struck out 200 or more batters and posted an ERAunder 2.00, this tool will quickly tell you all about Jack Coombs(1910), Hal Newhouser (1945) and Sam McDowell (1968).

The Seamheads Ballparks Database is an MS Access database produced byKevin Johnson and contains statistical and descriptive historicalballpark data. A description of what the database contains ishere.

Pro Yakyu Now contains two seperate database for downloading(available in itsdatasection): Michael Westbay's Pro Yakyu Database and Michael Eng'sJapanese Baseball Database. Both are available in comma-delimited text (csv)files. It looks like there is more coverage of the olderplayers in Michael Eng's database.

PitchFX Data

We mentioned PitchFX above briefly (when discussing theBrooksBaseball.netsite). the PitchFX data from There is a separate directoryfor each year, month, day and, finally, game. For example, thedirectory containing the PitchFX for the Mets-Rockies game on July 13,2008 ishere.A good tutorial in how to capture and use this data ishere.One of the problems it deals with is to how to set up scripts toautomatically download the data from all of these directories into asingle database.There is onewebsitethat has done some of the work in collecting all the data from thesedirectories into a single SQL database from 2007 to 2009.

In our book, Max and I describe the process of downloading Retrosheet play-by-play data (Appendix A.1) and computing run values of all events (Chapter 5). Here we illustrate some updated functions for downloading the data and computing the run values.

Downloading the Data
Our book assumes that you have a Windows environment, but now I use a Mac laptop and so I was motivated to adapt our downloading instructions for a Mac.

1. If you have a Mac, you need to install the Chadwick files. Here is an excellent description of how to install the Chadwick Software on a Mac.

2. In the current working directory, create a “download.folder”, and two subfolders “unzipped” and “zipped” inside “download.folder”. From the book script and data web site, download the file fields.csv and put this file in the “upzipped” folder. (For a Windows computer, you need to have the Chadwick cwevent.exe inside the “upzipped” folder.)

3. I updated a function parse.retrosheet2.pbp (a slight modification of the one provided in our book) that downloads the Retrosheet play-by-play and roster files for a particular season and uses the Chadwick program to extract the data and creates two data files. What’s new is that the function will work for both Mac and Windows. You can see the function here — you can read this function into R by typing …

Computing Runs Values
1. In Chapter 5, I describe how to compute the run values for all plays. I have put all of the R code into a new function compute.runs.expectancy which is found here and can be sourced into R:

2. Now we’re ready to download, say all of the 2013 season play-by-play data, and compute the runs values.


3. The data frame d2013 contains all of the 2013 play data. I wrote a short function runs.expectancy to compute the expected runs in the remainder of the inning for all 24 bases/outs situation.

The 2013 expected runs are similar to those found from 2011 season data in Chapter 5 of the book. In the next post, I’ll use this play data frame to see which players last season were best in performing in the clutch.