Winner's Curse
This page provides notes for the Winner's Curse Project with Jim Brander. Access is restricted to this page.
See also:
Data
The Postgres database hosting the data is on my Sauder workstation (137.82.145.70) and is named 'NHL' (note the caps). You can log in remotely (providing my VPN is down) from powershell or any psql capable client with:
psql -h 137.83.145.70 -U Ed NHL bottle
The two primary datasets are:
- The CapGeek data
- The NHL.com data
CapGeek Data
All contract data was be taken from http://www.CapGeek.com by a custom web-crawler. This includes all known (to capgeek) buyouts: http://www.capgeek.com/buyouts.php. Data was retrieved using player ID's, which are unique, start at 11, and end at 2352 (assigned to Biggs, Taylor, who signed a new entry level contract on 8/9 at 19:46. See http://www.capgeek.com/latest_contracts.php and look for the highest entry for an new signing.) Not all ID's exist.
Data retrieved includes:
- The player name (in the page level 1 header)
- The player's position and team name (in the page level 2 header)
- The full contract history. Notes on contracts (i.e. known clauses) are included in the dbase but not the STATA export.
CapGeek provides contract data on 2241 players, of which 1455 appear in the NHL statistics data, who held contracts from 2001 to the present day. 194 capgeek players have teams marked as 'Inactive', and these invariably do not appear in the NHL data. The missing capgeek players presumably play in the minors.
NHL Data
The NHL data was retrieved from NHL.com using a custom webcrawler. The data covers the years 1997-present, except for the 2004-2005 season (which was a lockout). Data was retrieved seperately for skaters and goalies. We have data on all 2865 NHL players that have stats recorded on NHL.com in the final dataset.
Joining to CapGeek
NHL.com does not provide unique player identifiers. These were created using player names and dates of birth. Players were matched to CapGeek using their names, and when names were non-unique matches were established manually by checking birth dates, teams and positions in both sources. Birth-date - player-name dyads were found to be unique. For a small percentage of players (approximately 45/1459) their names were recorded differently in each source. Fuzzy-matching software provided potential candidate matches and all matches were checked by hand.
Of the 2865 NHL players with statistics, we have CapGeek contract records for 1455 (51%). Part of the discrepancy is due to retirement. 442 players played their last game before 2001 and do not have CapGeek records. This leaves 968 (34%) of the NHL players that do not have contracts in CapGeek for some other reason - most likely that CapGeek's coverage is incomplete.
Variables
General
Each record has:
- nhlid - a unique player index based on name and date of birth
- skater - 1 for skaters, 0 for goalies
- syear - the season starting year identifier (YYYY), for the 2000-2001 season this is 2000
- playerage - in years, calculated as the age as of 1st October for the syear
- capgeek - 1 if the player-season is in CapGeek, 0 otherwise
From the NHL data, each record has:
- season - the NHL season YYYYXXXX (i.e., 20002001)
- jerseyno
- player - the player name
- teamstring - the team(s) the NHL records the player as coming from (for the season)
- teamcount - the number of teams the player played for (in the season)
- team1 - the 1st team the NHL records the player as coming from
- team2 - the 2nd team if applicable
- team3 - the 3rd team if applicable
- team4 - the 4th team if applicable
- pos - the player's listed position (C, L, R, D, G)
- dob - the player's Date of Birth
- biobirth_city
- biosorp - state or province
- bioctry - country
- bioht - height in inches
- biowt - weight in lbs
- biohanded - L or R handed
- biorhanded - 1 if R, 0 if L
- biork - Rookie (1 or 0)
Common NHL variables that are repeated for the Playoffs as seperate variables:
- biogp - games played (int)
- biog - goals (int)
- bioa - assists (int)
- biopim - penalty minutes (int)
- biotoiperg - time on ice per game (secs), calculated for goalies
Note that Playoff data is included by repeating variables with a PO_ prefix. The unit of observation is a player-season during the regular season.
Calculated NHL and CapGeek (combined) variables:
- CTeam - the contract team, deduced iff a player played for only one team during the contract
- NoTeamsDuringContract - the number of teams a player played for during a contract
- CTeamCommonFirst - the contract team, deduced iff a player always played primarily for the same team every season of the contract
- FirstTeamServedPerContract - the team that a player primarily played for at the start of contract
- ChangedCTeam - 1 if a player changed CTeam from this contract to the last, 0 otherwise.
- ChangedCTeamCommonFirst - 1 if a player changed CTeamCommonFirst from this contract to the last, 0 otherwise.
- SeasonDataButNoContractData - 1 if there is season data but no contract data, 0 otherwise
- playedfinalcteam - 1 if the player was playing for the final contract this season, 0 otherwise
- playedfinalcteamonly - 1 if the player only ever played for the his final contract team, 0 otherwise.
Contract Data
When a player has a record in capgeek, there is contract data. This is joined to the NHL data using unique player idenfiers (based on name and dob) and the season year. For these records the 'capgeek' variable is set to 1.
- id - Capgeek ID number
- length - contract length (years)
- value - contract total value
- type - contract type code: 1=ENTRY LEVEL, 2=STANDARD, 3=35-PLUS
- typetext - the contract type
- expstatus - The contract's expiry status code: 0=RFA, 1=UFA
- expstatustext - the contract's expiry status
- ahlsalary
- nhlsalary
- pbonuses
- sbonus
- caphit
- cyear - contract year signed (i.e., first contract year)
- cage - contract age for a given year (i.e. 1 for first year, 2 for second year, etc).
- firstcyearserved - the first year of the contract that the player actually played (c.f. cage).
- cgteam - the team code of the team that holds the player's contract (according to CapGeek)
- playedcteam - 1 if the player played for the contract holder in the season, 0 otherwise
- playedcteaonly - 1 if the player ONLY played for the contract holder in the season, 0 otherwise
- buyout - 1 if the player had a buyout in that season year, 0 otherwise
- byear - the buyout year
- blength - the number of contract years bought out
- bamount - the amount of the buyout
- status - the type of the contract as a code: 1=ENTry, 2=RFA, 4=UFA, 9=UNKnown, 0=NONE (Because it is the first contract and previous type can't be deduced).
- statustext - the text version of the above: ENT, RFA, UFA, UNK, NONE
- prevstatus - the type of the previous contract as a code
- contractid - unique contract ids
- relcontractid - unique relative contract ids (i.e., within player sequential ids).
- firstcontract - 1 if this is the first contract listed, 0 otherwise
- typechanged - 1 if the status (RFA/UFA/etc) of this contract is different from the type of the previous contract
- rfa2ufa - 1 if the transistion was from RFA last contract to UFA this period, 0 otherwise
- ent2rfa
- ent2ufa
- unk2rfa
- unk2ufa
- transitioncode - 0=No Transition, 1=ENT->RFA, 2=ENT->UFA, 3=RFA->UFA, 4=UNK->RFA, 5=UNK->UFA, 9=UFA->RFA, -1=The previous contract status can't be deduced, because the current contract is the first listed.
Note that if a player played for the Atlanta Thrashers and the Winnipeg Jets, this counts as the same team.
The count of distinct NHLID's per transistion code is:
transitioncode | count ----------------+------- -1 | 1405 0 | 348 1 | 302 2 | 5 3 | 117 5 | 383 9 | 13
Note that -1 indicates NHLID's that had no previous contracts in CapGeek (such that I couldn't deduce the status, because the player didn't come in as ENTRY-LEVEL type). There are just 117 NHLIDs that changed from RFA->UFA.
Player's Data
Variables are named 'view-name-prefix' followed by 'transformed NHL.com code'. The view name prefixes are three letter codes, listed in brackets next to the view names themselves. The untransformed NHL.com codes used are provided for each view name in the display box. The transformation rules are replacements as follows: % -> pc, / -> or OR / -> per (depending on context), + -> plus, - -> minus, " " -> "", 1 -> one, and so forth.
Skaters
Bios (bio)
# - Sweater Number Player - Player Name Team - Team Pos - Position DOB - Date of Birth Birth City - Birth City S/P - State/Province Ctry - Country Ht - Height in Inches Wt - Weight in Pounds S - Shoots Rk - Rookie GP - Games Played G - Goals A - Assists Pts - Points +/- - Plus/Minus PIM - Penalty Minutes TOI/G - Time On Ice Per Game
Summary (sum)
PP - Power Play Goals SH - Short Handed Goals GW - Game Winning Goals GT - Game Tying Goals OT - Overtime Goals S - Shots S% - Shooting Percentage Sft/G - Average Shifts Per Game FO% - Faceoff Win Percentage
Assists (ast)
ESA - Even Strength Assists SHA - Short Handed Assists PPA - Power Play Assists HmA - Home Assists RdA - Road Assists DvA - Own Division Assists ODvA - Other Division Assists A/G - Average Assists Per Game
Division Scoring (dsc)
Dv GP - Division Games Played Dv G - Division Goals Dv A - Division Assists Dv P - Division Points Dv +/- - Division Plus/Minus ODGP - Other Division Games Played OD G - Other Division Goals OD A - Other Division Assists OD P - Other Division Points OD+/- - Other Division Plus/Minus
Faceoff Percentages (fpc)
ESFOW - Even Strength Faceoffs Won ESFOL - Even Strength Faceoffs Lost PPFOW - Power Play Faceoffs Won PPFOL - Power Play Faceoffs Lost SHFOW - Shorthanded Faceoffs Won SHFOL - Shorthanded Faceoffs Lost HFOW - Home Faceoffs Won HFOL - Home Faceoffs Lost HFO% - Home Faceoff Win Percentage RFOW - Road Faceoffs Won RFOL - Road Faceoffs Lost RFO% - Road Faceoff Win Percentage FOW - Total Faceoffs Won FOL - Total Faceoffs Lost Tot - Total Faceoffs Taken FO% - Faceoff Win Percentage
Goals (gls)
ESG - Even Strength Goals PPG - Power Play Goals SHG - Short Handed Goals 1G - First Goals OTG - Overtime Goals GWG - Game Winning Goals GTG - Game Tying Goals HmG - Home Goals RdG - Road Goals DvG - Own Division Goals ODvG - Other Division Goals ENG - Empty Net Goals PSG - Penalty Shot Goals PST - Penalty Shots Taken G/G - Average Goals Per Game
Home Scoring (hsc)
GP - Home Games Played G - Home Goals A - Home Assists P - Home Points +/- - Home Plus/Minus PIM - Home Penalty Minutes Pen - Home Penalties EV TOI/g - Home Avg Even TOI/game PP TOI/g - Home Avg PP TOI/game SH TOI/g - Home Avg SH TOI/game TOI/g - Home Avg TOI/game FO - Home Faceoffs Taken FOW - Home Faceoffs Won FOL - Home Faceoffs Lost FO% - Home Faceoff Win %
Penalties (pen)
Minor - Minor Penalties Major - Major Penalties Misc - Misconduct Penalties G Misc - Game Misconduct Penalties Match - Match Penalties
Plus Or Minus (pom)
Other Div +/- - Other Division Plus/Minus Team GF - Team Goals For Team PPGF - Team Powerplay Goals For Team GA - Team Goals Against Team PPGA - Team Powerplay Goals Against
Points (pnt)
ESP - Even Strength Points SHP - Short Handed Points ODvP - Other Division Points P/G - Average Points Per Game
Road Scoring (rsc)
GP - Road Games Played G - Road Goals A - Road Assists P - Road Points +/- - Road Plus/Minus PIM - Road Penalty Minutes Pen - Road Penalties EV TOI/g - Road Avg Even TOI/game PP TOI/g - Road Avg PP TOI/game SH TOI/g - Road Avg SH TOI/game TOI/g - Road Avg TOI/game FO - Road Faceoffs Taken FOW - Road Faceoffs Won FOL - Road Faceoffs Lost FO% - Road Faceoff Win %
Real Time Stats (rts)
Hits - Hits BkS - Blocked Shots MsS - Missed Shots GvA - Giveaways TkA - Takeaways %Tm - % of Team Faceoffs Taken Shots - Shots % - Shooting Percentage
Special Teams (spc)
G - Power Play Goals A - Power Play Assists P - Power Play Points
Shooting (sho)
S/G - Average Shots Per Game PS Goals - Penalty Shot Goals PS Taken - Penalty Shots Taken
Shootouts (sou)
Player - Player Name Team - Team Pos - Position Home: S - Home Shots G - Home Goals S% - Home Shooting Percentage GDG - Home Game-Deciding Goals Road: S - Road Shots G - Road Goals S% - Road Shooting Percentage GDG - Road Game-Deciding Goals GDG - Total Game-Deciding Goals
Goalies
Bios (gbio or bio, when common)
# - Sweater Number Player - Player Name Team - Team DOB - Date of Birth Birth City - Birth City S/P - State/Province Ctry - Country Ht - Height in Inches Wt - Weight in Pounds C - Catches Rk - Rookie GP - Games Played W - Wins L - Losses T - Ties OT - OT and/or S/O Losses GAA - Goals Against Average Sv% - Save Percentage SO - Shutouts
Summary (gsum or bio/sum, when common)
GS - Games Started SA - Shots Against GA - Goals Against GAA - Goals Against Average Sv - Saves G - Goals A - Assists PIM - Penalty Minutes TOI - Time On Ice
Special Teams (sav)
Even: SA - Even Strength Shots Against GA - Even Strength Goals Against Sv - Even Strength Saves Sv% - Even Strength Save Percentage Power Play: SA - Power Play Shots Against GA - Power Play Goals Against Sv - Power Play Saves Sv% - Power Play Save Percentage Shorthanded: SA - Short Handed Shots Against GA - Short Handed Goals Against Sv - Short Handed Saves Sv% - Short Handed Save Percentage
Shootouts (gsou)
Home: W - Home Wins L - Losses SA - Shots Against GA - Goals Against Sv % - Save Percentage Road: W - Wins L - Losses SA - Shots Against GA - Goals Allowed Sv % - Save Percentage
Other Data
Arbitration data
All I have so far is:
- http://www.cbssports.com/nhl/story/10879613 which covers 2008
- http://www.nhlpa.com/news/salary-arbitration which doesn't give details, but appears the official source.
- http://www.mynhltraderumors.com/2011/07/06/2011-nhl-salary-arbitration-list/
- http://www.tsn.ca/nhl/feature/?id=46931
- http://www.mynhltraderumors.com/2010/07/06/31-players-file-for-nhlsalary-arbitration/
- http://www.tsn.ca/nhl/feature/?id=11919
And some articles/blog posts:
- http://www.cbc.ca/sports/hockey/nhl/story/2012/07/06/sp-nhl-hockey-salary-arbitration.html
- http://bleacherreport.com/articles/416350-hes-filing-for-arbitration-right
- http://espn.go.com/blog/nhl/post/_/id/18227/16-players-file-for-salary-arbitration
- http://www.google.com/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=4&ved=0CEsQFjAD&url=http%3A%2F%2Fscholarship.law.marquette.edu%2Fcgi%2Fviewcontent.cgi%3Farticle%3D1090%26context%3Dsportslaw&ei=oF0kUOjnDaStygGdloGICQ&usg=AFQjCNFVdbuIrKocqXl3w5Gm7gT32r5LcA