This page provides notes for the Winner's Curse Project with Jim Brander. Access is restricted to this page.

The two primary datasets are:

  • The CapGeek data
  • The data

CapGeek Data

All contract data was be taken from by a custom web-crawler. This includes all known (to capgeek) buyouts: Data was retrieved using player ID's, which are unique, start at 11, and end at 2352 (assigned to Biggs, Taylor, who signed a new entry level contract on 8/9 at 19:46. See and look for the highest entry for an new signing.) Not all ID's exist.

Data retrieved includes:

  • The player name (in the page level 1 header)
  • The player's position and team name (in the page level 2 header)
  • The full contract history. Notes on contracts (i.e. known clauses) are included in the dbase but not the STATA export.

CapGeek provides contract data on 2241 players, of which 1455 appear in the NHL statistics data, who held contracts from 2001 to the present day. 194 capgeek players have teams marked as 'Inactive', and these invariably do not appear in the NHL data. The missing capgeek players presumably play in the minors.

NHL Data

The NHL data was retrieved from using a custom webcrawler. The data covers the years 1997-present, except for the 2004-2005 season (which was a lockout). Data was retrieved seperately for skaters and goalies. We have data on all 2865 NHL players that have stats recorded on in the final dataset.

Joining to CapGeek does not provide unique player identifiers. These were created using player names and dates of birth. Players were matched to CapGeek using their names, and when names were non-unique matches were established manually by checking birth dates, teams and positions in both sources. Birth-date - player-name dyads were found to be unique. For a small percentage of players (approximately 45/1459) their names were recorded differently in each source. Fuzzy-matching software provided potential candidate matches and all matches were checked by hand.

Of the 2865 NHL players with statistics, we have CapGeek contract records for 1455 (51%). Part of the discrepancy is due to retirement. 442 players played their last game before 2001 and do not have CapGeek records. This leaves 968 (34%) of the NHL players that do not have contracts in CapGeek for some other reason - most likely that CapGeek's coverage is incomplete.



Each record has:

  • nhlid - a unique player index based on name and date of birth
  • skater - 1 for skaters, 0 for goalies
  • syear - the season starting year identifier (YYYY), for the 2000-2001 season this is 2000
  • playerage - in years, calculated as the age as of 1st October for the syear
  • capgeek - 1 if the player-season is in CapGeek, 0 otherwise

From the NHL data, each record has:

  • season - the NHL season YYYYXXXX (i.e., 20002001)
  • jerseyno
  • player - the player name
  • teamstring - the team(s) the NHL records the player as coming from (for the season)
  • teamcount - the number of teams the player played for (in the season)
  • team1 - the 1st team the NHL records the player as coming from
  • team2 - the 2nd team if applicable
  • team3 - the 3rd team if applicable
  • team4 - the 4th team if applicable
  • pos - the player's listed position (C, L, R, D, G)
  • dob - the player's Date of Birth
  • biobirth_city
  • biosorp - state or province
  • bioctry - country
  • bioht - height in inches
  • biowt - weight in lbs
  • biohanded - L or R handed
  • biorhanded - 1 if R, 0 if L
  • biork - Rookie (1 or 0)

Common NHL variables that are repeated for the Playoffs as seperate variables:

  • biogp - games played (int)
  • biog - goals (int)
  • bioa - assists (int)
  • biopim - penalty minutes (int)
  • biotoiperg - time on ice per game (secs), calculated for goalies

Note that Playoff data is included by repeating variables with a PO_ prefix. The unit of observation is a player-season during the regular season.

Calculated NHL and CapGeek (combined) variables:

  • CTeam - the contract team, deduced iff a player played for only one team during the contract
  • NoTeamsDuringContract - the number of teams a player played for during a contract
  • CTeamCommonFirst - the contract team, deduced iff a player always played primarily for the same team every season of the contract
  • FirstTeamServedPerContract - the team that a player primarily played for at the start of contract
  • ChangedCTeam - 1 if a player changed CTeam from this contract to the last, 0 otherwise.
  • ChangedCTeamCommonFirst - 1 if a player changed CTeamCommonFirst from this contract to the last, 0 otherwise.
  • SeasonDataButNoContractData - 1 if there is season data but no contract data, 0 otherwise
  • playedfinalcteam - 1 if the player was playing for the final contract this season, 0 otherwise
  • playedfinalcteamonly - 1 if the player only ever played for the his final contract team, 0 otherwise.

Contract Data

When a player has a record in capgeek, there is contract data. This is joined to the NHL data using unique player idenfiers (based on name and dob) and the season year. For these records the 'capgeek' variable is set to 1.

  • id - Capgeek ID number
  • length - contract length (years)
  • value - contract total value
  • type - contract type code: 1=ENTRY LEVEL, 2=STANDARD, 3=35-PLUS
  • typetext - the contract type
  • expstatus - The contract's expiry status code: 0=RFA, 1=UFA
  • expstatustext - the contract's expiry status
  • ahlsalary
  • nhlsalary
  • pbonuses
  • sbonus
  • caphit
  • cyear - contract year signed (i.e., first contract year)
  • cage - contract age for a given year (i.e. 1 for first year, 2 for second year, etc).
  • cgteam - the team code of the team that holds the player's contract (according to CapGeek)
  • playedcteam - 1 if the player played for the contract holder in the season, 0 otherwise
  • playedcteaonly - 1 if the player ONLY played for the contract holder in the season, 0 otherwise
  • buyout - 1 if the player had a buyout in that season year, 0 otherwise
  • byear - the buyout year
  • blength - the number of contract years bought out
  • bamount - the amount of the buyout
  • status - the type of the contract as a code: 1=ENTry, 2=RFA, 4=UFA, 9=UNKnown, 0=NONE (Because it is the first contract and previous type can't be deduced).
  • statustext - the text version of the above: ENT, RFA, UFA, UNK, NONE
  • prevstatus - the type of the previous contract as a code
  • contractid - unique contract ids
  • relcontractid - unique relative contract ids (i.e., within player sequential ids).
  • firstcontract - 1 if this is the first contract listed, 0 otherwise
  • typechanged - 1 if the status (RFA/UFA/etc) of this contract is different from the type of the previous contract
  • rfa2ufa - 1 if the transistion was from RFA last contract to UFA this period, 0 otherwise
  • ent2rfa
  • ent2ufa
  • unk2rfa
  • unk2ufa
  • transitioncode - 0=No Transition, 1=ENT->RFA, 2=ENT->UFA, 3=RFA->UFA, 4=UNK->RFA, 5=UNK->UFA, 9=UFA->RFA, -1=The previous contract status can't be deduced, because the current contract is the first listed.

Note that if a player played for the Atlanta Thrashers and the Winnipeg Jets, this counts as the same team.

The count of distinct NHLID's per transistion code is:

 transitioncode | count
             -1 |  1405
              0 |   348
              1 |   302
              2 |     5
              3 |   117
              5 |   383
              9 |    13

Note that -1 indicates NHLID's that had no previous contracts in CapGeek (such that I couldn't deduce the status, because the player didn't come in as ENTRY-LEVEL type). There are just 117 NHLIDs that changed from RFA->UFA.

Player's Data

Other Data