Difference between revisions of "Winner's Curse"

From edegan.com
Jump to navigation Jump to search
imported>Ed
imported>Ed
Line 5: Line 5:
  
 
==Data==
 
==Data==
 +
 +
The Postgres database hosting the data is on my Sauder workstation (137.82.145.70) and is named 'NHL' (note the caps). You can log in remotely (providing my VPN is down) from powershell or any psql capable client with:
 +
psql -h 137.83.145.70 -u Ed NHL
 +
bottle
  
 
The two primary datasets are:
 
The two primary datasets are:

Revision as of 20:36, 23 January 2013

This page provides notes for the Winner's Curse Project with Jim Brander. Access is restricted to this page.

See also:

Data

The Postgres database hosting the data is on my Sauder workstation (137.82.145.70) and is named 'NHL' (note the caps). You can log in remotely (providing my VPN is down) from powershell or any psql capable client with:

psql -h 137.83.145.70 -u Ed NHL
bottle

The two primary datasets are:

  • The CapGeek data
  • The NHL.com data

CapGeek Data

All contract data was be taken from http://www.CapGeek.com by a custom web-crawler. This includes all known (to capgeek) buyouts: http://www.capgeek.com/buyouts.php. Data was retrieved using player ID's, which are unique, start at 11, and end at 2352 (assigned to Biggs, Taylor, who signed a new entry level contract on 8/9 at 19:46. See http://www.capgeek.com/latest_contracts.php and look for the highest entry for an new signing.) Not all ID's exist.

Data retrieved includes:

  • The player name (in the page level 1 header)
  • The player's position and team name (in the page level 2 header)
  • The full contract history. Notes on contracts (i.e. known clauses) are included in the dbase but not the STATA export.

CapGeek provides contract data on 2241 players, of which 1455 appear in the NHL statistics data, who held contracts from 2001 to the present day. 194 capgeek players have teams marked as 'Inactive', and these invariably do not appear in the NHL data. The missing capgeek players presumably play in the minors.

NHL Data

The NHL data was retrieved from NHL.com using a custom webcrawler. The data covers the years 1997-present, except for the 2004-2005 season (which was a lockout). Data was retrieved seperately for skaters and goalies. We have data on all 2865 NHL players that have stats recorded on NHL.com in the final dataset.

Joining to CapGeek

NHL.com does not provide unique player identifiers. These were created using player names and dates of birth. Players were matched to CapGeek using their names, and when names were non-unique matches were established manually by checking birth dates, teams and positions in both sources. Birth-date - player-name dyads were found to be unique. For a small percentage of players (approximately 45/1459) their names were recorded differently in each source. Fuzzy-matching software provided potential candidate matches and all matches were checked by hand.

Of the 2865 NHL players with statistics, we have CapGeek contract records for 1455 (51%). Part of the discrepancy is due to retirement. 442 players played their last game before 2001 and do not have CapGeek records. This leaves 968 (34%) of the NHL players that do not have contracts in CapGeek for some other reason - most likely that CapGeek's coverage is incomplete.

Variables

General

Each record has:

  • nhlid - a unique player index based on name and date of birth
  • skater - 1 for skaters, 0 for goalies
  • syear - the season starting year identifier (YYYY), for the 2000-2001 season this is 2000
  • playerage - in years, calculated as the age as of 1st October for the syear
  • capgeek - 1 if the player-season is in CapGeek, 0 otherwise

From the NHL data, each record has:

  • season - the NHL season YYYYXXXX (i.e., 20002001)
  • jerseyno
  • player - the player name
  • teamstring - the team(s) the NHL records the player as coming from (for the season)
  • teamcount - the number of teams the player played for (in the season)
  • team1 - the 1st team the NHL records the player as coming from
  • team2 - the 2nd team if applicable
  • team3 - the 3rd team if applicable
  • team4 - the 4th team if applicable
  • pos - the player's listed position (C, L, R, D, G)
  • dob - the player's Date of Birth
  • biobirth_city
  • biosorp - state or province
  • bioctry - country
  • bioht - height in inches
  • biowt - weight in lbs
  • biohanded - L or R handed
  • biorhanded - 1 if R, 0 if L
  • biork - Rookie (1 or 0)

Common NHL variables that are repeated for the Playoffs as seperate variables:

  • biogp - games played (int)
  • biog - goals (int)
  • bioa - assists (int)
  • biopim - penalty minutes (int)
  • biotoiperg - time on ice per game (secs), calculated for goalies

Note that Playoff data is included by repeating variables with a PO_ prefix. The unit of observation is a player-season during the regular season.

Calculated NHL and CapGeek (combined) variables:

  • CTeam - the contract team, deduced iff a player played for only one team during the contract
  • NoTeamsDuringContract - the number of teams a player played for during a contract
  • CTeamCommonFirst - the contract team, deduced iff a player always played primarily for the same team every season of the contract
  • FirstTeamServedPerContract - the team that a player primarily played for at the start of contract
  • ChangedCTeam - 1 if a player changed CTeam from this contract to the last, 0 otherwise.
  • ChangedCTeamCommonFirst - 1 if a player changed CTeamCommonFirst from this contract to the last, 0 otherwise.
  • SeasonDataButNoContractData - 1 if there is season data but no contract data, 0 otherwise
  • playedfinalcteam - 1 if the player was playing for the final contract this season, 0 otherwise
  • playedfinalcteamonly - 1 if the player only ever played for the his final contract team, 0 otherwise.

Contract Data

When a player has a record in capgeek, there is contract data. This is joined to the NHL data using unique player idenfiers (based on name and dob) and the season year. For these records the 'capgeek' variable is set to 1.

  • id - Capgeek ID number
  • length - contract length (years)
  • value - contract total value
  • type - contract type code: 1=ENTRY LEVEL, 2=STANDARD, 3=35-PLUS
  • typetext - the contract type
  • expstatus - The contract's expiry status code: 0=RFA, 1=UFA
  • expstatustext - the contract's expiry status
  • ahlsalary
  • nhlsalary
  • pbonuses
  • sbonus
  • caphit
  • cyear - contract year signed (i.e., first contract year)
  • cage - contract age for a given year (i.e. 1 for first year, 2 for second year, etc).
  • firstcyearserved - the first year of the contract that the player actually played (c.f. cage).
  • cgteam - the team code of the team that holds the player's contract (according to CapGeek)
  • playedcteam - 1 if the player played for the contract holder in the season, 0 otherwise
  • playedcteaonly - 1 if the player ONLY played for the contract holder in the season, 0 otherwise
  • buyout - 1 if the player had a buyout in that season year, 0 otherwise
  • byear - the buyout year
  • blength - the number of contract years bought out
  • bamount - the amount of the buyout
  • status - the type of the contract as a code: 1=ENTry, 2=RFA, 4=UFA, 9=UNKnown, 0=NONE (Because it is the first contract and previous type can't be deduced).
  • statustext - the text version of the above: ENT, RFA, UFA, UNK, NONE
  • prevstatus - the type of the previous contract as a code
  • contractid - unique contract ids
  • relcontractid - unique relative contract ids (i.e., within player sequential ids).
  • firstcontract - 1 if this is the first contract listed, 0 otherwise
  • typechanged - 1 if the status (RFA/UFA/etc) of this contract is different from the type of the previous contract
  • rfa2ufa - 1 if the transistion was from RFA last contract to UFA this period, 0 otherwise
  • ent2rfa
  • ent2ufa
  • unk2rfa
  • unk2ufa
  • transitioncode - 0=No Transition, 1=ENT->RFA, 2=ENT->UFA, 3=RFA->UFA, 4=UNK->RFA, 5=UNK->UFA, 9=UFA->RFA, -1=The previous contract status can't be deduced, because the current contract is the first listed.

Note that if a player played for the Atlanta Thrashers and the Winnipeg Jets, this counts as the same team.

The count of distinct NHLID's per transistion code is:

 transitioncode | count
----------------+-------
             -1 |  1405
              0 |   348
              1 |   302
              2 |     5
              3 |   117
              5 |   383
              9 |    13

Note that -1 indicates NHLID's that had no previous contracts in CapGeek (such that I couldn't deduce the status, because the player didn't come in as ENTRY-LEVEL type). There are just 117 NHLIDs that changed from RFA->UFA.

Player's Data

Variables are named 'view-name-prefix' followed by 'transformed NHL.com code'. The view name prefixes are three letter codes, listed in brackets next to the view names themselves. The untransformed NHL.com codes used are provided for each view name in the display box. The transformation rules are replacements as follows: % -> pc, / -> or OR / -> per (depending on context), + -> plus, - -> minus, " " -> "", 1 -> one, and so forth.

Skaters

Bios (bio)

# - Sweater Number 
Player - Player Name 
Team - Team 
Pos - Position 
DOB - Date of Birth 
Birth City - Birth City 
S/P - State/Province 
Ctry - Country 
Ht - Height in Inches 
Wt - Weight in Pounds 
S - Shoots 
Rk - Rookie 
GP - Games Played 
G - Goals 
A - Assists 
Pts - Points 
+/- - Plus/Minus 
PIM - Penalty Minutes 
TOI/G - Time On Ice Per Game 

Summary (sum)

PP - Power Play Goals 
SH - Short Handed Goals 
GW - Game Winning Goals 
GT - Game Tying Goals 
OT - Overtime Goals 
S - Shots 
S% - Shooting Percentage 
Sft/G - Average Shifts Per Game 
FO% - Faceoff Win Percentage 

Assists (ast)

ESA - Even Strength Assists 
SHA - Short Handed Assists 
PPA - Power Play Assists 
HmA - Home Assists 
RdA - Road Assists 
DvA - Own Division Assists 
ODvA - Other Division Assists 
A/G - Average Assists Per Game 

Division Scoring (dsc)

Dv GP - Division Games Played 
Dv G - Division Goals 
Dv A - Division Assists 
Dv P - Division Points 
Dv +/- - Division Plus/Minus 
ODGP - Other Division Games Played 
OD G - Other Division Goals 
OD A - Other Division Assists 
OD P - Other Division Points 
OD+/- - Other Division Plus/Minus 

Faceoff Percentages (fpc)

ESFOW - Even Strength Faceoffs Won 
ESFOL - Even Strength Faceoffs Lost 
PPFOW - Power Play Faceoffs Won 
PPFOL - Power Play Faceoffs Lost 
SHFOW - Shorthanded Faceoffs Won 
SHFOL - Shorthanded Faceoffs Lost 
HFOW - Home Faceoffs Won 
HFOL - Home Faceoffs Lost 
HFO% - Home Faceoff Win Percentage 
RFOW - Road Faceoffs Won 
RFOL - Road Faceoffs Lost 
RFO% - Road Faceoff Win Percentage 
FOW - Total Faceoffs Won 
FOL - Total Faceoffs Lost 
Tot - Total Faceoffs Taken 
FO% - Faceoff Win Percentage 

Goals (gls)

ESG - Even Strength Goals 
PPG - Power Play Goals 
SHG - Short Handed Goals 
1G - First Goals 
OTG - Overtime Goals 
GWG - Game Winning Goals 
GTG - Game Tying Goals 
HmG - Home Goals 
RdG - Road Goals 
DvG - Own Division Goals 
ODvG - Other Division Goals 
ENG - Empty Net Goals 
PSG - Penalty Shot Goals 
PST - Penalty Shots Taken 
G/G - Average Goals Per Game 

Home Scoring (hsc)

GP - Home Games Played 
G - Home Goals 
A - Home Assists 
P - Home Points 
+/- - Home Plus/Minus 
PIM - Home Penalty Minutes 
Pen - Home Penalties 
EV TOI/g - Home Avg Even TOI/game 
PP TOI/g - Home Avg PP TOI/game 
SH TOI/g - Home Avg SH TOI/game 
TOI/g - Home Avg TOI/game 
FO - Home Faceoffs Taken 
FOW - Home Faceoffs Won 
FOL - Home Faceoffs Lost 
FO% - Home Faceoff Win % 

Penalties (pen)

Minor - Minor Penalties 
Major - Major Penalties 
Misc - Misconduct Penalties 
G Misc - Game Misconduct Penalties 
Match - Match Penalties 

Plus Or Minus (pom)

Other Div +/- - Other Division Plus/Minus 
Team GF - Team Goals For 
Team PPGF - Team Powerplay Goals For 
Team GA - Team Goals Against 
Team PPGA - Team Powerplay Goals Against 

Points (pnt)

ESP - Even Strength Points 
SHP - Short Handed Points 
ODvP - Other Division Points 
P/G - Average Points Per Game 

Road Scoring (rsc)

GP - Road Games Played 
G - Road Goals 
A - Road Assists 
P - Road Points 
+/- - Road Plus/Minus 
PIM - Road Penalty Minutes 
Pen - Road Penalties 
EV TOI/g - Road Avg Even TOI/game 
PP TOI/g - Road Avg PP TOI/game 
SH TOI/g - Road Avg SH TOI/game 
TOI/g - Road Avg TOI/game 
FO - Road Faceoffs Taken 
FOW - Road Faceoffs Won 
FOL - Road Faceoffs Lost 
FO% - Road Faceoff Win % 

Real Time Stats (rts)

Hits - Hits 
BkS - Blocked Shots 
MsS - Missed Shots 
GvA - Giveaways 
TkA - Takeaways 
%Tm - % of Team Faceoffs Taken 
Shots - Shots 
% - Shooting Percentage 

Special Teams (spc)

G - Power Play Goals 
A - Power Play Assists 
P - Power Play Points 

Shooting (sho)

S/G - Average Shots Per Game 
PS Goals - Penalty Shot Goals 
PS Taken - Penalty Shots Taken 

Shootouts (sou)

Player - Player Name 
Team - Team 
Pos - Position 
Home: S - Home Shots 
G - Home Goals 
S% - Home Shooting Percentage 
GDG - Home Game-Deciding Goals 
Road: S - Road Shots 
G - Road Goals 
S% - Road Shooting Percentage 
GDG - Road Game-Deciding Goals 
GDG - Total Game-Deciding Goals

Goalies

Bios (gbio or bio, when common)

# - Sweater Number
Player - Player Name
Team - Team
DOB - Date of Birth
Birth City - Birth City
S/P - State/Province
Ctry - Country
Ht - Height in Inches
Wt - Weight in Pounds
C - Catches
Rk - Rookie
GP - Games Played
W - Wins
L - Losses
T - Ties
OT - OT and/or S/O Losses
GAA - Goals Against Average
Sv% - Save Percentage
SO - Shutouts

Summary (gsum or bio/sum, when common)

GS - Games Started
SA - Shots Against
GA - Goals Against
GAA - Goals Against Average
Sv - Saves
G - Goals
A - Assists
PIM - Penalty Minutes
TOI - Time On Ice

Special Teams (sav)

Even: SA - Even Strength Shots Against
GA - Even Strength Goals Against
Sv - Even Strength Saves
Sv% - Even Strength Save Percentage
Power Play: SA - Power Play Shots Against
GA - Power Play Goals Against
Sv - Power Play Saves
Sv% - Power Play Save Percentage
Shorthanded: SA - Short Handed Shots Against
GA - Short Handed Goals Against
Sv - Short Handed Saves
Sv% - Short Handed Save Percentage

Shootouts (gsou)

Home: W - Home Wins
L - Losses
SA - Shots Against
GA - Goals Against
Sv % - Save Percentage
Road: W - Wins
L - Losses
SA - Shots Against
GA - Goals Allowed
Sv % - Save Percentage

Other Data

Arbitration data

All I have so far is:

And some articles/blog posts: