NHL
Old Material
Downloading Postgresql on Mac
Download package from:
http://www.enterprisedb.com/products-services-training/pgdownload#osx
Follow instructions given on the website. Macs already come with Perl, using the stackbuilder application which was also downloaded through the same link, download the PL/Perl package.
Variables
List of necessary variables and where to find them in the dropbox.
For all skaters we need:
NHLIDDetails.txt (likely a file we generate) ID (int) Playername from NHL, Playername from CapGeek, Playername from GeneralFanager DOB (transform to ISO8601)
NHLHistoric_Player_summary.txt & NHLPlayer_summary.txt (historic data set includes NHL Player summary except for two games of 2013-2014 season) Playername Current Team (string) Position (F, D) season (YYYY) goals (int) TOI (float)
NHLPlayer_points.txt Playername DOB PPG (float)
NHLPlayer_bios.txt playername dob game type (overtime or no overtime) weights (int) height (int) age (int) - calculated from DOB
NHLPlayer_faceOffPercentageAll.txt playername face-off wins (int)
Capgeek_10_processed-notepad.txt playername dob salary (int) length (int) contract start date (MM/DD/YYYY) contract type (EL, RFA, UFA, TFP) caphit (int) In a separate Table: Year and CPI (2010 Base Year)
Next Tasks
Spec General Fanager!
General Fanager Webcrawler
The Perl Libraries I used to create this webcrawler are
use strict; use LWP::Simple; use HTML::Tree;
Using the LWP::Simple library makes it easy to rip the HTML off the website by simply doing,
$content = get(your url as a string here);
The URL used to access the General Fanager page containing data from all the players is http://www.generalfanager.com/players.
Now the HTML::Tree library allows us to parse the HTML code into a more accessible tree structure.
$tree = HTML::Tree->new(); $tree->parse($content);
Now, with the HTML code parsed we can look down the tree to find what we are searching for.
$tree->look_down( '_tag', 'tag of what you are looking for here')
Will return an array with each element of the array containing the HTMl tree down from where the tag was found. I used the tag table because it was the most specific tag above the player stat, and put the resuls into the @tables variable. Now in order to access the data of each individual player you must look inside the @tables variable