Changes

Jump to navigation Jump to search
4,687 bytes added ,  17:34, 18 March 2016
no edit summary
to store all the data from each player. For the players without their own page these are the locations I found their data
foreach my $player (@{$tables[0]->{_content}[1]->{_content}}){
$name = $player->{_content}[0]->{_content}[0]; $position = $player->{_content}[1]->{_content}[0]; $age = $player->{_content}[2]->{_content}[0];
for players with their own page, the position and age can be found at the same place but the name and link to their page are found elsewhere.
$name = $player->{_content}[0]->{_content}[0]->{_content}[0]; $link = $player->{_content}[0]->{_content}[0]->{href};the link should be of the form /players/playerid# to which you can add http://www.generalfanager.com to get <nowiki>http://www.generalfanager.com/players/playerid#</nowiki> which is the link to that player's page. Using that link you can use the same method as described above to pull the HTML from that page and parse it into a tree structure. In order to do this I looped through all the players in playerdict, making sure to avoid any players without their own page. foreach my $loopplayer (keys %playerdict){ if ( @{$playerdict{$loopplayer}}[2]) {I constructed the url for the player using the following line, it should produce a structure similar to the one described above my $playerurl = "http://www.generalfanager.com". @{$playerdict{$loopplayer}}[2];Similarly I grabbed the data from that URL and parsed it into the variable $playertree. I found the player's team and birth date at the following locations my $teamstring = $playertree->{_content}[1]->{_content}[4]->{_content}[1]->{_content}[0]->{_content}[1]->{_content}[0]->{_content}[0]->{href}; my $birthstring = $playertree->{_content}[1]->{_content}[4]->{_content}[1]->{_content}[0]->{_content}[1]->{_content}[1]->{_content}[0];I then proceeded to clean up the strings using regexes. I removed unnecesarry information and spaces before and after the information like so $teamstring =~ s/\/teams\///; $teamstring =~ s/-|^\s+|\s+$/ /g; $birthstring=~ s/Birthdate:\s//; $birthstring=~s/^\s+|\s+$//g;now by looking down the playertree for tables we should find each contract as a table. I placed them into the array @playertables. Due to the irregular structure of the webpage I also had to look down the tree for the "contract_source" like so my @contract_sources = $playertree->look_down('class', 'contract_source');I then matched up the source with the contract using an index and began to loop through the contracts my $contidx = 0; foreach my $contract (@playertables){I then found the cap hit, aav, Total Value, Contract Length and Expiry Status and cleaned up the data using more regexes like so my $caphit = $contract->{_content}[1]->{_content}[0]->{_content}[0]; $caphit=~s/Cap Hit:\s\$|,|^\s+|\s+$//g; my $aav = $contract->{_content}[1]->{_content}[0]->{_content}[2]; $aav=~s/AAV:\s\$|,|^\s+|\s+$//g; my $totalvalue = $contract->{_content}[1]->{_content}[0]->{_content}[4]; $totalvalue=~s/Total Value:\s\$|,|^\s+|\s+$//g; my $contlength = $contract->{_content}[1]->{_content}[1]->{_content}[0]; $contlength=~s/Length:\s|\syears|^\s+|\s+$//g; my $expirystatus = $contract->{_content}[1]->{_content}[1]->{_content}[4]; $expirystatus=~s/Expiry Status:\s|^\s+|\s+$//g;Now in order to get the source, I used several conditional statements that look like below, I then cleaned up the Source using regexes my $source; if ((ref $contract_sources[$contidx]->{_content}[0] eq "HTML::Element") or ($contract_sources[$contidx]->{_content}[0] eq " ")) { $contidx++; } if (not $contract_sources[$contidx]->{_content}[1]) { $source = $contract_sources[$contidx]->{_content}[0]; unless (ref $source eq "") { $source = $source->{_content}[0]; } } elsif ($contract_sources[$contidx]->{_content}[1]->{_content}) { $source = $contract_sources[$contidx]->{_content}[1]->{_content}[0]; } else { $source = $contract_sources[$contidx]->{_content}[0]; } $source =~ s/\s+Source:\s+|^\s+//; $contidx++;These conditionals ensured that I always got the correct source for each contract.Finally I got the year, Salary, and bonuses of the contract, avoided any table rows that were not useful information, and cleaned up the numbers using regexes for (my $row = 3; $row<scalar(@{$contract->{_content}})-1; $row++){ unless ((ref $contract->{_content}[$row]->{_content}[0]->{_content}[0] eq "HTML::Element") or not (ref $contract->{_content}[$row]->{_content}[1] eq "HTML::Element")){ my $year = $contract->{_content}[$row]->{_content}[0]->{_content}[0]; $year =~ s/-\d+|^\s+|\s+$//g; my $nhlsalary = $contract->{_content}[$row]->{_content}[1]->{_content}[0]; $nhlsalary =~ s/[^\d]//g; my $perfbonus = $contract->{_content}[$row]->{_content}[3]->{_content}[0]; $perfbonus =~ s/[^\d]//g; my $signbonus = $contract->{_content}[$row]->{_content}[4]->{_content}[0]; $signbonus =~ s/[^\d]//g;Now with all that you should have all of the data that I looked for in its own variable, to do whatever you want with.
Anonymous user

Navigation menu