Changes

Jump to navigation Jump to search
2,005 bytes added ,  16:45, 25 January 2016
New page: The goal of the Govtrack Webcrawler is to create and automated system in perl by which bills relevant to a certian topic can be pulled from the Govtrack API which can be found [https://www...
The goal of the Govtrack Webcrawler is to create and automated system in perl by which bills relevant to a certian topic can be pulled from the Govtrack API which can be found [https://www.govtrack.us/api/v2/bill?congress=114&order_by=-current_status_date here].

==Process==

In order to perform this task several libraries are used most of these libraries come with [http://www.activestate.com/activeperl ActivePerl] but we are also using
[http://search.cpan.org/~mlehmann/JSON-XS-3.01/XS.pm JSON::XS] in order to make parsing the JSON data simpler. The LWP::UserAgent and HTTP::Request libraries are used to pull data from the API.
use strict;
use [http://search.cpan.org/~ether/libwww-perl-6.15/lib/LWP/UserAgent.pm LWP::UserAgent];
use [http://search.cpan.org/~ether/HTTP-Message-6.11/lib/HTTP/Request.pm HTTP::Request];
use JSON;
Next the useragent object is created.
my $ua = new LWP::UserAgent;
Now the parameters used to search the api are decided. Currently we are searching for the 107 bills related to entrepreneurship during the 114th congress.
my $queryName = "Entrepreneurship";
my $congressNo = "114";
my $limit = "107";
Using these parameters the url can be constructed.
my $genUrl = "https://www.govtrack.us/api/v2/bill?order_by=-current_status_date&congress=". $congressNo."&q=".$queryName."&limit=".$limit;
The useragent object can now retrieve and decode the text from the url into JSON data.
my $genResponse = $ua->get($genUrl);
my $genContent=$genResponse->decoded_content;
After getting the resulting JSON Data bill id's can now be located. Using the ids another page can be found which contains more specific information about each bill. From this bill specific page the tags of each bill can be used to determine whether or not the bill is relevant and should be reviewed by Mcnair Center staff. Currently tags that are considered relevant:

*Commerce: ID 5914
*Business Investment and Capital: ID 5918
*Small Business: ID 5935
*Small Business Administration: ID 6769
Anonymous user

Navigation menu