Difference between revisions of "USITC"
Jump to navigation
Jump to search
Line 11: | Line 11: | ||
The files are in: | The files are in: | ||
E:\McNair\Projects\USITC | E:\McNair\Projects\USITC | ||
+ | |||
+ | The results file is a csv of the data that I have been able to scrape from the HTML | ||
+ | of https://www.usitc.gov/secretary/fed_reg_notices/337.htm | ||
+ | |||
+ | For every notice paper, there is a line in the CSV file that | ||
+ | contains the Investigation Title, Investigation No., link to the PDF on the website, Notice description, and date the notice was issued | ||
+ | |||
==Status== | ==Status== |
Revision as of 15:11, 14 September 2017
USITC | |
---|---|
Project Information | |
Project Title | USITC Data |
Owner | Harrison Brown |
Start Date | 9/11/2017 |
Deadline | |
Primary Billing | |
Notes | In Progress |
Has project status | Active |
Copyright © 2016 edegan.com. All Rights Reserved. |
Files
This is where the files will go.
The files are in:
E:\McNair\Projects\USITC
The results file is a csv of the data that I have been able to scrape from the HTML of https://www.usitc.gov/secretary/fed_reg_notices/337.htm
For every notice paper, there is a line in the CSV file that contains the Investigation Title, Investigation No., link to the PDF on the website, Notice description, and date the notice was issued
Status
Check my work log to see what I have done on a day to day basis
Currently the web scraper is able to gather all of the data that I can gather from the HTML. There are a few cases where the Investigation Number is not listed and I need to test for those and fix that in the code.
Next steps will be to parse the PDFS