Difference between revisions of "Collecting SBIR Data"
Jump to navigation
Jump to search
Line 11: | Line 11: | ||
Files are in: | Files are in: | ||
E:\McNair\Projects\SBIR | E:\McNair\Projects\SBIR | ||
+ | Each file is a group of 1000 companies. Each group of 1000 is numbered sequentially. | ||
==Rough notes== | ==Rough notes== |
Revision as of 12:21, 20 June 2017
Collecting SBIR Data | |
---|---|
Project Information | |
Project Title | Collecting SBIR Data |
Owner | Adrian Smart |
Start Date | June 6, 2017 |
Deadline | |
Keywords | Data, Tool |
Primary Billing | |
Notes | |
Has project status | Active |
Copyright © 2016 edegan.com. All Rights Reserved. |
Manual Collection
Files are in:
E:\McNair\Projects\SBIR
Each file is a group of 1000 companies. Each group of 1000 is numbered sequentially.
Rough notes
- Get the data from https://www.sbir.gov/sbirsearch/award/all
- Built a Selenium Web Driver which is stored in E:\McNair\Software\Scripts\Selenium Web Drivers
- Does not work because there is a captcha that must be entered after selecting xls download
Notes on building a Selenium Web Driver:
In your python script:
- Make sure that you properly set the chromedriver path if you don't have it under root. For example: webdriver.Chrome("/Users/adriansmart/PycharmProjects/SeleniumTest/chromedriver")
- Use driver.find_element_by_xpath to select the element on html page. You will need to enter the xpath in this function so first load the website in a browser.
- Next, right click on the page element you want the xpath and select inspect. This will launch the html inspector and highlight the relevant lines of code
- Right click on what looks like the right piece of code and select "Copy xpath data"
- Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']")