Difference between revisions of "Collecting SBIR Data"
Jump to navigation
Jump to search
(5 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | {{McNair | + | {{Project |
+ | |Has project output=Data | ||
+ | |Has sponsor=McNair Center | ||
|Has title=Collecting SBIR Data | |Has title=Collecting SBIR Data | ||
|Has owner=Adrian Smart, | |Has owner=Adrian Smart, | ||
Line 5: | Line 7: | ||
|Has keywords=Data, Tool | |Has keywords=Data, Tool | ||
|Has project status=Complete | |Has project status=Complete | ||
− | |Does subsume=SBIR | + | |Does subsume=SBIR Evaluation, |
}} | }} | ||
==Manual Collection== | ==Manual Collection== | ||
Line 26: | Line 28: | ||
*Right click on what looks like the right piece of code and select "Copy xpath data" | *Right click on what looks like the right piece of code and select "Copy xpath data" | ||
*Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']") | *Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']") | ||
+ | |||
+ | = SBIR Concatenation = | ||
+ | ==Objective== | ||
+ | The objective of this project was to concatenate 162 xlsx files into one large tab delimited text file <br> | ||
+ | ==Script== | ||
+ | The python script can be found here: | ||
+ | E:\McNair\Projects\SBIR\concat_excel.py | ||
+ | The resulting file is located here: | ||
+ | E:\McNair\Projects\SBIR\SBIR.txt |
Latest revision as of 12:41, 21 September 2020
Collecting SBIR Data | |
---|---|
Project Information | |
Has title | Collecting SBIR Data |
Has owner | Adrian Smart |
Has start date | June 6, 2017 |
Has deadline date | |
Has keywords | Data, Tool |
Has project status | Complete |
Does subsume | SBIR Evaluation |
Has sponsor | McNair Center |
Has project output | Data |
Copyright © 2019 edegan.com. All Rights Reserved. |
Contents
Manual Collection
Files are in:
E:\McNair\Projects\SBIR
Each file is a group of 1000 companies. Each group of 1000 is numbered sequentially.
Rough notes
- Get the data from https://www.sbir.gov/sbirsearch/award/all
- Built a Selenium Web Driver which is stored in E:\McNair\Software\Scripts\Selenium Web Drivers
- Does not work because there is a captcha that must be entered after selecting xls download
Notes on building a Selenium Web Driver:
In your python script:
- Make sure that you properly set the chromedriver path if you don't have it under root. For example: webdriver.Chrome("/Users/adriansmart/PycharmProjects/SeleniumTest/chromedriver")
- Use driver.find_element_by_xpath to select the element on html page. You will need to enter the xpath in this function so first load the website in a browser.
- Next, right click on the page element you want the xpath and select inspect. This will launch the html inspector and highlight the relevant lines of code
- Right click on what looks like the right piece of code and select "Copy xpath data"
- Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']")
SBIR Concatenation
Objective
The objective of this project was to concatenate 162 xlsx files into one large tab delimited text file
Script
The python script can be found here:
E:\McNair\Projects\SBIR\concat_excel.py
The resulting file is located here:
E:\McNair\Projects\SBIR\SBIR.txt