Changes

667 bytes added , 13:41, 21 September 2020

no edit summary

{{Project|Has project output=Data|Has sponsor=McNair ~~Projects~~Center

|Has title=Collecting SBIR Data

|Has owner=Adrian Smart,

|Has start date=June 6, 2017

|Has keywords=Data, Tool

|Has project status=~~Active~~Complete|Does subsume=SBIR Evaluation,

}}

==Manual Collection==

Files are in:

E:\McNair\Projects\SBIR

Each file is a group of 1000 companies. Each group of 1000 is numbered sequentially.

==Rough notes==

*Get the data from https://www.sbir.gov/sbirsearch/award/all

*Built a Selenium Web Driver which is stored in E:\McNair\Software\Scripts\Selenium Web Drivers

*Does not work because there is a captcha that must be entered after selecting xls download

==Notes on ~~build~~ building a Selenium Web Driver:==In your python script:

*Make sure that you properly set the chromedriver path if you don't have it under root. For example: webdriver.Chrome("/Users/adriansmart/PycharmProjects/SeleniumTest/chromedriver")

*Use driver.find_element_by_xpath to ~~locate~~ select the element on html page*To get . You will need to enter the xpath ~~from html,~~ in this function so first load the websitein a browser.*~~Right~~ Next, right click on the page element you want the xpath and select inspect. This will launch the html inspector and highlight the relevant lines of code

*Right click on what looks like the right piece of code and select "Copy xpath data"

*Paste that stuff in your python script where it asks for a path, For example: driver.find_element_by_xpath("//*[@id='solr-print-dropdown-button']")

= SBIR Concatenation =

==Objective==

The objective of this project was to concatenate 162 xlsx files into one large tab delimited text file <br>

==Script==

The python script can be found here:

E:\McNair\Projects\SBIR\concat_excel.py

The resulting file is located here:

E:\McNair\Projects\SBIR\SBIR.txt

Ed

Bureaucrats, Interface administrators, Administrators (Semantic MediaWiki), Administrators

7,649

edits

Changes

Collecting SBIR Data (view source)

Revision as of 13:41, 21 September 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools