Changes

Jump to navigation Jump to search
10 bytes removed ,  15:23, 29 May 2019
* output: tab separated text file (AngelList_companyTypeIncubator.txt)
* description: Uses selenium to search AngelList for companies with the type incubator using a list with the proper endings for the states (and Washington DC) to create the angelList URL. It clicks the more button at the bottom of the screen when necessary. It stores the results, state, company name, short description, and url to site within angelList to a tab separated text file.
 
===angelList_keywordIncubator.py ===
* input: text file with URL endings for states
* output: tab separated text file (AngelList_keywordIncubator.txt)
* description: Uses selenium to search AngelList for companies that appear using the key word "incubator" and using a list with the proper endings for the states (and Washington DC) to create the angelList URL. It clicks the more button at the bottom of the screen when necessary. It stores the results, state, company name, short description, and url to site within angelList to a tab separated text file.
 
=== masterFile.py ===
* inputs: two tab separated files (AngelList_companyTypeIncubator.txt, AngelList_keywordIncubator.txt)
* outputs: one tab separated file (angelList_masterFile.txt)
* description: masterFile.py performs a diff on the two tab separated files with angelListData and creates a master file containing unique entries for use in save_angelList_pages.py
 
 
=== save_angelList_pages.py ===
* input: one tab separated file (angelList_masterFile.txt)
* output: data folder containing html files
* description: Uses selenium to open the url to the site for the incubator within angelList then saves the webpage as a html file in a specified folder.
 
=== parse_company_info.py ===
* input: path to data folder containing html files
* output: tab separated file containing company info (angelList_companyInfo.txt)
* description: Iterates through the saved angelList files and collects information such as the company name, a short description, the location, company size, URL company website, and business tags. It saves the information in a tab separated text file.
 
 
 
=== parse_portfolio.py ===
* input: path to data folder containing html files
* output: tab separated file containing portfolio info (angelList_portfolio.txt)
* description: Iterates through the saved angelList files and collects information on the portfolio of the company, saving the company name and the company portfolio name as a tab separated text file.
 
 
=== parse_employees.py ===
* input: path to data folder containing html files
* output: tab separated file containing employee/founder info (angelList_employees.txt)
* description: Iterates through the saved angelList files and collects information on people that work at the company, saving the company name and the founder/employee name as a tab separated text file.
83

edits

Navigation menu