Shrey Agarwal (Work Log)
09/27/2016 14:00 - 17:00:
- Set up personal and work log pages, accessed Remote Desktop.
- Compiled list of accelerators from Wiki
09/29/2016 14:00 - 16:15; 16:45 - 17:30:
- Created new project: Accelerator Seed List (Data) and worked with Dr. Egan to create schematic for data entry.
- Evaluated 3 sources and logged data. Sources were taken from List of Accelerators. Logged each step onto project page and identified categories that would be suitable for web crawling sometime in the future.
10/11/2016 14:00 - 17:30;
- Explored how to use regular expressions in TextPad to aid with data sorting (need to review expressions with Dr. Egan in future)
- Continued evaluating sources from List of Accelerators and recorded steps onto project page, as before. Finished evaluating the six sources from initial list. (All work done in Accelerator Seed List (Data))
10/13/2016 14:00 - 17:00;
- All work done in Accelerator Seed List (Data)
- Talked to Dr. Egan about project going forward. Need to pick out 10-15 accelerators from the sources listed on my project page and identify a reliable method for obtaining cohort information, as well as other variables
- Used google searches to identify more sources, and evaluated three databases with the help of TextPad
- Began working on more generic google searches. Was able to go through "Location+accelerator"-type searches today. Will continue next time.
10/18/2016 14:00 - 17:30;
- Work continued in Accelerator Seed List (Data)
- Took a sample size of 10 accelerators and detailed how to extract cohort information, as well as what other information is readily available from accelerator URLs.
- Brought Matthew up to speed on accelerator project, added summaries to each section so they became easier to follow, and worked with him to finish up extracting cohort information
10/20/16 14:30 - 17:30:
- Work continued in Accelerator Seed List (Data)
- Finished up the list of instructions for finding the cohort. Continued compiling the list of variables for each of the accelerators within the sample size.
- Consulted Peter on prospects of creating a web crawler with the information we currently have compiled. Determined it was possible, although beyond the scope of Peter's knowledge.
10/25/16 14:00 - 17:00
- Consulted Ed with next step for project.
- Began listing the E-R diagram onto the accelerator database page where entities were potential categories and each entity had its associated attributes
10/27/16 14:00 - 17:00
- Continued working with Matthew to identify elements in the E-R diagram for pulling information on accelerators.
- Found sources to obtain/cross-reference information (ie. Angel List)
11/08/16 14:00 - 18:00
- Identified possible keywords to filter results through for accelerators
- Began compiling a comprehensive list of accelerators based on the data we have already sifted through.
- Learned how to use regular expressions from Ben to sort names individually and alphabetically.
11/10/16 14:00 - 18:00
- Began sorting through accelerator list and removing duplicates, as well as identifying more places to pull names from.
- Worked with Peter to create a crawl for f6s because the website does not return only accelerators.
11/15/16 14:00 - 18:00
- Took a break from f6s to locate more lists based on individual google searches such as "city+accelerator+list"
- Put Seed DB information into an excel file on the remote desktop
11/17/16 14:00 - 16:00
- Continued filling out information for the random Google Searches
- Organized TextPad files on the RDP into coherent excel spreadsheets with proper headers on the table
- Noticed problem with f6s: it seems although all of the html coding was protected by a captcha so the crawler did not actually extract any information; it was all blocked.
11/22/16 14:00 - 17:00
- Worked to fix f6s crawler with Peter
- Finished and compiled master list of accelerators
12/01/16 14:00 - 18:00
- Caught up on project with Ed and Carlin
- Took 20 accelerators (241-260) from the list and filled out text.html files for them; finished the 20
12/05/16 13:00 - 16:00
- After finishing first 20 accelerators, continued working down the list, beginning at 321
- Work noted in Accelerator Seed List (Data), but mostly stored on McNair RDP
12/06/16 14:00 - 18:00
- Continued "Accelerating" down the list in Accelerator Seed List (Data), finished up until 340
12/08/16 14:00 - 17:00
- Continued working on accelerator list on the same page.
01/17/17 14:00 - 16:00
- Finished up "accelerating" from Accelerator Seed List (Data), numbers 341-351
1/18/17 14:00 - 16:00
- Finished accelerating for sure, went back and began an overview of the work done for quality control.
01/20/17 14:00 - 16:00
- Mandatory meeting, then worked through 2 of Ed's unfinished accelerators
1/23/17 14:00 - 16:00
- Worked with Matthew to go over about 70 items in the accelerator list and ensure that they follow a uniform structure and show correct information
1/24/17 14:00 - 16:00
- Worked with Peter to fix the problem with results not coming through on the new spreadsheet by renaming the file and including more symbols in the searches. Spreadsheet should be up to date now.
- Got to number 144 on the list while going through files.
1/25/17 14:00 - 16;00
- Continued looking through the list and fixing wrong entries or reporting them
1/26/17 14:00 - 16:00
- Talked with Ed about project going forward and tried to access the Crunchbase API with Peter to crawl for start-up companies.
- Continued working through the accelerator list, stopped at number 186.
1/27/17 14:00 - 16:00
- Continued looking through accelerator list and fixing any entries with error. Got to number 261.
1/30/17 14:30 - 16:30
- Got through about 425
1/31/17 14:00 - 16:00
- Got to number 502
2/01/17 14:00 - 16:00
- Finished looking through the initial list of accelerators and writing down which ones needed to be modified or completed (through 551)
2/03/17 14:00 - 17:00
- Finished about 30 entries for the accelerator entries that still needed to be completed. Worked out of the "NOT DONE" file in the server (which is now blank because everything is finished)
2/06/17 14:00 - 16:00
- Developed a standardized format for the text files with Matthew. Instructions are under "standardized format" in the accelerator seed list portion. I started at number 226 and standardized formats up until 370.
2/07/17 14:00-16:00
- Continued work from yesterday, completed up to number 488 from the list. Will likely need one more day to finish.
2/08/17 14:00 - 16:00
- Finished standardizing the txt files for use on the excel spreadsheet, compiled the data and examined the resultant tables. Realized we needed to fix some categories in the cohort files.
2/09/17 14:00 - 17:00
- Worked with Ed on a side project trying to gather information on climate change thanks to Baker's article on the Wall Street Journal
- Gathered information on climate change in relation to high-growth, high-risk innovation and organizations that deal with things such as carbon credits
2/10/17 14:00 - 17:00
- Realized that blog post was ambitious because we could not really find a clear purpose from the information we gathered, nor could we find a unique angle. Held off on the idea
- Went back to organizing the new columns and headers on the text file by identifying areas of error in the excel spreadsheet
2/15/17 14:00 - 16:00
- Spoke with Ed about free enterprise while he lectured all of us. It took about an hour.
- Looked at plans for project going forward including using linkedin to search the founders
2/20/17 14:00 - 16:00
- Found our first source for expanding the project into incubators, from angel.co. Seems similar to f6s in that we can crawl it and obtain a list of incubators and their various counterparts.
2/21/17 14:00 - 16:00
- Found more sources for incubators by reading through quora discussions and masters theses. Bookmarked these pages so that I could put them into text files after.
2/23/17 14:00 - 18:00
- Converted incubator files to text-pad and saved them (4 total), then cleaned them up through regex
- Took the cohort text file, put it into excel, and proceeded to clean up all of the mistakes in the excel document, particularly bad data or mistakes with organizations. Got through Y-Combinator.
2/24/17 14:00 - 16:00
- Finished up cleaning the cohort data for the names and the descriptions, but there still needs to be work done on the other stuff like dates and programs
2/28/17 14:00 - 16:00
- Created page Hub-Based Venture Firms and proceeded to research VC in Hubs listed on under E:\McNair\Projects\Hubs\summer 2016\Hubs Variables - Ariel.xls
- Looked at details such as whether they have in-house funds, whether they co-invest, focuses, and amounts invested.