Maxine Tao (Work Log)

From edegan.com
Revision as of 16:14, 28 June 2018 by Maxine.tao (talk | contribs)
Jump to navigation Jump to search

Summer 2018

6/21 -- Downloaded Crunchbase data using API version 3.1, loaded 17 files into crunchbase2 database, checked each table to make sure specs matched new data and updated line counts. Grace and I ran into an issue with blank strings on date types. Date types with "" were not being read as null. We fixed this using a one-line command that we've written on Crunchbase Data. Later we used Connor's master list of 166 accelerators and tried to create a table with accelerators and their uuids by using the 'orgnizations' table. Some names matched multiple times and some did not match at all so we ended up with 179 matches, which we will clean through tomorrow.

6/22 -- Loaded Accelerator Master List as a table and matched on accelerator name or accelerator URL. Manually edited out bad results with same name and different URLs or different URLs and same names. There were 34 entries from the master accelerator list that could not be matched to anything in the crunchbase data table 'organizations'. Grace and I manually searched for these using ILIKE and found a number of matches that we added back into our spreadsheet of matches. Now we have a clean list of accelerator names, their matches from the crunchbase data, and their UUIDs.


6/25 -- Updated list of accelerators and their UUIDs with Connor and Grace (we now have 163 matches), created a table in database crunchbase2 called 'AccUUIDsFinal'. This is a list of 3 columns: accelerator names from the master list, accelerator names from crunchbase, accelerator UUIDs from crunchbase. Then we joined this table back to the needed info fields from crunchbase. This new table is called 'AccAllInfo'. From this table, joining accelerator UUIDs to company UUIDs does not work. This gives investors that have invested in accelerators. From this, Connor and I figured that company_name/company_uuid actually refers to the company being invested in. Joining accelerator names to investor names also gives nothing back. However, when I manually searched Y Combinator as an investor name, I got results back. Not sure what is going on - I think the accelerator names to investor names join should work.

6/26 -- Fixed yesterday's issue of no matches. The problem was that the investor_names field was surrounded with curly braces. I removed these and a clean version is saved in 'funding_rounds-no brackets.txt'. I found that matching accelerator UUIDs to investor UUIDs gives more matches than accelerator names to investor names. There are 631 matches, most of which are labeled as seed type investments.

6/27 -- Filled in a spreadsheet of the unique accelerators I got from yesterday's matches with flags indicating whether or not they take equity and notes about specifics. This is incomplete, there are some that I'm not sure about or couldn't find information for. Also helped Connor with manually filtering out duplicated company names. Helped Grace with LinkedIn crawler; it seems to work for founders that we have urls for but it crashes otherwise.

6/28 -- Worked with Minh and Grace to debug linkedin crawler. We had an issue with the xpath of the linkedin searchbox. Also helped Connor with filling in accelerator terms on master variable list. I filtered the list of accelerators and companies they've invested in by the investment amounts. If they match what is given on the website, I put them into a separate sheet under 'Accelerators and Investments'