Work Logs

From edegan.com
Revision as of 15:14, 7 September 2017 by Hbrown512 (talk | contribs) (→‎Technical)
Jump to navigation Jump to search


Work Logs are broken down within two divisions of McNair Center, the long-term deliverables of academic papers and short-term deliverables of general content. Individuals working within a division will be listed under the respective one. In case an individual works within both divisions, they will be listed in both locations.


Academic Papers

This division of the McNair Center pursues longer term projects, such as peer-reviewed academic papers.

Jake Silberman

Jake Silberman Work Logs (log page)


Will Cleland

Will Cleland Work Logs (log page)

Todd Rachowin

Todd Rachowin Work Logs (log page)


Amir Kazempour

Amir Kazempour Work Logs (log page)


Content

This division of the McNair Center focuses on shorter term projects, including blog posts, tweets, and issue briefs.

Dylan Dickens

2018-03-06:Troubleshot Key Terms program with Christy, continued to read articles.

2018-03-05: Tested the Key Terms program, found it not to be working. Troubleshot and alerted Christy.

2018-03-01: Started to read articles for key-terms testing.

2018-02-28: Adjusted some wiki pages, started testing the revamped tools.

2018-02-27: Drafted email with concerns to Ed, met with Ed to resolve concerns. Created action plan of testing the revamped tools and codifying a subset of known papers.

2018-02-26: Reviewed Christy's new documentation, prepared to meet with Ed.

2018-02-22: Tested RegEx-Excel Filter process, flagged some additional questions that need guidance from Ed. Met with Christy and worked to resolve coding issues.

2018-02-21: Finished RegEx-Excel Filter process, spoke with Ed about long-term goals of project.

2018-02-20: Continued working on the RegEx-Excel filter.

2018-02-19: Continued working on the RegEx-Excel filter.

2018-02-15: Started developing a RegEx and Excel filter for processing and cross-referencing sources.

2018-02-14: Identified the status of all codes. Drafted an email to Christy about retunring temporarily to help with the codes.

2018-02-13: Ran the KeyTerms and PDF Converter Python Codes.

2018-02-12: Finished troubleshooting crawler, reached out to Ed for guidance. Was redirected to testing Key Terms code.

2018-02-07: Troubleshot the crawler with Christy.

2018-02-06: Troubleshot the crawler with Christy.

2018-02-05: Reached out to Ed for guidance, was redirected to testing the scholar crawler.

2018-02-01: Continued PDF - BibTex filtering

2018-01-31: Started PDF - BibTex filtering process as per meetings with Christy and Lauren.

2018-01-30: Met with both Christy and Lauren.

2018-01-29: Reviewed the current state of PTLR project in order to prepare for meetings on Tuesday.

2018-01-26: Assisted with the McNair Center Event.

2018-01-25: Reached out to previous project owners to gather information for next steps. Was on standby to assist with the Lyceum Research Page

2018-01-24: Searched for tools to accomplish the strategies outlined in Patent Thicket Strategic Planning. Had a hard time locating anything, or getting a good grasp on where exactly the project is and what it needs. Gathered contact information for previous owners to make communications later this week. Also continued to prep the Lyceum Research Page for Ed.

2018-01-23: Finished Patent Thicket Strategic Planning and sent to Ed. Ed approved.

2018-01-22: Read Patent Thicket literature. Met with Ed to discuss broad strategy, began planning for next steps. Patent Thicket Strategic Planning

2018-01-18: Met with Ed to discuss Patent Thicket Project. Helped complete his research for the Amazon HQ2 Report.

2018-01-11: Finalized sourcing for Venture Capital Gap for Women

2018-01-10: Sourced all of Venture Capital Gap for Women, downloaded PDF's for about 3/4 of sources

2018-01-09: Found additional sources on the Venture Capital Gap for Women, as well as Fondren availability for a portion of the sources.


9/23/2016 2:00-4:30 Introductory explanation/exploration, helped Catherine find source for Ed

9/26/2016 2:00-4:00 Began research for blog post about largely unknown entrepreneurial hubs, checked links on McNair Center blog

9/27/2016 4:00-6:00 Set up personal and work log pages, researched for blog post




Eliza Martin

Eliza Martin Work Logs (log page)

Meghana Gaur

Meghana Gaur Work Logs (log page)

2017-12-1: worked with ed to build tables with firm/portco data on distance and fund/portco data on performance

2017-11-16: finished calculating great circle distances between firms, portco's, and branch offices (look at roundlinewithgcd table)

2017-11-14: worked on getting all roundline tables down to the firm level, instead of fund; running into small problems with calculating gcd between firms and portco's (will discuss with Ed)

2017-11-14: worked on joining ipo information to roundline; aggregated ipo information to the fund level (rather than fund)

2017-11-09: reloaded firm coords and also fund coords - re-building roundlinewithgcd (code is written, but fund coords weren't correctly loaded, so this code will be re-run), wrote code for fundtofirms and portcotofirms, but this code will be re-run once the firm codes are correctly loaded; working on joining portcoexitmaster to roundlinejoinerlean

2017-11-08: loaded roundlinewithgcd table (calculating gcd between portcos and funds), created GCD example with notes in datawork folder in MatchingEntrepsToVCs, worked on building portcostofirms

2017-11-07: loaded portcocoords table, joined portcocoords to roundlinejoinerlean, calculate gcd distance between funds and portco's, work on joining funds to firms

2017-11-03: loaded table/sql script for firms office locations into vcdb2 with latitude and longitude coordinates; joined coordinates to all clean base tables for firms, funds, branch offices, joined co and fund coordinates to roundlinejoinerlean in new table: roundlinecoords

2017-11-02: met with Ed; loaded tables/sql script for branch office and fund office locations into vcdb2 with latitude and longitude coordinates

2017-10-27: come up with next steps for matching firms to funds - for geocoding branch offices

2017-10-26: update VC Database Rebuild wiki; identify key for bocore table; verify that fundbasecore table was correctly cleaned after being being rebuilt by Ed

2017-10-24: met with Ed to discuss firmbase and branch office tables; find key for firmbasecore table; remove undisclosed firms from both firmbasecore and bocore

2017-10-12: peer edit and put Shelby's blog post into Wordpress; see what needs to be done on VC project; continue literature review for matching models

2017-10-11: finished loading tables (firmbase and branchoffice)

2017-10-6: load data using SQL code into tables, which is on Retrieving US VC Data From SDC

2017-09-29: completed pulling/normalizing data, still need to load data using SQL code into tables, which is on Retrieving US VC Data From SDC

2017-09-28: met with Ed, worked on pulling firm and branch office data from SDC

2017-09-22: join portcos and funds; and begin literature review of matching games/venture capital (located in "Matching Entreps to VC's project folder" on E drive."

2017-09-21: work with Ed on research project

2017-09-19: continue to work on joining portcoexits and roundlinejoiner tables in vcdb2, in MatchingEntrepsToVC folder under project management

2017-09-15: work on joining portcoexits and roundlinejoiner; create txt file called "Notes on Matching Funds to portcos" in the "Matching Entreps to VC's project folder" on E drive.

2017-09-14: build table roundlinejoinerapprop (appropriate the funds between funds; work on joining portcoexits and roundlinejoiner)

2017-09-27: rebuild portcoexits and work on apportioning amounts in roundlinejoiner

2017-09-07: work with Ed to familiarize with SQL script for VC project/vcdb2 database

2017-09-05: receive project from Ed; reacquaint with wiki, RDP, etc.


Marcela Interiano

Marcela Interiano Work Logs (log page)

Veeral Shah

Veeral Shah Work Logs (log page)

Ariel Sun

Ariel Sun Work Logs (log page)

Gunny Liu

Gunny Liu Work Logs (Work Log)

:General Information

Ben Baldazo

Ben Baldazo Work Logs (Work Log)

contributing Projects

Crunchbase Data / Accelerator Seed List (Data) : Combined this data in a table discussed on Crunchbase Data

Houston Innovation District

Augusta Startup Ecosystem

Houston Entrepreneurship Ecosystem Project

Houston Entrepreneurship

Start-Up Guide (Issue Brief)

Houston Accelerators and Incubators (Report)

Cofounding in Exchange for Equity

Start-Ups of Houston (Map)

worklog

2017-11-21: Worked with Ed to set up all of the ground work to begin joining tables for the purpose stated in yesterday's work log. Should be able to finish it upon returning next week, but until then, notes are all held within "Z:\bulk\crunchbase\AccFunding.psql" with the important parts under the header of "From Ed on 21st of Nov. To finish on Nov 27" 2017-11-20: Attempting to link 3 tables from psql crunchbasebulk to find accelerators that have invested in companies. Likely found success with the table "Acc_Funded_Cos" but the investor column is dirty, thus trying to do it cleaner with the aforementioned 3 table link

  • This is all noted in "Z:\bulk\crunchbase\AccFunding.psql" and the code for "Acc_Funded_Cos" is emphasized

2017-09-25: Followed Talk:Ben Baldazo (Work Log) to create documentation infrastructure for Augusta Startup Ecosystem


Shoeb Mohammed

Shoeb Mohammed Work Logs Work Log page

James Chen

James Chen Work Logs (log page)



Albert Nabiullin

Albert Nabiullin Work Logs (log page)


Carlin Cherry

Carlin Cherry Work Logs (log page)


Julia Wang

Julia Wang Work Logs (log page)

12/4-12/8 finalizing University Patents report

  • 12/4 9-12 edits, sent to Ed, confirming catering for party
  • 12/5 9-12 final edits, sent to Ed
  • 12/6 1-4 making City Agglomeration graphics
  • 12/7 1-3 wrapping up everything

11/27-12/1

  • 11/27 10-12 edits, sent to Ed
  • 11/29 10-12 catering order for lunch party, wiki page organization
  • 11/30 2:30-4:30 edits
  • 12/1 10-12 met with Ed, edits

11/20-11/22

  • 11/20 10-12 edits
  • 11/21 10-12 met with Ed, edits
  • 11/22 10-12 met with Ed, edits

11/13-11/17 deadline 11/16 final draft

  • 11/13 10-12 redoing reg table
  • 11/14 10-12 edits
  • 11/15 10-12:30 edits
  • 11/16 2:30-4 met with Ed, edits
  • 11/17 10-12 edits

11/6-11/10 deadline 11/16 final draft

  • 11/6 10-12 4th draft
  • 11/8 10-12 met with Ed, editing
  • 11/9 2:30-5:30 redoing graphs, restructuring introduction
  • 11/10 10-12, 3-4:30 redoing charts, rewriting body

10/30-11/3 deadline 11/16 final draft

  • 10/30 10-12 revisions
  • 10/31 10-12 revisions
  • 11/1 10-12 revisions, new data for basic funding
  • 11/2 10-12 revisions

10/23-10/27

  • 10/23 10-12 editing University Patents
  • 10/24 10-12 reran regressions, fixed problem with Cornell!
  • 10/25 10-12 sent 2nd draft to Anne
  • 10/26 2:30-4:30 revisions
  • 10/27 10-12 sent 3rd draft

10/16-10/20

  • 10/16 10-12 pulled Houston patent addresses
  • 10/17 10-12 pulled Houston patent addresses
  • 10/18 10-12 edited University Patents
  • 10/19 2:30-4:30 edited University Patents, tabled patent database reorganization until it is cleaned by Oliver/Shelby/Ed
  • 10/20 10-12 edited University Patents

10/11-10/13

  • 10/11 10-12 pulling Houston patents
  • 10/12 10-12 University patents, sent draft to Anne

10/2-10/6

  • 10/2 10-12 work on University Patents draft, close to sending
  • 10/3 10-12 Distracted by Augusta project, Reorganizing patent database
  • 10/4 10-12 Reorganizing whole patent database by city, state, pulling Crunchbase data for Augusta
  • 10/5 2:15-2:45 Augusta patents
  • 10/6 10-12 Reorganizing patents, figure out misspellings

9/25-9/29 finish draft

  • 9/25 10-12 remaking charts
  • 9/26 10-12 data pull for Augusta University
  • 9/27 10-12 data pull for Augusta University
  • 9/29 10-12 reran log regressions

9/18-9/22/2017

  • 9/18 10-12 cleaning data
  • 9/19 10-12 cleaning data
  • 9/20 10-12 created time-series data set
  • 9/21 2:30-4 reran regressions
  • 9/22 10-12, 2-3 remade charts

9/11-9/15/2017 Deadline: 9/15 - convert data to time-series, new charts

  • 9/11 10-12 Converting to time-series
  • 9/12 10-12 Check accuracy, converting to time-series, talked to Jeemin about next project
  • 9/13 10-12 Fix R&D data, previous SQL code
  • 9/14 2:30pm-4pm, 10:30pm-12am Fix R&D data
  • 9/15 10-12, 2-3 join data

9/5-9/8/2017 Putting together University Patents report

  • 9/5 10-12 Looked at report, created artifacts, cleaned University Patents folder
  • 9/6 10-12 Spoke with Ed about project organization
  • 9/7 2:30-4:30 Writing report
  • 9/8 10-12 Making data into time series: gyear, make tables and charts



Ramee Saleh

Ramee Saleh Work Logs (log page)

Avesh Krishna

Avesh Krishna Work Logs (Log Page)

Shrey Agarwal

Shrey Agarwal Work Logs (log page)

1/23/18 15:00 - 17:00

  • Became reacclimatized with the project, spoke with Ed about the direction for the rest of the semester

1/25/18 15:00 - 17:00

  • Began examining the data on pulled webpages relating to demo days

1/26/18 13:00 - 17:00

  • Began categorizing demo day pages based on: 1) relevance to accelerators, 2) relevance to the particular accelerator (got to 200)

1/30/18 15:00 - 17:00

  • Continued working through the demo day pages, spoke with Ed about using the data to work a better set (got to 450)

2/01/18 15:00 - 17:00

  • Finished the match and created pivot tables to count the number of repetitions (companies going through more than one accelerator)

2/06/18 15:00 - 17:00

  • Discussed with Matthew the best way to collect the VC data from the repetitions. We tried different matches through our SDC data to no avail

2/08/18 15:00 - 18:00

  • Continued attempting to match with SDC the different columns. Didn't work without separating the data into individual files, a very tedious process.

2/13/18 15:00 - 17:00

  • Spoke with Ed about incubators project, will begin as soon as we can time the accelerator startup investments. Ed is expecting us to begin sometime in the next two months, using a similar process as we did for incubators. The process should be handled by a new worker.

2/15/18 15:00 - 17:00

  • Talked to Ed about next steps for the project. Practiced accessing the CrunchBase database on SQL and brushed up on SQL code.

2/16/18 13:00 - 17:00

  • Sifted through the database for Crunchbase investment information.

2/20/18 15:00 - 17:00

  • Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates.

2/22/18 15:00 - 18:00

  • Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators.

2/27/18 15:00 - 17:00

  • Looked into using the WhoIs Parser in order to find when the companies went through their accelerators.


9/19/17 15:00 - 17:00

  • Became reacclimatized with the project, spoke with Ed about the direction for the rest of the semester

9/20/17 15:00 - 17:00

  • Worked on setting up a new pull for the updated SDC data

9/21/17 15:00 - 17:00

  • Finished the pull and sorted the data from the updated accelerator list

9/22/17 15:00 - 17:00

  • Tried to set up the matcher with Matthew; ran into some difficulties on Power Shell, returning a blank file in the output

9/26/17 15:00 - 17:00

  • Finished the match and created pivot tables to count the number of repetitions (companies going through more than one accelerator)

9/27/17 15:00 - 17:00

  • Discussed with Matthew the best way to collect the VC data from the repetitions. We tried different matches through our SDC data to no avail

9/28/17 16:00 - 17:00

  • Continued attempting to match with SDC the different columns. Didn't work without separating the data into individual files, a very tedious process.

9/29/17 15:00 - 17:00

  • Spoke with Ed about incubators project, will begin as soon as we can time the accelerator startup investments. Ed is expecting us to begin sometime in the next two months, using a similar process as we did for incubators. The process should be handled by a new worker.

10/02/17 15:00 - 17:00

  • Talked to Ed about next steps for the project. Practiced accessing the CrunchBase database on SQL and brushed up on SQL code.

10/03/17 15:00 - 17:00

  • Sifted through the database for Crunchbase investment information.

10/04/17 15:00 - 17:00

  • Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates.

10/06/17 15:00 - 17:00

  • Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators.

10/11/17 15:00 - 17:00

  • Looked into using the WhoIs Parser in order to find when the companies went through their accelerators.

10/12/17 15:00 - 17:00

  • Discovered that the Wayback Machine will not be a good option for identifying the time when a company went through the accelerator. Created a list of VC Companies and their earliest round date. Included a column for the date they went through their accelerators and will fill it in when we find a good method of finding this date.

10/16/17 15:00 - 17:00

  • Continued working on sorting VCCompanies by their earliest round date.

10/17/17 15:00 - 17:00

  • Worked with Ben to find a solution to our problem of data acquisition. Finalized earliest round date for VCCompanies.

10/18/17 15:00 - 17:00

  • Updated our VC data with Ed's help in order to increase the accuracy and completion of our data.

10/19/17 15:00 - 17:00

  • Organized all of our matched data and updated it in order to reflect the most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies.

10/20/17 15:00 - 17:00

  • Generated the new list of VCCompanies as well as their earliest round dates.

10/23/17 15:00 - 17:00

  • Worked on sorting out the discrepancies in our matched data.

10/24/17 15:00 - 17:00

  • Went through list of VCCompanies and began adding respective accelerators in order to proceed with VCPercentage table.

10/25/17 15:00 - 17:00

  • Continued going through list of VCCompanies and adding accelerators.

10/26/17 15:00 - 17:00

  • Continued going through list of VCCompanies and adding accelerators. Will have this completed on Monday.

10/30/17 15:00 - 17:00

  • Finished adding all of the accelerators to the list of VCCompanies. Added a column indicating whether or not the company went through two or more accelerators.

10/31/17 15:00 - 17:00

  • Began compiling data in the column for the dates that a specific company went through an Accelerator.

11/01/17 15:00 - 17:00

  • Finalized entering dates for Y Combinator cohort companies.

11/02/17 15:00 - 17:00

  • Continued entering cohort company dates into Excel file.

11/06/17 15:00 - 17:00

  • Began looking at keywords for identifying the cohort class dates for each company

11/07/17 15:00 - 17:00

  • Received list from Peter with the accelerator founders matched from the Crunchbase LinkedIn URLs and proceeded to find the links for those founders without a match on Crunchbase. Data found in "Unfound Founders List" in the Fall 2017 folder


Tay Jacobe

Taylor Jacobe Work Logs (log page)

2017-12-01: Finished up the California post, ready to publish.

2017-11-30: Finished and published the Augusta post. Worked on California post in Wordpress, adding a bit more content suggested by Ed.

2017-11-29: Cleaned up Augusta findings post and cleaned out spam comments on Wordpress; there were almost 2000 spam comments within the last 3 weeks, which is concerning. Maybe there is a reason it has increased so quickly?

2017-11-17: Worked on Augusta Findings post.

2017-11-16: Finished first draft of California post, California Growth (Blog Post). Continued looking into a "Future of Communication" post and what that would look like. Anne also suggested I write a post about Augusta Findings (Blog Post), so I began that!

2017-11-15: Peer edited Yunnie and Dianna's blog post drafts. Anne suggested another post: McNair projects>Agglomeration>PeterHarrison. Research growth of high-tech high-growth enterprises in California from 1986-2016. Use file of maps. Started working on the post.

2017-11-10: Spent the morning cleaning out the spam comments on the blog. More than 1000 of them! Kept investigating Blockchain; I think I've determined that it might not be worth doing a post about because there are already a lot of sources that have published pieces that explain blockchain in simple terms. Continued looking for future blog post ideas: new social media (https://www.techworld.com/social-media/bumble-founder-whitney-wolfe-herd-talks-harvey-weinstein-linkedin-future-of-social-network-3666350/), 3D printing, security in a time of increasing automation and digitalization, the future of communication (smartphones, etc.: what's next?)

2017-11-09: Reorganized work log. Continued researching blockchain and began a draft of a post that will explain the concept in simpler terms and discuss potential impacts of this new technology! Created graphs for the Fund of Funds post. Finished the post and put everything into wordpress.

2017-11-08: Compiled a list of cities in Greater Cincinnati to use for data for blog post. Tried to educate myself on blockchain to eventually write a post about it

2017-11-01: Tried to gather research to improve the VC FOF post. Edited and redrafted. Investigated other potential blog posts.

2017-10-26: Worked on Fund of Funds in VC Blog post

2017-10-25: Looked over summary and edited. Started working on a blog post on the role of fund of funds in venture capital by direction of Anne and Ed

2017-10-23: Worked on a summary document for the Houston Innovation District project, verbal summaries of data analysis

2017-10-19: Worked on Houston Innovation District more. All work is documented on the wiki page

2017-10-18: Working on Houston Innovation District project. Figuring out what we've done and what needs to be done. McNair center servers went down, while I was working, so I lost a decent amount of work that I had been doing to summarize what we had and had to start over when it was rebooted. Cleaned up the wiki page and summarized where we are so far: what data we have & where it is, what data we are currently collecting, and what data we still want/need. Began working on collecting information about tax codes, incentives for development offered in Houston.

2017-10-11: Worked on prep for Houston Innovation District project

2017-10-06: Added more slides and edited. Updated wiki page with info.

2017-10-05: Spent quite a while trying to figure out source data for patent data in slides for Augusta, then worked on cleaning up and adding to Augusta slides that were unfinished/not great, created cybersecurity slide

2017-10-04: Continued working on Augusta project

2017-09-27: Worked on data analysis and research for Augusta Project, looked into Augusta business news (there isn't very much of it!)

2017-09-21: Continued preparation for Augusta Startup Ecosystem and Houston Innovation District Projects

2017-09-20: Preliminary preparations for Augusta and Houston Projects


Matthew Ringheanu

Matthew Ringheanu Work Logs (log page)

9/11/2017 2:00-5:00 pm

  • Spoke to Ed about the project going forward. Organized the current updated data for our project.

9/12/2017 3:00-5:00 pm

  • Began going through the Cleaned Cohort Data Excel file and found a few problems with it. Will continue the cleaning process for the rest of the week.

9/13/2017 2:00-5:00 pm

  • Sorted through Cleaned Cohort Data and finalized our List of Accelerators. We can begin the process of creating our PercentVC table.

9/14/2017 3:00-5:00 pm

  • Completely finalized our dataset of accelerators and startups. Met with Michelle Passo to discuss objectives of the research for credit course.

9/18/2017 2:00-4:00 pm

  • Talked with Peter about the LinkedIn crawler data. Went through VC page that Meghana sent me.

9/19/2017 3:00-5:00 pm

  • Completed SDC pull of updated VC Data.

9/20/2017 2:00-5:00 pm

  • Attempted several times to run the Matcher. Cleaned our pulled data.

9/21/2017 3:00-5:00 pm

  • Came extremely close to running the Matcher the correctly. Reviewed the final LinkedIn data from Peter.

9/25/2017 2:00-5:00 pm

  • Finalized the matched file of accelerator companies with VC portfolio companies. Gave Ben the data on Georgia accelerators.

9/26/2017 3:00-5:00 pm

  • Worked on finding the duplicates in our Matched file in order to have the most accurate data.

9/27/2017 2:00-5:00 pm

  • Attempted to find a way to organize the duplicate matches.

9/28/2017 4:00-5:00 pm

  • Continued running through matched data in order to organize it effectively.

10/2/2017 2:00-5:00 pm

  • Talked to Ed about next steps for the project. Practiced accessing the crunchbase database on SQL. Brushed up on SQL code.

10/3/2017 3:00-5:00 pm

  • Searched the database for crunchbase investment information.

10/4/2017 2:00-5:00 pm

  • Pulled the funding rounds table from SQL and matched it with the companies that have received VC funding in order to gather round dates.

10/6/2017 3:00-5:00 pm

  • Went through the matched data. Brainstormed ways to get the dates for cohort companies going through accelerators.

10/11/2017 2:00-3:30 pm:

  • Looked into using the WhoIs Parser in order to find when the companies went through their accelerators.

10/12/2017 3:00-5:00 pm

  • Discovered that the Wayback Machine will not be a good option for finding when companies went through their accelerators. Created a list of VCCompanies and their earliest round date. Included a column for the date they went through their accelerators and will fill it in when we find a good method of finding this date.

10/16/2017 2:00-3:30 pm

  • Continued working on sorting VCCompanies by their earliest round date.

10/17/2017 3:00-5:00 pm

  • Worked with Ben to find a solution to our problem of data acquisition. Finalized earliest round date for VCCompanies.

10/18/2017 2:00-5:00 pm

  • Updated our VC data with Ed's help in order to increase the accuracy and completion of our data.

10/19/2017 3:00-5:00 pm

  • Organized all of our matched data and updated it in order to reflect the most recent SDC pull with Ed. Matched Crunchbase data with our cohort companies.

10/20/2017 2:00-3:30 pm

  • Generated the new list of VCCompanies as well as their earliest round dates.

10/23/2017 2:00-3:30 pm

  • Worked on sorting out the discrepancies in our matched data.

10/24/2017 3:00-5:00 pm

  • Went through list of VCCompanies and began adding respective accelerators in order to proceed with VCPercentage table.

10/25/2017 2:00-5:00 pm

  • Continued going through list of VCCompanies and adding accelerators.

10/26/2017 3:30-5:30 pm

  • Continued going through list of VCCompanies and adding accelerators. Will have this completed on Monday.

10/30/2017 2:00-3:30 pm

  • Finished adding all of the accelerators to the list of VCCompanies. Added a column indicating whether or not the company went through two or more accelerators.

10/31/2017 3:00-5:00 pm

  • Began compiling data in the column for Date Company went through Accelerator.

11/1/2017 2:00-4:00 pm

  • Finalized entering dates for Y Combinator cohort companies.

11/2/2017 4:00-5:30 pm

  • Continued entering cohort company dates into Excel file.

11/6/2017 2:00-4:00 pm

  • Continued entering cohort company dates into Excel file. Began compiling a list of keywords for demo day press releases.

11/7/2017 3:00-5:00 pm

  • Finished coming up with keywords for demo day crawler. Sent the final list to Peter.

11/8/2017 2:00-3:30 pm

  • Spoke to Ed and organized all of our current data.

11/9/2017 3:00-5:00 pm

  • Created a new project page called Accelerator Data and listed all relevant files as well as descriptions.

11/14/2017 3:00-5:00 pm

  • Looked up URLs and decided whether or not the webiste was relevant.

11/15/2017 2:00-5:00 pm

  • Created SQL database entitled "acceleratordata" and began creating tables from folder of All Relevant Files.

11/16/2017 3:00-5:00 pm

  • Continued to input tables into SQL database.

11/20/2017 2:00-5:00 pm

  • Cleaned text files in order to import tables into SQL database.

11/27/2017 2:00-5:00 pm

  • Worked with Peter to find and exclude irrelevant keywords on HTML pages. Began categorizing relevant demo day pages.

11/28/2017 3:00-5:00 pm

  • Finished inputting tables of relevant files into SQL database.

11/29/2017 2:00-5:00 pm

  • Went through accelerator HTML URLs. Spoke with Ed about going through HTMLs and classifying based on overall and specific relevance.

12/1/2017 3:00-5:00 pm

  • Worked through accelerator links and classified pages based on whether or not they provided relevant information about startup timing.

12/4/2017 10:00-12:00 pm

  • Continued running through demo day crawl URLs and scoring them based on relevance.

12/7/2017 1:00-4:30 pm

  • Finalized scoring of demo day URLs for the original crawl. Last day of work for this semester.


Meghana Pannala

Meghana Pannala Work Logs (log page)

Technical

Harsh

Harsh Upadhyay Work Logs (log page)

Peter Jalbert

Peter Jalbert Work Logs (log page)

2017-12-21: Last minute adjustments to the Moroccan Data. Continued working on Selenium Documentation.

2017-12-20: Working on Selenium Documentation. Wrote 2 demo files. Wiki Page is avaiable here. Created 3 spreadsheets for the Moroccan data.

2017-12-19: Finished fixing the Demo Day Crawler. Changed files and installed as appropriate to make linked in crawler compatible with the RDP. Removed some of the bells and whistles.

2017-12-18: Continued finding errors with the Demo Day Crawler analysis. Rewrote the parser to remove any search terms that were in the top 10000 most common English words according to Google. Finished uploading and submitting Moroccan data.

2017-12-15: Found errors with the Demo Day Crawler. Fixed scripts to download Moroccan Law Data.

2017-12-14: Uploading Morocco Parliament Written Questions. Creating script for next Morocco Parliament download. Begin writing Selenium documentation. Continuing to download TIGER data.

2017-12-06: Running Morocco Parliament Written Questions script. Analyzing Demo Day Crawler results. Continued downloading for TIGER geocoder.

2017-11-28: Debugging Morocco Parliament Crawler. Running Demo Day Crawler for all accelerators and 10 pages per accelerator. TIGER geocoder is back to Forbidden Error.

2017-11-27: Rerunning Morocco Parliament Crawler. Fixed KeyTerms.py and running it again. Continued downloading for TIGER geocoder.

2017-11-20: Continued running Demo Day Page Parser. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.

2017-11-16: Continued running Demo Day Page Parser. Fixed KeyTerms.py and trying to run it again. Forbidden Error continues with the TIGER Geocoder. Began Image download for Image Classification on cohort pages. Clarifying specs for Morocco Parliament crawler.

2017-11-15: Continued running Demo Day Page Parser. Wrote a script to extract counts that were greater than 2 from Keyword Matcher. Continued downloading for TIGER Geocoder. Finished re-formatting work logs.

2017-11-14: Continued running Demo Day Page Parser. Wrote an HTML to Text parser. See Parser Demo Day Page for file location. Continued downloading for TIGER Geocoder.

2017-11-13: Built Demo Day Page Parser.

2017-11-09: Running demo version of Demo Day crawler (Accelerator Google Crawler). Fixing work log format.

2017-11-07: Created file with 0s and 1s detailing whether crunchbase has the founder information for an accelerator. Details posted as a TODO on Accelerator Seed List page. Still waiting for feedback on the PostGIS installation from Tiger Geocoder. Continued working on Accelerator Google Crawler.

2017-11-06: Contacted Geography Center for the US Census Bureau, here, and began email exchange on PostGIS installation problems. Began working on the Selenium Documentation. Also began working on an Accelerator Google Crawler that will be used with Yang and ML to find Demo Days for cohort companies.

2017-11-01: Attempted to continue downloading, however ran into HTTP Forbidden errors. Listed the errors on the Tiger Geocoder Page.

2017-10-31: Began downloading blocks of data for individual states for the Tiger Geocoder project. Wrote out the new wiki page for installation, and beginning to write documentation on usage.

2017-10-30: With Ed's help, was able to get the national data from Tiger installed onto a database server. The process required much jumping around and changing users, and all the things we learned are outlined in the database server documentation under "Editing Users".

2017-10-25: Continued working on the TigerCoder Installation.

2017-10-24: Throw some addresses into a database, use address normalizer and geocoder. May need to install things. Details on the installation process can be found on the PostGIS Installation page.

2017-10-23: Finished Yelp crawler for Houston Innovation District Project.

2017-10-19: Continued work on Yelp crawler for Houston Innovation District Project.

2017-10-18: Continued work on Yelp crawler for Innovation District Project.

2017-10-17: Constructed ArcGIS maps for the agglomeration project. Finished maps of points for every year in the state of California. Finished maps of Route 128. Began working on selenium Yelp crawler to get cafe locations within the 610-loop.

2017-10-16: Assisted Harrison on the USITC project. Looked for natural language processing tools to extract complaintants and defendants along with their location from case files. Experimented with pulling based on parts of speech tags, as well as using geotext or geograpy to pull locations from a case segment.

2017-10-13: Updated various project wiki pages.

2017-10-12: Continued work on Patent Thicket project, awaiting further project specs.

2017-10-05: Emergency ArcGIS creation for Agglomeration project.

2017-10-04: Emergency ArcGIS creation for Agglomeration project.

2017-10-02: Worked on ArcGIS data. See Harrison's Work Log for the details.

2017-09-28: Added collaborative editing feature to PyCharm.

2017-09-27: Worked on big database file.

2017-09-25: New task -- Create text file with company, description, and company type.

  1. VC Database Rebuild
  2. psql vcdb2
  3. table name, sdccompanybasecore2
  4. Combine with Crunchbasebulk
  1. TODO: Write wiki on linkedin crawler, write wiki on creating accounts.

2017-09-21: Wrote wiki on Linkedin crawler, met with Laura about patents project.

2017-09-20: Finished running linkedin crawler. Transferred data to RDP. Will write wikis next.

2017-09-19: Began running linkedin crawler. Helped Yang create RDP account, get permissions, and get wiki setup.

2017-09-18: Finished implementation of Experience Crawler, continued working on Education Crawler for LinkedIn.

2017-09-14: Continued implementing LinkedIn Crawler for profiles.

2017-09-13: Implemented LinkedIn Crawler for main portion of profiles. Began working on crawling Experience section of profiles.

2017-09-12: Continued working on the LinkedIn Crawler for Accelerator Founders Data. Added to the wiki on this topic.

2017-09-11: Continued working on the LinkedIn Crawler for Accelerator Founders Data.

2017-09-06: Combined founders data retrieved with the Crunchbase API with the crunchbasebulk data to get linkedin urls for different accelerator founders. For more information, see here.

2017-09-05: Post Harvey. Finished retrieving names from the Crunchbase API on founders. Next step is to query crunchbase bulk database to get linkedin urls. For more information, see here.

2017-08-24: Began using the Crunchbase API to retrieve founder information for accelerators. Halfway through compiling a dictionary that translates accelerator names into proper Crunchbase API URLs.

2017-08-23: Decided with Ed to abandon LinkedIn crawling to retrieve accelerator founder data, and instead use crunchbase. Spent the day navigating the crunchbasebulk database, and seeing what useful information was contained in it.

2017-08-22: Discovered that LinkedIn Profiles cannot be viewed through LinkedIn if the target is 3rd degree or further. However, if entering LinkedIn through a Google search, the profile can still be viewed if the user has previously logged into LinkedIn. Devising a workaround crawler that utilizes Google search. Continued blog post here under Section 4.

2017-08-21: Began work on extracting founders for accelerators through LinkedIn Crawler. Discovered that Python3 is not installed on RDP, so the virtual environment for the project cannot be fired up. Continued working on Ubuntu machine.


Harrison Brown

Harrison Brown Work Logs (log page)

2017-11-29:

  • Got the tab-delimited text files written for USITC data. Added detail to project page.

2017-11-29:

  • Finishing up converting JSON to tab-delimited text, see USITC/JSON_scraping_python. Worked on creating images with ArcGIS

2017-11-13:

  • Worked on getting JSON to tab-delimited text

2017-11-01:

  • Looked at Oliver's code. Got git repository set up for the project on Bonobo. Started messing around with reading the XML documents in Java.

2017-10-30:

  • Worked on seeing what data can be gathered from the CSV and XML files. Started project page for project.

2017-10-26:

  • Met with Ed to talk about the direction of the project. Starting to work on extracting information from the XML files. Working on adding documentation to wiki and work log. Looking into work from other projects that may use XML.

2017-10-25:

  • Found information about a USITC database that we could use. Added this information to the wiki, and updated information on USITC wiki page.

2017-10-19:

  • Continued to look into NLTK. Talked with Ed about looking into alternative approaches to gathering this data.

2017-10-18:

  • Trying to figure out the best way to extract respondents from the documents. Right now using exclusively NLTK will not get us any more accuracy that using regular expressions. Currently neither will allow us to match every entity correctly so trying to figure out alternate approaches.

2017-10-16:

  • NLTK
    • NLTK Information
      • Need to convert text to ascii. Had issues with my PDF texts and had to convert
      • Can use sent_tokenize() function to split document into sentences, easier that regular expressions
      • Use pos_tag() to tag the sentences. This can be used to extract proper nouns
        • Trying to figure out how to use this to grab location data from these documents
      • Worked with Peter to try to extract geographic information from the documents. We looked into tools Geograpy and GeoText. Geograpy does not have the functionality that we would like. GeoText looks to be better but we have issues with dependencies. Will try to resolve these next time.

2017-10-11:

  • Started to use NLTK library for gathering information to extract respondents. See code in Projects/USITC/ProcessingTexts

2017-10-05:

  • Made photos for the requested maps in ArcGIS with Peter and Jeemin.
       To access:
       Go to E:\McNair\Projects\Agglomeration\HarrisonPeterWorkArcGIS
        The photos can be found in there
       To generate the photos open ArcMap with the beginMapArc file
       To generate a PNG Click, File, Export to export the photos
       To adjust the data right click on the table name in the layers lab, and hit properties, then query builder

2017-10-04:

  • Worked with Peter on connecting ArcGIS to the database and displaying different points in ArcGIS

2017-10-02:

  • Started work with ArcGIS. Got the data with startups from Houston into the ArcGIS application. For notes see McNair/Porject/Agglomeration

2017-09-28:

  • Helped Christy with set up on Postgres server. Looked through text documents to see what information I could gather. Looked at Stanford NLTK library for extracting the respondents from the documents.

2017-09-28:

  • Got the PDFS parsed to text. Some of the formatting is off will need to determine if data can still be gathered.

2017-09-25:

  • Got 3000 PDFS downloaded. Script works. Completed a task to get emails for people who had written papers about economics and entrepreneurship. Started work on pasring the PDFS to text

2017-09-20:

  • Shell program did not work. Create Python program that can catch all exceptions (url does not exist, lost connection, and improperly formatted url) Hopefully it will complete with no problems. This program is found in the database server under the USITC folder.
  • Got connected to the database server and mounted the drive onto my computer. Got the list of all the PDFS on the website and started a shell script on the database server to download all of the PDFs. I will leave it running overnight hopefully it completes by tomorrow.

2017-09-17:

  • Added features to python program to pull the dates in numerical form. Worked on pulling the PDFs from the website. Currently working on pulling them in Python. The program can run and pull PDFs on my local machine but it doesn't work on the Remote Desktop. I will work on this next time.

2017-09-14:

  • Have a python program that can scrape the entire webpage and navigate through all of the pages that contain section 337 documents. You can see these files and more information on the USITC project page. It can pull all of the information that is in the HTML that can be gathered for each case. The PDFs now need to be scraped; will start work on that next time. Generated a csv file with more than 4000 entries from the webpage. There is a small edge case I need to fix where the entry does not contain the Investigation No.

2017-09-13:

  • Worked on parsing the USITC website Section 337 Notices. Nearly have all of the data I can scrape. Scraper works, but there are a few edges
  • cases where information in the tables are part of a Notice but do not have Investigation Numbers. Will finish this hopefully next time. Also added my USITC project to the projects page I did not have it linked

2017-09-11: Met with Dr. Egan and got assigned project. Set Up Project Page USITC, Started Coding in Python for the Web Crawler. Look in McNair/Projects/UISTC for project notes and code.

2017-09-07: Set Up Work Log Pages, Slack, Microsoft Remote Desktop


Administrative

Su Chen Teh

Su Chen Teh Work Logs (log page)

Juliette Richert

Juliette Richert Work Logs (log page)

Archive

This is the work log for archived members.