Research Plans
Contents
Ravali Kruthiventi
Project - USPTO Assignees, Patent and Citation Data
Assignees Data
- Data source: patent database (merged data from patent_2015 and patentdata databases)
- Issues: citations data contains non numeric patent numbers (likely application numbers, etc)
- Solution:
- Segregate into smaller tables so that Amir and Marcela can identify patterns
- link back to appropriate patent numbers from the patent table
- Time to implement: 1 day
- Priority:
- Teams waiting for it:
- Marcela and Amir
- Project : Patent data analysis
- Jake and James, potentially could need this down the line
- Project : LBO data
- Marcela and Amir
- Deadline:
- Data Source: USPTO Bulk Data repository
- Issues:
- The script inserts copies of data into the tables.
- Analysis required on the data to make sure the data was inserted correctly from the XML files.
- Analysis is also required to determine whether this data is better than the data we have in the patent database right now.
- Action owners : Amir and Marcela
- Solution:
- Amir and Marcela and/or I need to look at the data to determine quality
- If they find that any of this data is better than the data we currently have, I will have to figure out a way to integrate this data into our data model for patent data.
- Amir and Marcela and/or I will need to delete the copies
- Amir and Marcela and/or I need to look at the data to determine quality
- Time to implement:
- Priority:
- Teams waiting for it:
- Deadline:
- Issues:
Project - Lex Machina Data
- Data Source:
- Issues:
- Solution:
- Time to implement:
- Priority:
- Teams waiting for it:
- Deadline:
Project - Pattern Recognition on Patent Data through Machine Learning
- Data Source: The patent database.
- Plan:
- Technique
- Determine research question to be asked
- Scrub data
- Determine 3-4 mining\machine learning techniques to best extract patterns
- Train the algorithms
- Run the algos on sample dataset
- Determine the algo with best results
- Implement the
- Technique
- Known Issues:
- Dataset to be cleaned, quality analyzed as specified above.
- Deliverables
- Set of patterns to base further research on
- Research paper (?)
- Documentation - Wiki page
- Time to implement:
- Priority:
- Teams waiting for it: None
- Deadline:
- Plan:
Dylan Dickens
Ben Baldazo
Ben Baldazo Research Plans (Plan Page)
Startups of Houston Interactive Maps - The Whole Process
Use Google Maps to find Longitude and Latitude
- Document how to work Geocode.py and what might go wrong
Put through R code to make an interactive map
- Finding and Documenting the processes required to run the R code may be necessary
- Works on Carto and looks really cool
- We do eventually need to have a plug in and a Carto account so that we can post this on the Wiki
Accelerator Quality Issue Brief
Houston Accelerators (issue brief)
Factors to look at
- Value Added
- How to look at this though?
- Market vs Non-Market
- Philanthropic funding?
- If Non-Profit: Propublica will document contributions
- If for profit: call?
- Founded Bottom up? or Top down?
- See if it was founded by a group or individual with actual industry connections (online or phone)
- Philanthropic funding?
- Location
- Proximity to resources
- We may have to update the startups in Houston map for this
- Proximity to resources
- Available Resources (we should generally be able to call or find this on website, these should be things they brag about)
- Flex Space
- Events
- Co-Working
- Connections/Mentorship
- This can also have a judged value based on Mentor/Connection perceived experiece
- Funding (This may be hard but if they offer their own VC we can check through SDC)
- Userbase
- Leadership/Experience qualifications (Linkedin, profiles on their own websites, other bios)
- Have they driven a startup before or been in backseat
- What other qualifications do they have
- Criteria from the Acc Rankings (as long as we have portfolios then we can use SDC for this)
- VC funding history
- IPO
- Acquired
- Any other reviews possible
- Other articles (Xconomy, Houston Chronicle, etc.)
- Info from actually calling the accelerators (Putting a list of questions up on the discussion for Houston Accelerators (issue brief)
- perhaps reviews from startups themselves
- Could look specifically into startups that have gone through multiple accelerators hopefully we have phone numbers on File:HSM10.xlsx
Jake Silberman
Jake Silberman Research Plans Plan Page
Leveraged Buyout Innovation (Academic Paper)
- Finalize Hazard Model
- Determine best regression model (Cox or something else that makes more assumptions)
- Determine finalized variable set
- Predict based on model
- Match LBO and non LBO companies based on hazard model predictions
- Generate buckets, i.e. break down by industry, decade, etc...
- Determine metric for matching
- Integrate new patent data
- Create stocks of patents
- Add in patent assignment data
- Analysis of control group and study group for first results
- Refine matching if necessary
- Test for endogeneity/other issues
- Lit review for variables
- Revise preexisting regression variable write-up and reformat it to appropriate academic paper form
- Do final correct pull of SDC Data (just include IPOS)
- Clean data, throwing out duplicate names and only take most recently invested one
- Rank cities by venture capital on different metrics, either in SQL or Excel
- Write up issue brief
Shoeb Mohammed
Shoeb Mohammed Research Plans Research Plan page
Short Term
- Create a listing on the wiki for all software developed at McNair center. - Completed
- Build a Linux box to run the crawler. - Completed
Long Term
- Optimize/re-design the 'Matcher' software. In particular, speed-up fuzzy matching and possibly re-structure the code to make usage easier.
- Develop the crawler. Try to begin with code that Dan has. - Completed
Side Tasks
- If possible, redo the patent parser (previous coded by Kranti) to also pull in Patent Citation data.
Veeral Shah
James Chen
James Chen Research Plans (Plan Page)
- Short term:
- Refine variables to include in hazard rate model
- Industrygroup
- Log or ratio for tax, ebitda, etc.
- Refine variables to include in hazard rate model
- Long term:
- Finish hazard model
- Complete hazard rate matching
- Test for endogeneity, variables list
- Incorporate new patent data (stock, transfers, etc.)
- Complete literature Review using final variable list
Ariel Sun
Ariel Sun Research Plans (Plan Page)
- Hubs
- Get scorecard system completed Hubs: Mechanical Turk
- Mechanical Turk for potential hubs Mechanical Turk (Tool)
- Matched identified hubs to CMSAs
- VC table
- Waiting for patent data to be fixed to join to VC table
- Import VC data to STATA
- Hazard rate model
- Diff-in-diff
Todd Rachowin
Todd Rachowin Research Plans (Plan Page)
- Short-Term = Hubs List (Hubs: Mechanical Turk)
- Creating a comprehensive list of potential hubs
- Determining the best variables for the scorecard
- Building "filters" for automating the collection
- Running and auditing of the automation
- Collecting the remaining manual data
- Long-Term = Everything Else
- Hazard Rate Model (determine proper one, run it, etc.)
- Diff-Diff
Gunny Liu
Gunny Liu Research Plans (Plan Page)
Week VII
7/11 thru 7/15
- Finalize Twitter Webcrawler version Alpha, discuss roadmap ahead with research fellows
- Expand semantic mediawiki capabilities on our wiki and provide documentation for existing data structures
- Configuration of data transfer of startup data from local to wiki wrt Ben
Week VIII
7/8 thru 7/22
- Alpha Exploration & development of existing Google Maps API script
- Advanced development of Twitter Webcrawler to populate McNair databases
- Input: previously documented mothernodes and entrepreneurship buzzwords
- Advance development of Eventbrite Webcrawler to populate McNair databases
- To integrate with Google Maps API to provide updated mapping of active entrepreneurship events in Houston
Week IX
7/25 thru 7/29
- Alpha Exploration & development of Techcrunch API
- Alpha Exploration & development of Facebook API
Week X
8/1 thru 8/5
- Advanced development of all API scripts to populate McNair databases
Week XI
8/8 thru 8/12
- Last day of summer internship: 8/8