Difference between revisions of "Research Plans"

Revision as of 12:20, 15 July 2016

Ravali Kruthiventi

Project - USPTO Assignees, Patent and Citation Data

Assignees Data

Data source: patent database (merged data from patent_2015 and patentdata databases)
- Issues: citations data contains non numeric patent numbers (likely application numbers, etc)
- Solution:
  - Segregate into smaller tables so that Amir and Marcela can identify patterns
  - link back to appropriate patent numbers from the patent table
- Time to implement: 1 day
- Priority:
- Teams waiting for it:
  - Marcela and Amir
    - Project : Patent data analysis
  - Jake and James, potentially could need this down the line
    - Project : LBO data
- Deadline:

Data Source: USPTO Bulk Data repository
- Issues:
  - The script inserts copies of data into the tables.
  - Analysis required on the data to make sure the data was inserted correctly from the XML files.
  - Analysis is also required to determine whether this data is better than the data we have in the patent database right now.
    - Action owners : Amir and Marcela
- Solution:
  - Amir and Marcela and/or I need to look at the data to determine quality
    - If they find that any of this data is better than the data we currently have, I will have to figure out a way to integrate this data into our data model for patent data.
  - Amir and Marcela and/or I will need to delete the copies
- Time to implement:
- Priority:
- Teams waiting for it:
- Deadline:

Project - Lex Machina Data

Data Source:
- Issues:
- Solution:
- Time to implement:
- Priority:
- Teams waiting for it:
- Deadline:

Project - Pattern Recognition on Patent Data through Machine Learning

Data Source: The patent database.
- Plan:
  - Technique
    - Determine research question to be asked
    - Scrub data
    - Determine 3-4 mining\machine learning techniques to best extract patterns
    - Train the algorithms
    - Run the algos on sample dataset
    - Determine the algo with best results
    - Implement the
- Known Issues:
  - Dataset to be cleaned, quality analyzed as specified above.
- Deliverables
  - Set of patterns to base further research on
  - Research paper (?)
    - Documentation - Wiki page
- Time to implement:
- Priority:
- Teams waiting for it: None
- Deadline:

Dylan Dickens

Ben Baldazo

Ben Baldazo Research Plans (Plan Page)

Houston Entrepreneurship

Startups of Houston Interactive Maps - The Whole Process

Start-Ups of Houston (Map)

Use Google Maps to find Longitude and Latitude

Document how to work Geocode.py and what might go wrong

Put through R code to make an interactive map

Finding and Documenting the processes required to run the R code may be necessary
Works on Carto and looks really cool
We do eventually need to have a plug in and a Carto account so that we can post this on the Wiki

Accelerator Quality Issue Brief

Houston Accelerators (issue brief)

Factors to look at

Value Added
- How to look at this though?
Market vs Non-Market
- Philanthropic funding?
  - If Non-Profit: Propublica will document contributions
  - If for profit: call?
- Founded Bottom up? or Top down?
  - See if it was founded by a group or individual with actual industry connections (online or phone)
Location
- Proximity to resources
  - We may have to update the startups in Houston map for this
Available Resources (we should generally be able to call or find this on website, these should be things they brag about)
- Flex Space
- Events
- Co-Working
- Connections/Mentorship
  - This can also have a judged value based on Mentor/Connection perceived experiece
- Funding (This may be hard but if they offer their own VC we can check through SDC)
- Userbase
Leadership/Experience qualifications (Linkedin, profiles on their own websites, other bios)
- Have they driven a startup before or been in backseat
- What other qualifications do they have
Criteria from the Acc Rankings (as long as we have portfolios then we can use SDC for this)
- VC funding history
- IPO
- Acquired
Any other reviews possible
- Other articles (Xconomy, Houston Chronicle, etc.)
- Info from actually calling the accelerators (Putting a list of questions up on the discussion for Houston Accelerators (issue brief)
- perhaps reviews from startups themselves
  - Could look specifically into startups that have gone through multiple accelerators hopefully we have phone numbers on File:HSM10.xlsx

Jake Silberman

Jake Silberman Research Plans Plan Page

Leveraged Buyout Innovation (Academic Paper)

Finalize Hazard Model
- Determine best regression model (Cox or something else that makes more assumptions)
- Determine finalized variable set
- Predict based on model
Match LBO and non LBO companies based on hazard model predictions
- Generate buckets, i.e. break down by industry, decade, etc...
- Determine metric for matching
Integrate new patent data
- Create stocks of patents
- Add in patent assignment data
Analysis of control group and study group for first results
- Refine matching if necessary
- Test for endogeneity/other issues
Lit review for variables
- Revise preexisting regression variable write-up and reformat it to appropriate academic paper form

Venture Capital (Data)

Do final correct pull of SDC Data (just include IPOS)
Clean data, throwing out duplicate names and only take most recently invested one
Rank cities by venture capital on different metrics, either in SQL or Excel
Write up issue brief

Shoeb Mohammed

Shoeb Mohammed Research Plans Research Plan page

Short Term

Create a listing on the wiki for all software developed at McNair center. - Completed
Build a Linux box to run the crawler. - Completed

Long Term

Optimize/re-design the 'Matcher' software. In particular, speed-up fuzzy matching and possibly re-structure the code to make usage easier.
Develop the crawler. Try to begin with code that Dan has. - Completed

Side Tasks

If possible, redo the patent parser (previous coded by Kranti) to also pull in Patent Citation data.

Veeral Shah

Veeral Shah (Research Plans)

James Chen

James Chen Research Plans (Plan Page)

Short term:
- Refine variables to include in hazard rate model
  - Industrygroup
  - Log or ratio for tax, ebitda, etc.
Long term:
- Finish hazard model
- Complete hazard rate matching
- Test for endogeneity, variables list
- Incorporate new patent data (stock, transfers, etc.)
- Complete literature Review using final variable list

Ariel Sun

Ariel Sun Research Plans (Plan Page)

Hubs (Academic Paper)

Hubs
- Get scorecard system completed Hubs: Mechanical Turk
- Mechanical Turk for potential hubs Mechanical Turk (Tool)
- Matched identified hubs to CMSAs

VC table
- Waiting for patent data to be fixed to join to VC table
- Import VC data to STATA
- Hazard rate model
- Diff-in-diff

Todd Rachowin

Todd Rachowin Research Plans (Plan Page)

Short-Term = Hubs List (Hubs: Mechanical Turk)
1. Creating a comprehensive list of potential hubs
2. Determining the best variables for the scorecard
3. Building "filters" for automating the collection
4. Running and auditing of the automation
5. Collecting the remaining manual data

Long-Term = Everything Else
1. Hazard Rate Model (determine proper one, run it, etc.)
2. Diff-Diff

Gunny Liu

Gunny Liu Research Plans (Plan Page)

Week VII

7/11 thru 7/15

Finalize Twitter Webcrawler version Alpha, discuss roadmap ahead with research fellows
Expand semantic mediawiki capabilities on our wiki and provide documentation for existing data structures
Configuration of data transfer of startup data from local to wiki wrt Ben

Week VIII

7/8 thru 7/22

Alpha Exploration & development of existing Google Maps API script
Advanced development of Twitter Webcrawler to populate McNair databases
- Input: previously documented mothernodes and entrepreneurship buzzwords
Advance development of Eventbrite Webcrawler to populate McNair databases
- To integrate with Google Maps API to provide updated mapping of active entrepreneurship events in Houston

Week IX

7/25 thru 7/29

Alpha Exploration & development of Techcrunch API
Alpha Exploration & development of Facebook API

Week X

8/1 thru 8/5

Advanced development of all API scripts to populate McNair databases

Week XI

8/8 thru 8/12

Last day of summer internship: 8/8

@@ Line 25: / Line 25: @@
 ==James Chen==
-{{:James Chen (Research Plans)}}
+{{:James Chen (Research Plan)}}
 ==Ariel Sun==

Difference between revisions of "Research Plans"

Revision as of 12:20, 15 July 2016

Contents

Ravali Kruthiventi

Project - USPTO Assignees, Patent and Citation Data

Assignees Data

Project - Lex Machina Data

Project - Pattern Recognition on Patent Data through Machine Learning

Dylan Dickens

Ben Baldazo

Jake Silberman

Shoeb Mohammed

Short Term

Long Term

Side Tasks

Veeral Shah

James Chen

Ariel Sun

Todd Rachowin

Gunny Liu

Week VII

Week VIII

Week IX

Week X

Week XI

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools