Difference between revisions of "Ravali Kruthiventi (Research Plan)"
Jump to navigation
Jump to search
(11 intermediate revisions by 3 users not shown) | |||
Line 2: | Line 2: | ||
===Project - USPTO Assignees, Patent and Citation Data=== | ===Project - USPTO Assignees, Patent and Citation Data=== | ||
==== Assignees Data ==== | ==== Assignees Data ==== | ||
− | *Data source: patent database (merged data from patent_2015 and patentdata databases) | + | *Data source: [[Patent Data Processing - SQL Steps | patent database]] (merged data from patent_2015 and patentdata databases) |
**Issues: citations data contains non numeric patent numbers (likely application numbers, etc) | **Issues: citations data contains non numeric patent numbers (likely application numbers, etc) | ||
− | |||
**Solution: | **Solution: | ||
***Segregate into smaller tables so that Amir and Marcela can identify patterns | ***Segregate into smaller tables so that Amir and Marcela can identify patterns | ||
***link back to appropriate patent numbers from the patent table | ***link back to appropriate patent numbers from the patent table | ||
− | |||
**Time to implement: 1 day | **Time to implement: 1 day | ||
− | |||
**Priority: | **Priority: | ||
− | |||
**Teams waiting for it: | **Teams waiting for it: | ||
*** Marcela and Amir | *** Marcela and Amir | ||
− | ****Project : Patent data analysis | + | ****Project : [[ Patent Data Wiki Page | Patent data analysis ]] |
***Jake and James, potentially could need this down the line | ***Jake and James, potentially could need this down the line | ||
− | ****Project : LBO data | + | ****Project :[[Leveraged Buyout Innovation (Academic Paper)| LBO data]] |
− | |||
** Deadline: | ** Deadline: | ||
− | *Data Source: USPTO Bulk Data repository | + | *Data Source: [[USPTO Assignees Data | USPTO Bulk Data repository]] |
** Issues: | ** Issues: | ||
*** The script inserts copies of data into the tables. | *** The script inserts copies of data into the tables. | ||
Line 27: | Line 22: | ||
*** Analysis is also required to determine whether this data is better than the data we have in the patent database right now. | *** Analysis is also required to determine whether this data is better than the data we have in the patent database right now. | ||
**** Action owners : Amir and Marcela | **** Action owners : Amir and Marcela | ||
− | |||
** Solution: | ** Solution: | ||
*** Amir and Marcela and/or I need to look at the data to determine quality | *** Amir and Marcela and/or I need to look at the data to determine quality | ||
+ | **** If they find that any of this data is better than the data we currently have, I will have to figure out a way to integrate this data into our data model for patent data. | ||
*** Amir and Marcela and/or I will need to delete the copies | *** Amir and Marcela and/or I will need to delete the copies | ||
− | |||
** Time to implement: | ** Time to implement: | ||
** Priority: | ** Priority: | ||
Line 49: | Line 43: | ||
=== Project - Pattern Recognition on Patent Data through Machine Learning === | === Project - Pattern Recognition on Patent Data through Machine Learning === | ||
− | *Data Source: | + | *Data Source: The patent database. |
− | |||
** Plan: | ** Plan: | ||
***Technique | ***Technique | ||
− | + | ****Determine research question to be asked | |
+ | ****Scrub data | ||
+ | ****Determine 3-4 mining\machine learning techniques to best extract patterns | ||
+ | ****Train the algorithms | ||
+ | ****Run the algos on sample dataset | ||
+ | ****Determine the algo with best results | ||
+ | ****Implement the | ||
** Known Issues: | ** Known Issues: | ||
− | ** | + | ***Dataset to be cleaned, quality analyzed as specified above. |
+ | **Deliverables | ||
+ | ***Set of patterns to base further research on | ||
+ | ***Research paper (?) | ||
+ | ****Documentation - Wiki page | ||
** Time to implement: | ** Time to implement: | ||
** Priority: | ** Priority: | ||
− | ** Teams waiting for it: | + | ** Teams waiting for it: None |
** Deadline: | ** Deadline: | ||
+ | |||
+ | [[Internal Classification: Research Plans| ]] | ||
+ | [[Category:Work Log]] |
Latest revision as of 16:12, 21 March 2017
Contents
Project - USPTO Assignees, Patent and Citation Data
Assignees Data
- Data source: patent database (merged data from patent_2015 and patentdata databases)
- Issues: citations data contains non numeric patent numbers (likely application numbers, etc)
- Solution:
- Segregate into smaller tables so that Amir and Marcela can identify patterns
- link back to appropriate patent numbers from the patent table
- Time to implement: 1 day
- Priority:
- Teams waiting for it:
- Marcela and Amir
- Project : Patent data analysis
- Jake and James, potentially could need this down the line
- Project : LBO data
- Marcela and Amir
- Deadline:
- Data Source: USPTO Bulk Data repository
- Issues:
- The script inserts copies of data into the tables.
- Analysis required on the data to make sure the data was inserted correctly from the XML files.
- Analysis is also required to determine whether this data is better than the data we have in the patent database right now.
- Action owners : Amir and Marcela
- Solution:
- Amir and Marcela and/or I need to look at the data to determine quality
- If they find that any of this data is better than the data we currently have, I will have to figure out a way to integrate this data into our data model for patent data.
- Amir and Marcela and/or I will need to delete the copies
- Amir and Marcela and/or I need to look at the data to determine quality
- Time to implement:
- Priority:
- Teams waiting for it:
- Deadline:
- Issues:
Project - Lex Machina Data
- Data Source:
- Issues:
- Solution:
- Time to implement:
- Priority:
- Teams waiting for it:
- Deadline:
Project - Pattern Recognition on Patent Data through Machine Learning
- Data Source: The patent database.
- Plan:
- Technique
- Determine research question to be asked
- Scrub data
- Determine 3-4 mining\machine learning techniques to best extract patterns
- Train the algorithms
- Run the algos on sample dataset
- Determine the algo with best results
- Implement the
- Technique
- Known Issues:
- Dataset to be cleaned, quality analyzed as specified above.
- Deliverables
- Set of patterns to base further research on
- Research paper (?)
- Documentation - Wiki page
- Time to implement:
- Priority:
- Teams waiting for it: None
- Deadline:
- Plan: