Difference between revisions of "Ravali Kruthiventi (Research Plan)"

From edegan.com
Jump to navigation Jump to search
 
(9 intermediate revisions by 3 users not shown)
Line 2: Line 2:
 
===Project - USPTO Assignees, Patent and Citation Data===
 
===Project - USPTO Assignees, Patent and Citation Data===
 
==== Assignees Data ====
 
==== Assignees Data ====
*Data source: patent database (merged data from patent_2015 and patentdata databases)
+
*Data source: [[Patent Data Processing - SQL Steps | patent database]] (merged data from patent_2015 and patentdata databases)
 
**Issues: citations data contains non numeric patent numbers (likely application numbers, etc)
 
**Issues: citations data contains non numeric patent numbers (likely application numbers, etc)
 
**Solution:  
 
**Solution:  
Line 11: Line 11:
 
**Teams waiting for it:
 
**Teams waiting for it:
 
*** Marcela and Amir
 
*** Marcela and Amir
****Project : Patent data analysis (?)
+
****Project : [[ Patent Data Wiki Page | Patent data analysis ]]
 
***Jake and James, potentially could need this down the line
 
***Jake and James, potentially could need this down the line
****Project : LBO data
+
****Project :[[Leveraged Buyout Innovation (Academic Paper)| LBO data]]
 
** Deadline:
 
** Deadline:
  
*Data Source: USPTO Bulk Data repository
+
*Data Source: [[USPTO Assignees Data | USPTO Bulk Data repository]]
 
** Issues:
 
** Issues:
 
*** The script inserts copies of data into the tables.  
 
*** The script inserts copies of data into the tables.  
Line 24: Line 24:
 
** Solution:
 
** Solution:
 
*** Amir and Marcela and/or I need to look at the data to determine quality
 
*** Amir and Marcela and/or I need to look at the data to determine quality
 +
**** If they find that any of this data is better than the data we currently have, I will have to figure out a way to integrate this data into our data model for patent data.
 
*** Amir and Marcela and/or I will need to delete the copies
 
*** Amir and Marcela and/or I will need to delete the copies
 
** Time to implement:
 
** Time to implement:
Line 42: Line 43:
 
=== Project - Pattern Recognition on Patent Data through Machine Learning ===
 
=== Project - Pattern Recognition on Patent Data through Machine Learning ===
  
*Data Source:
+
*Data Source: The patent database.
 
 
 
** Plan:
 
** Plan:
 
***Technique
 
***Technique
 
+
****Determine research question to be asked
 +
****Scrub data
 +
****Determine 3-4 mining\machine learning techniques to best extract patterns
 +
****Train the algorithms
 +
****Run the algos on sample dataset
 +
****Determine the algo with best results
 +
****Implement the
 
** Known Issues:
 
** Known Issues:
** Solution:
+
***Dataset to be cleaned, quality analyzed as specified above.
 +
**Deliverables
 +
***Set of patterns to base further research on
 +
***Research paper (?)
 +
****Documentation - Wiki page
 
** Time to implement:
 
** Time to implement:
 
** Priority:
 
** Priority:
** Teams waiting for it:
+
** Teams waiting for it: None
 
** Deadline:
 
** Deadline:
 +
 +
[[Internal Classification: Research Plans| ]]
 +
[[Category:Work Log]]

Latest revision as of 16:12, 21 March 2017

Project - USPTO Assignees, Patent and Citation Data

Assignees Data

  • Data source: patent database (merged data from patent_2015 and patentdata databases)
    • Issues: citations data contains non numeric patent numbers (likely application numbers, etc)
    • Solution:
      • Segregate into smaller tables so that Amir and Marcela can identify patterns
      • link back to appropriate patent numbers from the patent table
    • Time to implement: 1 day
    • Priority:
    • Teams waiting for it:
    • Deadline:
  • Data Source: USPTO Bulk Data repository
    • Issues:
      • The script inserts copies of data into the tables.
      • Analysis required on the data to make sure the data was inserted correctly from the XML files.
      • Analysis is also required to determine whether this data is better than the data we have in the patent database right now.
        • Action owners : Amir and Marcela
    • Solution:
      • Amir and Marcela and/or I need to look at the data to determine quality
        • If they find that any of this data is better than the data we currently have, I will have to figure out a way to integrate this data into our data model for patent data.
      • Amir and Marcela and/or I will need to delete the copies
    • Time to implement:
    • Priority:
    • Teams waiting for it:
    • Deadline:

Project - Lex Machina Data

  • Data Source:
    • Issues:
    • Solution:
    • Time to implement:
    • Priority:
    • Teams waiting for it:
    • Deadline:


Project - Pattern Recognition on Patent Data through Machine Learning

  • Data Source: The patent database.
    • Plan:
      • Technique
        • Determine research question to be asked
        • Scrub data
        • Determine 3-4 mining\machine learning techniques to best extract patterns
        • Train the algorithms
        • Run the algos on sample dataset
        • Determine the algo with best results
        • Implement the
    • Known Issues:
      • Dataset to be cleaned, quality analyzed as specified above.
    • Deliverables
      • Set of patterns to base further research on
      • Research paper (?)
        • Documentation - Wiki page
    • Time to implement:
    • Priority:
    • Teams waiting for it: None
    • Deadline: