Oliver Chang (Work Log)

Oliver Chang Work Logs (logpage)

To-do List:

Expand XPath use in the patent data
Edit to include Application data
Finish ID joining
Look into NIH document similarity algorithm
Sysadminy stuff

Projects:

Uploads:

File:PADX-File-Description-v2 Hague.pdf
- Describes patent kind codes (notably, what the hell X0 represents)
File:PatentFullTextAPSDoc GreenBook pgs13-22.pdf
- Describes the fields in APS, their supposed character lengths, and if they are required/optional
File:Aps-wku-modulus11.pdf
- Describes the layout of the check digit on magnetic tape
File:Mod-11-algorithm.pdf
- Describes the algorithm used to calculate the check digit

Day-by-Day (in reverse chronological order)

October 2017

Oct 3: troubleshoot vc_circles.py and make command line interface a little nicer
Oct 2: discuss mapping strategies & investigate missing eca data

September 2017

Sept 23: make Project/OliverLovesCircles usable and add initial splitting ability
Sept 22: goal setting & server debugging & meet with Yang

August 2017

Aug 4: setup parallel instance python framework for job reporting; begin test run
Aug 2: finish up some documentation of the code and for the wiki
Aug 1: discuss with Abhi & Ed about alternatives to Java port because of algorithmic constants that would be hard to port; run test batches on python with addition of equality operators and convergence early stopping

July 2017

July 31: sketch out parallel enclosing circle algorithm
July 28: field questions and data cleanup questions from Kerda & Joe & Adrian
travelling
July 19: try to remove duplicated records (esp. those with empty titles) which are preventing the addition of a unique constraint
July 18: run correspondent join on properties and correspondents table to match previous project; sync with Adrian and Abhi
July 17: redo db operations after cleaning up granted patent number bugs
July 13: powwow about parallelizing Enclosing Circle Algorithm; sketch out what to do for the rest of the summer; work more on joins
July 12: generate some example data illustrating the difficulty of joining different tables
July 11: track down some bugs that happen very rarely and were missed in the initial qa phase
July 7: catch up on documentation
July 6: try (unsuccessfully) to understand docid mapping...create exploration scripts
July 5: add invention title to proper grouping of assignment properties; optimize XML parsing

June 2017

June 30: powwow with James, Abhi, Ed about optimization issues; discuss document ids, X0 etc with Ed; pinpoint issues with APS doc numbers (see Repro Pat Dat#Gotchas) for more info
June 29: add logging of copy commands, more chattiness to scripts, debug assignment data failure
June 28: create examples for expansion to plant, reissue, design patent collection; start optimizing xml
June 27: write SQL to replicate assignees, extract postcodes for ongoing projects
June 26: speedup code, abstract in-memory file splitters to avoid repetition and some weird edge cases
June 25: create mappings for APS, assignment properties, XML 2.5 for data import; run data imports for granted data
June 23: cleanup hacky models with a better set of abstractions; cleanup IDE warnings; redefine patent-address mapping
June 22: create postcode<->patent table
June 21: document granted patent queries and equivalencies
June 20: sketch out APS driver; discuss patent id problem; further document with evidence the zipcode data validity
June 19: skim address regular expressions; cursory investigation of patent table
June 16: create method of getting all data into the database, whether it likes it or not; copy over assignments, granted data using new scheme
June 15: add more robust error reporting, fix race conditions; build out assignment driver; build out fee event driver; add error logging
June 14: migrate bulk inserts to copy command; refresh on address data and start in on that; convert processor to multi-threaded application
June 13: spot check SQL tables; fix broken final case endlessly looping; investigate smarter insert methods
June 12: add XML printer, use it to inspect applications; extend BaseScraper to fetch patent application data; add applications documentation to my project page; add CREATE of other tables
June 8: add foreign key inserts; create pretty printer for XML analysis
June 7: finalize DB abstraction layer; migrate code to bulk inserts; upgrade webserver software and do optimization on RDP postgres with Ed
June 6: add jdbc; create basic schema; add db interaction; schedule meeting for later in the week
June 5: look into postgresql; refersh on postgis; add some notes to the Enclosing Circle Algorithm page
June 1: add RDP git remote; add more documentation to wiki page; refactor downloader scripts; start creation of tooling for interacting with data

May 2017

May 31: finish copy-pasting attributes into the wiki page; retroactively fill out work log; meet with Ed to discuss next steps
May 30: update documentation on wiki, restructure large binary files to have more hierarchy instead of a flat listing at the root
May 29: expand to APS; expand to raw assignment data
May 27: expand to maintenance fee data
May 26: create models, translate xmlparser*.pl file into Java; start using builder pattern
May 25: sketch out OO design of project; download bulk data
May 24: move wiki pages around; start git repository for project
May 21: discuss technical details of previous work with Ed
May 8: cleanup dead links on wiki and start reading about previous work; discuss current project status with Ed
May 4: setup wiki account, rdp account, database training

Oliver Chang (Work Log)

Contents

Day-by-Day (in reverse chronological order)

October 2017

September 2017

August 2017

July 2017

June 2017

May 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools