Changes

Jump to navigation Jump to search
4,129 bytes added ,  18:53, 29 September 2020
no edit summary
{{McNair Staff
|status=Active
}}
===Summer 2018===
[[VentureXpert Data]]
[[Geocode.py]]
<onlyinclude>
[[Augi Liebster]] [[Work Logs]]
 
2018-08 6-17: For the last two weeks I have continued to work on the constructino of vcdb3. I built on Marcos Lee's code and ran my data through the clustering process.
 
2018-08-01: Found issues cleaning the geocoded data. Had difficulty extracting good data from Puerto Rico. Issue was that it was marked as an exclude row but actually needs to be included. Will continue to work on tomorrow. Think I have an idea to implement my solution but am having trouble with manipulation of some tables.
 
2018-07-31: Moved onto geocoding. First had to learn about the process. Figured out I would need to include primary keys in the output table, so I changed the script a bit the reflect that.
 
2013-07-30: Worked on graphs all day for Ed. Called him twice to make sure that everything was in line with his expectations.
 
2018-07-27: Talked to Ed about fixes on the ranking tables including fixing numalive by including portcos that didn't have deaddates listed. Then finished cleaning fundbase tables. Called Ed and talked about formatting for the tables. Will work on the tables over the weekend.
 
2018-07-26: Finished ranking data and formatted the tables for Ed to review. Started cleaning firmbase and fundbase tables using firmname and fundname as primary keys.
 
2018-07-25: Worked on ranking data. Finished ranking data for cities and moved on to states. Created the tables based on variables Ed told me to include in excel. Didn't style them yet but sent them to Ed for approval.
 
2018-07-24: Spent the majority of the day talking to Ed. Worked through the ExitKeysClean and PortCoExit with Ed, and then started to work on Roundbase Ranking.
 
2018-07-23: Came in Sunday and finished PortCoExit and ExitKeysClean tables. Was not getting desired results in ExitKeysClean as MaNoDups was not joining correctly. Figured out this was because the initial MA pull pulled state names instead of state codes, so the matching based on primary keys of portcos didn't work.
 
2018-07-20: Continued to construct the ExitKeysClean and PortCoExit tables. Spent a lot of time removing duplicates from data and making sure everything was clean. Had to deal with a few bugs in the data as I was getting strange numbers with selects when I performed joins.
 
2018-07-19: Loaded the data into the database and began to construct the ExitKeysClean table. Learning SQL on the fly so had some beginner difficulties.
 
2018-07-18: Cleaned up MA data. Made a mistake yesterday because I didn't clean the data before running matches and thus was getting massive documents. Cleaned MA data based on state codes and date of first investment vs date the MA was announced. Removed multiples after cleaning and was able to salvage around 70 percent of data as good matches. Began the process for IPOs.
 
2018-07-17: Waiting on Ed to respond, so that I can finish cleaning up my IPO and MA data and begin to build stacks in the database. Currently struggling with finding a way to ensure that a company is matched to itself when we match portcos agains MAs and IPOs. Could not a find a way to link these two using data given in the MA database. IPOs seem to be doing fine. Spent the majority of my day helping Minh classifying his Demo Day pages.
 
2018-07-16: Spent the day matching portcos with IPOs and MAs. Then cleaned the data using an excel file. Almost finished IPOs. Made a mistake in filtering MAs but will go back and finish cleaning both MAs and IPOs by tomorrow. Slightly confused about cleaning the MA table since I do not see a way other than equivalence of state to determine whether a company is matched to
itself or a company with a similar name. Either will accept same state as a indicator or will wait for Ed's response.
 
2018-07-13: Worked to standardize the company names using the matcher. Also uploaded the rest of the data that I could into the db.
 
2018-07-12: Spent the day struggling the MA pull. Dylan figured out that data will pull when pulled in text not columnar form. Tomorrow will try to learn RegEx so that I can manage this file. Still stuck on USLongDescription as I have tried different ways of normalization and nothing has worked.
2018-07-11: Uploaded the rest of the tables that I was able to into the database. I am struggling with normalizing the USLongDescription and have tried the various ways given to solve the problem. I am stuck here and not sure how to proceed. I am similarly stuck with the MA table as I have still not been able to retrieve this data from SDC. I did update the Venture Xpert DataBase wiki page with information on loading the tables and the possible errors that could arise. For now, I am waiting on a response from Ed to see how I can continue to be productive.

Navigation menu