Changes

Jump to navigation Jump to search
no edit summary
'''09/15/16''': ===Fall 2017===<onlyinclude>[[Christy Warden]] [[Work Logs]] [[Christy Warden (Work Log)|(log page)]]
''22017-4:45''12-12: Was introduced to the Wiki, built my page and was added to the RDP and Slack. Practiced basic Linux with Harsh and was introduced to the researchers.[[Scholar Crawler Main Program]] [[Accelerator Website Images]]
'''09/20/16''' 2017-11-28: [[PTLR Webcrawler]] [[Internal Link Parser]]
''22017-2:30:'' Was introduced to the DB server and how to access it/mount bulk drive in the RDP. 211-21:30-3 Tried (and failed) to help Will upload his file to his database. [[PTLR Webcrawler]]
''32017-4:4509-21:'' Learned from Harsh how to transfer Will's file between machines so that he could access it for his table (FileZilla/ Putty, but really we should've just put it in the RDP mounted bulk drive we built at the beginning.)[[PTLR Webcrawler]]
'''2017-09/22/16'''-14: Ran into some problems with the scholar crawler. Cannot download pdfs easily since a lot of the links are not to PDFs they are to paid websites. Trying to adjust crawler to pick up as many pdfs as it can without having to do anything manually. Adjusted code so that it outputs tab delimited text rather than CSV and practiced on several articles.
''22017-09-2:3012:Got started on Google Scholar Crawling. Found Harsh'' Labeled new supplies (USB ports)s code from last year and figured out how to run it on scholar queries. Looked online for Adjusted provided code to save the results of the query in a solution to labeling tab-delimited text file named after the query itself so that it can be found again in the black ports, sent link with potentially useful supplies to Dr. Daytonfuture.
''2:302017-09-311:''Went through all of Barely started [[Ideas for CS Mentorship]] before getting introduced to my new project for the new supplies plus monitorssemester. Began by finding old code for pdf ripping, desktops implementing it and mice) and created Excel sheet to keep track of them (Name, Quantity, SN, Link etctrying it out on a file.)
''32017-309-07:15:'' Added Reoriented myself with the Wiki and my previous projects. Met new team members. Began tracking down my former Wikis (they all seem pretty clear to me thus far about where to get my code for everything). Looking through my hours C drive to figure out where the wiki Work Hours page, updated pieces of code I have in my Work Logpersonal directory belong in the real world (luckily I am a third degree offender only).</onlyinclude>
'''09/27/16''' ===Spring 2017===
''2'1/18/17''' ''10-12:45'' Starting running old twitter programs and reviewing how they work. Automate.py is currently running and AutoFollower is in the process of being fixed.  '''1/20/17''' ''10-11'' Worked on twitter programs. Added error handling for Automate.py and it appears to be working but I will check on Monday.  ''11-211:2515'' Talked with Ed about projects that will be done this semester and what I'll be working on.  ''11:15 - 12'' Read Went through the wiki our code repository and made a second Wiki page for documenting the existing twitter crawlerchanges since it has last been completed. http://examplemcnair.bakerinstitute. org/wiki/Software_Repository_Listing_2 ''Rest 12-12:45'' Worked on the smallest enclosing circle problem for location of timestartups.  '''1/23/17''' ''10-12:45'' Worked on adjusting our feeds for HootSuite the enclosing circle problem. Wrote and completed a program which guarantees a perfect outcome but takes forever to run because it checks all possible outcomes. I would like to maybe rewrite it or improve it so that it outputs a good solution, but not necessarily a perfect one so that we can run the program on larger quantities of data. Also today I discussed the cohort data breakdown with Peter and checked through the twitter code. Automate.py seems to be working perfectly now, and making I would like someone to go through the content with me so that I can filter it more effectively. Autofollower appears to be failing but not returning any sort of error code? I've run it a few different times and it always bottlenecks somewhere new, so I suspect some sort of data limiting on twitter is preventing this algorithm from working. Need to think of a new one.  '''1/25/17''' ''10-12:45'' Simultaneously worked twitter and enclosing circle because they both have a long run time. I realized there was an error in my enclosing circle code which I have corrected and tested on several practice examples. I have some idea for how to speed up the algorithm when we run it relevant on a really large input, but I need more info about what the actual data will look like. Also, the program runs much more quickly now that I corrected the error.  For twitter, I discovered that the issues I am having lies somewhere in the follow API so for now, I've commented it out and am running the program minus the follow component to assure that everything else is working. So far, I have not seen any unusual behavior, but the people writing program has a long wait period so it is taking a while to test.  '''1/27/17''' ''10-12:45'' So much twitter. Finally found the bug that has plagued the tweetsprogram (sleep_on_rate_limit should have been False). Program is now running on my dummy account, and I am going to check its progress on monday YAY.  '''2/blogs. [[Christy Warden 3/17'''  # Patent Data (more people) and VC Data (build dataset for paper classifier) # US Universities patenting and entrepreneurship programs (help w code for identifying Universities and assigning to patents) # Matching tool in Perl (fix, run??) # Collect details on Universities (Social Medialook on wikipedia, download xml and process)]]<!# Maps issue (note -- null edit dummy -->[[Category:McNair Staff]] this was moved here by Ed from a page called "New Projects" that was deleted) '''2/6/17'''
This is a link Worked on the classification based on description algorithm the whole time I was here. I was able to break down the new data so that the key words are all found and accounted for on a given set of data and so that I can go through a description and tag the things words and output a matrix. Now I did am trying to develop a way to generate the output I anticipate from the HootSuite input matrix of tagged words. Tried MATLAB but I would have to buy a neural network package and brainstorming about how I didn't realize until the end of the day. Now I am looking into writing my own neural network or finding a good python library to up our twitter/social media/blog presencerun.
'''09http:/29/16'''scikit-learn.org/stable/modules/svm.html#svm
Everything I did is inside of my social media research page http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Social_Media)I got the twitter crawler running and have created a plan for how going to generate a list of potential followers/ people worth following to increase our twitter interactions and improve our feed to find stuff to retweet.try this on Wednesday
'''10/4/16'''
''11-12:30:'2/17/17''' Directed people to the ambassador event.
''12:30-3:'' work on my crawler (can be read about on my social media Comment section of Industry Classifier wiki page) .
''3-4:45:''donald trump twitter data crawl.
'''102/620/1617'''
''12:15-4:45:'' Worked on the Twitter Crawler. It currently takes as input building a name data table of a twitter user long descriptions rather than short ones and returns started using this as the active twitter followers on their page most likely to engage with our content. I think my metric for what constitutes a potential follower needs adjusting and the code needs input to be made cleaner and more helpful. Project is in Documents/Projects/Twitter Crawler in the RDPindustry classifier. More information and a link to the page about the current project is on my social media page [[Christy Warden (Social Media)]]
'''10/18/16'''
''1-'2:30:/22/17''updated the information we have for the Donald Trump tweets. The data is in the Trump Tweets project in the bulk folder and should have his tweets up until this afternoon when I started working. ''2:30-5:''Continued (and completed a version of) the twitter crawler. I have run numerous example users through the crawler and checked the outputs to see if the people I return are users that would be relevant to @BakerMcNair and generally they are. [[Christy Warden (Social Media)]] for more information
''5 - 5:30:'' Started reading about the existing eventbrite crawler Finished code from above, ran numerous times with mild changes to data types (which takes forever) talked to Ed and am brainstorming ideas for how we could use itbuilt an aggregation model. (Maybe incorporate both twitter and eventbrite into one application?)
'''10/25/16'''
''12:15-4:45:'2/24/17''' Worked on the Twitter Crawler. I am currently collecting data by following around 70-80 people while I am at work and measuring the success of the follow so that I can adjust my program to make optimal following decisions based on historical follow response. More info at [[Christy Warden (Social Media)]]
About to be done with industry classifier. Got 76% accuracy now, working on a file that can be used by non-comp sci people where you just type in the name of a file with a Company [tab] description format and it will output Company [tab] Industry. Worked on allowing this program to run without needing to rebuild the classification matrix every single time since I already know exactly what I'''10/27/16'''m training it on. Will be done today or Monday I anticipate.
''12:15-3:'' First I ran a program that unfollowed all of the non-responders from my last follow spree and then I updated by datas about who followed us back. I cannot seem to see a pattern yet in the probability of someone following us back based on the parameters I am keeping track of, but hopefully we will be able to see something with more data. Last week we had 151 followers, at the beginning of today we had 175 follows and by the time that I am leaving (4:45) we have 190 followers. I think the program is working, but I hope the rate of growth increases.
''3-4'2/27/17''' SQL Learning with Ed
''4-4:45'' Found a starter list Classifier is done whooo! It runs much more quickly than anticipated due to the use of people to crawl for Tuesday, checked our stats the python Pickle library (discovered by Peter) and ran one more starting position through I will document its use on the crawlerindustry classifier page. Updated data sheets and worklog(Done: http://mcnair.bakerinstitute.org/wiki/Industry_Classifier). The log of who I've followed (also looked through changes to Enclosing Circle and realized a stupid mistake which I corrected and debugged and if theynow a circle run that used to take ten minutes takes seven seconds. It is ready to run as soon as Peter is done collecting data, although I've followed back) d like to think of a better way to test to make sure that these really are all on the twitter crawler pageoptimal circles.
'''113/101/1617'''
''12:15 - 2:'' Unfollowed Plotted some of the non responders, followed about 100 people using the crawlergeocoded data with Peter and troubleshooted remaining bugs. Updated my data sheets about how people have responded Met with Ed and added all discussed errors in the new followers geodata, which I need to the log go through and figure out how to fix. Worked on [[Christy Warden (Social Media)]] twitter crawler pageupdating documentation of enclosing circles and related projects.
''2-4:45'' Prepped the next application of my twitter crawling abilities, which is going to be a constantly running program on a dummy account which follows a bunch of new sources and dms the McNair account when something related to us shows up.
'''3/06/17'''
'''11/3/16'''Worked on Enclosing Circle data and started the geocoder which is running and should continue to run through Wednesday.
''12:15-12:30:'3/20/17' I made a mistake today! I intended to fix a bug that occurred in my DM program, but accidentally started running a program before copying the program's report about what went wrong so I could no longer access the error report. I am running the program again between now and Thursday and hoping to run into the same error so I can actually address it. (I believe it was something to do with a bad link). I did some research about catching and fixing exceptions in a program while still allowing it to continue, but I can't really fix the program until I have a good example of what is going wrong.
''12:30 - 2:30:'' Unfollowed the non respondersTried to debug Enclosing Circle with Peter. Talked through a Brute force algorithm with Ed, followed about 100 people using the crawler. Updated my data sheets about how people have responded and added all the new followers to the log wrote explanation of Enclosing circle on [[Christy Warden (Social Media)]] twitter crawler Enclosing Circle wiki page. I've noticed that our ratios and also wrote an English language explanation of successful returns of our follow are improving, I am unsure whether I am getting better at picking node accounts or whether our account is gaining legitimacy because our ratio is improvinga brute force algorithm.
''2-4:15'' I had the idea after my DM program which runs constantly had (some) success, that I could make the follow crawler run constantly too? I started implementing a way to do this, but haven't had a chance to run or test it yet. This will present serious difficulties because I don't want to do anything that could potentially get us kicked off twitter/ lose my developer rights on our real account. It is hard to use a dummy acct for this purpose though, because nobody will follow back an empty account so it'll be hard to see if the program succeeds in that base case. I will contemplate tonight and work on it Thursday.
''4:15-4:30'3/27/17''' Started adding comments and print statements and some level of organization in my code in case other/future interns use it and I am not at work to explain how it functions. The code could definitely do with some cleanup, but I think that should probably come later after everything is functional and all of our twitter needs are met.
''4:30-4:45'' Updated work log More debugging with Peter. Wrote code to remove subsumed circles and put my thoughts on my social media project pagetested it. Discovered that we were including many duplicate points which was throwing off our results .
'''3/29/17'''
'''11/8/16'''Tried to set up an IDE for rewriting enclosing circle in C.
''12:15-1'' Talked to Ed about my project and worked out a plan for the future of the twitter crawler. I will explain all of it on the social media page.
''1- 4:45'3/31/17''' Worked on updating the crawler. It is going to take awhile but I made a lot of progress today and expect that it should be working (iffily) by next Thursday.
Finally got the IDE set up after many youtube tutorials and sacrifices to the computer gods. It is a 30 day trial so I need to check with Ed about if a student license is a thing we can use or not for after that. Spent time familiarizing myself with the IDE and writing some toy programs. Tried to start writing my circle algorithm in C and realized that this is an overwhelming endeavor because I used many data structures that are not supported by C at all. I think that I could eventually get it working if given a ton of time but the odds are slim on it happening in the near future. Because of this, I started reading about some programs that take in python code and optimize parts of it using C which might be helpful (Psyco is the one I was looking at). Will talk to Ed and Peter on Monday.
'''11/10/16'''
''12:15 - 4:45'04/03/17'' Tried to fix bug in my retweeting crawler, but still haven't found it. I am going to keep running the program until the error comes up and then log into the RDP as soon as I notice and copy down the error. Worked on changes to the crawler which will allow for automation.
[[Matching Entrepreneurs to VCs]]
'''1104/1510/1617'''
''12:15 - 1:30'' Changing twitter crawler. Same as above
''1:30 - 4:45'' Worked on pulling all the data for the executive orders and bills with Peter (we built a script in anticipation of Harsh gathering the data from GovTrack which will build a tsv of the data)
'''-4/12/17'''
'''11/17/16'''Same as above
''12:15 - 1:30'04/17/17''' Changing twitter crawler
''1:30 - 5:30'' Fixed Same as above + back to Enclosing circle algorithm. I am trying to make it so that the script Peter and I wrote because next point chosen for any given circle is the data Harsh gathered ended up being in a slightly different form than what we anticipated. Peter built and debugged a crawler point closest to its center, not to pull all of the executive orders and I debugged original point that we cast the tsv outputcircle from. I stayed late while the program ran on Harsh's data am running into some issues with debugging that I will be able to ensure no bugs and discovered at the very very end of the run that there was a minor bug. Fixed it and then leftsolve soon.
'''04/26/17'''
'''11/22/16'''Debugged new enclosing circle algorithm. I think that it works but I will be testing and plotting with it tomorrow. Took notes in the enclosing circle page.
''12:15- 2'' Worked on updating the crawler so that it runs automatically. Ran into some issues because we changed from python 2.7 to anaconda, but got those running again. Started the retweeter crawler, seems to be working well.
''2-2:30'04/27/17''' Redid the Bill.txt data for the adjusted regexes. Met with Harsh, Ed and Peter about being better at communicating our projects and code.
''2:30-4:30'' Back to PROBLEM! In fixing the twitter crawler. enclosing circle algorithm, I am now officially testing it before we use it on our main account and have found some bugs with data collection that have been adjusted. I realized at the very end discovered a problem in one of the day that ways Peter and I have a logical flaw in my code that needs to be adjusted because only 1 person at a time goes into had sped up the people we followed list. Basicallyprogram, because of this, we will only be following one person in every 24 hour period. When I get back from Thanksgiving, I need which lead the algorithm to change the unfollow someone functionwrong computations and completely false runtime. The new idea is that I will follow everyone that comes out of a source node, and then call the unfollow function algorithm runs for as an extremely long as it will run time and does not seem feasible to use for while maintaining the condition that the top person on the list was followed for more than one dayour previous application. I will likely need only one more day am looking into ways to finish this program before speed it up, but it can start running on our accountdoes not look good.
''4:30 - 4:45'04/28/17''' In response to the "start communicating with the comp people" talk, I updated my wiki pages and work log on which I have been heavily slacking.
Posted thoughts and updates on the enclosing circle page.
'''11/29/16'''
''12:15- 1:45'05/01/17''' Fixed code and reran it for gov track project, documented on E&I governance
''1:45- 2'' Had accelerator project explained to meImplemented concurrent enclosing circle EnclosingCircleRemake2.py. Documented in enclosing circle page.
''2 - 2:30'' Built histograms of govtrack data with Ed and Albert, reran data for Albert.===Fall 2016===
''2:30-4:45'09/15/16''' Completed first 5 reports (40-45) on accelerators (accidentally did number 20 as well): Was introduced to the Wiki, built my page and was added to the RDP and Slack. Practiced basic Linux with Harsh and was introduced to the researchers.
'''09/20/16''': Was introduced to the DB server and how to access it/mount bulk drive in the RDP. 2:30-3 Tried (and failed) to help Will upload his file to his database. Learned from Harsh how to transfer Will's file between machines so that he could access it for his table (FileZilla/ Putty, but really we should've just put it in the RDP mounted bulk drive we built at the beginning.)
'''1209/122/16'''": Labeled new supplies (USB ports). Looked online for a solution to labeling the black ports, sent link with potentially useful supplies to Dr. Dayton. Went through all of the new supplies plus monitors, desktops and mice) and created Excel sheet to keep track of them (Name, Quantity, SN, Link etc.). Added my hours to the wiki Work Hours page, updated my Work Log.
''12:15- 3'09/27/16''' Fixed : Read through the wiki page for the existing twitter crawler/example. Worked on adjusting our feeds for HootSuite and making the perl code that gets a list of all Bills that have been passed, then composed new data of Bills with content on it relevant buzzword info as well as whether or not they were enactedto the people writing the tweets/blogs.
''3 [[Christy Warden (Social Media)]]<!- 4:45'' Worked on Accelerators data collection.- null edit dummy -->
This is a link to all of the things I did to the HootSuite and brainstorming about how to up our twitter/social media/blog presence.
'''109/1829/1716'''
''10-12Everything I did is inside of my social media research page http:45'' Starting running old twitter programs and reviewing how they work//mcnair. Automatebakerinstitute.py is currently org/wiki/Christy_Warden_(Social_Media)I got the twitter crawler running and AutoFollower is in the process have created a plan for how to generate a list of being fixedpotential followers/ people worth following to increase our twitter interactions and improve our feed to find stuff to retweet.
'''10/4/16'''
'''1/20/17'11-12:30:''Directed people to the ambassador event.
''1012:30-113:'' Worked work on twitter programs. Added error handling for Automate.py and it appears to my crawler (can be working but I will check read about on Monday. my social media page)
''113-114:45:15'' Talked with Ed about projects that will be done this semester and what I'll be working ondonald trump twitter data crawl.
''11:15 - 12'' Went through our code repository and made a second Wiki page documenting the changes since it has last been completed. http:10/6/mcnair.bakerinstitute.org/wiki/Software_Repository_Listing_216'''
''12:15-124:45:'' Worked on the smallest enclosing circle problem Twitter Crawler. It currently takes as input a name of a twitter user and returns the active twitter followers on their page most likely to engage with our content. I think my metric for location of startupswhat constitutes a potential follower needs adjusting and the code needs to be made cleaner and more helpful. Project is in Documents/Projects/Twitter Crawler in the RDP.More information and a link to the page about the current project is on my social media page [[Christy Warden (Social Media)]]
'''10/18/16'''
''1-2:30:''updated the information we have for the Donald Trump tweets. The data is in the Trump Tweets project in the bulk folder and should have his tweets up until this afternoon when I started working. '1/23/17'2:30-5:''Continued (and completed a version of) the twitter crawler. I have run numerous example users through the crawler and checked the outputs to see if the people I return are users that would be relevant to @BakerMcNair and generally they are. [[Christy Warden (Social Media)]] for more information
''105 -125:30:45'' Worked on Started reading about the enclosing circle problem. Wrote existing eventbrite crawler and completed a program which guarantees a perfect outcome but takes forever to run because it checks all possible outcomes. I would like to maybe rewrite it or improve it so that am brainstorming ideas for how we could use it outputs a good solution, but not necessarily a perfect one so that we can run the program on larger quantities of data. Also today I discussed the cohort data breakdown with Peter and checked through the (Maybe incorporate both twitter code. Automate.py seems to be working perfectly now, and I would like someone to go through the content with me so that I can filter it more effectively. Autofollower appears to be failing but not returning any sort of error codeeventbrite into one application? I've run it a few different times and it always bottlenecks somewhere new, so I suspect some sort of data limiting on twitter is preventing this algorithm from working. Need to think of a new one.)
'''10/25/16'''
'''1/25/17'12:15-4:45:''Worked on the Twitter Crawler. I am currently collecting data by following around 70-80 people while I am at work and measuring the success of the follow so that I can adjust my program to make optimal following decisions based on historical follow response. More info at [[Christy Warden (Social Media)]]
'''10-12:45/27/16''' Simultaneously worked twitter and enclosing circle because they both have a long run time. I realized there was an error in my enclosing circle code which I have corrected and tested on several practice examples. I have some idea for how to speed up the algorithm when we run it on a really large input, but I need more info about what the actual data will look like. Also, the program runs much more quickly now that I corrected the error.
For twitter, ''12:15-3:'' First I discovered ran a program that unfollowed all of the issues non-responders from my last follow spree and then I updated by datas about who followed us back. I am having lies somewhere cannot seem to see a pattern yet in the follow API so for nowprobability of someone following us back based on the parameters I am keeping track of, but hopefully we will be able to see something with more data. Last week we had 151 followers, at the beginning of today we had 175 follows and by the time that I've commented it out and am running leaving (4:45) we have 190 followers. I think the program minus the follow component to assure that everything else is working. So far, but I have not seen any unusual behavior, but hope the program has a long wait period so it is taking a while to testrate of growth increases.
''3-4'' SQL Learning with Ed
''4-4:45'1/27/17'Found a starter list of people to crawl for Tuesday, checked our stats and ran one more starting position through the crawler. Updated data sheets and worklog. The log of who I've followed (and if they've followed back) are all on the twitter crawler page.
''10-12:45'' So much twitter. Finally found the bug that has plagued the program (sleep_on_rate_limit should have been False). Program is now running on my dummy account, and I am going to check its progress on monday YAY.
'''11/1/16'''
'''12:15 - 2/3/17':''Unfollowed the non responders, followed about 100 people using the crawler. Updated my data sheets about how people have responded and added all the new followers to the log on [[Christy Warden (Social Media)]] twitter crawler page.
''2-4:45'' Prepped the next application of my twitter crawling abilities, which is going to be a constantly running program on a dummy account which follows a bunch of new sources and dms the McNair account when something related to us shows up.
# Patent Data (more people) and VC Data (build dataset for paper classifier)
# US Universities patenting and entrepreneurship programs (help w code for identifying Universities and assigning to patents)
# Matching tool in Perl (fix, run??)
# Collect details on Universities (look on wikipedia, download xml and process)
# Maps issue
(note - this was moved here by Ed from a page called "New Projects" that was deleted)'''11/3/16'''
''12:15-12:30:'2/6/17'I made a mistake today! I intended to fix a bug that occurred in my DM program, but accidentally started running a program before copying the program's report about what went wrong so I could no longer access the error report. I am running the program again between now and Thursday and hoping to run into the same error so I can actually address it. (I believe it was something to do with a bad link). I did some research about catching and fixing exceptions in a program while still allowing it to continue, but I can't really fix the program until I have a good example of what is going wrong.
Worked on ''12:30 - 2:30:'' Unfollowed the classification based on description algorithm non responders, followed about 100 people using the whole time I was herecrawler. I was able to break down the new Updated my data so that the key words are sheets about how people have responded and added all found and accounted for on a given set of data and so that I can go through a description and tag the words and output a matrix. Now I am trying to develop a way new followers to generate the output I anticipate from the input matrix of tagged wordslog on [[Christy Warden (Social Media)]] twitter crawler page. Tried MATLAB but I would have to buy a neural network package and I didn't realize until the end ve noticed that our ratios of successful returns of the day. Now our follow are improving, I am unsure whether I am looking into writing my own neural network getting better at picking node accounts or finding a good python library to runwhether our account is gaining legitimacy because our ratio is improving.
http''2-4:15'' I had the idea after my DM program which runs constantly had (some) success, that I could make the follow crawler run constantly too? I started implementing a way to do this, but haven't had a chance to run or test it yet. This will present serious difficulties because I don't want to do anything that could potentially get us kicked off twitter//scikit-learnlose my developer rights on our real account. It is hard to use a dummy acct for this purpose though, because nobody will follow back an empty account so it'll be hard to see if the program succeeds in that base case.org/stable/modules/svmI will contemplate tonight and work on it Thursday.html#svm
going ''4:15-4:30'' Started adding comments and print statements and some level of organization in my code in case other/future interns use it and I am not at work to try this on Wednesdayexplain how it functions. The code could definitely do with some cleanup, but I think that should probably come later after everything is functional and all of our twitter needs are met.
''4:30-4:45'' Updated work log and put my thoughts on my social media project page.
'''2/17/17'''
Comment section of Industry Classifier wiki page.'''11/8/16'''
''12:15-1'' Talked to Ed about my project and worked out a plan for the future of the twitter crawler. I will explain all of it on the social media page.
'''2/20/17'1- 4:45''Worked on updating the crawler. It is going to take awhile but I made a lot of progress today and expect that it should be working (iffily) by next Thursday.
Worked on building a data table of long descriptions rather than short ones and started using this as the input to industry classifier.
'''11/10/16'''
'''2/22/1712:15 - 4:45''Tried to fix bug in my retweeting crawler, but still haven't found it. I am going to keep running the program until the error comes up and then log into the RDP as soon as I notice and copy down the error. Worked on changes to the crawler which will allow for automation.
Finished code from above, ran numerous times with mild changes to data types (which takes forever) talked to Ed and built an aggregation model.
'''11/15/16'''
'''2/24/17'12:15 - 1:30''Changing twitter crawler.
About to be done ''1:30 - 4:45'' Worked on pulling all the data for the executive orders and bills with industry classifier. Got 76% accuracy now, working on Peter (we built a file that can be used by non-comp sci people where you just type script in anticipation of Harsh gathering the name data from GovTrack which will build a tsv of a file with a Company [tab] description format and it will output Company [tab] Industry. Worked on allowing this program to run without needing to rebuild the classification matrix every single time since I already know exactly what I'm training it on. Will be done today or Monday I anticipate.data)
'''211/2717/1716'''
Classifier is done whooo! It runs much more quickly than anticipated due to the use of the python Pickle library (discovered by Peter) and I will document its use on the industry classifier page. (Done''12: http15 - 1://mcnair.bakerinstitute.org/wiki/Industry_Classifier).I also looked through changes to Enclosing Circle and realized a stupid mistake which I corrected and debugged and now a circle run that used to take ten minutes takes seven seconds. It is ready to run as soon as Peter is done collecting data, although I30''d like to think of a better way to test to make sure that these really are the optimal circles.Changing twitter crawler
''1:30 - 5:30'' Fixed the script Peter and I wrote because the data Harsh gathered ended up being in a slightly different form than what we anticipated. Peter built and debugged a crawler to pull all of the executive orders and I debugged the tsv output. I stayed late while the program ran on Harsh's data to ensure no bugs and discovered at the very very end of the run that there was a minor bug. Fixed it and then left.
'''3/01/17'''
Plotted some of the geocoded data with Peter and troubleshooted remaining bugs. Met with Ed and discussed errors in the geodata, which I need to go through and figure out how to fix. Worked on updating documentation of enclosing circles and related projects.'''11/22/16'''
''12:15- 2'' Worked on updating the crawler so that it runs automatically. Ran into some issues because we changed from python 2.7 to anaconda, but got those running again. Started the retweeter crawler, seems to be working well.
'''3/06/17'2-2:30''Redid the Bill.txt data for the adjusted regexes. Met with Harsh, Ed and Peter about being better at communicating our projects and code.
Worked ''2:30-4:30'' Back to the twitter crawler. I am now officially testing it before we use it on Enclosing Circle our main account and have found some bugs with data and started collection that have been adjusted. I realized at the very end of the day that I have a logical flaw in my code that needs to be adjusted because only 1 person at a time goes into the people we followed list. Basically, because of this, we will only be following one person in every 24 hour period. When I get back from Thanksgiving, I need to change the geocoder which unfollow someone function. The new idea is running that I will follow everyone that comes out of a source node, and should continue then call the unfollow function for as long as it will run for while maintaining the condition that the top person on the list was followed for more than one day. I will likely need only one more day to run through Wednesdayfinish this program before it can start running on our account.
'''3/20/17'4:30 - 4:45''In response to the "start communicating with the comp people" talk, I updated my wiki pages and work log on which I have been heavily slacking.
Tried to debug Enclosing Circle with Peter. Talked through a Brute force algorithm with Ed, wrote explanation of Enclosing circle on Enclosing Circle wiki page and also wrote an English language explanation of a brute force algorithm.
'''11/29/16'''
'''3/27/17'12:15- 1:45''Fixed code and reran it for gov track project, documented on E&I governance
More debugging with Peter. Wrote code ''1:45- 2'' Had accelerator project explained to remove subsumed circles and tested it. Discovered that we were including many duplicate points which was throwing off our results .me
'''3/29/17'2 - 2:30''Built histograms of govtrack data with Ed and Albert, reran data for Albert.
Tried to set up an IDE for rewriting enclosing circle in C.''2:30-4:45'' Completed first 5 reports (40-45) on accelerators (accidentally did number 20 as well)
'''312/311/1716'''
Finally got ''12:15- 3'' Fixed the IDE set up after many youtube tutorials and sacrifices to the computer gods. It is perl code that gets a 30 day trial so I need to check with Ed about if a student license is a thing we can use or not for after that. Spent time familiarizing myself with the IDE and writing some toy programs. Tried to start writing my circle algorithm in C and realized that this is an overwhelming endeavor because I used many data structures that are not supported by C at list of all. I think Bills that I could eventually get it working if given a ton of time but the odds are slim on it happening in the near future. Because of thishave been passed, I started reading about some programs that take in python code and optimize parts then composed new data of it using C which might be helpful (Psyco is the one I was looking at). Will talk to Ed and Peter on MondayBills with relevant buzzword info as well as whether or not they were enacted.
''3 - 4:45'' Worked on Accelerators data collection.
'''04/03/17Notes from Ed'''
[[Matching Entrepreneurs I moved all of the Congress files from your documents directory to VCs]]: E:\McNair\Projects\E&I Governance Policy Report\ChristyW

Navigation menu