Difference between revisions of "Software Repository Listing"
Line 9: | Line 9: | ||
This repository contains all tools and scripts meant for system administration (stuff like backup scripts..) | This repository contains all tools and scripts meant for system administration (stuff like backup scripts..) | ||
*See the [[Center IT]] page for current documentation. | *See the [[Center IT]] page for current documentation. | ||
− | |||
==Harvard Dataverse== | ==Harvard Dataverse== | ||
This repository contains all tools and scripts related to Harvard Dataverse. | This repository contains all tools and scripts related to Harvard Dataverse. | ||
*The [[Harvard Dataverse]] page provides instruction on how to access data. | *The [[Harvard Dataverse]] page provides instruction on how to access data. | ||
+ | ---- | ||
+ | '''Contents''' | ||
− | + | ''cleaning db.sql'' | |
− | |||
− | |||
− | |||
− | |||
− | + | ''copytable.sql'' | |
− | |||
− | |||
− | + | ''createtables.sql'' | |
− | |||
− | |||
− | |||
− | + | ''droptables.sql'' | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==ExecutiveOrderCrawler== | ==ExecutiveOrderCrawler== | ||
Line 58: | Line 40: | ||
''Extractor.py'' | ''Extractor.py'' | ||
This script runs through the text files of the executive orders, and outputs a CSV with a 1 if the order hit a buzzword, and a 0 if it did not. | This script runs through the text files of the executive orders, and outputs a CSV with a 1 if the order hit a buzzword, and a 0 if it did not. | ||
+ | |||
+ | ==Geocoding Inventor Locations== | ||
+ | This repository holds software for matching Inventor addresses to known locations. | ||
+ | There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool. | ||
+ | *See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl. | ||
+ | *See [[Geocode.py]] for the newer version in Python. | ||
+ | |||
+ | ---- | ||
+ | '''Contents''' | ||
+ | |||
+ | ''MatchLocations.pl'' Perl version | ||
+ | |||
+ | ''Geocode.py'' Python version | ||
==GovTrack== | ==GovTrack== | ||
Line 74: | Line 69: | ||
''Govtrack_webcrawler_AllEnactedBills.pl'' Perl code. | ''Govtrack_webcrawler_AllEnactedBills.pl'' Perl code. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==Geocoding Inventor Locations== | ==Geocoding Inventor Locations== | ||
Line 98: | Line 75: | ||
*See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl. | *See [[Geocoding Inventor Locations (Tool)]] for documentation on the older version implemented in Perl. | ||
*See [[Geocode.py]] for the newer version in Python. | *See [[Geocode.py]] for the newer version in Python. | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
==Matcher== | ==Matcher== | ||
Line 138: | Line 108: | ||
The first instantiation of the Twitter Crawler application 1 described above. | The first instantiation of the Twitter Crawler application 1 described above. | ||
+ | ==Utilities== | ||
+ | This repository contains various utilities developed for text processing and other generally useful tools. See the wiki pages for each tool's documentation. | ||
+ | *[[Fuzzy match names (Tool)]] | ||
+ | *[[Godo (Tool)]] | ||
+ | *[[Normalizer Documentation | Normalizer]]. On the [[Software Repository|git-server]] we have many different versions like normalize fixed width, normalize surnames. | ||
+ | |||
+ | ==Web Crawler== | ||
+ | This repository contains all software for web crawlers. | ||
+ | *[[Whois Parser]] pulls the Whois information given a list for URLs. | ||
+ | *[[PhD Masterclass - How to Build a Web Crawler]]: Ed's class on building a web crawler. | ||
+ | *[[LinkedIn Crawler (Tool)]] : web scraper for Linked In. | ||
[[category:McNair Admin]] | [[category:McNair Admin]] | ||
[[admin_classification::Software Repository| ]] | [[admin_classification::Software Repository| ]] |
Revision as of 11:22, 13 March 2017
This page lists all software/tools available on our Software Repository. The documentation on using a particular tool will be on its separate wiki page.
For information and tutorial on how to access McNair git server, see Software Repository. Read the tutorial and instructions first before pushing anything to the git-server.
Contents
Repositories on McNair git server
Center IT Sysadmin
This repository contains all tools and scripts meant for system administration (stuff like backup scripts..)
- See the Center IT page for current documentation.
Harvard Dataverse
This repository contains all tools and scripts related to Harvard Dataverse.
- The Harvard Dataverse page provides instruction on how to access data.
Contents
cleaning db.sql
copytable.sql
createtables.sql
droptables.sql
ExecutiveOrderCrawler
This repository downloads and parses Executive Orders from Archives.gov using Scrapy and Selenium.
This project is documented on the McNair Wiki under the Code section/Executive Orders at: http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report#Code
Contents
Order_links.txt This text file contains a list of links to the executive orders that need to be downloaded.
Executive Spider The executive folder contains all the necessary components to run a successful Scrapy crawl.
Extractor.py This script runs through the text files of the executive orders, and outputs a CSV with a 1 if the order hit a buzzword, and a 0 if it did not.
Geocoding Inventor Locations
This repository holds software for matching Inventor addresses to known locations. There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool.
- See Geocoding Inventor Locations (Tool) for documentation on the older version implemented in Perl.
- See Geocode.py for the newer version in Python.
Contents
MatchLocations.pl Perl version
Geocode.py Python version
GovTrack
The code in this repository is meant for scraping the govtrack website and running analytics on the data retrieved. This could prove helpful in the ongoing entrepreneurship research at McNair Center.
Contents
GovPassed.py Python code. This filters Bills by the ones that passed.
GovTrack.py Python code. Takes all bills from a directory and creates a tab-delimited text file of their stats, such as number of words and number of entrepreneurship buzzwords.
Govtrack_webcrawler.pl Perl code.
Govtrack_webcrawler_AllEnactedBills.pl Perl code.
Geocoding Inventor Locations
This repository holds software for matching Inventor addresses to known locations. There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool.
- See Geocoding Inventor Locations (Tool) for documentation on the older version implemented in Perl.
- See Geocode.py for the newer version in Python.
Matcher
This repository contains the matcher tool which is used to match firm names given two lists.
- See The Matcher (Tool) for documentation.
Patent Data Parser
This repository contains all tools developed for patent data parsing.
- Patent Data (Tool) and Patent Data Extraction Scripts (Tool) pages on the wiki describe our Patent Database schema and corresponding XML parsing tools.
- Also, see USPTO Assignees Data which explains Patent Assignee Database schema and relevant XML parsing tools.
TwitterAPI
Python code for interacting with Twitter
This project is documented on the McNair Wiki
http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1)
http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_2)
Contents
AutoFollower.py Twitter crawler application 1 above. Incomplete file attempting to automate the process.
Automate.py Twitter crawler application 2 above. Complete python code.
InfoGrabber.py Gets information about an input twitter user.
Twitter_Follower_Finder.py The first instantiation of the Twitter Crawler application 1 described above.
Utilities
This repository contains various utilities developed for text processing and other generally useful tools. See the wiki pages for each tool's documentation.
- Fuzzy match names (Tool)
- Godo (Tool)
- Normalizer. On the git-server we have many different versions like normalize fixed width, normalize surnames.
Web Crawler
This repository contains all software for web crawlers.
- Whois Parser pulls the Whois information given a list for URLs.
- PhD Masterclass - How to Build a Web Crawler: Ed's class on building a web crawler.
- LinkedIn Crawler (Tool) : web scraper for Linked In.