Software Repository Listing

From edegan.com
Jump to navigation Jump to search

This page lists all software/tools available on our Software Repository. The documentation on using a particular tool will be on its separate wiki page.

For information and tutorial on how to access McNair git server, see Software Repository.
Read the tutorial and instructions first before pushing anything to the git-server.

Tools not currently in the repository

Tools in the repository

Center IT Sysadmin

This repository contains all tools and scripts meant for system administration (stuff like backup scripts..)

  • See the Center IT page for current documentation.

Harvard Dataverse

This repository contains all tools and scripts related to Harvard Dataverse.


Contents

cleaning db.sql

copytable.sql

createtables.sql

droptables.sql

ExecutiveOrderCrawler

This repository downloads and parses Executive Orders from Archives.gov using Scrapy and Selenium.

This project is documented on the McNair Wiki under the Code section/Executive Orders at: http://mcnair.bakerinstitute.org/wiki/E%26I_Governance_Policy_Report#Code


Contents

Order_links.txt This text file contains a list of links to the executive orders that need to be downloaded.

Executive Spider The executive folder contains all the necessary components to run a successful Scrapy crawl.

Extractor.py This script runs through the text files of the executive orders, and outputs a CSV with a 1 if the order hit a buzzword, and a 0 if it did not.

Geocoding Inventor Locations

This repository holds software for matching Inventor addresses to known locations. There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool.


Contents

MatchLocations.pl Perl version

Geocode.py Python version

GovTrack

The code in this repository is meant for scraping the govtrack website and running analytics on the data retrieved. This could prove helpful in the ongoing entrepreneurship research at McNair Center.


Contents

GovPassed.py Python code. This filters Bills by the ones that passed.

GovTrack.py Python code. Takes all bills from a directory and creates a tab-delimited text file of their stats, such as number of words and number of entrepreneurship buzzwords.

Govtrack_webcrawler.pl Perl code.

Govtrack_webcrawler_AllEnactedBills.pl Perl code.

Geocoding Inventor Locations

This repository holds software for matching Inventor addresses to known locations. There are two programs/scripts that do same job. One is implemented in Perl(old) and other in Python(new). You should probably use the newer tool.

Matcher

This repository contains the matcher tool which is used to match firm names given two lists.

Patent Data Parser

This repository contains all tools developed for patent data parsing.

TwitterAPI

Python code for interacting with Twitter

This project is documented on the McNair Wiki

http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_1)

http://mcnair.bakerinstitute.org/wiki/Christy_Warden_(Twitter_Crawler_Application_2)


Contents

AutoFollower.py Twitter crawler application 1 above. Incomplete file attempting to automate the process.

Automate.py Twitter crawler application 2 above. Complete python code.

InfoGrabber.py Gets information about an input twitter user.

Twitter_Follower_Finder.py The first instantiation of the Twitter Crawler application 1 described above.

Utilities

This repository contains various utilities developed for text processing and other generally useful tools. See the wiki pages for each tool's documentation.

Web Crawler

This repository contains all software for web crawlers.