Difference between revisions of "Python Libraries"

From edegan.com
Jump to navigation Jump to search
Line 16: Line 16:
 
**Can use sent_tokenize() function to split document into sentences, easier that regular expressions
 
**Can use sent_tokenize() function to split document into sentences, easier that regular expressions
 
**Use pos_tag() to tag the sentences. This can be used to extract proper noun
 
**Use pos_tag() to tag the sentences. This can be used to extract proper noun
 +
**there are several packages that need to be downloaded, to do this:
 +
***open up python in the shell
 +
****run nltk.download()
 +
****download all packages

Revision as of 15:47, 19 October 2017


McNair Project
Python Libraries
Project logo 02.png
Project Information
Project Title Python Libraries
Owner Peter Jalbert, Harrison Brown, Christy Warden, Jeemin Sim
Start Date
Deadline Never"Never" contains an extrinsic dash or other characters that are invalid for a date interpretation.
Keywords Python, Libraries
Primary Billing
Notes
Has project status
Copyright © 2016 edegan.com. All Rights Reserved.


This page is dedicated to documenting all Python libraries, working or not. Please include a description of what the library is for, whether or not it is functional, and how to import and use it.

Geocoding Libraries

NLP Libraries

NLTK

NLTK is the Natural Language Toolkit

  • NLTK Information
    • Need to convert text to ascii. Had issues with my PDF texts and had to convert
    • Can use sent_tokenize() function to split document into sentences, easier that regular expressions
    • Use pos_tag() to tag the sentences. This can be used to extract proper noun
    • there are several packages that need to be downloaded, to do this:
      • open up python in the shell
        • run nltk.download()
        • download all packages