Difference between revisions of "Whois Parser"
(10 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | {{ | + | {{Project |
+ | |Has project output=Tool | ||
|Has image=Whois Parser | |Has image=Whois Parser | ||
|Has title= | |Has title= | ||
Line 5: | Line 6: | ||
|Has start date= | |Has start date= | ||
|Has deadline= | |Has deadline= | ||
− | |Has keywords= | + | |Has keywords=Tool |
− | | | + | |Has sponsor=McNair Center |
|Has notes= | |Has notes= | ||
|Has project status=Complete | |Has project status=Complete | ||
Line 13: | Line 14: | ||
|Does subsume= | |Does subsume= | ||
}} | }} | ||
+ | |||
+ | ==Current Notes== | ||
+ | |||
+ | Note: WHOIS is not an acronym but should be capitalized. It isn't here for legacy reasons. | ||
+ | {{Colored box|title=NOTICE|content=This page has a naming issue. Alternative versions are available at [[WhoIsParser.pl]], [[WhoIs Parser]], [[WhoisParser]], and [[Whois Parser]]}} | ||
+ | |||
+ | The latest version of the script (based on v2 from Kunal) is in: | ||
+ | E:\tools\WhoisParser\WhoisParser.pl | ||
+ | |||
+ | Packages were updated on Father using PPM (as admin): | ||
+ | *Net::WhoisNG | ||
+ | *Net::Whois::Parser | ||
+ | |||
+ | Packages were installed on Mother: | ||
+ | *cpanm Date::Manip | ||
+ | *cpanm Net::WhoisNG --force | ||
+ | *cpanm Net::Whois --force | ||
+ | *cpanm Net::Whois::Parser | ||
+ | |||
+ | It was run on Father as: | ||
+ | perl WhoisParser.pl -file="DistinctIncubatorDomains.txt" -outfile="IncubatorWhois.txt" | ||
+ | |||
+ | Note that the Date::Manip functions were commented out in the version on Father, and that line 174 had a map to <nowiki>''</nowiki> added in the join as most records have nulls for most fields. | ||
+ | |||
+ | ==2016 Version== | ||
[[Internal Classification::Internal Resources| ]] | [[Internal Classification::Internal Resources| ]] | ||
Line 30: | Line 56: | ||
perl WhoIsParser.pl -file=listofurls.txt -outfile=listofurls_processed.txt | perl WhoIsParser.pl -file=listofurls.txt -outfile=listofurls_processed.txt | ||
− | = NAME = | + | === NAME === |
WhoIs Parser - Retrieves and parses Whois information | WhoIs Parser - Retrieves and parses Whois information | ||
Line 36: | Line 62: | ||
corresponding columns with information from the WhoIs API. | corresponding columns with information from the WhoIs API. | ||
− | = SYNOPSIS = | + | === SYNOPSIS === |
perl whoisParser -file=<file> [-outfile=<file>] | perl whoisParser -file=<file> [-outfile=<file>] | ||
− | = OPTIONS = | + | === OPTIONS === |
-file=<file>: Name of file of domain names. | -file=<file>: Name of file of domain names. | ||
Line 206: | Line 232: | ||
http://www.alpheus.net 2003-03-27T23:14:33Z 27-mar-2018 2016-03-28T11:22:05Z Alpheus Firstcall 1301 Fannin St.20th Floor Houston 77002 US 1301 Fannin St.20th Floor Houston 77002 US | http://www.alpheus.net 2003-03-27T23:14:33Z 27-mar-2018 2016-03-28T11:22:05Z Alpheus Firstcall 1301 Fannin St.20th Floor Houston 77002 US 1301 Fannin St.20th Floor Houston 77002 US | ||
+ | |||
+ | ==Summer 2018 Work== | ||
+ | |||
+ | I used this parser after running my Google URL finder as detailed on http://mcnair.bakerinstitute.org/wiki/U.S._Seed_Accelerators#Finding_Company_URLs. | ||
+ | |||
+ | Type this in the command line: | ||
+ | perl whoisParser_v2.pl -file="inputfile" -outfile="outputfile" | ||
+ | |||
+ | Associated files can be found in: | ||
+ | E:\McNair\Projects\Accelerators\Summer 2018\url finder | ||
+ | |||
+ | Input file is allURLS.txt and output file is whoisresults.txt |
Latest revision as of 19:03, 12 November 2020
Whois Parser | |
---|---|
Project Information | |
Has title | |
Has owner | Kunal Shah |
Has start date | |
Has deadline date | |
Has keywords | Tool |
Has project status | Complete |
Has sponsor | McNair Center |
Has project output | Tool |
Copyright © 2019 edegan.com. All Rights Reserved. |
Contents
Current Notes
Note: WHOIS is not an acronym but should be capitalized. It isn't here for legacy reasons.
The latest version of the script (based on v2 from Kunal) is in:
E:\tools\WhoisParser\WhoisParser.pl
Packages were updated on Father using PPM (as admin):
- Net::WhoisNG
- Net::Whois::Parser
Packages were installed on Mother:
- cpanm Date::Manip
- cpanm Net::WhoisNG --force
- cpanm Net::Whois --force
- cpanm Net::Whois::Parser
It was run on Father as:
perl WhoisParser.pl -file="DistinctIncubatorDomains.txt" -outfile="IncubatorWhois.txt"
Note that the Date::Manip functions were commented out in the version on Father, and that line 174 had a map to '' added in the join as most records have nulls for most fields.
2016 Version
This wiki page is under Additional Links/WhoisParser
The whoisParser was written by Kunal Shah on March 20, 2016 and is located
repository: Web_Crawler branch: shoeb_patch/whoisParser directory: /WhoIsParser file: whoisParser.pl
Location:
E:\McNair\Projects\Houston\WhoIsParser
To use this parser, copy above perl program into a directory, make it current working directory (that is, use 'cd' command if needed) and run the following command. The directory should also have the input file(see below).
perl WhoIsParser.pl -file=listofurls.txt -outfile=listofurls_processed.txt
NAME
WhoIs Parser - Retrieves and parses Whois information Specifically, takes a file with a column of domain names and populates the corresponding columns with information from the WhoIs API.
SYNOPSIS
perl whoisParser -file=<file> [-outfile=<file>]
OPTIONS
-file=<file>: Name of file of domain names. -outfile=<file>: The name of the outfile -h: Display help
USAGE & FEATURES
Arguments:
A text file with a column of domain names
Returns:
A text file of the domain names with the next 12 columns populated with information pulled from the Whois API. A header specifying each column is inserted into the first row of the file. The columns of information outputed are:
1. Domain Name
2. Creation Date
3. Expiration Date
4. Update Date
5. Registrant Name
6. Registrant Street
7. Registrant City
8. Registrant Postal Code
9. Registrant Country
10. Admin Street
11. Admin City
12. Admin Postal Code
13. Admin Country
BUGS & FEEDBACK
Worked as expected on all example files. Please report any discovered bugs to Kunal.
Tested files: Input: example_file.txt
Output: example_outfile.txt
Input Text:
http://hotmailpasswordsupportnumber.info/
http://www.MidtownDelivery.com
http://www.actionfigurelabs.com
https://m.facebook.com/AddictivePerformance99
http://adknowledgents.wix.com/adknowledgents
http://www.advancedcardiodr.com/
http://www.advancedseismic.com
https://www.alignedsigns.com/ppcregistration6.htm
https://www.alliedwarranty.com/
http://none yet
Output Text:
Domain Name Creation Date Expiration Date Update Date Registrant Name Registrant Street Registrant City Registrant Postal Code Registrant Country Admin Street Admin City Admin Postal Code Admin Country
http://1986ventures.com 2013-09-12T09:25:51Z 12-sep-2016 Domain Admin C/O ID#10760, PO Box 16 Note - Visit PrivacyProtect.org to contact the domain owner/operator Note - Visit PrivacyProtect.org to contact the domain owner/operator Nobby Beach QLD 4218 AU C/O ID#10760, PO Box 16 Note - Visit PrivacyProtect.org to contact the domain owner/operator Note - Visit PrivacyProtect.org to contact the domain owner/operator Nobby Beach QLD 4218 AU
http://2nd.md/ 2010-11-17 2017-11-17
http://www.2ndsquare.com 2013-10-16T04:01:29Z 16-oct-2016 2015-10-16T20:38:12Z Sameer Khan 22215 Tower Terr San Antonio 78259 US 22215 Tower Terr San Antonio 78259 US
http://www.32nddegree.com/ 2008-02-18T18:45:15Z 18-feb-2020 Cutshall, Wes 1321 Upland Dr. Houston 77043 US 1321 Upland Dr. Houston 77043 US
http://www.80legs.com 2008-07-17T21:09:48Z 17-jul-2016 Shion Deysarkar 904 West Avenue Austin 78701 US 904 West Avenue Austin 78701 US
http://hotmailpasswordsupportnumber.info/
http://www.MidtownDelivery.com 2012-01-23T05:01:21Z 23-jan-2017 2015-01-05T05:24:56Z Jim Wiseheart 7655 S. Braeswood#21 Houston 77071 US 7655 S. Braeswood#21 Houston 77071 US
http://accreu.com 2011-05-05T00:11:53.000Z 05-may-2016 Oneandone Private Registration 701 Lee Road Suite 300ATTN Chesterbrook 19087 US 701 Lee Road Suite 300ATTN Chesterbrook 19087 US
http://www.actionfigurelabs.com 2011-02-18T17:40:24Z 18-feb-2017 Phillip Leech 2223 Willowby Dr Houston 77008 US 2223 Willowby Dr Houston 77008 US
https://m.facebook.com/AddictivePerformance99
http://www.additech.com/ 1997-01-24T05:00:00Z 25-jan-2018 Additech, Inc. 10925 Kinghurst Houston 77099 US 10925 Kinghurst Houston 77099 US
http://adknowledgents.wix.com/adknowledgents
http://www.rmudata.com 2000-04-13T17:09:54Z 13-apr-2017 PERFECT PRIVACY, LLC 12808 Gran Bay Parkway West Jacksonville 32258 US 12808 Gran Bay Parkway West Jacksonville 32258 US
http://www.advancedcardiodr.com/ 2012-04-17T14:12:09Z 17-apr-2022 2015-01-08T22:09:14Z Sharafat Hussain Advanced Cardiovascular Care Center800 Peakwood Drive, Suite 8C Houston 77090 US Advanced Cardiovascular Care Center800 Peakwood Drive, Suite 8C Houston 77090 US
http://alwii.org 2011-05-31T21:48:05Z Chi Mao 1917 Ashland St, 2nd FloorIn Select Specialty Hospital Houston 77008 US 1917 Ashland St, 2nd FloorIn Select Specialty Hospital Houston 77008 US
http://www.advancedseismic.com 2009-10-30T19:00:47Z 30-oct-2016 2015-10-31T11:28:22Z na na na na 88888 US na na 88888 US
http://www.AdvoWire.com 2013-07-13T08:43:39Z 13-jul-2018 2013-07-13T08:43:39Z Jason Pampell 6516 North Gessner Houston 77040 US 6516 North Gessner Houston 77040 US
http://www.aggredyne.com 2011-04-01T21:03:52Z 01-apr-2018 Robert C. Hux 10530 Rockley Rd.,Suite 150 Houston 77099 US 10530 Rockley Rd.,Suite 150 Houston 77099 US
http://www.akrostechlabs.com/ 2008-03-24T17:34:07Z 24-mar-2017 2015-03-24T01:54:15Z Registration Private DomainsByProxy.com14747 N Northsight Blvd Suite 111, PMB 309 Scottsdale 85260 US DomainsByProxy.com14747 N Northsight Blvd Suite 111, PMB 309 Scottsdale 85260 US
http://www.aleedex.com 2012-12-27T20:15:55Z 10-jun-2019 2013-06-14T09:54:17Z Farid Premani 10500 Reserve at Fountain Lake Stafford 77477 US 10500 Reserve at Fountain Lake Stafford 77477 US
http://www.alertlogic.com/ 2003-10-10T21:24:13Z 10-oct-2019 PERFECT PRIVACY, LLC 12808 Gran Bay Pkwy West Jacksonville 32258 US 12808 Gran Bay Pkwy West Jacksonville 32258 US
http://www.aliceandlove.com 2014-08-07T01:42:29Z 07-aug-2016 c/o WHOIStrustee.com Limited Riverside View Thornes Lane WF1 5QW GB Riverside View Thornes Lane WF1 5QW GB
https://www.alignedsigns.com/ppcregistration6.htm
https://www.alliedwarranty.com/ 2004-03-31T20:07:28Z 31-mar-2018 2014-03-16T04:17:39Z Registration Private DomainsByProxy.com14747 N Northsight Blvd Suite 111, PMB 309 Scottsdale 85260 US DomainsByProxy.com14747 N Northsight Blvd Suite 111, PMB 309 Scottsdale 85260 US
http://none yet
http://www.alpheus.net 2003-03-27T23:14:33Z 27-mar-2018 2016-03-28T11:22:05Z Alpheus Firstcall 1301 Fannin St.20th Floor Houston 77002 US 1301 Fannin St.20th Floor Houston 77002 US
Summer 2018 Work
I used this parser after running my Google URL finder as detailed on http://mcnair.bakerinstitute.org/wiki/U.S._Seed_Accelerators#Finding_Company_URLs.
Type this in the command line:
perl whoisParser_v2.pl -file="inputfile" -outfile="outputfile"
Associated files can be found in:
E:\McNair\Projects\Accelerators\Summer 2018\url finder
Input file is allURLS.txt and output file is whoisresults.txt