
Jump to navigation Jump to search
no edit summary
We wrote a couple of simple scripts together to get to grips with Perl.
===Running a Perl Script===
The first was (save it in a file called in the root of your R drive):
Or we can shell on to Bear and run it there:
Use PuTTY to connect to (see [[Research Computing At Haas| here]]).
===Processing Text Data===
Next we went to:,73,222,html?CaseID=2
And we created a file called Data.txt (saved next to the script) that contained the following:
Potential Foreign Corrupt Practices Act Violation
Date: 07/01/2003 (Date of Incident Report)
Misconduct Type: Ethics
Enforcement Agency: SEC
Contracting Party: None
Court Type: Administrative
Amount: $0
Disposition: Pending
Synopsis: "As previously reported in July 2003, we became aware of an incident..."
•1. SEC 10-K (p. 34 of 137)
We then wrote the following script to process the data:
#!/usr/bin/perl -w
#Lines that start with a # are comments that aren't read by the interpreter
use strict;
#The strict module forces us to declare variables before we use them
my @Textfile;
#Declare an array called TextFile
open (DATA,"Data.txt");
#Open a filehandle on our file
while (<DATA>) {
#Read the data from the filehandle, line by line
chomp $_;
#$_ is a special variable - it captures the line being read from the filehandle here
if (!$_) {next;}
#if the line is undefined (i.e. blank) move to the next loop iteration
my $line = $_;
#Set a local variable called line to $_
push (@Textfile, $line);
#Push the line onto the Textfile array
my $Doccell;
#Declare the Doccell variable
for (my $i=0; $i<=$#Textfile; $i++) {
#Do a for loop, starting from i=0, going while i is less than the
#last index of the Textfile array, and incrementing by one each time
if ($Textfile[$i]=~/^Document\(s\):/) {$Doccell=$i;}
#Test to see if the entry matches a regular expression, if it does record the index
my @docs = splice(@Textfile,$Doccell);
#Create a next array by splicing out everything after the index we just found
shift @docs;
#Remove the first element of the docs array
my $Firm = shift @Textfile;
#Set Firm equal to the first element of Textfile (which we just removed)
my $Violation =shift(@Textfile);
#Set Violation equal to the (new) first element of Textfile (which we just removed)
my $Offense={};
#Create an anonymous hash
foreach my $cell (@Textfile) {\
#Iterative over Textfile, setting the current iteration to cell
my ($name,@value)=split(":",$cell);
#Spill the cell on :
my $value=join(":",@value);
#Join the Value array on :
#Set an entry in the Offense hash
#Set the doclist entry in the Offense hash to a reference to the docs array
my $Master=[];
#Define an anonymous array
#Define an anonymous hash in the zeroth cell of the anonymous array
#Set a hash entry
#Set a hash entry
#Set a hash entry
#Open a filehandle for writing (overwrite the file if it exists)
print OUTPUT $Master->[0]->{FirmName};
#Print the output file an entry from the anonymous hash in the anonymous array
print OUTPUT "\t";
#Print a tab
print OUTPUT $Master->[0]->{Violation}."\t";
#Print another entry with another tab on the end
foreach my $key ( sort {$a cmp $b } (keys %{ $Master->[0]->{Offense} } )) {
#Iterate through the hash's keys, in alphabetical order, setting the current key to $key
print OUTPUT $Master->[0]->{Offense}->{$key}."\t";
#Print an entry, with a tab
print OUTPUT "\n";
#Print a new line
close OUTPUT;
#Close the output filehandle - this will flush the write buffer
Anonymous user

Navigation menu