END AS excludeflag
FROM round;
==Fixing erroneous geo-coordinates==
Some of the geocoordinates in the db are dirty and point to locations in India, Eastern Europe. However, the company addresses exist. Isolate the dirty geo-coordinates and do a lookup using Geocode.py script. To isolate place a box around the continental US and flag all points that fall outside the box. Add back the points that are located in Hawaii and Puerto Rico. Then import back into db.
I used longitude boundaries of -66 to -125 and latitude boundaries of 24 to 50.
--identify bad geo coords
DROP TABLE badgeodata;
CREATE TABLE badgeodata (
city varchar(100),
companyname varchar(100),
startyear real,
endyear real,
latitude real,
longitude real,
noaddress int
);
\COPY badgeodata FROM 'badgeodata.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--43724
DROP TABLE geodirtydata;
CREATE TABLE geodirtydata AS
SELECT g.*
FROM geoimport AS g
INNER JOIN badgeodata AS bg ON g.coname = bg.companyname;
--30498
\COPY geodirtydata TO 'geodirtydata.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
DROP TABLE geodirtydatawithflags;
CREATE TABLE geodirtydatawithflags (
coname varchar(100),
statecode varchar(2),
datefirstinv date,
latitude real,
longitude real,
longdirtyflag int,
latdirtyflag int,
hawaiiflag int,
prflag int,
latlongflag int,
masterflag int
);
\COPY geodirtydatawithflags FROM 'geodirtydataflags.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--30498
--import coordinates back into db
DROP TABLE geodirtyfix;
CREATE TABLE geodirtyfix (
coname varchar(100),
statecode varchar(2),
datefirstinv date,
latitude real,
longitude real
);
\COPY geodirtyfix FROM 'geodirtyfix.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV
--2300
DROP TABLE geoimportclean;
CREATE TABLE geoimportclean AS
SELECT g.*
FROM geoimport AS g
WHERE g.coname NOT IN (SELECT coname FROM geodirtyfix);
--40378
--42678
DROP TABLE geoimportfix;
CREATE TABLE geoimportfix AS
SELECT * FROM geoimportclean
UNION ALL
SELECT * FROM geodirtyfix
WHERE latitude IS NOT NULL;
--41718
The redo coleveloutput and colevelsimple using the geoimportfix as your geo table instead of geoimport.
==Cleaning firmbase==
The firmbase table contains undisclosed firms. Add a flag and remove them. Then use firmname, statecode, foundingdate as the key for this table. Check that is valid and make your core table.