Changes

Jump to navigation Jump to search
195 bytes added ,  15:59, 25 September 2017
There are still geo errors in the db. Addresses within the US have incorrect geo-coordinates. To fix this problem we will just lookup all the addresses in the DB using the Geocode.py script. Also we need to pull a company level file from SDC because the addresses will be copied down or be null by the normalizer. Modify your round ssh sdc script to remove the round dates. Therefore only one line will be assigned to one company. There will be no normalization errors this way. Then copy into the db and copy out all the distinct coname, statecode, datefirstinv that have a value in addr1 or addr2. Then run this through the geocode script. Copy the result back into the db and redo the colevel output tables.
NOTE '''NOTES FROM ED: *The lat/long data has to be stored as decimal or numeric to prevent precision problems. *The join was to the old (wrong) base table, that had addresses copied down. *The code below has been fixed up and rerun.The fixed up code is in ProcessingCoLevelSimple.sql
DROP TABLE geoallcoords;
DROP TABLE coleveloutput;
CREATE TABLE coleveloutput AS
SELECT companybasecore2A.coname, companybasecore2A.statecode, companybasecore2A.datefirstinv, companybasecore2A.city, companybasecore2A.addr1, companybasecore2A.addr2, companybasecore2A.zip, g.latitude, g.longitude, d.deaddate, d.aliveyear, d.deadyear FROM companybasecore2 sdccompanybasecore2 AS A LEFT JOIN deadalive1 AS d ON d.coname=companybasecore2A.coname AND d.statecode=companybasecore2A.statecode AND d.datefirstinv=companybasecore2A.datefirstinv LEFT JOIN geoallcoords AS g ON g.coname = companybasecore2A.coname AND g.statecode = companybasecore2A.statecode AND g.datefirstinv = companybasecore2 A.datefirstinv
WHERE hadgrowthvc=1;
--3257532380
DROP TABLE colevelsimple;
CREATE TABLE colevelsimple AS
SELECT coname, statecode, datefirstinv, city, addr1, addr2, zip, aliveyear, deadyear, latitude, longitude
FROM coleveloutput WHERE aliveyear IS NOT NULL and deadyear IS NOT NULL AND latitude IS NOT NULL;
--31523
<nowiki>\COPY colevelsimple TO 'colevelsimple.txt' WITH DELIMITER AS E'\t' HEADER NULL AS '' CSV</nowiki> '''NOTE FROM ED:'''*The query below is now redundant. *The same cleaning is done in ProcessingCoLevelSimple.sql*The clean table is CoLevelSimpleClean*This clean table is used as the basis for CoLevelBlowOut
I also added flags on the geodata table to filter points outside the US. You can use the geoallcoords1 table instead of geoallcoords and set excludeflag = 1 to filter out 292 erroneous points when you create your colevel tables.

Navigation menu