Changes

Jump to navigation Jump to search
no edit summary
A request to the CIA to use a web-bot to scrap data from the HTML version of the [https://www.cia.gov/library/publications/the-world-factbook CIA World Factbook] recieved no response. World leader information was downloaded in pdf format from the [https://www.cia.gov/library/publications/world-leaders-1/pdf-version/pdf-version.html CIA World Leaders PDF site] for April 2008, and converted into a plain-text file ([http://www.edegan.com/repository/WorldLeaders-Raw.txt WorldLeaders-Raw.txt]).
The raw file was then reprocessed by a one-off script to produce ([http://www.edegan.com/repository/WorldLeaders-Extracted.txt WorldLeaders-Extracted.txt]). The country codes were corrected using a look-up table ([http://www.edegan.com/repository/WorldLeaders-UNCountryLookup.txt WorldLeaders-UNCountryLookup.txt]) to produce the resulting basic dataset ([http://www.edegan.com/repository/WorldLeaders-ExtractedUNCountry.txt WorldLeaders-ExtractedUNCountry.txt]). Users should note that a very small number of individuals had invalidly coded countries (mostly countries that were not recognized by the UN but were recognized by the CIA) and were excluded in this process. Furthermore, some leaders held multiple positions in their governments and had multiple listings - these were collapsed to a single record.
The NormalizeSurnames.pl script was with following options (and defaults):
Anonymous user

Navigation menu