Changes

Jump to navigation Jump to search
933 bytes added ,  15:22, 23 September 2020
no edit summary
The obvious approach is to Export (with or without revision history) and then import. As the page count is likely around 500-600 (see below) this should be feasible, though importing though [[Special:Import]] isn't recommended for page counts greater than 100, due to timeout issues. See https://www.mediawiki.org/wiki/Manual:Importing_XML_dumps. Instead, one should use importDump.php [https://www.mediawiki.org/wiki/Manual:ImportDump.php] for larger imports. Note that mwdumper apparently doesn't work for imports in to MW1.31+, presumably because of database structure changes. Another option is to use pagefromfile.py [https://www.mediawiki.org/wiki/Manual:Pywikibot/pagefromfile.py], which is a part of [https://www.mediawiki.org/wiki/Manual:Pywikibot Pywikibot].
Looking at [[Special:AllPages]], it does seem that some of the pages from the old wiki were previously 'imported' to the new wiki. I suspect that this was done page-by-page by the students, copying over the content. Nevertheless, there are pages with the same name but differing content, which is going to cause collisions. See https://meta.wikimedia.org/wiki/Help:Import#Merging_histories_and_other_complications. pagefromfile.py has the option of adding the old content to the bottom of the new page (or the top), which would be helpful in many cases. However, a careful approach might be best. I could build lists of pages that have distinct names, common names, and common names but different import criteria, etc.
Trying to copy data out of the database and then bringing it back in, almost surely isn't an option as this installation using Postgres and the new one uses MySQL. Also the base structure has changed beyond all recognition in the last decade or so. Likewise, I likely can't install any new extensions on the old wiki -- she's just too old.
The job queue length is 1,270.
===Analyzing Pages===
I created a db called fixwiki and a script fixwiki.sql to load and analyze the page lists from the wikis on Mother and Astarte. The results of the analysis:
Astarte Only: 675
Both: 144
 
===The Choice===
 
Given the analysis, I think I'm going to pull both page sets for posterity, and then only load the "Astarte Only" pages for now. A quick look suggests that the duplicate pages won't be adding a huge amount of content that I care about. I'll do the load with importDump.php, which is the most standard method available. If/when my wiki gets bigger I'll install pywikibots, but for now they seem like another potential vulnerability.
 
I'll then move the repository and pdf directories over by hand, and put some access control on them, and use
 
==Moving Data==
 
I dumped the following from Astarte:
*AstarteOnly pages with full histories to Wikiedia-20200923190317.xml (33MB)
*AstarteOnly pages without histories to Wikiedia-20200923190722 (3MB)
*AstarteAndMother pages with full histories to Wikiedia-20200923190929.xml (22MB)
*AstarteAndMother pages without histories to Wikiedia-20200923192123.xml (705K)

Navigation menu