There is now just enough minimal-invasive order brought to the Geocities files that is possible to serve them via a proxy server that even imitates the original URLs completely. Using this proxy, you will be able to click on any historic Geocities URL and experience it in your browser, which ideally is a historic browser as well.1

All technical measurements applied to the data are presented on GitHub in the form of annotated programs written in bash, Perl, SQL and Python. The comments inside the scripts explain problems, considerations, compromises, decisions and technical solutions, trying to match the ideal of software being executable documentation of itself.

You are welcome to evaluate each step and use this information to make your own Geocities proxy server, or use the developed techniques to revive other dead web sites.

Click to enlarge!!

There is a small discussion over at reddit about this graphic. You’re welcome to compare this treemap with the first published treemap, with problematic areas encircled.

The overall number of files has come down from 36 million to 28 million.

Next up is re-packaging the Geocities files and database contents for a cleaned-up distribution on the Internet Archive.

  1. Get your ancient web surfing gear at []

3 Responses to A City Rebuilt

  • # Nathan 2013-04-07 23:41

    I hope that “Cleaned up” doesn’t mean “Divided into hundreds of 7zip archives”. Maybe a new torrent? By the way, one request: Please don’t make the files require a case-sensitive file system. If you do this, it dramatically reduces the number of users that can use the archive.

  • # despens 2013-04-10 20:06

    Dear Nathan,

    dividing a download in multi-gigabytes size into 7zip archives has its merits, especially when
    – you need to finish getting a complete file before your provider resets your connection,
    – you want to verify checksums, or
    – you need to carry it as a whole or partly on a temporary file system that doesn’t allow files larger than 2GB
    – etc …

    The new distribution will be available at the Internet Archive once it is done, a torrent is too unstable.

    Without a case-sensitive file system you will still be lost with Geocities. Removing case sensitivity would mean to rewrite most of the source files and compromise too much fidelity, since the choice of file names were made by the users. Some used case to distinguish thumbnail images from full size images for example.

    If you really want to play with the files on, say, a native Macintosh, you might want to format a disk with HFSX. This generates loads of other problems tho. A tolerable solution is using a virtual machine running Linux.

    Anyway, some directories contain so many files and sub-directories that you can’t access them with a graphical interface.

  • # powerKitten 2016-10-24 20:43

    Hey, has the cleaned up version been uploaded to the Internet Archive? If it has, can you please give me the link.

Leave a Reply

Your email address will not be published. Required fields are marked *