Recently the torrent download seems to have picked up again. So here are some hints for the other 20 users that are trying to get the full Geocities Torrent as well (all assuming you are running some sort of Unix and want to serve the Geocities copy through a web server):

Decrunching

The torrent directories UPPERCASE, LOWERCASE and NUMBERS contain huge amounts of archives. Look into these directories to find out which archives don’t have files that end in “.part” – this means they are already completely downloaded and can be decrunched. Use the following command to create a list of these archives:

$ find . -iregex ".*geocities-[a-j].*001" > list.txt

The regular expression is looking for archives from the letter a to the letter j. Then run this perl script in the same directory:

open FILE, "< list.txt";
open LOG, "> log.txt";
while (<FILE>) {
    chop; chomp;
    my $tarname = substr($_, 0, -7) . "\n";
    print `7z -y x $_`;
    print `tar -xf $tarname`;
    print `rm $tarname`;
    print LOG $_."\n";
};
close FILE;
close LOG;

Try monitoring the progress by tailing log.txt.

Serving

The torrent’s directory www.geocities.com contains a lot of file links to the YAHOOIDS directory decrunched before. Unfortunately, the file links make assumptions about where you put the torrent data in your file system. For example

1969bronco -> /geocities/YAHOOIDS/1/9/1969bronco

means that there should be a folder named “geocities” in the top hierarchy of your hard disk. LOL WUT! Relative file links would have been a better option. To fix them, first cd into www.geocities.com and save the absolute file links in a list:

$ ls -l > list.txt

Then run the following perl script:

open LIST, "< list.txt";
while (<LIST>) {
    if (~/(\S+) -> \/geocities(\S+)/) {
        print `rm -v $1`;
        print `ln -sv ..$2 $1`;
    }
}

The resulting file links look like this:

1969bronco -> ../YAHOOIDS/1/9/1969bronco

A last hint: You can speed up the loading of the www.geocities.com index in your web browser by navigating to its address on your web server, wait once for the index being created, and then saving the generated HTML as “index.html” in the www.geocities.com directory.

Browsing

Firefox is recommended to browse the contents of the Geocities Torrent, because it supports userscripts. Many URLs on Geocities pages point to absolute locations that no longer exists. The Geocities Torrent Link Fixer will change URLs in links, framesets, images and background images to point to your local copy. Inside the script, change the base URL “localhost/geocities/” according to your hosting setup.

??¿

If you have questions please use the comments.


9 Responses to Tips for Torrenters

  • # Thetoadfromstarfox 2011-03-26 02:13

    I am an absolute beginner and i installed linux just to be able to unpack this stuff correctly cause i’ve gotten some issues in windows when trying to unpack it.

    Let’s say I have the downloaded torrent on a external 1tb usb hd called c: and I want to unpack this whole torrrent to another 1tb usb hd called e: , how exactly do I write this in the perl script for it to do this?.

    Thanks.

  • # drx 2011-03-26 07:22

    In Linux the drives are mounted somewhere in your file system as directories. If you want to unpack to another directory than where the packed archive is stored, you have to specify it in the unpacking command.

    For example tar:

    tar -C /path/you/want/it/in -xf archive.tar

    In the Perl scripts published here you will find the corresponding commands. You can change them inside there. (Most of our Perl scripts just call system commands and keep some stats.)

    In this case the Perl line should read:

    print `tar -C /path/you/want/it/in -xf $tarname`;

  • # Thetoadfromstarfox 2011-03-26 07:35

    open FILE, ” log.txt”;
    while () {
    chop; chomp;
    my $tarname = substr($_, 0, -7) . “\n”;
    print `7z -y x $_`;
    print `tar -xf $tarname -C /media/myotherdd/`;
    print `rm $tarname`;
    print LOG $_.”\n”;
    };
    close FILE;
    close LOG;

    I tried executing it like this and it still unpacks it in the folder where the archive is instead of on my other hdd , what am i missing here?.

  • # drx 2011-03-26 07:41

    Arguments in the wrong order.

    print `tar -C /media/myotherdd/ -xf $tarname`;

    That “-C” thing has to go first.

    Alternatively, you could use the graphical desktop to unpack all these. I just use scripts because I am lazy.

  • # Thetoadfromstarfox 2011-03-26 07:53

    hmm odd , it refuses to unpack it on my other hdd so i guess ill have to go the gui way.

  • # Thetoadfromstarfox 2011-03-27 02:24

    How large do you estimate the Uppercase dir is unpacked (the second unpacking), over 600gig?

  • # drx 2011-03-28 20:17

    Sorry Toad, I cannot tell how large the archives will be decrunched. I had my computer counting the bytes for two days now and it still didn’t finish. (This is where UNIX filesystems SUCK.) To avoid running out of disk space with two 1TB HDDs, you have to decrunch one archive at a time.

  • # Thetoadfromstarfox 2011-03-29 14:38

    Ok thanks for the effort though!:). I’ve been having some issues unpacking it , giving me errors on some rars so i think the torrent got fubared when i downloaded it with windows. I read on the archive teams homepage somewhere in the comments that theres a file system error with the file layout in the torrent itself so i guess thats what has happened to me. Now to figure out what to delete of the Uppercase dir and fix with redownloading the parts in linux *blerugh*. Guess i’ll spend some time digging through Lowercase in the meantime since it seems healthy atleast.

  • # Thetoadfromstarfox 2011-03-30 15:21

    I am getting a bit confused by how they have packed this, maybe i should just redownload the whole thing with linux heh.

    My uppcase dir is 379gig and i have 482 geocities files ending with .001 to unpack and they all run fine through 7z but they only unpack 45gig of the 370gig thats there, and to make things even more confusing i got a bunch of files ending on “001.alt” which aren’t even listed in the .torrent.

    *headache*


Leave a Reply

Your email address will not be published. Required fields are marked *