1. Make a list of the files that need inspection with convmv, like this:
    (convmv --lowmem -r --nfc -f latin1 -t utf8 *) 2>&1 >> ~/Desktop/encoding_errors.txt
    You can find out here what files probably have filenames not encoded in utf8.
  2. Check out each directory containing bad filenames. Usually, inside one profile you will only encounter one encoding. To find out which one exactly, grep for parts of the bad file name in surrounding HTML files. For example:
    $ ls
    A?onet.jpg
    milenio.html
    $ grep 'onet\.jpg' *.html
    milenio.html:    <td colspan=3 rowspan=1 width=314 align="center" [rest of output line ommitted]

    Now you know that milenio.html links to the file that has a mysterious file name.

  3. Check out the HTML file in Firefox:
    $ firefox milenio.html
    By looking at the source code (Control+U) or “page information” (Control+I) you can most of the time guess the used encoding. Either

    • it is explicitly written into the <head> part of the HTML, like
      <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
    • or the characters are written in entities inside the HTML, like &Agrave;, &#192; or %C0. Then the browser will usually display the correctly and you can try different encodings for the file.
    • If no encoding is specified anywhere, at least on Geocities the likelihood that it is iso-8859-1 is very high. (This used to be Netscape’s default encoding if no other was specified.)
  4. Use convmv to convert the file names:
    $ convmv -f iso-8859-1 -t utf8 -r --notest *
    The –notest option makes convmv actually do the renaming, if this option is omitted, convmv will just display what it would do to the files.

Leave a Reply

Your email address will not be published. Required fields are marked *