Page 1 of 1

rsync problems with diacritics in file names (e.g. "å", "ä")

Posted: 01 Jun 2010, 16:32
by RobV
Dear all,

I have a large MP3 collection with many diacritics in the file names.
When using rsync to copy the entire archive to a mounted hard-disk these file names get distorted; the diacritic characters are replaced by two characters each, of which the first one is something like à or Â.

I attempted using rsync's --iconv parameter, but all my attempts failed. In fact all these attempts have a comparable wrong result:

Code: Select all

rsync -av --iconv=.
rsync -av --no-iconv
rsync -av --iconv=-
rsync -av --iconv=ISO-8859-1,UTF8
rsync -av --iconv=UTF8,UTF8
rsync -av --iconv=ISO-8859-1,ISO-8859-1
My locale is as follows:

Code: Select all

user@bubba:~$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Could anyone help me out or provide some hint?
Thanks in advance.

Kind regards, RobV

Re: rsync problems with diacritics in file names (e.g. "å", "ä")

Posted: 07 Jun 2010, 04:25
by Kiff
In UTF-8 ä is stored as "c3 84". If this is read as ISO-8859-1, it will be considered to be two different characters, the first one (c3) being ã.

Most likely either rsync or your shell reads the UTF-8 filenames as ISO-8859-1.

Re: rsync problems with diacritics in file names (e.g. "å", "ä")

Posted: 07 Jun 2010, 05:08
by RobV
Hi Kiff,

You are right, and the rsync parameter --iconv=UTF8,UTF8 is there to enforce rsync to treat both read and write operations are UTF-8 based.
Unfortunately the rsync on Bubba 2 doesn't react as I expected.
I think that either my expectations are wrong or the rsync implementation should be improved.

RobV

Re: rsync problems with diacritics in file names (e.g. "å",

Posted: 07 Jun 2010, 06:16
by asparak
rsync isn't Excito's, its an app with its own support process. Encoding transition issues have plagued unix/linux on all platforms for years, so rsync isn't unique in this.
http://samba.anu.edu.au/rsync/ - You might be able to get support for your specific issue on their mailing list. rsync on bubba (debian) is 3.0.3, which is fairly current (latest is 3.0.7)
Have you tried using UTF16?

Re: rsync problems with diacritics in file names (e.g. "å",

Posted: 07 Jun 2010, 06:31
by Kiff
some bugs related to --iconv, symlinks and character sets have been fixed since rsync v.3.0.3, if you have some symlinks in there.

Re: rsync problems with diacritics in file names (e.g. "å",

Posted: 07 Jun 2010, 08:23
by RobV
I'll have a look at the UTF-16 suggestion and I'll visit the web-site of rsync for further assistance.
Thanks for your help guys.
Rob

Re: rsync problems with diacritics in file names (e.g. "å",

Posted: 23 Jun 2010, 18:44
by RobV
Dears,

As the Linux tool set is obviously not capable of handling the UTF-8 character set in a proper way (too modern maybe?) I found the solution in running a great tool under Good-Old Microsoft Windows XP.
The freeware tool is called "Diacritics Remover" and it does do exactly what its users expect.
This in contrast with Linux tool rsync which has several well-described options that simply don't work at all.

RobV :cry: