rsync problems with diacritics in file names (e.g. "å", "ä")

Got problems with your B2 or B3? Share and get helped!
Post Reply
RobV
Posts: 34
Joined: 05 Oct 2008, 05:48

rsync problems with diacritics in file names (e.g. "å", "ä")

Post by RobV » 01 Jun 2010, 16:32

Dear all,

I have a large MP3 collection with many diacritics in the file names.
When using rsync to copy the entire archive to a mounted hard-disk these file names get distorted; the diacritic characters are replaced by two characters each, of which the first one is something like à or Â.

I attempted using rsync's --iconv parameter, but all my attempts failed. In fact all these attempts have a comparable wrong result:

Code: Select all

rsync -av --iconv=.
rsync -av --no-iconv
rsync -av --iconv=-
rsync -av --iconv=ISO-8859-1,UTF8
rsync -av --iconv=UTF8,UTF8
rsync -av --iconv=ISO-8859-1,ISO-8859-1
My locale is as follows:

Code: Select all

user@bubba:~$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
Could anyone help me out or provide some hint?
Thanks in advance.

Kind regards, RobV

Kiff
Posts: 48
Joined: 08 Feb 2010, 04:09
Location: Norway
Contact:

Re: rsync problems with diacritics in file names (e.g. "å", "ä")

Post by Kiff » 07 Jun 2010, 04:25

In UTF-8 ä is stored as "c3 84". If this is read as ISO-8859-1, it will be considered to be two different characters, the first one (c3) being ã.

Most likely either rsync or your shell reads the UTF-8 filenames as ISO-8859-1.

RobV
Posts: 34
Joined: 05 Oct 2008, 05:48

Re: rsync problems with diacritics in file names (e.g. "å", "ä")

Post by RobV » 07 Jun 2010, 05:08

Hi Kiff,

You are right, and the rsync parameter --iconv=UTF8,UTF8 is there to enforce rsync to treat both read and write operations are UTF-8 based.
Unfortunately the rsync on Bubba 2 doesn't react as I expected.
I think that either my expectations are wrong or the rsync implementation should be improved.

RobV

asparak
Posts: 173
Joined: 08 Jun 2009, 07:38

Re: rsync problems with diacritics in file names (e.g. "å",

Post by asparak » 07 Jun 2010, 06:16

rsync isn't Excito's, its an app with its own support process. Encoding transition issues have plagued unix/linux on all platforms for years, so rsync isn't unique in this.
http://samba.anu.edu.au/rsync/ - You might be able to get support for your specific issue on their mailing list. rsync on bubba (debian) is 3.0.3, which is fairly current (latest is 3.0.7)
Have you tried using UTF16?

Kiff
Posts: 48
Joined: 08 Feb 2010, 04:09
Location: Norway
Contact:

Re: rsync problems with diacritics in file names (e.g. "å",

Post by Kiff » 07 Jun 2010, 06:31

some bugs related to --iconv, symlinks and character sets have been fixed since rsync v.3.0.3, if you have some symlinks in there.

RobV
Posts: 34
Joined: 05 Oct 2008, 05:48

Re: rsync problems with diacritics in file names (e.g. "å",

Post by RobV » 07 Jun 2010, 08:23

I'll have a look at the UTF-16 suggestion and I'll visit the web-site of rsync for further assistance.
Thanks for your help guys.
Rob

RobV
Posts: 34
Joined: 05 Oct 2008, 05:48

Re: rsync problems with diacritics in file names (e.g. "å",

Post by RobV » 23 Jun 2010, 18:44

Dears,

As the Linux tool set is obviously not capable of handling the UTF-8 character set in a proper way (too modern maybe?) I found the solution in running a great tool under Good-Old Microsoft Windows XP.
The freeware tool is called "Diacritics Remover" and it does do exactly what its users expect.
This in contrast with Linux tool rsync which has several well-described options that simply don't work at all.

RobV :cry:

Post Reply