codepages - encoding of filename characters

Got problems with Bubba? Then this forum is for you.
Locked
dsp76
Posts: 76
Joined: 15 Apr 2007, 14:18

codepages - encoding of filename characters

Post by dsp76 » 05 May 2007, 09:46

Hi,
this seems to be a problem always coming up. I bet others are also interested in the solution.

I don't know, what codepage Bubba uses for filenames, but I guess its UTF-8?

I use WinSCP4 - it always expectes UTF-8 local and remote. When I copy a file with special characters (umlaute) to bubba, WinSCP shows it correctly. But the ssh session with putty shows broken characters... als midnight-commander on bubba shows them incorrectly. Also Mediatomb, when I open its webinterface...

Windows Explorer however, when mapping Bubba via Samba shows filenames correctly....

So what is wrong and what is right?

Looking forward to solve that problem ;-)

dsp

Per
Posts: 12
Joined: 03 Apr 2007, 14:03

Post by Per » 05 May 2007, 10:41

Fix so that international char are correctly displayed with file names:

International characters, e.g. åäö, appear as garbage when you copy files to bubba
from windows and then list with SSH. In order for bubba/samba to hadle international
characters changes /etc/samba/smb.conf:

Under [global] section add:

Code: Select all

unix charset = ISO-8859-1
dos charset = UTF8
The last line enables international chars is only needed if if you access your windows
shares from bubba (with (smbclient). Restart samba for the change to take effect:

Code: Select all

/etc/init.d/samba restart

You must now set the correct locale for the shell in order to see corect chars in the
shell. Install the 'localeconf' package and add the locales you want.

Code: Select all

apt-get update
apt-get install localeconf 

You will be prompted with a list of locales. Select those you whant and select one as
default when prompted.

You cna now set one locale for bash in ~.bashrc (swedish as example, replace with your
locale):

Code: Select all

export LANG=sv_SE
/Per

dsp76
Posts: 76
Joined: 15 Apr 2007, 14:18

Post by dsp76 » 05 May 2007, 14:29

... perfect!

Seems like I need to retransfer a lot of data, in order to not need to rename.

I'm sure I set the codepage of my bubba to UTF-8 initially, now I chose the ISO version.

Many thanks!

dsp

dsp76
Posts: 76
Joined: 15 Apr 2007, 14:18

Post by dsp76 » 06 May 2007, 18:42

Hi,
I fear we need to discuss this a bit further...

1) I had also a look through the Bubba Webinterface Filemanager. As Firefox is also set to UTF-8 by default, again special characters are scrambled, switching manually to ISO fixed the view.

2) Accessing Bubba from my Ubuntu machine, which has UTF-8 as default encoding, again shows scrambled special characters...

Does it mean:
- if I want to use the Bubba filesystem correctly with Windows, Bubba needs to be set to ISO?
- if I want to use the Bubba filesystem correctly with Ubuntu, Bubba needs to be set to UTF-8?

Both Bubba and Ubuntu can handle UTF-8 but Windows can`t?
Everywhere else I read suggestions to use UTF-8 consequently...



:shock: I'm confused...

dsp

Per
Posts: 12
Joined: 03 Apr 2007, 14:03

Post by Per » 07 May 2007, 12:10

Hi!

Bubba uses debian sarge witch uses ISO-8859-1. New distribution like ubuntu or debian etch uses UTF-8 by default. Windows also uses UTF-8.

So what to do to handle international characters? You basically have two options:

1) Change debian sarge to use UTF-8. I guess that it could be done, but you could run into problem with some software. This is the case with Etch.

2a) Translate code page when needed. This is what samba does. It is aware that sarge uses ISO-8859-1 and windows UFT-8 (we told it so with the settings in samba.conf).

So accsessing bubba through sambe will see things propery.

2b) Change setting on the application or machine that access bubba to assume ISO-8859-1, like your firefox.

Hope this clarify this a bit.

/Per

Cheeseboy
Posts: 789
Joined: 08 Apr 2007, 12:16

Post by Cheeseboy » 12 May 2007, 06:58

Hi guys,

Just a quick note on this...
Please be aware that ISO-8859-1 is NOT the same as Windows ANSI.
MS has extended iso-1, and the name of windows ANSI is cp1252...
Not sure if this is relevant to your problem, but just be aware.
When it comes to understanding charset/codepage problems, I find this resource very helpful:
http://czyborra.com/charsets/iso8859.html

Cheers

/Niklas

Cheeseboy
Posts: 789
Joined: 08 Apr 2007, 12:16

Post by Cheeseboy » 12 May 2007, 07:09

also, as far as I'm aware, when Windows uses unicode, it is always UTF-16 Little Endian, not UTF-8. That is fixed two byte encoding....

juicer
Posts: 23
Joined: 23 Jan 2007, 10:16

Post by juicer » 24 May 2007, 07:13

Per, thank you for compiling my posts from the International chars in filenames when using samba thread...

Excito, wouldn't it be nice to have a couple of sticky threads or a FAQ regarding this and other issues that pop up every now and then?

/juicer

Per
Posts: 12
Joined: 03 Apr 2007, 14:03

Post by Per » 24 May 2007, 14:44

That would be a good idea. I usually end up collecting bits and pieaces and save it in a file for myself (and forgets where I got it from, sorry jucier ... )

A note on the locale info. If you set LANG enviorment variable as described above, all locale issues will be handled. If you only want file names etc. to show up properly, and not have man pages etc in the locale, you can use:

Code: Select all

export LC_CTYPE=sv_SE
More info on this here http://www.debian.org/doc/manuals/intro ... le.en.html

Also note that there may be software that doesn't check the locale. The only really safe way is to stick to file names with a-z, A-Z, 0-9, - or _ only.

/Per

tor
Posts: 703
Joined: 06 Dec 2006, 12:24
Contact:

Post by tor » 25 May 2007, 02:59

Hi,

Character encoding can be a real problem, we are painfully aware of that, and i think you have done a great job in these threads on that.

We have recently added a new forum with howtos, so a summary might belong there? Anyone is welcome to contribute there ;)

/Tor
Co-founder OpenProducts and Ex Excito Developer

Locked