Problems with encoding UTF-8 / ANSI

Post your questions and problem reports here

Moderator: kfury77

Forum rules
Please try to follow these guidelines. This will help to receive faster and more accurate response.
  • Check the Support section of the corresponding product first. Chances are you will find your answer there;
  • Do not create new topics for already reported problems. Add your comments to the existing topics instead;
  • Create separate topic for each problem request. Do NOT post a number of non-related problem reports in a single topic;
  • Give your topic a meaningful title. Titles such as "A question," "Bug report" and "Help!" provide others no clue what your message is about;
  • Include the version number of the software you are using;
  • This is not an official customer support helpdesk. If you need a prompt and official response, please contact our support team directly instead. It may take a while until you receive a reply in the forum;
Post Reply
Flashbaer
Posts: 5
Joined: Thu Apr 24, 2008 9:02 am

Problems with encoding UTF-8 / ANSI

Post by Flashbaer »

Hi there,

I think I have a problem with saving files in the encoding "UTF-8 without BOM":
There are some files in my project which are identified as "ANSI" everytime I do open one. Other files are shown as "UTF-8 *". The Problem that I have is, even if I save a ANSI-file as "UTF-8 without Bom" it will be identified as "ANSI" next time when I open it.
If I choose "UTF-8" or "UTF16" the files are identified correct, the problem is only with the UTF-8 without BOM...
And I guess that the files which are saved correct as "UTF-8 *" if I choose this encoding, but if I open them again and they're identified as "ANSI" they will be saved as "ANSI".
Can anybody help me to get the hang of this behaviour?

I'm using WeBuilder v. 9.2.0.100 with Windows XP Home.

Greetz
Flashbaer
Cary
Posts: 82
Joined: Mon May 28, 2007 10:41 pm

Re: Problems with encoding UTF-8 / ANSI

Post by Cary »

The BOM is the byte-order-mark which appears at the beginning of the file to tell the applications opening it that it is utf-8 or utf-16. Without the BOM, an application can't know that it's opening a file encoded using either of these. Some applications refer to the BOM as the utf-8 signature or unicode signature.

I believe WeBuilder, like some other applications, looks for this in the head of the html when it opens a page without a BOM:

Code: Select all

<meta http-equiv="content-type" content="text/html; charset=utf-8">
If it finds this it will use the UTF-8 without BOM encoding. If this meta tag isn't there, then WeBuilder will use ANSI, because that's what it looks like without the BOM. You also have to double-check what encoding other applications use when they open the page to make sure they are also using UTF-8.

So make sure your page has the above meta tag in its head when you use utf-8 without BOM, and with some applications you will just always have to manually switch them to the correct encoding when you open a page without the BOM.
Flashbaer
Posts: 5
Joined: Thu Apr 24, 2008 9:02 am

Re: Problems with encoding UTF-8 / ANSI

Post by Flashbaer »

Hi Cary,

thanks for this helpful reply!
It seems that you're right: All files that are opened as "UTF-8 *" have this meta-tag in their content. I tried to save one of this "mysterious" non-ANSI files with this tag as a comment in a PHP-File and it worked well, too.
But is it possible that this is the only solution for this problem? I don't want to put this meta-tag in CSS-files ore something else because the @charset "utf-8"; in CSS-files does not work as well...

And just now I discovered that there are some PHP-files that don't have anything in their content that could be related to UTF-8, but are opened correct as "UTF-8 *".

I'll become crazy because of this... :-)

Thanks again and best regards
Flashbaer
User avatar
Karlis
Site Admin
Posts: 3605
Joined: Mon Jul 15, 2002 5:24 pm
Location: Riga, Latvia, Europe
Contact:

Re: Problems with encoding UTF-8 / ANSI

Post by Karlis »

See UTF-8 IS actually an ANSI format, but a very special ANSI - in this type of ANSI files all characters appear as simple ANSI characters but the special chars consist of two rare ANSI chars together. So the program, if there is UTF-8 BOM or UTF-8 meta tag OR EVEN at least ONE special character (also if no BOM or meta tags present), detects this and displays the file correctly. However a file that has NO BOM and NO UTF-8 tag and NO special character actually IS a simple ANSI file - there is no way of telling that you wanted it to be treated as UTF-8. Computer has no way of detecting that it is UTF-8. So another workaround if you do not want to use BOM, is to add some Unicode text.
Karlis Blumentals
Blumentals Software
www.blumentals.net
Flashbaer
Posts: 5
Joined: Thu Apr 24, 2008 9:02 am

Re: Problems with encoding UTF-8 / ANSI

Post by Flashbaer »

Hi Karlis,
thanks a lot, learning is a never ending story... :-) I thougt UTF-8 characters are different from ANSI...
So I'll try to get Unicode characters in the files and hope it will work!
User avatar
Karlis
Site Admin
Posts: 3605
Joined: Mon Jul 15, 2002 5:24 pm
Location: Riga, Latvia, Europe
Contact:

Re: Problems with encoding UTF-8 / ANSI

Post by Karlis »

You're welcome.
Karlis Blumentals
Blumentals Software
www.blumentals.net
Post Reply