Vietnamese diacritics not displayed correctly in HTML code


(Chris Walker) #1

Hi Forum,

Question: what do I need to change in the code so that Vietnamese characters show up correctly in HTML code? For example, add some code to the Header or CSS section or Console.

Web page in question: www.web-ranking.vn. View source and look for jumbled characters like: ào các. You will see it first in the meta description tag for example. The jumbled characters is what the browser creates instead of the Vietnamese diacritics.

Background

I am building a website with content written in Vietnamese language. The Vietnamese characters render in all browsers perfectly. When I view source, the Vietnamese characters show up as jumbled characters like Xin chào các.

I use Text Stacks for text.

Is the problem caused at a theme level or is the problem caused at a Stack level?

How to troubleshoot?

My fear is that when Googlebot crawls my page for SEO it will see the jumbled characters and misinterpret my content therefore jeopardising my search engine ranking.

Thanks for your help in advance,

Chris


(Jannis from inStacks Software) #2

Chris
You need a web font for your web site, which contains the Vietnamese sub charset in order to display it correctly.
Jannis


(Chris Walker) #3

Jannis,

Thank you. You gave me a moment of hope. I changed the font to Arial and the problem is still there.

I am using Sytten_2 Theme by SeyDesign.

He’s got this link ref in the header related to fonts.

Could it be causing the problem?

Here is what I am seeing that I want to fix. Note the red font with jumbled characters.

+++++++++++++++++++++++++++
Do you think adding this in the header will make a difference?

I got it from news.zing.vn. They have somehow overcome the problem I am having.

Thanks for your help


(Chris Walker) #4

The link did not display in my post. I was referring to the fonts.googleapis link you can see in the screenshot.


(Jannis from inStacks Software) #5

Just adding a link to a font will not apply the font to your website.
Using the styled text stack might not be the solution. Maybe you try with a HTML stack and check http://unicode-table.com

If you publish a site with your text will help.


(Mathew Mitchell) #6

@tophu4u2 I have published a website before, partially in Vietnamese, and did not have problems. The website no longer exists (it was a short term 1-2 year project) but I definitely did not use the Text stack. At the time I used the HTML stack. In reality all the content was written in Markdown, then converted to HTML. Now that there is a direct Markdown stack that would probably also work.

I believe I was using a very common font: Helvetica, or Helvetica-Neue, or Helvetica-Neue-Light.

… okay I just tested with a couple of line of Vietnamese and it all seemed to work fine with a Markdown stack as well. I looked at both the published page but also the source on that published page.

I highly doubt it’s a theme issue.

And there’s a general expression, “Friends don’t let friends use the text stack.” :slight_smile: The text stack is meant to be a helpful starter stack/page for newcomers, but my guess is there are several potential downsides to using it. At the very least it’s worth considering using the Markdown stack.


(Christi Carew) #7

Hi,

I have your email in our support box. I just need to check with our developer, as I am not sure. I will get back to you via email as soon as I can. Thanks for your patience.

Christi Carew
YourHead Software
http://www.yourhead.com


(Christi Carew) #8

Hi,

I’m posting here as well as via email. It is not a Stacks thing. It is probably something to do with the encoding. If you look at the General Settings, there are three encoding options, none of which are sufficient for Vietnamese characters, which probably need 16-bit.

So at this point, it’s a RW question. I’m not sure if they’ll see this. You might want to email them directly.

Christi Carew
YourHead Software
http://www.yourhead.com


(Isaiah Carew) #9

OK, I’ve done a little reading on this topic. I’m afraid I probably sent @girlcarew off in the wrong direction. Vietnamese is an exception in the far-east languages in that it works fine in UTF-8. Considering the number of times I’ve blogged about my passion for Phö and cà phê đá, I can’t help but be completely embarrassed about getting this wrong.

So… let’s fix this.

Considering UTF-8 is the default for RapidWeaver and your screenshot shows that you have that working fine – it seems like the problem is something more insidious: the font or some RapidWeaver character mangling.

I did some quick tests of my own to see what I could find out. Here are two pages I made:

https://dl.dropboxusercontent.com/u/433436/viet/stacks/index.html
and
https://dl.dropboxusercontent.com/u/433436/viet/styled/index.html

The first is a Stacks page and the second is a Styled Text page. I listed all the A-related characters. They all show fine both pages in both Safari and Chrome. So :thumbsup: !!!

To do this I created the pages in RapidWeaver and Pasted in the text. Then I selected it and choose “Clear Formatting” from the Format menu. That stripped away any non-standard styles.

I’m publishing with the Alpha theme with all the default settings.

So… with the knowledge that it can work, here’s a couple things for you to try:

  1. Select the text and choose “Clear Formatting” from the format menu. If the problem has to do with non-standard styles that RapidWeaver is mangling, then perhaps that will help clear it up.

  2. Try a different theme – try Alpha like I did – not as a permanent choice, but just to try to identify the source of the problem. If choosing a default theme does fix the problem, then the font is probably to blame. If the your theme allows you to change the font – try choosing one of the normal web-fonts: Helvetica or Times. If your theme doesn’t have a font selection you’ll need to manually override its CSS or choose a different theme that does allow it.

Let us know what you find out,
Isaiah


(Chris Walker) #10

Hi Jannis,

I am using Arial with Charset UTF8.

The font displays fine in all browsers. The problem occurs when viewing the html source. The diacritics are not displaying correctly in the HTML Source code.

For example here is the meta description tag:

I have tried adding a

No change.

I tried saving the index.html file as UTF8 without BOM but no change.

I know it can be done. View this pages source code and you will see all diacritics displaying correctly in the source code.
http://news.zing.vn

Thanks for your help in advance,
Chris


(Chris Walker) #11

Hi Isaiah,

I am using Arial with Charset UTF8.

The font displays fine in all browsers. The problem occurs when viewing the html source. The diacritics are not displaying correctly in the HTML Source code.

For example here is the meta description tag:

I have tried adding a

No change.

I tried saving the index.html file as UTF8 without BOM but no change.

Do you know if all Rapidweaver index.html files are formatted with BOM or without BOM?
Here is where I hear about BOM
http://www.recycledelectrons.net.au/Random-Info/correcting-why-foreign-characters-are-not-displaying-correctly

I know it can be done. View this pages source code and you will see all diacritics displaying correctly in the source code.
http://news.zing.vn

Thanks for your help in advance,

Chris


(Chris Walker) #12

Mathew,

Thanks for your input. I think it is theme related because even when I use mark down stack or html stack instead of text stack the meta tags are wrong. Here is the meta description for example:

You can see some diacritics are showing correctly when I view html source. The à is an example of the problem.

I know it can be done. Diacritics showing correctly in “view html source code.” Here is a website that shows perfectly:

zing.vn

Thanks for your help,

Chris


(Mathew Mitchell) #13

Chris: I have no idea about the source of the problem then. It might be worthwhile starting a new project with a new theme (with different developer). Publish to some “safe” area. If nothing else you could use Will Woodgate’s blank theme. By going through this process at least you could see if the problem really was theme related.

Blank theme here: https://themeflood.com

(Will also has some other free themes you could use for testing purposes.)


(Chris Walker) #14

I reached out to Jukka at http://www.cs.tut.fi/~jkorpela/personal.html

He explained the following:
The “View Source” function of a web browser shows what the HTML document contains, and in this case it actually contains the 7 characters"à".

Indeed when I view the index.html file with my html editor the jumbled characters as I was complaining about show up.

Is it then logical to point the finger at the theme? Or, maybe it is Rapidweaver.

Jukka goes on to suggest:
It is possible that RapidWeaver has an option like “Do not escape Latin 1 characters” (possibly with a stranger name) that would prevent it from turning an input of “à” to the reference “à”.

Anybody have any ideas how to troubleshoot a theme or Rapidweaver to solve this problem?

The problem is: Why does RapidWeaver or (the Theme) use “à” for “à” whereas other accented characters are shown correctly.


(Isaiah Carew) #15

When I wrote before I recommended two specific things:

  1. Clear formatting
  2. Try another theme as an experiment.

With the further clue that the diacritics are mangled in the View Source (in the HTML itself) we can eliminate the theme from our search. The theme does not have the ability to change the HTML.

But I stand firm on my first recommendation: Clear Formatting.

While there are some other long-shot things to consider (server configs, doctype, etc) the most likely source of error is that you have copied and pasted this text from another app and the way that app has encoded the diacritics is a way that doesn’t translate well to the web.

Clear Formatting removes all extra styles/encoding/etc. and brings thins back to their defaults. And as I showed with my published examples the RapidWeaver (and Stacks) defaults display the vietnamese diacritic character set just fine.

Select the text, Choose Clear Formatting from the Format menu (Cmd-Opt-Period), and see if that improves things. If it does we’re in business, if it doesn’t we can move further down the list to the more esoteric things like DOCTYPE and server config.

Whatever the case, it would be good to report back specifically:

  1. if the clear formatting changed anything at all (as in, does the HTML now display correctly in View Source)
  2. if you typed in the characters within RapidWeaver or if you pasted the text in from elsewhere (with that info I can try to duplicate the experiment here and perhaps learn some more details in my debugger)

Isaiah