


Situationwith Thai text on a client site is that we can't control where exactly particular words/sentences are going to break between the lines (how web browser will handle it). Often, content appearance is indicated as incorrect by local reviewers.


Researchso far has lead to the workaround mentioned, looking for a better way how to handle this. Even W3C doesn't have a solution yet and is just discussing whether it should be part of CSS3 specification.


Thai language utilizes spaces very rarely, mostly to distinguish between sentences etc. Therefore, common appearance of a Thai sentence is one looong string.Where to break such a string when more lines of text are put together is determined by particular words identification. For words identification local dictionaries are used which are most probably part of operating system or web browser, I'm not entirely sure about these.


Apparently, the more web browsers / operating systems you check on the more results you get! Moreover, there's not much you can do about this as it's system driven and there are no "where to break Thai" settings available.

使用,或表示断点真正的位置不会阻止网络浏览器的思考(即使错误) / em>,有些休息也是可能的地方,你没有定义他们eg

Using <wbr/>, &#8203; or &shy; to indicate where the breakpoints really are won't prevent web browser thinking (even though wrong) that some breaks are also possible in places, where you haven't defined them e.g. in the middle of a word which might be grammatically incorrect.

如果这样的单词放置在行的末尾,取决于屏幕分辨率,复制长度,CSS规则定义),并且浏览器应用他错误的换行规则,那么你将会遇到一个泰语换行问题,无论你之前,之后还是其他地方定义了另一个断点 - 浏览器将总是使用他认为最接近EOL的断点,而不只是通过在标记中插入一个提到的字符而轻轻地建议的断点。

If such a word is placed at the end of a line (depends on screen resolution, copy length, CSS rules defined) and the browser applies his wrong line breaking rule on it then you would end up with a Thai line breaking issue, no matter that you have defined another breakpoints before, after or somewhere else in the word - browser will always use a breakpoint that he thinks is closest to EOL, not just the ones you have gently suggested by inserting one of the mentioned chars in your markup.


That's why you actually need to focus on where not to break your text (non-breaking zero-width-space), not where it's allowed. And that's what lead us back to the ugly and long markup example in the "Workaround" section above. That way a line break can strictly only occur where you have allowed it to be, but it's messy.


Any other solutionhow to handle this more effectively would be appreciated ... and who knows, it might even help W3C in their implementation?




I know this thread was quite some time but I have something to say as a native Thai. I read lots of Thai web pages everyday and I feel the quality of Thai line breaking by the modern web browsers nowadays is perfectly acceptable.

我知道,Google Chrome浏览器使用ICU4C,Internet Explorer使用Uniscribe API,Firefox使用libthai将泰语句子分解成单词。对于我知道的泰国人来说,这些网络浏览器如何处理泰语中的换行符是完全可以接受的。 (实际上我们曾经在早期版本的Firefox(1.x)中遇到这个问题,但是现在已经解决了。)

As I know, Google Chrome browser uses ICU4C, Internet Explorer uses Uniscribe API, and Firefox uses libthai to break Thai sentences into words. For Thai people I know, how these web browsers handle line breaks in Thai is perfectly acceptable for them. (actually we used to have this problem with very early version of Firefox (1.x) but that is resolved now.)

泰语换行和换字,语言,仍然被认为是一个未解决的问题,仍然由许多语言学研究人员积极解决。目前没有实现可以完美地打破一句话到泰语单词。 IBM ICU 页面包含对此问题的一些分析。

Thai line breaking and word breaking, unlike western languages, is still considered an unsolved problem and is still actively tackled by many linguistics researchers. Currently there is no implementation that could perfectly break a sentence to Thai words. IBM ICU Boundary Analysis page contains some analysis on this problem.


Many times, it has something to do with the context. For example, the phrase "ตากลม" can be correctly broken to "ตา","กลม" or "ตาก","ลม". Each way says totally different thing but Thai readers can still perfectly understand the intended meaning, given the context.

鉴于您的本地审阅者已经熟悉阅读泰国网站,我认为他们可能太疯狂,无法解决这个问题。这是所有泰国网站,网络浏览器,甚至是Microsoft Word的常见的无法解决的问题。

Given that your local reviewers are already familiar with reading Thai websites, I think maybe they are too pushy on you to resolve this problem. This is common unsolvable problem for all Thai websites, web browsers, and even Microsoft Word.


It is best to wait (or contribute to IBM ICU) until Thai sentence breaking implementation gets better. Let the web browsers handle this. I don't think trying to workaround this problem worth your valuable time. As as I know, even Thai website publishers here just don't care to get this one right.

如果你需要发布一个完美的线/ ,您可以考虑其他媒介,例如PDF文档,其中您应该更多地控制换行符。

Should you need to publish a document with a perfect line/word breaking, you may consider other medium, such as PDF document in which you should have more control over the line breaks.



