我有一个PostgreSQL 9.1数据库表 en_US.UTF-8:
I have a postgresql 9.1 database table, "en_US.UTF-8":
CREATE TABLE branch_language
id serial NOT NULL,
name_language character varying(128) NOT NULL,
branch_id integer NOT NULL,
language_id integer NOT NULL,
The attribute name_language contains names in various languages. The language is specified by the foreign key language_id.
/* us english */
CREATE INDEX idx_branch_language_2
ON branch_language
USING btree
(name_language COLLATE pg_catalog."en_US" );
/* catalan */
CREATE INDEX idx_branch_language_5
ON branch_language
USING btree
(name_language COLLATE pg_catalog."ca_ES" );
/* portuguese */
CREATE INDEX idx_branch_language_6
ON branch_language
USING btree
(name_language COLLATE pg_catalog."pt_PT" );
Now when I do a select I am not getting the results I am expecting.
select name_language from branch_language
where language_id=42 -- id of catalan language
order by name_language collate "ca_ES" -- use ca_ES collation
This generates a list of names but not in the order I expected:
Aficions i Joguines
Agència de viatges
Aliments i Subministraments
Aparells elèctrics i il luminació
Art i Antiguitats
Articles de la llar
Bars i Restaurants
Àudio, Vídeo, CD i DVD
As I expected the last two entries to appear in different positions in the list.
Creating the indexes works. I don't think they are really necessary unless you want to optimize for performance.
但是select语句似乎忽略了这一部分:整理 ca_ES。
The select statement however seems to ignore the part: collate "ca_ES".
当我选择其他排序规则时,也会存在此问题。我尝试过 es_ES和 pt_PT,但结果相似。
This problem also exists when I select other collations. I have tried "es_ES" and "pt_PT" but the results are similar.
I can't find a flaw in your design. I have tried.
我重新考虑了这个问题。考虑此。它似乎工作正常。我什至在本地测试服务器(Debian Squeeze上的PostgreSQL 9.1.6)上创建了语言环境 ca_ES.utf8
I revisited this question. Consider this test case on sqlfiddle. It seems to work just fine. I even created the locale ca_ES.utf8
in my local test server (PostgreSQL 9.1.6 on Debian Squeeze) and added the locale to my DB cluster:
I get the same results as can be seen in the sqlfiddle above.
请注意,归类名称是标识符,需要使用双引号将其保留为CamelCase拼写,例如 ca_ES
Note that collation names are identifiers and need to be double-quoted to preserve CamelCase spelling like "ca_ES"
. Maybe there has been some confusion with other locales in your system? Check your available collations:
SELECT * FROM pg_collation;
通常,排序规则是从系统区域设置派生的。在此处阅读。如果仍然得到不正确的结果,我将尝试更新您的系统并重新生成 ca_ES
Generally, collation rules are derived from system locales. Read about the details in the manual here. If you still get incorrect results, I would try to update your system and regenerate the locale for "ca_ES"
. In Debian (and related Linux distributions) this can be done with:
dpkg-reconfigure locales
实际上是’̀’|| 音频
Could it be that your 'Àudio'
is in fact '̀ ' || 'Audio'
? That would be this character:
SELECT U&'\0300A';
SELECT ascii(U&'\0300A');
SELECT chr(768);
您必须 SET standard_conforming_strings = TRUE
Read more about the acute accent in wikipedia.
You have to SET standard_conforming_strings = TRUE
to use Unicode strings like in the first line.
Note that some browsers cannot display unnormalized Unicode characters correctly and many fonts have no proper glyph for the special characters, so you may see nothing here or gibberish. But UNICODE allows for that nonsense. Test to see what you got:
SELECT octet_length('̀A') -- returns 3 (!)
SELECT octet_length('À') -- returns 2
如果这是您的数据库收缩的数据,则需要摆脱它或承受后果。解决方法是将您的字符串标准化为。 Perl具有卓越的UNICODE-foo技能,您可以在plperlu函数中利用它们的库在PostgreSQL中进行操作。我这样做是为了使我免于疯狂。
If that's what your database has contracted, you need to get rid of it or suffer the consequences. The cure is to normalize your strings to NFC. Perl has superior UNICODE-foo skills, you can make use of their libraries in a plperlu function to do it in PostgreSQL. I have done that to save me from madness.
