本文介绍了排序会忽略撇号-有时(除非是唯一使用的列,否则除外);为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Linux和cygwin上我都遇到这种情况,因此我怀疑这不是错误.不过,我还是不明白.谁能解释?

This happens to me both on Linux and on cygwin, so I suspect it is not a bug. Still, I don't understand it. Can anyone explain?

请考虑以下文件(制表符分隔,这是常规的撇号)(我用cat创建它,以确保不是造成问题的非打印字符)

Consider the following file (tab-delimited, and that's a regular apostrophe)(I create it with cat to ensure that it wasn't non-printing characters that were the source of the problem)

$cat > temp
cat     1389
cat'    1747
ca't    3175
cat     46848484
ca't    720

$sort temp
<gives the exact same output as cat temp>

$sort -k1,1 temp
cat     1389
cat     46848484
cat'    1747
ca't    3456
ca't    720

为什么我必须忽略第二列才能正确排序?

Why do I have to ignore the second column in order to sort correctly?

推荐答案

我打开了sort的手册,并注意到以下内容:

I pulled up the manual for sort and noticed the following:

事实证明,语言环境实际上指定了给定语言环境的字典顺序的工作方式.这很有道理,但是由于某些原因,它会跳到多字段文件中...

As it turns out, locales actually specify how lexicographic ordering works for a given locale. This makes a lot of sense, but for some reason it trips over multi field files...

(另请参见:)
linux的sort命令的异常行为
为什么sort命令以不同的方式进行排序,如果有尾随的字段吗?

(see also:)
Unusual behaviour of linux's sort command
Why does the sort command sort differently if there are trailing fields?

您可以执行以下几项操作:

您可以使用

LC_ALL="C" sort temp

这将给出更合乎逻辑的结果,但它可能并不是您真正想要的结果.

This will give a more logical result, but it might not be the one you actually want.

您可以尝试通过将语言环境设置为C并告诉它您想要字典排序来对排序进行更基本的词典排序:

You could try to get sort to do a more basic lexicographical ordering by setting the locale to C and telling it you want dictionary ordering:

LC_ALL="C" sort -d temp

要使排序输出您的语言环境信息并突出显示排序键,您可以使用

To have sort output your locale information and hilight the sort key, you can use

sort --debug temp




就我个人而言,我真的很想知道要指定哪种规则,以使排序在多个字段中表现得不直观.




Personally I'm really curious to know what rule is being specified that makes sort behave unintuitively across multiple fields.

他们应该以给定的语言和方言指定正确的字典顺序.语言环境的功能是否根本根本不处理多字段情况,还是对行的含义"采取了某种不同的解释?

They're supposed to specify correct lexicographic order in the given language and dialect. Do the locales' functions simply not handle the multiple field case at all, or are they taking some kind of different interpretation on the "meaning" of the line?

这篇关于排序会忽略撇号-有时(除非是唯一使用的列,否则除外);为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 18:02