string.casefold和string.lower 区别

python 3.3 引入了string.casefold 方法,其效果和 string.lower 非常类似,都可以把字符串变成小写,那么它们之间有什么区别?他们各自的应用场景?

对 Unicode 的时候用 casefold

string.casefold官方说明:

Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter 'ß' is equivalent to "ss". Since it is already lowercase, lower() would do nothing to 'ß'casefold()converts it to "ss".

The casefolding algorithm is described in section 3.13 of the Unicode Standard

lower() 只对 ASCII 也就是 'A-Z'有效,但是其它一些语言里面存在小写的情况就没办法了。文档里面举得例子是德语中'ß'的小写是'ss'

s = 'ß'
s.lower() # 'ß'
s.casefold() # 'ss'

string.lower官方说明:

Return a copy of the string with all the cased characters [4] converted to lowercase.

The lowercasing algorithm used is described in section 3.13 of the Unicode Standard

参考

https://docs.python.org/3/library/stdtypes.html#str.casefold

https://segmentfault.com/q/1010000004586740/a-1020000004586838

总结

汉语 & 英语环境下面,继续用 lower()没问题;要处理其它语言且存在大小写情况的时候再用casefold()

04-14 02:40