删除utf-8字符串中的控制字符

本文介绍了删除utf-8字符串中的控制字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我在客户端（输入后）删除控制字符（tab，cr，lf，\v和所有其他不可见的字符），但是由于客户端不能被信任，所以我必须在服务器中删除它们

So I am removing control characters (tab, cr, lf, \v and all other invisible chars) in the client side (after input) but since the client cannot be trusted, I have to remove them in the server too.

所以根据这个链接

控制字符从x00到1F，从7F到9F。
因此我的客户端（javascript）控件的char去除功能是：

the control characters are from x00 to 1F and from 7F to 9F.thus my client (javascript) control char removal function is:

return s.replace(/[\x00-\x1F\x7F-\x9F]/g, "");

我的php（服务器）控件的字符删除功能是：

and my php (server) control char removal function is:

$s = preg_replace('/[\x00-\x1F\x7F-\x9F]/', '', $s);

现在，这似乎在PHP中创建了国际utf8字符（如ζ（xCF x82））的问题（因为x82是在第二个序列组内），javascript等价物不会产生任何问题。

Now this seems to create problems with international utf8 chars such as ς (xCF x82) in PHP only (because x82 is inside the second sequence group), the javascript equivalent does not create any problems.

现在我的问题是，我应该从7F到9F中删除控制字符？对于我的理解，从127到159（7F到9F）的序列显然可以是有效的UTF-8字符串的一部分？

Now my question is, should I remove the control characters from 7F to 9F? To my understanding those the sequences from 127 to 159 (7F to 9F) obviously can be part of a valid UTF-8 string?

也可能我不应该过滤00到31控制字符，因为这些字符中的一些可能会出现在一些奇怪的（japanese？chinese？）但是有效的utf-8字符？

also, maybe I shouldn't even filter the 00 to 31 control characters because also some of those characters can appear in some weird (japanese? chinese?) but valid utf-8 characters ?

删除utf

删除utf-8字符串中的控制字符

问题描述

推荐答案