问题描述
鉴于某些多字节字符集,我是否正确地假设以下内容未达到预期的目的?
Given certain multibyte character sets, am I correct in assuming that the following doesn't do what it was intended to do?
$string = str_replace('"', '\\"', $string);
特别是,如果输入所使用的字符集中可能具有诸如0xbf5c之类的有效字符,则攻击者可以注入0xbf22以获得0xbf5c22,从而留下一个有效字符,后跟无引号的双引号().
In particular, if the input was in a character set that might have a valid character like 0xbf5c, so an attacker can inject 0xbf22 to get 0xbf5c22, leaving a valid character followed by an unquoted double quote (").
是否有减轻此问题的简便方法,还是我一开始就误解了该问题?
Is there an easy way to mitigate this problem, or am I misunderstanding the issue in the first place?
(在我的情况下,字符串进入HTML输入标签的value属性中:echo'input type ="text" value ='.$ string.'">';)
(In my case, the string is going into the value attribute of an HTML input tag: echo 'input type="text" value="' . $string . '">';)
就此而言,像preg_quote()这样的函数呢?它没有charset参数,因此在这种情况下似乎完全没有用.当您没有选择将字符集限制为UTF-8的选项时(是的,那很好),看来您真的很残障.在这种情况下可以使用哪些替换和报价功能?
For that matter, what about a function like preg_quote()? There's no charset argument for it, so it seems totally useless in this scenario. When you DON'T have the option of limiting charset to UTF-8 (yes, that'd be nice), it seems like you are really handicapped. What replace and quoting functions are available in that case?
推荐答案
不,您是正确的:在多字节字符串上使用单字节字符串功能可能会导致意外结果.请使用多字节字符串函数,例如, mb_ereg_replace
或 mb_split
:
No, you’re right: Using a singlebyte string function on a multibyte string can cause an unexpected result. Use the multibyte string functions instead, for example mb_ereg_replace
or mb_split
:
$string = mb_ereg_replace('"', '\\"', $string);
$string = implode('\\"', mb_split('"', $string));
编辑.这是一个使用拆分联接变体的mb_replace
实现:
Edit Here’s a mb_replace
implementation using the split-join variant:
function mb_replace($search, $replace, $subject, &$count=0) {
if (!is_array($search) && is_array($replace)) {
return false;
}
if (is_array($subject)) {
// call mb_replace for each single string in $subject
foreach ($subject as &$string) {
$string = &mb_replace($search, $replace, $string, $c);
$count += $c;
}
} elseif (is_array($search)) {
if (!is_array($replace)) {
foreach ($search as &$string) {
$subject = mb_replace($string, $replace, $subject, $c);
$count += $c;
}
} else {
$n = max(count($search), count($replace));
while ($n--) {
$subject = mb_replace(current($search), current($replace), $subject, $c);
$count += $c;
next($search);
next($replace);
}
}
} else {
$parts = mb_split(preg_quote($search), $subject);
$count = count($parts)-1;
$subject = implode($replace, $parts);
}
return $subject;
}
关于参数的组合,此函数的行为应类似于单字节str_replace
.
As regards the combination of parameters, this function should behave like the singlebyte str_replace
.
这篇关于多字节字符串上的str_replace()危险吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!