本文介绍了Preg_Replace和UTF8的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在增强视频搜索页,以突出显示搜索词)的结果.因为用户可以输入judas priest,并且视频的文本中包含Judas Priest,所以我必须使用正则表达式来保留原始文本的大小写.

I'm enhancing our video search page to highlight the search term(s) in the results. Because user can enter judas priest and a video has Judas Priest in it's text I have to use regular expressions to preserve the case of the original text.

我的代码可以工作,但是我遇到特殊字符(例如š, č and ž)的问题,看来Preg_Replace()仅在大小写相同的情况下才匹配(尽管使用/ui修饰符).我的代码:

My code works, but I have problems with special characters like š, č and ž, it seems that Preg_Replace() will only match if the case is the same (despite the /ui modifier).My code:

$Content = Preg_Replace ( '/\b(' . $term . '?)\b/iu', '<span class="HighlightTerm">$1</span>', $Content );

我也尝试过:

$Content = Mb_Eregi_Replace ( '\b(' . $term . '?)\b', '<span class="HighlightTerm">\\1</span>', $Content );

但是它也不起作用.如果搜索词为SREČA",它将匹配SREČA",但是如果搜索词为sreča",它将不匹配(反之亦然).

But it also doesn't work. It will match "SREČA" if the search term is "SREČA", but if the search term is "sreča" it will not match it (and vice versa).

那我该如何做呢?

更新:我设置了区域设置和内部编码:

update: I set the locale and internal encoding:

Mb_Internal_Encoding ( 'UTF-8' );
$loc = "UTF-8";
putenv("LANG=$loc");
$loc = setlocale(LC_ALL, $loc);

推荐答案

我现在真的很愚蠢,但是问题根本就不在Preg_ *函数上.我不知道为什么,但是我首先检查了给定术语是否在带有StriPos的字符串中,并且由于该函数不是多字节安全的,因此如果文本的大小写与搜索字词,因此甚至没有调用Preg_Replace.

I feel really stupid right about now but the problem wasn't with Preg_* functions at all. I don't know why but I first checked if the given term is even in the string with StriPos and since that function is not multi-byte safe it returned false if the case of the text was not the same as the search term, so the Preg_Replace wasn't even called.

因此,这里要学习的教训是,如果您具有UTF8字符串,请始终使用多字节版本的函数.

So the lesson to be learned here is that always use multi-byte versions of functions if you have UTF8 strings.

这篇关于Preg_Replace和UTF8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

07-20 23:10