问题描述
我们需要从一本书的标题生成一个唯一的URL-标题可以包含任何字符.我们如何搜索替换所有无效"字符,以便生成有效且简洁的网址?
We need to generate a unique URL from the title of a book - where the title can contain any character. How can we search-replace all the 'invalid' characters so that a valid and neat lookoing URL is generated?
例如:
"The Great Book of PHP"
www.mysite.com/book/12345/the-great-book-of-php
"The Greatest !@#$ Book of PHP"
www.mysite.com/book/12345/the-greatest-book-of-php
"Funny title "
www.mysite.com/book/12345/funny-title
推荐答案
啊,子弹化
// This function expects the input to be UTF-8 encoded.
function slugify($text)
{
// Swap out Non "Letters" with a -
$text = preg_replace('/[^\\pL\d]+/u', '-', $text);
// Trim out extra -'s
$text = trim($text, '-');
// Convert letters that we have left to the closest ASCII representation
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
// Make text lowercase
$text = strtolower($text);
// Strip out anything we haven't been able to convert
$text = preg_replace('/[^-\w]+/', '', $text);
return $text;
}
这很好用,因为它首先使用每个字符的unicode属性来确定它是字母(还是\ d相对于数字)-然后将非字符转换为-,然后音译为ascii,再进行其他替换,然后自行清理. (Fabrik的测试返回"arvizturo-tukorfurogep")
This works fairly well, as it first uses the unicode properties of each character to determine if it's a letter (or \d against a number) - then it converts those that aren't to -'s - then it transliterates to ascii, does another replacement for anything else, and then cleans up after itself. (Fabrik's test returns "arvizturo-tukorfurogep")
我也倾向于添加停用词列表-以便将其从子句中删除. "the","of","or","a"等(但不要长篇大论,否则您将剥离"php"之类的东西)
I also tend to add in a list of stop words - so that those are removed from the slug. "the" "of" "or" "a", etc (but don't do it on length, or you strip out stuff like "php")
这篇关于PHP代码生成安全的URL?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!