问题描述
我有一个字符串:
$string = 'Five People';
我想将所有 number 个单词替换为数字.结果是:
I want to replace all number-words into numbers. So results are:
$string = '5 People';
我具有此功能,可以将单个单词转换为int:
I have this function to convert single words to int:
function words_to_number($data) {
$data = strtr(
$data,
array(
'zero' => '0',
'a' => '1',
'one' => '1',
'two' => '2',
'three' => '3',
'four' => '4',
'five' => '5',
'six' => '6',
'seven' => '7',
'eight' => '8',
'nine' => '9',
'ten' => '10',
'eleven' => '11',
'twelve' => '12',
'thirteen' => '13',
'fourteen' => '14',
'fifteen' => '15',
'sixteen' => '16',
'seventeen' => '17',
'eighteen' => '18',
'nineteen' => '19',
'twenty' => '20',
'thirty' => '30',
'forty' => '40',
'fourty' => '40', // common misspelling
'fifty' => '50',
'sixty' => '60',
'seventy' => '70',
'eighty' => '80',
'ninety' => '90',
'hundred' => '100',
'thousand' => '1000',
'million' => '1000000',
'billion' => '1000000000',
'and' => '',
)
);
// Coerce all tokens to numbers
$parts = array_map(
function ($val) {
return floatval($val);
},
preg_split('/[\s-]+/', $data)
);
$stack = new SplStack; // Current work stack
$sum = 0; // Running total
$last = null;
foreach ($parts as $part) {
if (!$stack->isEmpty()) {
// We're part way through a phrase
if ($stack->top() > $part) {
// Decreasing step, e.g. from hundreds to ones
if ($last >= 1000) {
// If we drop from more than 1000 then we've finished the phrase
$sum += $stack->pop();
// This is the first element of a new phrase
$stack->push($part);
} else {
// Drop down from less than 1000, just addition
// e.g. "seventy one" -> "70 1" -> "70 + 1"
$stack->push($stack->pop() + $part);
}
} else {
// Increasing step, e.g ones to hundreds
$stack->push($stack->pop() * $part);
}
} else {
// This is the first element of a new phrase
$stack->push($part);
}
// Store the last processed part
$last = $part;
}
return $sum + $stack->pop();
}
// test
$words = 'five';
echo words_to_number($words);
效果很好(尝试 ideone ).我需要找到一种方法来确定字符串中的哪些单词是单词数字,然后替换所有这些匹配的单词并将其转换为数字.
Works great (try it ideone). I need to find a way to determine which words within a string is a word-number and then do a replace of all these matching words and convert them into numbers.
这怎么办?也许是正则表达式方法?
How can this be done? Maybe a regex approach?
推荐答案
我尝试移植 text2num
Python库到PHP,将其与用于匹配英语拼写数字的正则表达式,增强它达到了十亿分之一,结果如下:
I have tried to port a text2num
Python library to PHP, mix it with a regex for matching English spelled out numbers, enhanced it to the decillion, and here is a result:
function text2num($s) {
// Enhanced the regex at http://www.rexegg.com/regex-trick-numbers-in-english.html#english-number-regex
$reg = <<<REGEX
(?x) # free-spacing mode
(?(DEFINE)
# Within this DEFINE block, we'll define many subroutines
# They build on each other like lego until we can define
# a "big number"
(?<one_to_9>
# The basic regex:
# one|two|three|four|five|six|seven|eight|nine
# We'll use an optimized version:
# Option 1: four|eight|(?:fiv|(?:ni|o)n)e|t(?:wo|hree)|
# s(?:ix|even)
# Option 2:
(?:f(?:ive|our)|s(?:even|ix)|t(?:hree|wo)|(?:ni|o)ne|eight)
) # end one_to_9 definition
(?<ten_to_19>
# The basic regex:
# ten|eleven|twelve|thirteen|fourteen|fifteen|sixteen|seventeen|
# eighteen|nineteen
# We'll use an optimized version:
# Option 1: twelve|(?:(?:elev|t)e|(?:fif|eigh|nine|(?:thi|fou)r|
# s(?:ix|even))tee)n
# Option 2:
(?:(?:(?:s(?:even|ix)|f(?:our|if)|nine)te|e(?:ighte|lev))en|
t(?:(?:hirte)?en|welve))
) # end ten_to_19 definition
(?<two_digit_prefix>
# The basic regex:
# twenty|thirty|forty|fifty|sixty|seventy|eighty|ninety
# We'll use an optimized version:
# Option 1: (?:fif|six|eigh|nine|(?:tw|sev)en|(?:thi|fo)r)ty
# Option 2:
(?:s(?:even|ix)|t(?:hir|wen)|f(?:if|or)|eigh|nine)ty
) # end two_digit_prefix definition
(?<one_to_99>
(?&two_digit_prefix)(?:[- ](?&one_to_9))?|(?&ten_to_19)|
(?&one_to_9)
) # end one_to_99 definition
(?<one_to_999>
(?&one_to_9)[ ]hundred(?:[ ](?:and[ ])?(?&one_to_99))?|
(?&one_to_99)
) # end one_to_999 definition
(?<one_to_999_999>
(?&one_to_999)[ ]thousand(?:[ ](?&one_to_999))?|
(?&one_to_999)
) # end one_to_999_999 definition
(?<one_to_999_999_999>
(?&one_to_999)[ ]million(?:[ ](?&one_to_999_999))?|
(?&one_to_999_999)
) # end one_to_999_999_999 definition
(?<one_to_999_999_999_999>
(?&one_to_999)[ ]billion(?:[ ](?&one_to_999_999_999))?|
(?&one_to_999_999_999)
) # end one_to_999_999_999_999 definition
(?<one_to_999_999_999_999_999>
(?&one_to_999)[ ]trillion(?:[ ](?&one_to_999_999_999_999))?|
(?&one_to_999_999_999_999)
) # end one_to_999_999_999_999_999 definition
# ==== MORE ====
(?<one_to_quadrillion>
(?&one_to_999)[ ]quadrillion(?:[ ](?&one_to_999_999_999_999_999))?|
(?&one_to_999_999_999_999_999)
) # end one_to_quadrillion definition
(?<one_to_quintillion>
(?&one_to_999)[ ]quintillion(?:[ ](?&one_to_quadrillion))?|
(?&one_to_quadrillion)
) # end one_to_quintillion definition
(?<one_to_sextillion>
(?&one_to_999)[ ]sextillion(?:[ ](?&one_to_quintillion))?|
(?&one_to_quintillion)
) # end one_to_sextillion definition
(?<one_to_septillion>
(?&one_to_999)[ ]septillion(?:[ ](?&one_to_sextillion))?|
(?&one_to_sextillion)
) # end one_to_septillion definition
(?<one_to_octillion>
(?&one_to_999)[ ]octillion(?:[ ](?&one_to_septillion))?|
(?&one_to_septillion)
) # end one_to_octillion definition
(?<one_to_nonillion>
(?&one_to_999)[ ]nonillion(?:[ ](?&one_to_octillion))?|
(?&one_to_octillion)
) # end one_to_nonillion definition
(?<one_to_decillion>
(?&one_to_999)[ ]decillion(?:[ ](?&one_to_nonillion))?|
(?&one_to_nonillion)
) # end one_to_decillion definition
(?<bignumber>
zero|(?&one_to_decillion)
) # end bignumber definition
(?<zero_to_9>
(?&one_to_9)|zero
) # end zero to 9 definition
# (?<decimals>
# point(?:[ ](?&zero_to_9))+
# ) # end decimals definition
) # End DEFINE
####### The Regex Matching Starts Here ########
\b(?:(?&ten_to_19)\s+hundred|(?&bignumber))\b
REGEX;
return preg_replace_callback('~' . trim($reg) . '~i', function ($x) {
return text2num_internal($x[0]);
}, $s);
}
function text2num_internal($s) {
// Port of https://github.com/ghewgill/text2num/blob/master/text2num.py
$Small = [
'zero'=> 0,
'one'=> 1,
'two'=> 2,
'three'=> 3,
'four'=> 4,
'five'=> 5,
'six'=> 6,
'seven'=> 7,
'eight'=> 8,
'nine'=> 9,
'ten'=> 10,
'eleven'=> 11,
'twelve'=> 12,
'thirteen'=> 13,
'fourteen'=> 14,
'fifteen'=> 15,
'sixteen'=> 16,
'seventeen'=> 17,
'eighteen'=> 18,
'nineteen'=> 19,
'twenty'=> 20,
'thirty'=> 30,
'forty'=> 40,
'fifty'=> 50,
'sixty'=> 60,
'seventy'=> 70,
'eighty'=> 80,
'ninety'=> 90
];
$Magnitude = [
'thousand'=> 1000,
'million'=> 1000000,
'billion'=> 1000000000,
'trillion'=> 1000000000000,
'quadrillion'=> 1000000000000000,
'quintillion'=> 1000000000000000000,
'sextillion'=> 1000000000000000000000,
'septillion'=> 1000000000000000000000000,
'octillion'=> 1000000000000000000000000000,
'nonillion'=> 1000000000000000000000000000000,
'decillion'=> 1000000000000000000000000000000000,
];
$a = preg_split("~[\s-]+(?:and[\s-]+)?~u", $s);
$a = array_map('strtolower', $a);
$n = 0;
$g = 0;
foreach ($a as $w) {
if (isset($Small[$w])) {
$g = $g + $Small[$w];
}
else if ($w == "hundred" && $g != 0) {
$g = $g * 100;
}
else {
$x = $Magnitude[$w];
if (strlen($x) > 0) {
$n =$n + $g * $x;
$g = 0;
}
else{
throw new Exception("Unknown number: " . $w);
}
}
}
return $n + $g;
}
echo text2num("one") . "\n"; // 1
echo text2num("twelve") . "\n"; // 12
echo text2num("seventy two") . "\n"; // 72
echo text2num("three hundred") . "\n"; // 300
echo text2num("twelve hundred") . "\n"; // 1200
echo text2num("twelve thousand three hundred four") . "\n"; // 12304
echo text2num("six million") . "\n"; // 6000000
echo text2num("six million four hundred thousand five") . "\n"; // 6400005
echo text2num("one hundred twenty three billion four hundred fifty six million seven hundred eighty nine thousand twelve") . "\n"; # // 123456789012
echo text2num("four decillion") . "\n"; // 4000000000000000000000000000000000
echo text2num("five hundred and thirty-seven") . "\n"; // 537
echo text2num("five hundred and thirty seven") . "\n"; // 537
请参见 PHP演示.
regex实际上可以匹配大数字或"1100"之类的数字,请参见\b(?:(?&ten_to_19)\s+hundred|(?&bignumber))\b
.可以进一步增强.例如.可以用其他边界类型(例如(?<!\S)
和(?!\S)
来在空白之间进行匹配等)替换单词边界.
The regex can actually match either just big numbers or numbers like "eleven hundred", see \b(?:(?&ten_to_19)\s+hundred|(?&bignumber))\b
. It can be further enhanced. E.g. word boundaries may be replaced with other boundary types (like (?<!\S)
and (?!\S)
to match in between whitespaces, etc.).
正则表达式中的小数部分被注释掉,因为即使我们匹配它,num2text
也将无法处理它们.
Decimal part in the regex is commented out since even if we match it, the num2text
won't handle them.
这篇关于将字符串中的数字单词转换成数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!