问题描述
我有一个表示数学表达式的结构良好的树.例如,给定字符串:"1+2-3*4/5"
,它被解析为:
I have a well-formed tree that represents a mathematical expression. For example, given the string: "1+2-3*4/5"
, this gets parsed into:
subtract(add(1,2),divide(multiply(3,4),5))
表示为这棵树:
我希望能够做的是尽可能减少这棵树.在上面的例子中,这很简单,因为所有的数字都是常数.然而,一旦我允许未知数(用 $
后跟一个标识符表示),事情就会变得更加棘手:
What I'd like to be able to do is take this tree and reduce it as much as possible. In the case above, this is pretty simple, because all of the numbers are constants. However, things start to get trickier once I allow for unknowns (denoted with a $
followed by an identifier):
"3*$a/$a"
变成 divide(multiply(3,$a), $a)
这应该简化为 3
,因为 $a
术语应该相互抵消.问题是,我如何以通用方式识别这一点?"我如何识别 min(3, sin($x))
总是 sin($x)
?我如何识别 sqrt(pow($a, 2))
是 abs($a)
?我如何识别 nthroot(pow(42, $a), $a)
(42 的 a 根到 apower)是42
?
This should simplify to 3
, since the $a
terms should cancel each other out. The question is, "how do I go about recognizing this in a generic manner?" How do I recognize that min(3, sin($x))
is always going to be sin($x)
? How do I recognize that sqrt(pow($a, 2))
is abs($a)
? How do I recognize that nthroot(pow(42, $a), $a)
(the a root of 42 to the a power) is 42
?
我意识到这个问题非常广泛,但我一直在反对这个问题,但没有想出任何令人满意的东西.
I realize this question is pretty broad, but I've been beating my head against this for a while and haven't come up with anything satisfactory enough.
推荐答案
你可能想要实现一个 术语重写系统.关于基础数学,请查看维基百科.
You probably want to implement a term rewriting system. Regarding the underlying math, have a look at WikiPedia.
术语重写模块的结构
由于我最近实施了一个解决方案...
Since I implemented a solution recently...
首先,准备一个 CExpression 类,它对表达式的结构进行建模.
First, prepare a class CExpression, which models the structure of your expression.
实现 CRule
,它包含一个模式和一个替换.使用特殊符号作为模式变量,需要在模式匹配时绑定并在替换表达式中替换.
Implement CRule
, which contains a pattern and a replacement. Use special symbols as pattern variables, which need to get bound during pattern matching and replaced in the replacement expression.
然后,实现一个类CRule
.applyRule(CExpression, CRule)
的主要方法尝试将规则与表达式的任何适用子表达式进行匹配.如果匹配,则返回结果.
Then, implement a class CRule
. It's main method applyRule(CExpression, CRule)
tries to match the rule against any applicable subexpression of expression. In case it matches, return the result.
最后,实现一个 CRuleSet
类,它只是一组 CRule 对象.主要方法 reduce(CExpression)
应用规则集,只要不能应用更多规则,然后返回简化的表达式.
Finaly, implement a class CRuleSet
, which is simply a set of CRule objects. The main method reduce(CExpression)
applies the set of rules as long as no more rules can be applied and then returns the reduced expression.
此外,您需要一个 CBindingEnvironment
类,它将已经匹配的符号映射到匹配的值.
Additionally, you need a class CBindingEnvironment
, which maps already matched symbols to the matched values.
尝试将表达式重写为正常形式
不要忘记,这种方法在一定程度上有效,但可能不完整.这是因为以下所有规则都执行本地术语重写.
Don't forget, that this approach works to a certain point, but is likely to be non complete. This is due to the fact, that all of the following rules perform local term rewrites.
为了使这种本地重写逻辑更强大,应该尝试将表达式转换为我称之为正常形式的东西.这是我的方法:
To make this local rewrite logic stronger, one should try to transform expressions into something I'd call a normal form. This is my approach:
如果某个术语包含文字值,请尝试将该术语尽可能向右移动.
If a term contains literal values, try to move the term as far to the right as possible.
最终,这个文字值可能出现在最右边,并且可以作为完全文字表达式的一部分进行计算.
Eventually, this literal value may appear rightmost and can be evaluated as part of a fully literal expression.
何时计算完全文字表达式
一个有趣的问题是何时评估完全文字表达式.假设你有一个表达式
An interesting question is when to evaluate fully literal expression. Suppose you have an expression
x * ( 1 / 3 )
这将减少到
x * 0.333333333333333333
现在假设 x 被 3 替换.这将产生类似
Now suppose x gets replaced by 3. This would yield something like
0.999999999999999999999999
因此急切求值返回了一个稍微不正确的值.
Thus eager evaluation returns a slightly incorrect value.
另一方面,如果你保留 ( 1/3 ) 并先用 3 替换 x
At the other side, if you keep ( 1 / 3 ) and first replace x by 3
3 * ( 1 / 3 )
重写规则会给出
1
因此,延迟评估完全文字表达式可能很有用.
Thus, it might be useful to evaluate fully literal expression late.
重写规则示例
以下是我的规则在应用程序中的显示方式:_1、_2、...符号匹配任何子表达式:
Here is how my rules appear inside the application: The _1, _2, ... symbols match any subexpression:
addRule( new TARuleFromString( '0+_1', // left hand side :: pattern
'_1' // right hand side :: replacement
)
);
或者更复杂一些
addRule( new TARuleFromString( '_1+_2*_1',
'(1+_2)*_1'
)
);
某些特殊符号只匹配特殊的子表达式.例如._Literal1, _Literal2, ... 只匹配文字值:
Certain special symbols only match special subexpressions. E.g. _Literal1, _Literal2, ... match only literal values:
addRule( new TARuleFromString( 'exp(_Literal1) * exp(_Literal2 )',
'exp( _Literal1 + _Literal2 )'
)
);
此规则将非文字表达式向左移动:
This rule moves non-literal expression to the left:
addRule( new TARuleFromString( '_Literal*_NonLiteral',
'_NonLiteral*_Literal'
)
);
任何以_"开头的名称都是模式变量.当系统匹配一条规则时,它会保留一堆已匹配符号的分配.
Any name, that begins with a '_', is a pattern variable. While the system matches a rule, it keeps a stack of assignments of already matched symbols.
最后,不要忘记规则可能会产生非终止替换序列.因此在减少表达式的同时,让过程记住,之前已经到达了哪些中间表达式.
Finally, don't forget that rules may yield non terminating replacement sequences.Thus while reducing expression, make the process remember, which intermediate expressions have already been reached before.
在我的实现中,我不直接保存中间表达式.我保留了一个中间表达式的 MD5() 哈希数组.
In my implementation, I don't save intermediate expressions directly. I keep an array of MD5() hashes of intermediate expression.
一组规则作为起点
这是一组入门规则:
addRule( new TARuleFromString( '0+_1', '_1' ) );
addRule( new TARuleFromString( '_Literal2=0-_1', '_1=0-_Literal2' ) );
addRule( new TARuleFromString( '_1+0', '_1' ) );
addRule( new TARuleFromString( '1*_1', '_1' ) );
addRule( new TARuleFromString( '_1*1', '_1' ) );
addRule( new TARuleFromString( '_1+_1', '2*_1' ) );
addRule( new TARuleFromString( '_1-_1', '0' ) );
addRule( new TARuleFromString( '_1/_1', '1' ) );
// Rate = (pow((EndValue / BeginValue), (1 / (EndYear - BeginYear)))-1) * 100
addRule( new TARuleFromString( 'exp(_Literal1) * exp(_Literal2 )', 'exp( _Literal1 + _Literal2 )' ) );
addRule( new TARuleFromString( 'exp( 0 )', '1' ) );
addRule( new TARuleFromString( 'pow(_Literal1,_1) * pow(_Literal2,_1)', 'pow(_Literal1 * _Literal2,_1)' ) );
addRule( new TARuleFromString( 'pow( _1, 0 )', '1' ) );
addRule( new TARuleFromString( 'pow( _1, 1 )', '_1' ) );
addRule( new TARuleFromString( 'pow( _1, -1 )', '1/_1' ) );
addRule( new TARuleFromString( 'pow( pow( _1, _Literal1 ), _Literal2 )', 'pow( _1, _Literal1 * _Literal2 )' ) );
// addRule( new TARuleFromString( 'pow( _Literal1, _1 )', 'ln(_1) / ln(_Literal1)' ) );
addRule( new TARuleFromString( '_literal1 = pow( _Literal2, _1 )', '_1 = ln(_literal1) / ln(_Literal2)' ) );
addRule( new TARuleFromString( 'pow( _Literal2, _1 ) = _literal1 ', '_1 = ln(_literal1) / ln(_Literal2)' ) );
addRule( new TARuleFromString( 'pow( _1, _Literal2 ) = _literal1 ', 'pow( _literal1, 1 / _Literal2 ) = _1' ) );
addRule( new TARuleFromString( 'pow( 1, _1 )', '1' ) );
addRule( new TARuleFromString( '_1 * _1 = _literal', '_1 = sqrt( _literal )' ) );
addRule( new TARuleFromString( 'sqrt( _literal * _1 )', 'sqrt( _literal ) * sqrt( _1 )' ) );
addRule( new TARuleFromString( 'ln( _Literal1 * _2 )', 'ln( _Literal1 ) + ln( _2 )' ) );
addRule( new TARuleFromString( 'ln( _1 * _Literal2 )', 'ln( _Literal2 ) + ln( _1 )' ) );
addRule( new TARuleFromString( 'log2( _Literal1 * _2 )', 'log2( _Literal1 ) + log2( _2 )' ) );
addRule( new TARuleFromString( 'log2( _1 * _Literal2 )', 'log2( _Literal2 ) + log2( _1 )' ) );
addRule( new TARuleFromString( 'log10( _Literal1 * _2 )', 'log10( _Literal1 ) + log10( _2 )' ) );
addRule( new TARuleFromString( 'log10( _1 * _Literal2 )', 'log10( _Literal2 ) + log10( _1 )' ) );
addRule( new TARuleFromString( 'ln( _Literal1 / _2 )', 'ln( _Literal1 ) - ln( _2 )' ) );
addRule( new TARuleFromString( 'ln( _1 / _Literal2 )', 'ln( _Literal2 ) - ln( _1 )' ) );
addRule( new TARuleFromString( 'log2( _Literal1 / _2 )', 'log2( _Literal1 ) - log2( _2 )' ) );
addRule( new TARuleFromString( 'log2( _1 / _Literal2 )', 'log2( _Literal2 ) - log2( _1 )' ) );
addRule( new TARuleFromString( 'log10( _Literal1 / _2 )', 'log10( _Literal1 ) - log10( _2 )' ) );
addRule( new TARuleFromString( 'log10( _1 / _Literal2 )', 'log10( _Literal2 ) - log10( _1 )' ) );
addRule( new TARuleFromString( '_Literal1 = _NonLiteral + _Literal2', '_Literal1 - _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = _NonLiteral * _Literal2', '_Literal1 / _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = _NonLiteral / _Literal2', '_Literal1 * _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 =_NonLiteral - _Literal2', '_Literal1 + _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral + _Literal2 = _Literal1 ', '_Literal1 - _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral * _Literal2 = _Literal1 ', '_Literal1 / _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral / _Literal2 = _Literal1 ', '_Literal1 * _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral - _Literal2 = _Literal1', '_Literal1 + _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_NonLiteral - _Literal2 = _Literal1 ', '_Literal1 + _Literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal2 - _NonLiteral = _Literal1 ', '_Literal2 - _Literal1 = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = sin( _NonLiteral )', 'asin( _Literal1 ) = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = cos( _NonLiteral )', 'acos( _Literal1 ) = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = tan( _NonLiteral )', 'atan( _Literal1 ) = _NonLiteral' ) );
addRule( new TARuleFromString( '_Literal1 = ln( _1 )', 'exp( _Literal1 ) = _1' ) );
addRule( new TARuleFromString( 'ln( _1 ) = _Literal1', 'exp( _Literal1 ) = _1' ) );
addRule( new TARuleFromString( '_Literal1 = _NonLiteral', '_NonLiteral = _Literal1' ) );
addRule( new TARuleFromString( '( _Literal1 / _2 ) = _Literal2', '_Literal1 / _Literal2 = _2 ' ) );
addRule( new TARuleFromString( '_Literal*_NonLiteral', '_NonLiteral*_Literal' ) );
addRule( new TARuleFromString( '_Literal+_NonLiteral', '_NonLiteral+_Literal' ) );
addRule( new TARuleFromString( '_Literal1+(_Literal2+_NonLiteral)', '_NonLiteral+(_Literal1+_Literal2)' ) );
addRule( new TARuleFromString( '_Literal1+(_Literal2+_1)', '_1+(_Literal1+_Literal2)' ) );
addRule( new TARuleFromString( '(_1*_2)+(_3*_2)', '(_1+_3)*_2' ) );
addRule( new TARuleFromString( '(_2*_1)+(_2*_3)', '(_1+_3)*_2' ) );
addRule( new TARuleFromString( '(_2*_1)+(_3*_2)', '(_1+_3)*_2' ) );
addRule( new TARuleFromString( '(_1*_2)+(_2*_3)', '(_1+_3)*_2' ) );
addRule( new TARuleFromString( '(_Literal * _1 ) / _Literal', '_1' ) );
addRule( new TARuleFromString( '(_Literal1 * _1 ) / _Literal2', '(_Literal1 * _Literal2 ) / _1' ) );
addRule( new TARuleFromString( '(_1+_2)+_3', '_1+(_2+_3)' ) );
addRule( new TARuleFromString( '(_1*_2)*_3', '_1*(_2*_3)' ) );
addRule( new TARuleFromString( '_1+(_1+_2)', '(2*_1)+_2' ) );
addRule( new TARuleFromString( '_1+_2*_1', '(1+_2)*_1' ) );
addRule( new TARuleFromString( '_literal1 * _NonLiteral = _literal2', '_literal2 / _literal1 = _NonLiteral' ) );
addRule( new TARuleFromString( '_literal1 + _NonLiteral = _literal2', '_literal2 - _literal1 = _NonLiteral' ) );
addRule( new TARuleFromString( '_literal1 - _NonLiteral = _literal2', '_literal1 - _literal2 = _NonLiteral' ) );
addRule( new TARuleFromString( '_literal1 / _NonLiteral = _literal2', '_literal1 * _literal2 = _NonLiteral' ) );
让规则成为一流的表达
有趣的一点:由于上述规则是特殊的表达式,表达式解析器会对其进行正确评估,用户甚至可以添加新规则,从而增强应用程序的重写能力.
An interesting point: Since the above rules are special expression, which get correctly evaluate by the expression parser, users can even add new rules and thus enhance the application's rewrite capabilities.
解析表达式(或更一般的:语言)
对于Cocoa/OBjC 应用,Dave DeLong 的 DDMathParser 是一个完美的选择语法分析数学表达式的候选人.
For Cocoa/OBjC applications, Dave DeLong's DDMathParser is a perfect candidate to syntactically analyse mathematical expressions.
对于其他语言,我们的老朋友Lex &Yacc 或更新的 GNU Bison 可能会有所帮助.
For other languages, our old friends Lex & Yacc or the newer GNU Bison might be of help.
年轻得多,并且拥有 大量准备使用的语法文件,ANTLR 是一个基于 Java 的现代解析器生成器.除了纯粹的命令行使用,ANTLRWorks 还提供了一个 GUI 前端 构建和调试基于 ANTLR 的解析器.ANTLR 为各种宿主语言生成语法,比如JAVA、C、Python、PHP 或 C#.ActionScript 运行时当前是 破碎.
Far younger and with an enourmous set of ready to use syntax-files, ANTLR is a modern parser generator based on Java. Besides purely command-line use, ANTLRWorks provides a GUI frontend to construct and debug ANTLR based parsers. ANTLR generates grammars for various host language, like JAVA, C, Python, PHP or C#. The ActionScript runtime is currently broken.
如果您想从下而上学习如何解析表达式(或一般的语言),我建议您使用 来自 Niklaus Wirth 的免费书籍文本(或 德文版),著名的 Pascal 和 Modula-2 发明者.
In case you'd like to learn how to parse expressions (or languages in general) from the bottom-up, I'd propose this free book's text from Niklaus Wirth (or the german book edition), the famous inventor of Pascal and Modula-2.
这篇关于简化数学表达式的策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!