本文介绍了Code Golf:数学表达式评估器(尊重 PEMDAS)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我挑战您编写一个数学表达式求值器,该求值器遵守 PEMDAS(运算顺序:括号、求幂、乘法、除法、加法、减法),而不使用正则表达式,类似于预先存在的Eval()"函数、解析库等

I challenge you to write a mathematical expression evaluator that respects PEMDAS (order of operations: parentheses, exponentiation, multiplication, division, addition, subtraction) without using regular expressions, a pre-existing "Eval()"-like function, a parsing library, etc.

我在 SO(此处)上看到了一个预先存在的评估者挑战,但是那个特别需要从左到右的评估.

I saw one pre-existing evaluator challenge on SO (here), but that one specifically required left-to-right evaluation.

示例输入和输出:

"-1^(-3*4/-6)" -> "1"

"-2^(2^(4-1))" -> "256"

"2*6/4^2*4/3" -> "1"

我用 C# 编写了一个评估器,但想看看它与选择语言的更聪明的程序员相比有多糟糕.

I wrote an evaluator in C#, but would like to see how badly it compares to those of smarter programmers in their languages of choice.

Code Golf:评估数学表达式

说明:

  1. 让我们把它变成一个接受字符串参数并返回字符串结果的函数.

  1. Let's make this a function that accepts a string argument and returns a string result.

至于为什么没有正则表达式,嗯,这是为了公平竞争.我认为最紧凑的正则表达式"应该有一个单独的挑战.

As for why no regexes, well, that's to level the playing field. I think there ought to be a separate challenge for "the most compact regex".

使用 StrToFloat() 是可以接受的.通过解析库"我的意思是排除通用语法解析器之类的东西,也是为了公平竞争.

Using StrToFloat() is acceptable. By "parsing library" I meant to exclude such things as general-purpose grammar parsers, also to level the playing-field.

支持浮动.

支持括号、取幂和四个算术运算符.

Support paretheses, exponentiation, and the four arithmetic operators.

给予乘法和除法同等的优先权.

Give multiplication and division equal precedence.

赋予加减相同的优先级.

Give addition and subtraction equal precedence.

为简单起见,您可以假设所有输入都是格式良好的.

For simplicity, you may assume all inputs are well-formed.

对于您的函数是否接受诸如.1"之类的东西,我没有偏好.或1e3"作为有效数字,但接受它们会为您赢得布朗尼积分.;)

I don't have a preference as to whether your function accepts such things as ".1" or "1e3" as valid numbers, but accepting them would earn you brownie points. ;)

对于被零除的情况,您或许可以返回NaN";(假设您希望实现错误处理).

For divide-by-zero cases, you could perhaps return "NaN" (assuming you wish to implement error handling).

推荐答案

C(465 个字符)

#define F for(i=0;P-8;i+=2)
#define V t[i
#define P V+1]
#define S V+2]),K(&L,4),i-=2)
#define L V-2]
K(double*t,int i){for(*++t=4;*t-8;*++t=V])*++t=V];}M(double*t){int i,p,b;
F if(!P)for(p=1,b=i;i+=2,p;)P?P-1||--p||(P=8,M(t+b+2),K(t+b,i-b),i=b):++p;
F P-6||(L=pow(L,S;F P-2&&P-7||(L*=(P-7?V+2]:1/S;F P-4&&(L+=(P-5?V+2]:-S;
F L=V];}E(char*s,char*r){double t[99];char*e,i=2,z=0;for(;*s;i+=2)V]=
strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;P=8;M(t+2);sprintf(r,"%g",*t);}

前五个换行符是必需的,其余的只是为了便于阅读.我把前五个换行符算作一个字符.如果你想用行来衡量它,在我删除所有空格之前它是 28 行,但这是一个非常无意义的数字.它可以是 6 行到 100 万行,这取决于我如何格式化.

The first five newlines are required, the rest are there just for readability. I've counted the first five newlines as one character each. If you want to measure it in lines, it was 28 lines before I removed all the whitespace, but that's a pretty meaningless number. It could have been anything from 6 lines to a million, depending on how I formatted it.

入口点是 E()(用于评估").第一个参数是输入字符串,第二个参数指向输出字符串,必须由调用者分配(按照通常的 C 标准).它最多可以处理 47 个标记,其中标记是运算符(+-*/^()"之一)或浮点数.一元符号运算符不算作单独的标记.

The entry point is E() (for "evaluate"). The first parameter is the input string, and the second parameter points to the output string, and must be allocated by the caller (as per usual C standards). It can handle up to 47 tokens, where a token is either an operator (one of "+-*/^()"), or a floating point number. Unary sign operators do not count as a separate token.

这段代码松散地基于我多年前做的一个项目作为练习.我去掉了所有的错误处理和空格跳过,并使用高尔夫技术对其进行了重组.下面是 28 行,具有足够的格式,我可以它,但可能不足以阅读它.您需要 #include (或见底部注释).

This code is loosely based on a project I did many years ago as an exercise. I took out all the error handling and whitespace skipping and retooled it using golf techniques. Below are the 28 lines, with enough formatting that I was able to write it, but probably not enough to read it. You'll want to #include <stdlib.h>, <stdio.h>, and <math.h> (or see note at the bottom).

查看代码后了解其工作原理.

See after the code for an explanation of how it works.

#define F for(i=0;P-8;i+=2)
#define V t[i
#define P V+1]
#define S V+2]),K(&L,4),i-=2)
#define L V-2]
K(double*t,int i){
    for(*++t=4;*t-8;*++t=V])
        *++t=V];
}
M(double*t){
    int i,p,b;
    F if(!P)
        for(p=1,b=i;i+=2,p;)
            P?P-1||--p||(P=8,M(t+b+2),K(t+b,i-b),i=b):++p;
    F P-6||(L=pow(L,S;
    F P-2&&P-7||(L*=(P-7?V+2]:1/S;
    F P-4&&(L+=(P-5?V+2]:-S;
    F L=V];
}
E(char*s,char*r){
    double t[99];
    char*e,i=2,z=0;
    for(;*s;i+=2)
        V]=strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;
    P=8;
    M(t+2);
    sprintf(r,"%g",*t);
}

第一步是标记化.双精度数组包含每个标记的两个值,一个运算符(P,因为O 看起来太像零了)和一个值(V>).这种标记化是在 E() 中的 for 循环中完成的.它还处理任何一元 +- 运算符,将它们合并到常量中.

The first step is to tokenize. The array of doubles contains two values for each token, an operator (P, because O looks too much like zero), and a value (V). This tokenizing is what is done in the for loop in E(). It also deals with any unary + and - operators, incorporating them into the constant.

操作员"令牌数组的字段可以具有以下值之一:

The "operator" field of the token array can have one of the following values:

0:(
1:)
2:*
3:+
4:浮点常量值
5:-
6:^
7:/
8:标记字符串结束

这个方案主要是由 Daniel Martin 推导出来的,他注意到最后 3 位在此挑战中每个运算符的 ASCII 表示中是唯一的.

This scheme was largely derived by Daniel Martin, who noticed that the last 3 bits were unique in the ASCII representation of each of the operators in this challenge.

E() 的未压缩版本看起来像这样:

An uncompressed version of E() would look something like this:

void Evaluate(char *expression, char *result){
    double tokenList[99];
    char *parseEnd;
    int i = 2, prevOperator = 0;
    /* i must start at 2, because the EvalTokens will write before the
     * beginning of the array.  This is to allow overwriting an opening
     * parenthesis with the value of the subexpression. */
    for(; *expression != 0; i += 2){
        /* try to parse a constant floating-point value */
        tokenList[i] = strtod(expression, &parseEnd);

        /* explanation below code */
        if(parseEnd != expression && prevOperator != 4/*constant*/ &&
           prevOperator != 1/*close paren*/){
            expression = parseEnd;
            prevOperator = tokenList[i + 1] = 4/*constant*/;
        }else{
            /* it's an operator */
            prevOperator = tokenList[i + 1] = *expression & 7;
            expression++;
        }
    }

    /* done parsing, add end-of-token-string operator */
    tokenList[i + 1] = 8/*end*/

    /* Evaluate the expression in the token list */
    EvalTokens(tokenList + 2); /* remember the offset by 2 above? */

    sprintf(result, "%g", tokenList[0]/* result ends up in first value */);
}

由于我们保证输入有效,解析失败的唯一原因是因为下一个标记是运算符.如果发生这种情况,parseEnd 指针将与 tokenStart 相同.我们还必须处理解析成功的情况,但我们真正想要的是一个运算符.加法和减法运算符会发生这种情况,除非直接跟在符号运算符之后.换句话说,给定表达式4-6",我们希望将其解析为{4, -, 6},而不是{4、-6}.另一方面,给定4+-6",我们应该将其解析为{4, +, -6}.解决方法很简单.如果解析失败OR前面的记号是一个常数或一个右括号(实际上是一个将计算为常数的子表达式),则当前记号是一个运算符,否则它是一个常数.

Since we're guaranteed valid input, the only reason the parsing would fail would be because the next token is an operator. If this happens, the parseEnd pointer will be the same as, tokenStart. We must also handle the case where parsing succeeded, but what we really wanted was an operator. This would occur for the addition and subtraction operators, unless a sign operator directly followed. In other words, given the expression "4-6", we want to parse it as {4, -, 6}, and not as {4, -6}. On the other hand, given "4+-6", we should parse it as {4, +, -6}. The solution is quite simple. If parsing fails OR the preceding token was a constant or a closing parenthesis (effectively a subexpression which will evaluate to a constant), then the current token is an operator, otherwise it's a constant.

标记化完成后,通过调用M()完成计算和折叠,它首先查找任何匹配的括号对并通过调用处理其中包含的子表达式自己递归.然后它处理运算符,先求幂,然后一起乘除,最后一起加减.因为需要格式良好的输入(如挑战中所指定),所以它不会明确检查加法运算符,因为它是处理所有其他运算符后的最后一个合法运算符.

After tokenizing is done, calculating and folding are done by calling M(), which first looks for any matched pairs of parentheses and processes the subexpressions contained within by calling itself recursively. Then it processes operators, first exponentiation, then multiplication and division together, and finally addition and subtraction together. Because well-formed input is expected (as specified in the challenge), it doesn't check for the addition operator explicitly, since it's the last legal operator after all the others are processed.

缺少高尔夫压缩的计算函数将如下所示:

The calculation function, lacking golf compression, would look something like this:

void EvalTokens(double *tokenList){
    int i, parenLevel, parenStart;

    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] == 0/*open paren*/)
            for(parenLevel = 1, parenStart = i; i += 2, parenLevel > 0){
                if(tokenList[i + 1] == 0/*another open paren*/)
                    parenLevel++;
                else if(tokenList[i + 1] == 1/*close paren*/)
                    if(--parenLevel == 0){
                        /* make this a temporary end of list */
                        tokenList[i + 1] = 8;
                        /* recursively handle the subexpression */
                        EvalTokens(tokenList + parenStart + 2);
                        /* fold the subexpression out */
                        FoldTokens(tokenList + parenStart, i - parenStart);
                        /* bring i back to where the folded value of the
                         * subexpression is now */
                        i = parenStart;
                    }
            }

    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] == 6/*exponentiation operator (^)*/){
            tokenList[i - 2] = pow(tokenList[i - 2], tokenList[i + 2]);
            FoldTokens(tokenList + i - 2, 4);
            i -= 2;
        }
    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] == 2/*multiplication operator (*)*/ ||
           tokenList[i + 1] == 7/*division operator (/)*/){
            tokenList[i - 2] *=
                (tokenList[i + 1] == 2 ?
                    tokenList[i + 2] :
                    1 / tokenList[i + 2]);
            FoldTokens(tokenList + i - 2, 4);
            i -= 2;
        }
    for(i = 0; tokenList[i + 1] != 8/*end*/; i+= 2)
        if(tokenList[i + 1] != 4/*constant*/){
            tokenList[i - 2] +=
                (tokenList[i + 1] == 3 ?
                    tokenList[i + 2] :
                    -tokenList[i + 2]);
            FoldTokens(tokenList + i - 2, 4);
            i -= 2;
        }
    tokenList[-2] = tokenList[0];
    /* the compressed code does the above in a loop, equivalent to:
     *
     * for(i = 0; tokenList[i + 1] != 8; i+= 2)
     *     tokenList[i - 2] = tokenList[i];
     *
     * This loop will actually only iterate once, and thanks to the
     * liberal use of macros, is shorter. */
}

一旦执行了一个操作,操作数和操作符就会被K()(通过宏S调用)从token列表中折叠出来>).运算结果保留为常量,代替折叠表达式.因此,最终结果保留在标记数组的开头,因此当控制返回到 E() 时,它只是将其打印到字符串中,利用事实上,数组中的第一个值是令牌的值字段.

Once an operation is performed, the operands and operator are folded out of the token list by K() (called through the macro S). The result of the operation is left as a constant in place of the folded expression. Consequently, the final result is left at the beginning of the token array, so when control returns to E(), it simply prints that to a string, taking advantage of the fact that the first value in the array is the value field of the token.

此对 FoldTokens() 的调用发生在操作 (^, *, /,+-) 已执行,或在子表达式(用括号括起来)已处理之后.FoldTokens() 例程确保结果值具有正确的运算符类型 (4),然后复制子表达式的较大表达式的其余部分.例如,当表达式2+6*4+1"处理后,EvalTokens()首先计算6*4,结果代替6(2+24*4+1).FoldTokens() 然后删除子表达式24*4"的其余部分,留下 2+24+1.

This call to FoldTokens() takes place either after an operation (^, *, /, +, or -) has been performed, or after a subexpression (surrounded by parentheses) has been processed. The FoldTokens() routine ensures that the result value has the correct operator type (4), and then copies the rest of the larger expression of the subexpression. For instance, when the expression "2+6*4+1" is processed, EvalTokens() first calculates 6*4, leaving the result in place of the 6 (2+24*4+1). FoldTokens() then removes the rest of the sub expression "24*4", leaving 2+24+1.

void FoldTokens(double *tokenList, int offset){
    tokenList++;
    tokenList[0] = 4; // force value to constant

    while(tokenList[0] != 8/*end of token string*/){
        tokenList[0] = tokenList[offset];
        tokenList[1] = tokenList[offset + 1];
        tokenList += 2;
    }
}

就是这样.宏只是用来代替普通操作,其他的只是上面的高尔夫压缩.

That's it. The macros are just there to replace common operations, and everything else is just golf-compression of the above.

strager 坚持认为代码应该包括#include 语句,因为如果没有 strtodpow 和函数的正确前向声明,它将无法正常运行.由于挑战只要求一个功能,而不是一个完整的程序,我认为这不应该是必需的.但是,可以通过添加以下代码以最低成本添加前向声明:

strager insists that the code should include #include statements, as it will not function correctly without a proper forward declation of the strtod and pow and functions. Since the challenge asks for just a function, and not a complete program, I hold that this should not be required. However, forward declarations could be added at minimal cost by adding the following code:

#define D double
D strtod(),pow();

然后我会替换所有double"的实例;在带有D"的代码中.这将在代码中添加 19 个字符,使总数达到 484.另一方面,我也可以将我的函数转换为返回双精度而不是字符串,就像他一样,这将修剪 15 个字符,改变 E() 函数:

I would then replace all instances of "double" in the code with "D". This would add 19 characters to the code, bringing the total up to 484. On the other hand, I could also convert my function to return a double instead of a string, as did he, which would trim 15 characters, changing the E() function to this:

D E(char*s){
    D t[99];
    char*e,i=2,z=0;
    for(;*s;i+=2)
        V]=strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;
    P=8;
    M(t+2);
    return*t;
}

这将使总代码大小为 469 个字符(或 452 个没有 strtodpow 的前向声明,但使用 D 宏).甚至可以通过要求调用者传入一个指向 double 的指针作为返回值来再修剪 1 个字符:

This would make the total code size 469 characters (or 452 without the forward declarations of strtod and pow, but with the D macro). It would even be possible to trim 1 more characters by requiring the caller to pass in a pointer to a double for the return value:

E(char*s,D*r){
    D t[99];
    char*e,i=2,z=0;
    for(;*s;i+=2)
        V=strtod(s,&e),P=z=e-s&&z-4&&z-1?s=e,4:*s++&7;
    P=8;
    M(t+2);
    *r=*t;
}

我会留给读者来决定哪个版本合适.

I'll leave it to the reader to decide which version is appropriate.

这篇关于Code Golf:数学表达式评估器(尊重 PEMDAS)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-21 12:11