我有一个已排序的名称列表,从中创建了一个子范围数组:
名单:AAA,AAAA,BA,BB,BBCC,BBD,CCC,DDD,EE,FFF,FFG,
GG,GGA,GGGB。。。
子范围:[AAA-BBCC],[BBD-EE],[FFF-GGA],[GGGB-…],…
然后我编写了一个函数,将子范围转换为一组最小的a到Z索引对,如您可能在文件抽屉标签上找到的那样。
[A-B],[B-E],[F-G],[G-…],…
但是,我的函数错误地“计算”了错误的对值,在这种情况下应该是:
[英国广播公司],[英国广播公司]。[F-GGA],[GGG-…],…
我在用php,以防语言功能有帮助。
虽然这是错误的,但这是我的代码:

$indexCount = count( $tempData );

for( $i = 0; $i < $indexCount; ) {

    $lowValue  = $tempData[ $i ][ 'lowValue' ];
    $highValue = $tempData[ $i ][ 'highValue' ];

    // Find the minimum non-matching lowValue compared to the highValue ...

    if( !empty( $highValue ) ) {

        for( $j = strlen( $lowValue ) - 1; $j > 0; --$j ) {

            if( $lowValue !== $highValue ) {

                $lowValue = substr( $lowValue, 0, $j );

            }
            else {

                break;

            } // End of if( $lowValue !== $highValue ) ... else ...

        } // End of for( $j = strlen( $lowValue ) - 1; $j > 0; --$j ) ...

    }
    else {

        $lowValue = substr( $lowValue, 0, 1 );

    } // End of if( !empty( $highValue ) ) ...

    // Find the minimum non-matching highValue compared to the minimized lowValue ...

    if( !empty( $highValue ) ) {

        $highValue = substr( $highValue, 0, $j );

        // If after minimizing both values and the values are the same, then ...

        if( $lowValue === $highValue ) {

            //
            // Add back the next character from the original values to each
            // minimized value ...
            //

            $lowValue  = substr( $tempData[ $i ][ 'lowValue' ],  0, $j + 1 );
            $highValue = substr( $tempData[ $i ][ 'highValue' ], 0, $j + 1 );

        } // End of if( $lowValue === $highValue ) ...

    } // End of if( !empty( $highValue ) ) ...

    // Save the minimized values ...

    $tempData[ $i ][ 'lowShortValue'  ] = $lowValue;
    $tempData[ $i ][ 'highShortValue' ] = $highValue;

    //
    // Originally, my code stopped here, but then I realized that I also
    // needed to minimize the high value and next low value to bridge the
    // sub-ranges ...
    //

    $k = $i;

    if( ++$i < $indexCount ) {

        $lowValue  = $tempData[ $k ][ 'highValue' ];
        $highValue = $tempData[ $i ][ 'lowValue' ];

        // Find the minimum non-matching lowValue compared to the highValue ...

        for( $j = strlen( $lowValue ) - 1; $j > 0; --$j ) {

            if( $lowValue !== $highValue ) {

                $lowValue = substr( $lowValue, 0, $j );

            }
            else {

                break;

            } // End of if( $lowValue !== $highValue ) ... else ...

        } // End of for( $j = strlen( $lowValue ) - 1; $j > 0; --$j ) ...

        //
        // Find the minimum non-matching highValue compared to the
        // minimized lowValue ...
        //

        if( !empty( $highValue ) ) {

            $highValue = substr( $highValue, 0, $j );

            // If after minimizing both values and the values are the same, then ...

            if( $lowValue === $highValue ) {

                //
                // Add back the next character from the original values to each
                // minimized value ...
                //

                $lowValue  = substr( $tempData[ $k ][ 'highValue' ], 0, $j + 1 );
                $highValue = substr( $tempData[ $i ][ 'lowValue' ],  0, $j + 1 );

            } // End of if( $lowValue === $highValue ) ...

        } // End of if( !empty( $highValue ) ) ...

        // Save the minimized values ...

        $tempData[ $k ][ 'highValue' ] = $lowValue;
        $tempData[ $i ][ 'lowValue' ]  = $highValue;

    } // End of if( ++$i < $indexCount ) ...

} // End of for( $i = 0; $i < $indexCount; ++$i ).

此外,由于无法工作,我对该代码还有两个问题:
1)我将lowShortValue和highShortValues存储到$tempdata数组中两次,覆盖为该索引“计算”的值。
2)是否应该对子范围数组进行反向扫描,以便我可以将最小值从一个子范围“携带”到之前的子范围,并跳过每对子范围中最大值的最小化?

最佳答案

我不知道这是否是一个“好”的算法,但它确实有效,并且有一些优化的元素可以减少值的重新量化。

//
// Truncate the lowValue and highValue index values to produce a range of minimal index values.
//
// If the original values were:
//
//   ant - apple, bear - cab, cat - chimp, dog - fry, teller - total
//
// Then the range values would be reduced to:
//
//  an - ap, b - cat, cab - ch, d - f, te - to
//

if( ( $indexCount = count( $tempData ) ) > 0 ) {

    //
    // $indexCount = 5, meaning that there are five sub-ranges.
    //
    // Loop 1a: $i = 4, $j = 4, %k = 0, compare: teller & total           => te  - to
    // Loop 1b: $i = 4, $j = 3, %k = 1, compare: fry    & te     (bridge) => f   - te
    // Loop 2a: $i = 3, $j = 3, %k = 0, compare: dog    & f               => d   - f
    // Loop 2b: $i = 3, $j = 2, %k = 1, compare: chimp  & dog    (bridge) => ch  - d
    // Loop 3a: $i = 2, $j = 2, %k = 0, compare: ca     & ch              => ca  - ch <─────┐
    // Loop 3b: $i = 2, $j = 1, %k = 1, compare: cab    & cat    (bridge) => cab - cat, cat - ch
    // Loop 4a: $i = 1, $j = 1, %k = 0, compare: bear   & cab             => b   - cab
    // Loop 4b: $i = 1, $j = 0, %k = 1, compare: apple  & b      (bridge) => ap  - b
    // Loop 5a: $i = 0, $j = 0, %k = 0, compare: ant    & apple           => an  - ap
    //

    $retrim = true;

    // Process the sub-ranges from right (larger) to left (smaller) ...

    for( $i = $j = $indexCount - 1; $i >= 0; --$i ) {

        //
        // Loop part A - $i === $j
        //

        if( empty( $newLowValue ) ) {

            // Get the low value from sub-range's low value ...

            $newLowValue = $lowValue = $tempData[ $i ][ 'lowValue' ];

        } // End of if( empty( $newLowValue ) ) ...

        if( $retrim === true ) {

            // Get high value from the sub-range's high value ...

            $newHighValue = $highValue = $tempData[ $i ][ 'highValue' ];

        }
        else {

            // Get high value from the previous right (larger) sub-range's low value ...

            $newHighValue = $highValue = $newLowValue;

            // Get the low value from sub-range's low value ...

            $newLowValue = $lowValue = $tempData[ $i ][ 'lowValue' ];

        } // End of if( $retrim === true ) ... else ...

        // With in the sub-range, look for the first difference between the two strings ...

        $k = 1;
        $l = max( strlen( $newLowValue ), strlen( $newHighValue ) ) ;

        while( $k < $l ) {

            $newLowValue = mb_substr( $lowValue,  0, $k );

            if( $retrim === true ) {

                $newHighValue = mb_substr( $highValue, 0, $k );

            } // End of if( $retrim === true ) ...

            if( $newLowValue !== $newHighValue ) {

                break;

            } // End of if( $newLowValue !== $newHighValue ) ...

            ++$k;

        } // End of while( $k < $l ) ...

        // Save the trimmed high value ...

        $tempData[ $i ][ 'highShortValue' ] = $newHighValue;

        //
        // Loop part B - Bridge - $i === $j + 1
        //

        if( --$j < 0 ) {

            // No more sub-ranges, can't bridge, save the trimmed low value ...

            $tempData[ $i ][ 'lowShortValue' ] = $newLowValue;

        }
        else {

            // Bridge.

            $newHighValue = $highValue = $newLowValue;
            $newLowValue  = $lowValue  = $tempData[ $j ][ 'highValue' ];

            if( mb_substr( $newLowValue, 0, strlen( $newHighValue ) ) === $newHighValue ) {

                //
                // Too many characters were trimmed from the high value, re-trim the
                // previous-sub-range's un-trimmed lowValue ...
                //

                $retrim = true;

                $newHighValue = $highValue = $tempData[ $i ][ 'lowValue' ];

            }
            else {

                $retrim = false;

            } // End of if( mb_substr( $newLowValue, 0, strlen( $newHighValue ) ) ===
              //              $newHighValue ) ... else ...

            //
            // In the bridge between the $j and $i sub-ranges, look for the first difference
            // between the two strings ...
            //

            $k = 1;
            $l = max( strlen( $lowValue ), strlen( $highValue ) );

            while( $k < $l ) {

                $newLowValue = mb_substr( $lowValue, 0, $k );

                if( $retrim === true ) {

                    $newHighValue = mb_substr( $highValue, 0, $k );

                } // End of if( $retrim === true ) ...

                if( $newLowValue !== $newHighValue ) {

                    break;

                } // End of if( $newLowValue !== $newHighValue ) ...

                ++$k;

            } // End of while( $k < $l ) ...

            // Bridge, save the trimmed low value ...

            $tempData[ $i ][ 'lowShortValue' ] = $newHighValue;

            $retrim                            = false;

        } // End of if( --$j >= 0 ) ...

    } // End of for( $i = $indexCount - 1; $i >= 0; --$i ).

} // End of if( ( $indexCount = count( $tempData ) ) > 0 ) ...

正如我在问题中所建议的,我改变了扫描子范围的方向,从左到右(升序)到右到左(降序)。虽然没有必要,但它确实使流程对我来说更加清晰,并允许我在两个循环部分之间携带最小值,并且有助于消除在相同子范围内存储两次值的情况。
我发现的另一件事是我需要从substr更改为mb_substr。从我展示的数据中不清楚,但使用substr的副作用是,有时函数返回原始的未截断字符串,而不管length参数的值如何。此外,在原始问题中的版本之后添加到代码中的if(retrm)测试(分配了true和false布尔值)将失败,if-then语句中的代码块将执行,即使它们不应该执行当我切换到mb_substr函数并且副作用停止发生时,我将这些更改为if(retrim===true)。

关于php - 如何编写将子范围转换为索引对的A到Z单词索引算法?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57153817/

10-15 21:31