我有一个已排序的名称列表,从中创建了一个子范围数组:
名单:AAA,AAAA,BA,BB,BBCC,BBD,CCC,DDD,EE,FFF,FFG,
GG,GGA,GGGB。。。
子范围:[AAA-BBCC],[BBD-EE],[FFF-GGA],[GGGB-…],…
然后我编写了一个函数,将子范围转换为一组最小的a到Z索引对,如您可能在文件抽屉标签上找到的那样。
[A-B],[B-E],[F-G],[G-…],…
但是,我的函数错误地“计算”了错误的对值,在这种情况下应该是:
[英国广播公司],[英国广播公司]。[F-GGA],[GGG-…],…
我在用php,以防语言功能有帮助。
虽然这是错误的,但这是我的代码:
$indexCount = count( $tempData );
for( $i = 0; $i < $indexCount; ) {
$lowValue = $tempData[ $i ][ 'lowValue' ];
$highValue = $tempData[ $i ][ 'highValue' ];
// Find the minimum non-matching lowValue compared to the highValue ...
if( !empty( $highValue ) ) {
for( $j = strlen( $lowValue ) - 1; $j > 0; --$j ) {
if( $lowValue !== $highValue ) {
$lowValue = substr( $lowValue, 0, $j );
}
else {
break;
} // End of if( $lowValue !== $highValue ) ... else ...
} // End of for( $j = strlen( $lowValue ) - 1; $j > 0; --$j ) ...
}
else {
$lowValue = substr( $lowValue, 0, 1 );
} // End of if( !empty( $highValue ) ) ...
// Find the minimum non-matching highValue compared to the minimized lowValue ...
if( !empty( $highValue ) ) {
$highValue = substr( $highValue, 0, $j );
// If after minimizing both values and the values are the same, then ...
if( $lowValue === $highValue ) {
//
// Add back the next character from the original values to each
// minimized value ...
//
$lowValue = substr( $tempData[ $i ][ 'lowValue' ], 0, $j + 1 );
$highValue = substr( $tempData[ $i ][ 'highValue' ], 0, $j + 1 );
} // End of if( $lowValue === $highValue ) ...
} // End of if( !empty( $highValue ) ) ...
// Save the minimized values ...
$tempData[ $i ][ 'lowShortValue' ] = $lowValue;
$tempData[ $i ][ 'highShortValue' ] = $highValue;
//
// Originally, my code stopped here, but then I realized that I also
// needed to minimize the high value and next low value to bridge the
// sub-ranges ...
//
$k = $i;
if( ++$i < $indexCount ) {
$lowValue = $tempData[ $k ][ 'highValue' ];
$highValue = $tempData[ $i ][ 'lowValue' ];
// Find the minimum non-matching lowValue compared to the highValue ...
for( $j = strlen( $lowValue ) - 1; $j > 0; --$j ) {
if( $lowValue !== $highValue ) {
$lowValue = substr( $lowValue, 0, $j );
}
else {
break;
} // End of if( $lowValue !== $highValue ) ... else ...
} // End of for( $j = strlen( $lowValue ) - 1; $j > 0; --$j ) ...
//
// Find the minimum non-matching highValue compared to the
// minimized lowValue ...
//
if( !empty( $highValue ) ) {
$highValue = substr( $highValue, 0, $j );
// If after minimizing both values and the values are the same, then ...
if( $lowValue === $highValue ) {
//
// Add back the next character from the original values to each
// minimized value ...
//
$lowValue = substr( $tempData[ $k ][ 'highValue' ], 0, $j + 1 );
$highValue = substr( $tempData[ $i ][ 'lowValue' ], 0, $j + 1 );
} // End of if( $lowValue === $highValue ) ...
} // End of if( !empty( $highValue ) ) ...
// Save the minimized values ...
$tempData[ $k ][ 'highValue' ] = $lowValue;
$tempData[ $i ][ 'lowValue' ] = $highValue;
} // End of if( ++$i < $indexCount ) ...
} // End of for( $i = 0; $i < $indexCount; ++$i ).
此外,由于无法工作,我对该代码还有两个问题:
1)我将lowShortValue和highShortValues存储到$tempdata数组中两次,覆盖为该索引“计算”的值。
2)是否应该对子范围数组进行反向扫描,以便我可以将最小值从一个子范围“携带”到之前的子范围,并跳过每对子范围中最大值的最小化?
最佳答案
我不知道这是否是一个“好”的算法,但它确实有效,并且有一些优化的元素可以减少值的重新量化。
//
// Truncate the lowValue and highValue index values to produce a range of minimal index values.
//
// If the original values were:
//
// ant - apple, bear - cab, cat - chimp, dog - fry, teller - total
//
// Then the range values would be reduced to:
//
// an - ap, b - cat, cab - ch, d - f, te - to
//
if( ( $indexCount = count( $tempData ) ) > 0 ) {
//
// $indexCount = 5, meaning that there are five sub-ranges.
//
// Loop 1a: $i = 4, $j = 4, %k = 0, compare: teller & total => te - to
// Loop 1b: $i = 4, $j = 3, %k = 1, compare: fry & te (bridge) => f - te
// Loop 2a: $i = 3, $j = 3, %k = 0, compare: dog & f => d - f
// Loop 2b: $i = 3, $j = 2, %k = 1, compare: chimp & dog (bridge) => ch - d
// Loop 3a: $i = 2, $j = 2, %k = 0, compare: ca & ch => ca - ch <─────┐
// Loop 3b: $i = 2, $j = 1, %k = 1, compare: cab & cat (bridge) => cab - cat, cat - ch
// Loop 4a: $i = 1, $j = 1, %k = 0, compare: bear & cab => b - cab
// Loop 4b: $i = 1, $j = 0, %k = 1, compare: apple & b (bridge) => ap - b
// Loop 5a: $i = 0, $j = 0, %k = 0, compare: ant & apple => an - ap
//
$retrim = true;
// Process the sub-ranges from right (larger) to left (smaller) ...
for( $i = $j = $indexCount - 1; $i >= 0; --$i ) {
//
// Loop part A - $i === $j
//
if( empty( $newLowValue ) ) {
// Get the low value from sub-range's low value ...
$newLowValue = $lowValue = $tempData[ $i ][ 'lowValue' ];
} // End of if( empty( $newLowValue ) ) ...
if( $retrim === true ) {
// Get high value from the sub-range's high value ...
$newHighValue = $highValue = $tempData[ $i ][ 'highValue' ];
}
else {
// Get high value from the previous right (larger) sub-range's low value ...
$newHighValue = $highValue = $newLowValue;
// Get the low value from sub-range's low value ...
$newLowValue = $lowValue = $tempData[ $i ][ 'lowValue' ];
} // End of if( $retrim === true ) ... else ...
// With in the sub-range, look for the first difference between the two strings ...
$k = 1;
$l = max( strlen( $newLowValue ), strlen( $newHighValue ) ) ;
while( $k < $l ) {
$newLowValue = mb_substr( $lowValue, 0, $k );
if( $retrim === true ) {
$newHighValue = mb_substr( $highValue, 0, $k );
} // End of if( $retrim === true ) ...
if( $newLowValue !== $newHighValue ) {
break;
} // End of if( $newLowValue !== $newHighValue ) ...
++$k;
} // End of while( $k < $l ) ...
// Save the trimmed high value ...
$tempData[ $i ][ 'highShortValue' ] = $newHighValue;
//
// Loop part B - Bridge - $i === $j + 1
//
if( --$j < 0 ) {
// No more sub-ranges, can't bridge, save the trimmed low value ...
$tempData[ $i ][ 'lowShortValue' ] = $newLowValue;
}
else {
// Bridge.
$newHighValue = $highValue = $newLowValue;
$newLowValue = $lowValue = $tempData[ $j ][ 'highValue' ];
if( mb_substr( $newLowValue, 0, strlen( $newHighValue ) ) === $newHighValue ) {
//
// Too many characters were trimmed from the high value, re-trim the
// previous-sub-range's un-trimmed lowValue ...
//
$retrim = true;
$newHighValue = $highValue = $tempData[ $i ][ 'lowValue' ];
}
else {
$retrim = false;
} // End of if( mb_substr( $newLowValue, 0, strlen( $newHighValue ) ) ===
// $newHighValue ) ... else ...
//
// In the bridge between the $j and $i sub-ranges, look for the first difference
// between the two strings ...
//
$k = 1;
$l = max( strlen( $lowValue ), strlen( $highValue ) );
while( $k < $l ) {
$newLowValue = mb_substr( $lowValue, 0, $k );
if( $retrim === true ) {
$newHighValue = mb_substr( $highValue, 0, $k );
} // End of if( $retrim === true ) ...
if( $newLowValue !== $newHighValue ) {
break;
} // End of if( $newLowValue !== $newHighValue ) ...
++$k;
} // End of while( $k < $l ) ...
// Bridge, save the trimmed low value ...
$tempData[ $i ][ 'lowShortValue' ] = $newHighValue;
$retrim = false;
} // End of if( --$j >= 0 ) ...
} // End of for( $i = $indexCount - 1; $i >= 0; --$i ).
} // End of if( ( $indexCount = count( $tempData ) ) > 0 ) ...
正如我在问题中所建议的,我改变了扫描子范围的方向,从左到右(升序)到右到左(降序)。虽然没有必要,但它确实使流程对我来说更加清晰,并允许我在两个循环部分之间携带最小值,并且有助于消除在相同子范围内存储两次值的情况。
我发现的另一件事是我需要从substr更改为mb_substr。从我展示的数据中不清楚,但使用substr的副作用是,有时函数返回原始的未截断字符串,而不管length参数的值如何。此外,在原始问题中的版本之后添加到代码中的if(retrm)测试(分配了true和false布尔值)将失败,if-then语句中的代码块将执行,即使它们不应该执行当我切换到mb_substr函数并且副作用停止发生时,我将这些更改为if(retrim===true)。
关于php - 如何编写将子范围转换为索引对的A到Z单词索引算法?,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/57153817/