我在InputStream中执行简单的行号计算(NewLines#10的计算数)
for (int i = 0; i < readBytes ; i++) {
if ( b[ i + off ] == 10 ) { // New Line (10)
rowCount++;
}
}
我可以更快吗?没有一个字节的迭代?
可能我正在寻找一些能够使用CPU特定指令(simd / sse)的类。
所有代码:
@Override
public int read(byte[] b, int off, int len) throws IOException {
int readBytes = in.read(b, off, len);
for (int i = 0; i < readBytes ; i++) {
hadBytes = true; // at least once we read something
lastByteIsNewLine = false;
if ( b[ i + off ] == 10 ) { // New Line (10)
rowCount++;
lastByteIsNewLine = (i == readBytes - 1); // last byte in buffer was the newline
}
}
if ( hadBytes && readBytes == -1 && ! lastByteIsNewLine ) { // file is not empty + EOF + last byte was not NewLine
rowCount++;
}
return readBytes;
}
最佳答案
在我的系统上,仅将lastByteIsNewLine
和hasBytes
部分移出循环会导致〜10%的改善*:
public int read(byte[] b, int off, int len) throws IOException {
int readBytes = in.read(b, off, len);
for (int i = 0; i < readBytes ; i++) {
if ( b[ i + off ] == 10 ) {
rowCount++;
}
}
hadBytes |= readBytes > 0;
lastByteIsNewLine = (readBytes > 0 ? b[readBytes+off-1] == 10 : false);
if ( hadBytes && readBytes == -1 && ! lastByteIsNewLine ) {
rowCount++;
}
return readBytes;
}
*从填充有任意文本的ByteArrayInputStream读取的10MB缓冲区上的1000次迭代为6000毫秒vs 6700毫秒,可进行1,000次迭代。