问题描述
我在循环语句中有以下代码.
在循环中,将字符串附加到sb(StringBuilder)并检查sb的大小是否已达到5MB.
I have the following code inside a loop statement.
In the loop, strings are appended to sb(StringBuilder) and checked whether the size of sb has reached 5MB.
if (sb.toString().getBytes("UTF-8").length >= 5242880) {
// Do something
}
这可以正常工作,但是速度很慢(就检查尺寸而言)
最快的方法是什么?
This works fine, but it is very slow(in terms of checking the size)
What would be the fastest way to do this?
推荐答案
您可以使用
快速计算UTF-8
长度
public static int utf8Length(CharSequence cs) {
return cs.codePoints()
.map(cp -> cp<=0x7ff? cp<=0x7f? 1: 2: cp<=0xffff? 3: 4)
.sum();
}
如果ASCII字符占主导地位,则使用起来可能会更快
If ASCII characters dominate the contents, it might be slightly faster to use
public static int utf8Length(CharSequence cs) {
return cs.length()
+ cs.codePoints().filter(cp -> cp>0x7f).map(cp -> cp<=0x7ff? 1: 2).sum();
}
相反.
但是您也可以考虑不重新计算整个大小的优化潜力,而只是重新计算要添加到StringBuilder
的新片段的大小,类似
But you may also consider the optimization potential of not recalculating the entire size, but only the size of the new fragment you’re appending to the StringBuilder
, something alike
StringBuilder sb = new StringBuilder();
int length = 0;
for(…; …; …) {
String s = … //calculateNextString();
sb.append(s);
length += utf8Length(s);
if(length >= 5242880) {
// Do something
// in case you're flushing the data:
sb.setLength(0);
length = 0;
}
}
这是假设,如果您要添加包含代理对的片段,则它们始终是完整的,不会分成两半.对于普通应用程序,应该总是这样.
This assumes that if you’re appending fragments containing surrogate pairs, they are always complete and not split into their halves. For ordinary applications, this should always be the case.
Didier-L 建议的另一种可能性是将计算推迟到StringBuilder
达到如前所述,阈值的长度除以三,不可能UTF-8
长度大于阈值.但是,只有在某些处决中您未达到threshold / 3
的情况下,这才是有益的.
An additional possibility, suggested by Didier-L, is to postpone the calculation until your StringBuilder
reaches a length of the threshold divided by three, as before that, it is impossible to have a UTF-8
length greater than the threshold. However, that will be only beneficial if it happens that you don’t reach threshold / 3
in some executions.
这篇关于Java-检查字符串大小的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!