问题描述
据说Java在性能方面比python快10倍。这也是我从基准测试中看到的。但真正降低Java的是JVM启动时间。
It is said that Java is 10x faster than python in terms of performance. That's what I see from benchmarks too. But what really brings down Java is the JVM startup time.
这是我做的测试:
$time xlsx2csv.py Types\ of\ ESI\ v2.doc-emb-Package-9
...
<output skipped>
real 0m0.085s
user 0m0.072s
sys 0m0.013s
$time java -jar -client /usr/local/bin/tika-app-0.7.jar -m Types\ of\ ESI\ v2.doc-emb-Package-9
real 0m2.055s
user 0m2.433s
sys 0m0.078s
相同的文件,Docx和Python中的12 KB ms XLSX嵌入文件是25x快点 !! WTH !!
Same file , a 12 KB ms XLSX embedded file inside Docx and Python is 25x faster !! WTH!!
Java需要2.055秒。
It takes 2.055 sec for Java.
我知道这完全是由于启动时间,但我需要的是我需要通过脚本调用它来解析一些我不想在python中重新发明轮子的文件。
I know it is all due to startup time, but what i need is i need to call it via a script to parse some documents which i do not want to re-invent the wheel in python.
但是要解析10k +文件,这是不实际的..
But as to parse 10k+ files , it is just not practical..
无论如何要加快它(我已经尝试过-client选项,它只加速这么少(20%)) 。
Anyway to speed it up (I already tried -client option and it only speed up by so little(20%) ).
我的另一个想法?将其作为长时间运行的守护进程运行,在本地使用UDP或Linux-ICP套接字进行通信?
My another idea? Run it as a long-running daemon , communicate using UDP or Linux-ICP sockets locally?
推荐答案
尝试。
注意:我不会亲自使用它。
Note: I don't use it personally.
这篇关于有什么方法可以提升JVM启动速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!