问题描述
我正在使用Tika,我意识到每次下载jar文件并将其放置在Temp文件夹中
I'm using Tika and I realized that each time the jar file is downloaded and placed in Temp folder
Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar to C:\Users\asus\AppData\Local\Temp\tika-server.jar.
Retrieving http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-server/1.19/tika-server-1.19.jar.md5 to C:\Users\asus\AppData\Local\Temp\tika-server.jar.md5.
问题在于jar文件的大小约为60MB,这需要一些时间才能下载.
The problem is that the jar file size is around 60MB, which takes some time to download.
这是我正在使用的代码:
This is the code I'm using :
from tika import parser
def get_pdf_text(path):
parsed = parser.from_file(path):
return parsed['content']
我发现的唯一解决方法是:
The only workaround I found is this :
1-使用java -jar tika-server-x.x.jar --port xxxx
2-使用tika.TikaClientOnly = True
3-用parser.from_file(path, '/path/to/server')
但是我不想手动运行jar文件.如果我能使用Python自动运行jar文件并用它设置tika而不重新下载,那就更好了.
But I don't want to run the jar file manually. It would be better if I can use Python to automatically run the jar file and setup tika with it without redownloading.
推荐答案
要解决此问题,应将环境变量添加到tika服务器jar中,并指定包含tika jar文件的路径文件夹.
To resolve this problem you should add an environment variable to the tika server jar and specify the path folder which contains the tika jar file.
TIKA_SERVER_JAR ='PATH_OF_FOLDER_CONTAINING_TIKA_SERVER_JAR'.
TIKA_SERVER_JAR = 'PATH_OF_FOLDER_CONTAINING_TIKA_SERVER_JAR'.
这篇关于python如何将tika与现有的jar文件一起使用而无需再次下载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!