问题描述
我使用我的本地Windows独立群集,并试图加载使用以下code从我们的服务器的一个数据 -
I am using standalone cluster on my local windows and trying to load data from one of our server using following code -
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
df = sqlContext.load(source="jdbc", url="jdbc:postgresql://host/dbname", dbtable="schema.tablename")
我已经设置了SPARK_CLASSPATH为 -
I have set the SPARK_CLASSPATH as -
os.environ['SPARK_CLASSPATH'] = "C:\Users\ACERNEW3\Desktop\Spark\spark-1.3.0-bin-hadoop2.4\postgresql-9.2-1002.jdbc3.jar"
在执行sqlContext.load,它会抛出错误提找到JDBC没有合适的驱动程序:PostgreSQL的。我试图寻找网页,但没能找到解决办法。
While executing sqlContext.load, it throws error mentioning "No suitable driver found for jdbc:postgresql". I have tried searching web, but not able to find solution.
推荐答案
我与MySQL同样的问题,而且是从来没有能够得到它与SPARK_CLASSPATH方式工作。不过,我没有得到它与额外的命令行参数的工作,一看便知,以this问题
I had the same problem with mysql, and was never able to get it to work with the SPARK_CLASSPATH approach. However I did get it to work with extra command line arguments, see the answer to this question
要避免通过点击得到它的工作,这里是你必须做的:
To avoid having to click through to get it working, here's what you have to do:
pyspark --conf spark.executor.extraClassPath=<jdbc.jar> --driver-class-path <jdbc.jar> --jars <jdbc.jar> --master <master-URL>
这篇关于无法连接到pyspark外壳使用JDBC的Postgres的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!