问题描述
我想在pyspark中捕获 show 的结果,类似于和.我只能使用scala找不到pyspark解决方案.
I would like to capture the result of show in pyspark, similar to here and here. I was not able to find a solution with pyspark, only scala.
df.show()
#+----+-------+
#| age| name|
#+----+-------+
#|null|Michael|
#| 30| Andy|
#| 19| Justin|
#+----+-------+
最终目的是将其捕获为我的logger.info
中的字符串我尝试了logger.info(df.show())
,它将仅显示在控制台上.
The ultimate purpose is to capture this as string inside my logger.info
I tried logger.info(df.show())
which will only display on console.
推荐答案
您可以使用链接在pyspark中捕获explain()的结果.只需检查 show()
并观察到它正在调用self._jdf.showString()
.
You can build a helper function using the same approach as shown in post you linked Capturing the result of explain() in pyspark. Just examine the source code for show()
and observe that it is calling self._jdf.showString()
.
答案取决于您所使用的spark版本,因为show()
的参数数量随时间而变化.
The answer depends on which version of spark you are using, as the number of arguments to show()
has changed over time.
在版本2.3中,添加了vertical
参数.
In version 2.3, the vertical
argument was added.
def getShowString(df, n=20, truncate=True, vertical=False):
if isinstance(truncate, bool) and truncate:
return(df._jdf.showString(n, 20, vertical))
else:
return(df._jdf.showString(n, int(truncate), vertical))
Spark版本1.5到2.2
从1.5版开始,添加了truncate
参数.
def getShowString(df, n=20, truncate=True):
if isinstance(truncate, bool) and truncate:
return(df._jdf.showString(n, 20))
else:
return(df._jdf.showString(n, int(truncate)))
Spark版本1.3到1.4
show
函数是在1.3版中首次引入的.
Spark Versions 1.3 through 1.4
The show
function was first introduced in version 1.3.
def getShowString(df, n=20):
return(df._jdf.showString(n))
现在按如下方式使用助手功能:
Now use the helper function as follows:
x = getShowString(df) # default arguments
print(x)
#+----+-------+
#| age| name|
#+----+-------+
#|null|Michael|
#| 30| Andy|
#| 19| Justin|
#+----+-------+
或者您的情况:
logger.info(getShowString(df))
这篇关于将DataFrame show()的结果保存到pyspark中的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!