问题描述
我想向现有列添加一个字符串.例如, df['col1']
的值为 '1', '2', '3'
等,我想连接字符串 '000'
在 col1
的左边,所以我可以得到一列(新的或替换旧的无关紧要)作为 '0001', '0002', '0003'代码>.
I would like to add a string to an existing column. For example, df['col1']
has values as '1', '2', '3'
etc and I would like to concat string '000'
on the left of col1
so I can get a column (new or replace the old one doesn't matter) as '0001', '0002', '0003'
.
我想我应该使用 df.withColumn('col1', '000'+df['col1'])
但当然它不起作用,因为 pyspark 数据帧是不可变的?
I thought I should use df.withColumn('col1', '000'+df['col1'])
but of course it does not work since pyspark dataframe are immutable?
这应该是一项简单的任务,但我没有在网上找到任何东西.希望有人能给我一些帮助!
This should be an easy task but i didn't find anything online. Hope someone can give me some help!
谢谢!
推荐答案
from pyspark.sql.functions import concat, col, lit
df.select(concat(col("firstname"), lit(" "), col("lastname"))).show(5)
+------------------------------+
|concat(firstname, , lastname)|
+------------------------------+
| Emanuel Panton|
| Eloisa Cayouette|
| Cathi Prins|
| Mitchel Mozdzierz|
| Angla Hartzheim|
+------------------------------+
only showing top 5 rows
http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#module-pyspark.sql.functions
这篇关于在 pyspark 中,如何向列添加/连接字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!