问题描述
我想在现有列中添加一个字符串.例如,df['col1']
的值为'1', '2', '3'
等,我想在col1
的左侧合并字符串'000'
,这样我就可以得到一列(新列或替换旧列都没有关系)为'0001', '0002', '0003'
.
I would like to add a string to an existing column. For example, df['col1']
has values as '1', '2', '3'
etc and I would like to concat string '000'
on the left of col1
so I can get a column (new or replace the old one doesn't matter) as '0001', '0002', '0003'
.
我以为我应该使用df.withColumn('col1', '000'+df['col1'])
,但是由于pyspark数据帧是不可变的,因此它当然不起作用吗?
I thought I should use df.withColumn('col1', '000'+df['col1'])
but of course it does not work since pyspark dataframe are immutable?
这应该是一个简单的任务,但是我没有在网上找到任何东西.希望有人能给我一些帮助!
This should be an easy task but i didn't find anything online. Hope someone can give me some help!
谢谢!
推荐答案
from pyspark.sql.functions import concat, col, lit
df.select(concat(col("firstname"), lit(" "), col("lastname"))).show(5)
+------------------------------+
|concat(firstname, , lastname)|
+------------------------------+
| Emanuel Panton|
| Eloisa Cayouette|
| Cathi Prins|
| Mitchel Mozdzierz|
| Angla Hartzheim|
+------------------------------+
only showing top 5 rows
http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#module-pyspark.sql.functions
这篇关于在pyspark中,如何将字符串添加/合并到列中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!