问题描述
我是 pySpark 的新手.我收到了一个大约有 1000 列的 csv 文件.我正在使用数据块.大多数这些列之间都有空格,例如总收入"、总年龄"等.我需要用带下划线'_'的空格更新所有列名.
I am new to pySpark. I have received a csv file which has around 1000 columns. I am using databricks. Most of these columns have spaces in between eg "Total Revenue" ,"Total Age" etc. I need to updates all the column names with space with underscore'_'.
我已经试过了
foreach(在 cloned.Columns 中的数据列 c)c.ColumnName = String.Join("_", c.ColumnName.Split());
foreach(DataColumn c in cloned.Columns) c.ColumnName = String.Join("_", c.ColumnName.Split());
但它在数据块上的 Pyspark 中不起作用.
but it didn't work in Pyspark on databricks.
推荐答案
我会将 select
与 list
理解结合使用:
I would use select
in conjunction with a list
comprehension:
from pyspark.sql import functions as F
renamed_df = df.select([F.col(col).alias(col.replace(' ', '_')) for col in df.columns])
这篇关于从pyspark中的所有列名中删除空格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!