我有一个数据框hash_file,它有两列VARIABLEconcept_id

hash_file = pd.DataFrame({'VARIABLE':['Tes ','Exam ','Evaluation '],'concept_id': [1,2,3]})


要在这两列的值中去除空格,我使用以下代码

hash_file['VARIABLE']=hash_file['VARIABLE'].astype(str).str.strip()
hash_file['concept_id']=hash_file['concept_id'].astype(str).str.strip()


尽管这可以正常工作,但我无法使用这种方法,因为我的实际数据框具有150多个列。

无论如何,是否要一次从所有列及其值中删除空格?喜欢一行吗?

更新截图

python - 与单个列相比,跨数据帧一次剥离空间的绝佳方法-LMLPHP

最佳答案

仅选择DataFrame.select_dtypes的字符串列,并将每个Series.str.stripDataFrame.apply一起使用:

cols = hash_file.select_dtypes(object).columns
hash_file[cols] = hash_file[cols].apply(lambda x: x.str.strip())


如果字符串中没有缺失值:

cols = hash_file.select_dtypes(object).columns
hash_file[cols] = hash_file[cols].applymap(lambda x: x.strip())


性能:

[9000 rows x 150 columns] (50% strings columns)




hash_file = pd.DataFrame({'VARIABLE':['Tes ','Exam ','Evaluation '],'concept_id': [1,2,3]})
hash_file = pd.concat([hash_file] * 3000, ignore_index=True)
hash_file = pd.concat([hash_file] * 75, ignore_index=True, axis=1)




In [14]: %%timeit
    ...: cols = hash_file.select_dtypes(object).columns
    ...: hash_file[cols] = hash_file[cols].applymap(lambda x: x.strip())
    ...:
338 ms ± 14 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [15]: %%timeit
    ...: cols = hash_file.select_dtypes(object).columns
    ...: hash_file[cols] = hash_file[cols].apply(lambda x: x.str.strip())
    ...:
368 ms ± 7.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [16]: %%timeit
    ...: cols = hash_file.select_dtypes(object).columns
    ...: hash_file[cols] = hash_file[cols].stack().str.strip().unstack()
    ...:
818 ms ± 17.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [17]: %%timeit
    ...: hash_file.astype(str).applymap(lambda x: x.strip())
    ...:
1.09 s ± 21.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]: %%timeit
    ...: hash_file.astype(str).apply(lambda x: x.str.strip())
    ...:
1.2 s ± 32.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [19]: %%timeit
    ...: hash_file.astype(str).stack().str.strip().unstack()
    ...:
2 s ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

10-07 13:25
查看更多