本文介绍了截断 pandas 的列宽的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在将大型csv文件读入熊猫,其中一些带有成千上万个字符的String列.是否有任何快速的方法来限制列的宽度,即仅保留前100个字符?

I'm reading in large csv files into pandas some of them with String columns in the thousands of characters. Is there any quick way to limit the width of a column, i.e. only keep the first 100 characters?

推荐答案

如果您可以将整个内容读入内存,则可以使用str方法进行矢量操作:

If you can read the whole thing into memory, you can use the str method for vector operations:

>>> df = pd.read_csv("toolong.csv")
>>> df
   a                       b  c
0  1  1256378916212378918293  2

[1 rows x 3 columns]
>>> df["b"] = df["b"].str[:10]
>>> df
   a           b  c
0  1  1256378916  2

[1 rows x 3 columns]

还请注意,您可以使用来获得具有一定长度的系列

Also note that you can get a Series with lengths using

>>> df["b"].str.len()
0    10
Name: b, dtype: int64

我本来是想知道

>>> pd.read_csv("toolong.csv", converters={"b": lambda x: x[:5]})
   a      b  c
0  1  12563  2

[1 rows x 3 columns]

会更好,但我实际上不知道转换器是逐行调用还是在整列上调用.

would be better but I don't actually know if the converters are called row-by-row or after the fact on the whole column.

这篇关于截断 pandas 的列宽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-05 19:14
查看更多