本文介绍了截断 pandas 的列宽的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在将大型csv文件读入熊猫,其中一些带有成千上万个字符的String列.是否有任何快速的方法来限制列的宽度,即仅保留前100个字符?
I'm reading in large csv files into pandas some of them with String columns in the thousands of characters. Is there any quick way to limit the width of a column, i.e. only keep the first 100 characters?
推荐答案
如果您可以将整个内容读入内存,则可以使用str
方法进行矢量操作:
If you can read the whole thing into memory, you can use the str
method for vector operations:
>>> df = pd.read_csv("toolong.csv")
>>> df
a b c
0 1 1256378916212378918293 2
[1 rows x 3 columns]
>>> df["b"] = df["b"].str[:10]
>>> df
a b c
0 1 1256378916 2
[1 rows x 3 columns]
还请注意,您可以使用来获得具有一定长度的系列
Also note that you can get a Series with lengths using
>>> df["b"].str.len()
0 10
Name: b, dtype: int64
我本来是想知道
>>> pd.read_csv("toolong.csv", converters={"b": lambda x: x[:5]})
a b c
0 1 12563 2
[1 rows x 3 columns]
会更好,但我实际上不知道转换器是逐行调用还是在整列上调用.
would be better but I don't actually know if the converters are called row-by-row or after the fact on the whole column.
这篇关于截断 pandas 的列宽的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!