问题描述
我有一个 Python3.x pandas DataFrame,其中某些列是用字节表示的字符串(如在 Python2.x 中)
I have a Python3.x pandas DataFrame whereby certain columns are strings which as expressed as bytes (like in Python2.x)
import pandas as pd
df = pd.DataFrame(...)
df
COLUMN1 ....
0 b'abcde' ....
1 b'dog' ....
2 b'cat1' ....
3 b'bird1' ....
4 b'elephant1' ....
当我使用 df.COLUMN1
按列访问时,我看到 Name: COLUMN1, dtype: object
When I access by column with df.COLUMN1
, I see Name: COLUMN1, dtype: object
但是,如果我按元素访问,它是一个字节"对象
However, if I access by element, it is a "bytes" object
df.COLUMN1.ix[0].dtype
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'bytes' object has no attribute 'dtype'
如何将这些转换为常规"字符串?也就是说,我怎样才能摆脱这个 b''
前缀?
How do I convert these into "regular" strings? That is, how can I get rid of this b''
prefix?
推荐答案
您可以使用矢量化的 str.decode
将字节字符串解码为普通字符串:
You can use vectorised str.decode
to decode byte strings into ordinary strings:
df['COLUMN1'].str.decode("utf-8")
要对多列执行此操作,您可以只选择 str 列:
To do this for multiple columns you can select just the str columns:
str_df = df.select_dtypes([np.object])
全部转换:
str_df = str_df.stack().str.decode('utf-8').unstack()
然后您可以用原始 df cols 换出转换后的 cols:
You can then swap out converted cols with the original df cols:
for col in str_df:
df[col] = str_df[col]
这篇关于如何翻译“字节"对象转换为 Pandas Dataframe 中的文字字符串,Python3.x?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!