python - 使用带有混合类型列的rpy2将pandas df转换为R data.frame

如果我有一个Pandas数据框，其中包含一列包含字符和数字数据的列，例如：

d = {'one' : pd.Series(['cat', 2., 3.], index=['a', 'b', 'c']),
 'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)

%%R -i df
str(df)

然后，当使用rpy2将其转换为R data.frame时，混合列中的每个值都会在R data.frame中获得自己的列，并填充相同的值。上面的代码生成一个data.frame，其中有5列，而不是2列：

'data.frame':   4 obs. of  5 variables:
 $ one.a: chr  "cat" "cat" "cat" "cat"
 $ one.b: num  2 2 2 2
 $ one.c: num  3 3 3 3
 $ one.d: num  NaN NaN NaN NaN
 $ two  : num  1 2 3 4

这是预期的行为吗？如果可以，为什么？

（我正在使用在Python 3.5.4上运行的Jupyter笔记本5.0.0 | Anaconda自定义（64位）| Windows 10和rpy2 2.9.1）

谢谢。

最佳答案

看起来rpy2正在使r listvector而不是r stringvector。我遇到了类似的问题，即转换数据帧时某些列同时具有字符串和NaN。尽管我使用的是pandas2ri.py2ri（）而不是R magic，但我认为它们是相似的。

此代码似乎有效。

# Your code with pandas2ri instead of rmagic
d = {'one' : pandas.Series(['cat', 2., 3.], index=['a', 'b', 'c']),
     'two' : pandas.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pandas.DataFrame(d)

print(df)

pandas2ri.activate()
r_df = pandas2ri.py2ri(df)
print(r_df)

# convert each column individually to see the vector type in R
r_df1 = pandas2ri.py2ri(df['one'])
print(type(r_df1))

r_df2 = pandas2ri.py2ri(df['two'])
print(type(r_df2))

# Make an rpy2 OrdDict, be sure to include the None to avoid indexing problems
od = rlc.OrdDict([('one', robjects.StrVector(['cat', 2., 3., None])),
                  ('two', robjects.FloatVector([1., 2., 3., 4.]))])

# make a vector for rownames
od_rownames = robjects.StrVector(['a', 'b', 'c', 'd'])

# convert the OrdDict to an r dataframe and assign rownames
od_df = robjects.DataFrame(od)
print(od_df)

od_df.rownames = od_rownames
print(od_df)

关于python - 使用带有混合类型列的rpy2将pandas df转换为R data.frame，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/51394678/