本文介绍了将列表插入单元格中-为什么loc在这里实际起作用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们知道设置单个单元格的标准方法是使用atiat.但是,我注意到一些有趣的行为,我想知道是否有人可以合理化.

We are aware that the standard method of setting a single cell is using at or iat. However, I noticed some interesting behaviour I was wondering if anyone could rationalise.

在解决这个问题时,我遇到了一些奇怪的问题loc的行为.

In solving this question, I come across some weird behaviour of loc.

# Setup.

pd.__version__
# '0.24.0rc1'

df = pd.DataFrame({'A': [12, 23], 'B': [['a', 'b'], ['c', 'd']]})
df
    A       B
0  12  [a, b]
1  23  [c, d]

要设置单元格(1,'B'),只需使用df.at[1, 'B'] = ...这样的at即可.但是使用loc时,我最初尝试了此方法,但没有成功:

To set cell (1, 'B'), it suffices to do this with at, like df.at[1, 'B'] = .... But with loc, I initially tried this, which did not work:

df.loc[1, 'B'] = ['m', 'n', 'o', 'p'] 
# ValueError: Must have equal len keys and value when setting with an iterable

所以,我尝试了(也失败了)

So, I tried (which also failed)

df.loc[1, 'B'] = [['m', 'n', 'o', 'p']]
# ValueError: Must have equal len keys and value when setting with an ndarray

我认为loc在某种程度上也可以采用嵌套列表.在一件奇怪的事件中,代码起作用了:

I thought loc would also somehow be able to take nested lists here. In a bizarre turn of events, this code worked:

df.loc[1, 'B'] = [['m'], ['n'], ['o'], ['p']]
df

    A             B
0  12        [a, b]
1  23  [m, n, o, p]

为什么loc以这种方式工作?此外,如果您将任何其他元素添加到任何列表中,则会失败:

Why does loc work this way? Additionally, if you add another element to any of the lists, it flops:

df.loc[1, 'B'] = [['m'], ['n'], ['o'], ['p', 'q']]
# ValueError: Must have equal len keys and value when setting with an iterable

空列表也不起作用.将每个元素嵌套在自己的列表中似乎没有意义.

Empty lists don't work either. It seems pointless to have to nest each element in its own list.

为什么loc这样做?是记录下来的行为还是错误?

Why does loc do this? Is this documented behaviour, or a bug?

推荐答案

之所以会发生这种情况,是因为loc执行了 检查它支持的所有用例. (注意:历史记录是lociloc的创建是为了消除ix的歧义,可追溯到2013 v0.11,但即使到今天,loc仍然有很多歧义.)

This occurs because loc does a bunch of checking for all the many usecases which it supports. (Note: The history was that loc and iloc were created to remove ambiguity of ix, way back in 2013 v0.11, but even today there's still a lot of ambiguity in loc.)

在这种情况下,df.loc[1, 'B']可以返回:

In this case df.loc[1, 'B'] can either return:

  • 单个元素(例如,在这种情况下,当1/'B'具有唯一的索引/列时).
  • 一个系列(如果1/'B'之一在索引/列中多次出现).
  • 一个DataFrame(如果同时在索引/列中多次出现"1/B").

此外:即使总是第一种情况,在这种情况下iloc也会遇到相同的问题,但这可能是因为loc和iloc共享此分配代码.

Aside: iloc suffers the same issue in this case, even though it's always going to be the first case, but that may be because loc and iloc share this assignment code.

因此,熊猫需要支持所有这些情况以进行分配!

So that pandas needs to support all of those cases for assignment!

赋值逻辑的早期部分将列表中的列表转换为numpy数组:

An early part of the assignment logic converts the list (of lists) into a numpy array:

In [11]: np.array(['m', 'n', 'o', 'p']).shape
Out[11]: (4,)

In [12]: np.array([['m', 'n', 'o', 'p']]).shape
Out[12]: (1, 4)

因此,您不能仅传递列表列表并期望获得正确的数组.相反,您可以显式设置为对象数组:

So you can't just pass the list of lists and expect to get the right array. Instead you could to explictly set into an object array:

In [13]: a = np.empty(1, dtype=object)

In [14]: a[0] = ['m', 'n', 'o', 'p']

In [15]: a
Out[15]: array([list(['m', 'n', 'o', 'p'])], dtype=object)

现在您可以在作业中使用它:

Now you can use this in the assignment:

In [16]: df.loc[0, 'B'] = a

In [17]: df
Out[17]:
    A             B
0  12  [m, n, o, p]
1  23        [c, d]

这仍然不是理想的,但是要重申,在lociloc中有很多极端情况,解决方案是尽可能明确地避免它们(在这里使用at).而且,正如您所知,更普遍的是,避免在DataFrame中使用列表!

It's still not ideal, but to reiterate there are sooo many edge cases in loc and iloc, that the solution is to be as explicit as possible to avoid them (use at here). And more generally, as you know, avoid using lists inside a DataFrame!

这篇关于将列表插入单元格中-为什么loc在这里实际起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-29 23:37