问题描述
我们知道设置单个单元格的标准方法是使用at
或iat
.但是,我注意到一些有趣的行为,我想知道是否有人可以合理化.
We are aware that the standard method of setting a single cell is using at
or iat
. However, I noticed some interesting behaviour I was wondering if anyone could rationalise.
在解决这个问题时,我遇到了一些奇怪的问题loc
的行为.
In solving this question, I come across some weird behaviour of loc
.
# Setup.
pd.__version__
# '0.24.0rc1'
df = pd.DataFrame({'A': [12, 23], 'B': [['a', 'b'], ['c', 'd']]})
df
A B
0 12 [a, b]
1 23 [c, d]
要设置单元格(1,'B'),只需使用df.at[1, 'B'] = ...
这样的at即可.但是使用loc时,我最初尝试了此方法,但没有成功:
To set cell (1, 'B'), it suffices to do this with at, like df.at[1, 'B'] = ...
. But with loc, I initially tried this, which did not work:
df.loc[1, 'B'] = ['m', 'n', 'o', 'p']
# ValueError: Must have equal len keys and value when setting with an iterable
所以,我尝试了(也失败了)
So, I tried (which also failed)
df.loc[1, 'B'] = [['m', 'n', 'o', 'p']]
# ValueError: Must have equal len keys and value when setting with an ndarray
我认为loc
在某种程度上也可以采用嵌套列表.在一件奇怪的事件中,此代码起作用了:
I thought loc
would also somehow be able to take nested lists here. In a bizarre turn of events, this code worked:
df.loc[1, 'B'] = [['m'], ['n'], ['o'], ['p']]
df
A B
0 12 [a, b]
1 23 [m, n, o, p]
为什么loc
以这种方式工作?此外,如果您将任何其他元素添加到任何列表中,则会失败:
Why does loc
work this way? Additionally, if you add another element to any of the lists, it flops:
df.loc[1, 'B'] = [['m'], ['n'], ['o'], ['p', 'q']]
# ValueError: Must have equal len keys and value when setting with an iterable
空列表也不起作用.将每个元素嵌套在自己的列表中似乎没有意义.
Empty lists don't work either. It seems pointless to have to nest each element in its own list.
为什么loc
这样做?是记录下来的行为还是错误?
Why does loc
do this? Is this documented behaviour, or a bug?
推荐答案
之所以会发生这种情况,是因为loc
执行了 束 检查它支持的所有用例. (注意:历史记录是loc
和iloc
的创建是为了消除ix
的歧义,可追溯到2013 v0.11,但即使到今天,loc
仍然有很多歧义.)
This occurs because loc
does a bunch of checking for all the many usecases which it supports. (Note: The history was that loc
and iloc
were created to remove ambiguity of ix
, way back in 2013 v0.11, but even today there's still a lot of ambiguity in loc
.)
在这种情况下,df.loc[1, 'B']
可以返回:
In this case df.loc[1, 'B']
can either return:
- 单个元素(例如,在这种情况下,当1/'B'具有唯一的索引/列时).
- 一个系列(如果1/'B'之一在索引/列中多次出现).
- 一个DataFrame(如果同时在索引/列中多次出现"1/B").
此外:即使总是第一种情况,在这种情况下iloc
也会遇到相同的问题,但这可能是因为loc和iloc共享此分配代码.
Aside: iloc
suffers the same issue in this case, even though it's always going to be the first case, but that may be because loc and iloc share this assignment code.
因此,熊猫需要支持所有这些情况以进行分配!
So that pandas needs to support all of those cases for assignment!
赋值逻辑的早期部分将列表中的列表转换为numpy数组:
An early part of the assignment logic converts the list (of lists) into a numpy array:
In [11]: np.array(['m', 'n', 'o', 'p']).shape
Out[11]: (4,)
In [12]: np.array([['m', 'n', 'o', 'p']]).shape
Out[12]: (1, 4)
因此,您不能仅传递列表列表并期望获得正确的数组.相反,您可以显式设置为对象数组:
So you can't just pass the list of lists and expect to get the right array. Instead you could to explictly set into an object array:
In [13]: a = np.empty(1, dtype=object)
In [14]: a[0] = ['m', 'n', 'o', 'p']
In [15]: a
Out[15]: array([list(['m', 'n', 'o', 'p'])], dtype=object)
现在您可以在作业中使用它:
Now you can use this in the assignment:
In [16]: df.loc[0, 'B'] = a
In [17]: df
Out[17]:
A B
0 12 [m, n, o, p]
1 23 [c, d]
这仍然不是理想的,但是要重申,在loc
和iloc
中有很多极端情况,解决方案是尽可能明确地避免它们(在这里使用at
).而且,正如您所知,更普遍的是,避免在DataFrame中使用列表!
It's still not ideal, but to reiterate there are sooo many edge cases in loc
and iloc
, that the solution is to be as explicit as possible to avoid them (use at
here). And more generally, as you know, avoid using lists inside a DataFrame!
这篇关于将列表插入单元格中-为什么loc在这里实际起作用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!