我有这个CSV文件:
index empno ename job mgr hiredate sal comm deptno
0, 7839, KING, PRESIDENT, 0, 1981-11-17, 5000, 0, 10
1, 7698, BLAKE, MANAGER, 7839, 1981-05-01, 2850, 0, 30
2, 7782, CLARK, MANAGER, 7839, 1981-05-09, 2450, 0, 10
3, 7566, JONES, MANAGER, 7839, 1981-04-01, 2975, 0, 20
4, 7654, MARTIN, SALESMAN, 7698, 1981-09-10, 1250, 1400, 30
5, 7499, ALLEN, SALESMAN, 7698, 1981-02-11, 1600 300, 30
6, 7844, TURNER, SALESMAN, 7698, 1981-08-21, 1500, 0, 30
7, 7900, JAMES, CLERK, 7698, 1981-12-11, 950, 0, 30
8, 7521, WARD, SALESMAN, 7698, 1981-02-23, 1250, 500, 30
9, 7902, FORD, ANALYST, 7566, 1981-12-11, 3000, 0, 20
10, 7369, SMITH, CLERK, 7902, 1980-12-09, 800, 0, 20
11, 7788, SCOTT, ANALYST, 7566 1982-12-22, 3000, 0, 20
12, 7876, ADAMS, CLERK, 7788, 1983-01-15, 1100, 0, 20
13, 7934, MILLER, CLERK, 7782, 1982-01-11, 1300, 0, 10
使用下面的代码,我得到所有的
emp.csv
:import csv
import sys
import pandas as pd
import dateutil
# Load data from csv file
emp = pd.DataFrame.from_csv("D:\R data\emp.csv")
# Convert date from string to date times`enter code here`
emp['hiredate'] = emp['hiredate'].apply(dateutil.parser.parse, dayfirst=True)
jonessal = emp[['sal']][emp['ename']=='JONES']
empename = emp[['ename','sal']][emp['sal'] > jonessal]
print(empename)
这是代码的输出:
index
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 NaN NaN
9 NaN NaN
10 NaN NaN
11 NaN NaN
12 NaN NaN
13 NaN NaN
我想要的输出是:
index
0 KING 5000
9 FORD 3000
11 SCOTT 3000
我以为变量的值是2975,但结果是
NaN
。如果我用
jonesal
对工资进行硬编码,它可以正常工作,但是当我使用变量时,它会返回所有NaN:NaN
最佳答案
jonessal
是一个数据帧。
emp[['ename','sal']][emp['sal'] > jonessal]
这里,比较
emp['sal'] > jonessal
不是针对标量的,由于brodcast,它返回一个奇怪的数据帧。由于索引/形状不匹配,最终结果由nan组成。这里,你假设只有一个叫琼斯的雇员。遵循相同的假设,可以使用以下命令获取标量:
jonessal = emp.loc[emp['ename']=='JONES', 'sal'].values[0]
(
.values
返回一个数组,[0]
来自单个员工假设。)现在,它将返回相同的结果:
emp[['ename','sal']][emp['sal'] > jonessal]
Out[81]:
ename sal
0 KING 5000
9 FORD 3000
11 SCOTT 3000
关于python - 为什么 bool 索引返回所有NaN,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/38291386/