python - 使用iloc进行索引

现在，通过kaggle教程，虽然我了解其功能的基本概念，但从输出结果和阅读文档中可以看出，我认为我需要确认此处的情况：

predictors = ["Pclass", "Sex", "Age", "SibSp", "Parch", "Fare", "Embarked"]
kf = KFold(titanic.shape[0], n_folds=3, random_state=1)

predictions = []

for train, test in kf:
     train_predictors = (titanic[predictors].iloc[train,:])

我的主要问题是iloc函数的最后一行。其余仅用于上下文。它只是将训练数据分开吗？

最佳答案

.iloc[]是访问row和column的pandas和DataFrames索引（或Series，在这种情况下仅是index）的主要方法。 in the Indexing docs对此进行了很好的解释。

在这种情况下，从scikit-learn docs：

  KFold将所有样本划分为k个样本组，称为折叠
  （如果k = n，则等效于“留一法”策略），等于
  大小（如果可能）。使用k-1学习预测函数
  折叠，剩下的折叠用于测试。 2折示例
  对具有4个样本的数据集进行交叉验证：

import numpy as np
from sklearn.cross_validation import KFold

kf = KFold(4, n_folds=2)
for train, test in kf:
    print("%s %s" % (train, test))
[2 3] [0 1] [0 1] [2 3]

换句话说，KFold选择index位置，这些位置在for的kf循环中使用并传递给.iloc，以便从中选择适当的row index（以及所有columns）包含训练集的titanic[predictors] DataFrame。

关于python - 使用iloc进行索引，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/34200874/