我正在尝试使用表示7个特征和7个标签的数字浮点数据来适合TensorForestEstimator
模型。即,features
和labels
的形状均为(484876, 7)
。我在num_classes=7
中适当设置了num_features=7
和ForestHParams
。数据格式如下:
f1 f2 f3 f4 f5 f6 f7 l1 l2 l3 l4 l5 l6 l7
39000.0 120.0 65.0 1000.0 25.0 0.69 3.94 39000.0 39959.0 42099.0 46153.0 49969.0 54127.0 55911.0
32000.0 185.0 65.0 1000.0 75.0 0.46 2.19 32000.0 37813.0 43074.0 48528.0 54273.0 60885.0 63810.0
30000.0 185.0 65.0 1000.0 25.0 0.41 1.80 30000.0 32481.0 35409.0 39145.0 42750.0 46678.0 48595.0
调用
fit()
时,Python崩溃,并显示以下消息:这是启用
tf.logging.set_verbosity('INFO')
时的输出:INFO:tensorflow:training graph for tree: 0
INFO:tensorflow:training graph for tree: 1
...
INFO:tensorflow:training graph for tree: 9998
INFO:tensorflow:training graph for tree: 9999
INFO:tensorflow:Create CheckpointSaverHook.
2017-07-26 10:25:30.908894: F tensorflow/contrib/tensor_forest/kernels/count_extremely_random_stats_op.cc:404]
Check failed: column < num_classes_ (39001 vs. 8)
Process finished with exit code 134 (interrupted by signal 6: SIGABRT)
我不确定这个错误是什么意思,因为
num_classes=7
而不是8,并且由于功能和标签的形状是(484876, 7)
,所以它真的没有意义,我不知道39001的来源。这是要重现的代码:
import numpy as np
import pandas as pd
import os
def get_training_data():
training_file = "data.txt"
data = pd.read_csv(training_file, sep='\t')
X = np.array(data.drop('Result', axis=1), dtype=np.float32)
y = []
for e in data.ResultStr:
y.append(list(np.array(str(e).replace('[', '').replace(']', '').split(','))))
y = np.array(y, dtype=np.float32)
features = tf.constant(X)
labels = tf.constant(y)
return features, labels
hyperparameters = ForestHParams(
num_trees=100,
max_nodes=10000,
bagging_fraction=1.0,
num_splits_to_consider=0,
feature_bagging_fraction=1.0,
max_fertile_nodes=0,
split_after_samples=250,
min_split_samples=5,
valid_leaf_threshold=1,
dominate_method='bootstrap',
dominate_fraction=0.99,
# All parameters above are default
num_classes=7,
num_features=7
)
estimator = TensorForestEstimator(
params=hyperparameters,
# All parameters below are default
device_assigner=None,
model_dir=None,
graph_builder_class=RandomForestGraphs,
config=None,
weights_name=None,
keys_name=None,
feature_engineering_fn=None,
early_stopping_rounds=100,
num_trainers=1,
trainer_id=0,
report_feature_importances=False,
local_eval=False
)
estimator.fit(
input_fn=lambda: get_training_data(),
max_steps=100,
monitors=[
TensorForestLossHook(
early_stopping_rounds=30
)
]
)
如果我用
SKCompat
包装它,它也不起作用,会发生相同的错误。崩溃的原因是什么? 最佳答案
需要在regression=True
中指定ForestHParams
,因为默认情况下TensorForestEstimator
假定它用于解决分类问题,该分类问题只能输出一个值。
在估计器初始化时会创建一个隐式的num_outputs
变量,如果未指定1
,则将其设置为regression
。如果指定了regression
,则通常会保存num_outputs = num_classes
和检查点。