python - 如何为Sagemaker编写Tensorflow KMeans Estimator脚本

我正在尝试将Tensorflows tf.contrib.factorization.KMeansClustering估计器与SageMaker一起使用，但遇到了一些麻烦。我的SageMaker predictor.predict()调用的输出看起来不正确。群集值太大，因为它们应该是0到7之间的整数。（我将群集数设置为8）。

每次运行时，我都会得到类似的输出（数组的后半部为4L或0L之类的其他数字）。数组中有40个值，因为这是多少行（我将它们传递给predict()函数的用户及其等级）

例：
{'outputs': {u'output': {'int64_val': [6L, 0L, 6L, 1L, 2L, 4L, 5L, 7L, 7L, 7L, 7L, 5L, 0L, 1L, 7L, 3L, 3L, 6L, 7L, 3L, 7L, 2L, 6L, 2L, 3L, 7L, 6L, 3L, 3L, 6L, 1L, 2L, 1L, 3L, 7L, 7L, 7L, 3L, 5L, 7L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L], 'dtype': 9, 'tensor_shape': {'dim': [{'size': 100L}]}}}, 'model_spec': {'signature_name': u'serving_default', 'version': {'value': 1534392971L}, 'name': u'generic_model'}}

我正在使用的数据是项目评分的稀疏矩阵，其中rows=users，cols=items，并且单元格的浮点数介于0.0和10之间。因此，我的输入数据是一个矩阵，而不是典型的要素数组。

我认为问题可能出在serve_input_fn函数中。这是我的SageMaker entry_point脚本：

def estimator_fn(run_config, params):
    #feature_columns = [tf.feature_column.numeric_column('inputs', shape=list(params['input_shape']))]
    return tf.contrib.factorization.KMeansClustering(num_clusters=NUM_CLUSTERS,
                            distance_metric=tf.contrib.factorization.KMeansClustering.COSINE_DISTANCE,
                            use_mini_batch=False,
                            feature_columns=None,
                            config=run_config)

def serving_input_fn(params):
    tensor = tf.placeholder(tf.float32, shape=[None, None])
    return tf.estimator.export.build_raw_serving_input_receiver_fn({'inputs': tensor})()

def train_input_fn(training_dir, params):
    """ Returns input function that would feed the model during training """
    return generate_input_fn(training_dir, 'train.csv')


def eval_input_fn(training_dir, params):
    """ Returns input function that would feed the model during evaluation """
    return generate_input_fn(training_dir, 'test.csv')


def generate_input_fn(training_dir, training_filename):
    """ Generate all the input data needed to train and evaluate the model. """
    # Load train/test data from s3 bucket
    train = np.loadtxt(os.path.join(training_dir, training_filename), delimiter=",")
    return tf.estimator.inputs.numpy_input_fn(
        x={'inputs': np.array(train, dtype=np.float32)},
        y=None,
        num_epochs=1,
        shuffle=False)()

在generate_input_fn()中，train是numpy评分矩阵。

如果有帮助，这是我对predict()函数的调用，（ratings_matrix是40 x num_items numpy数组）：

mtx = tf.make_tensor_proto(values=ratings_matrix,
                           shape=list(ratings_matrix.shape), dtype=tf.float32)
result = predictor.predict(mtx)

我觉得问题很简单，我很想念。这是我编写的第一个ML算法，因此将不胜感激。

最佳答案

感谢javadba的回答！

我对机器学习或TensorFlow的看法不是很好，所以请纠正我。但是，您似乎可以与SageMaker集成，但是预测并不是您所期望的。

最终，SageMaker与EstimatorSpec一起运行train_and_evaluate进行培训，并使用TensorFlow Serving进行预测。它没有任何其他隐藏的功能，因此使用TensorFlow估计器从KMeans预测中获得的结果将独立于SageMaker。但是，它可能受您如何定义serving_input_fn和output_fn的影响。

当您使用相同的设置在SageMaker生态系统之外运行相同的估算器时，您是否获得了期望格式的预测？

SageMaker TensorFlow的经验在这里开源，并显示了什么是可能的，现在还没有。
https://github.com/aws/sagemaker-tensorflow-container

关于python - 如何为Sagemaker编写Tensorflow KMeans Estimator脚本，我们在Stack Overflow上找到一个类似的问题：https://stackoverflow.com/questions/51888996/