Tensorflow对象检测培训被杀死，资源匮乏?

本文介绍了Tensorflow对象检测培训被杀死，资源匮乏?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在此处，部分问题被问到了和此处，且没有后续操作，因此也许这不是举办会议的场所问这个问题，但是我想出了更多的信息，希望可以得到这些问题的答案.

This question has partially been asked here and here with no follow-ups, so maybe this is not the venue to ask this question, but I've figured out a little more information that I'm hoping might get an answer to these questions.

我一直在尝试在我自己的大约一千张照片的照片库上训练object_detection.我一直在使用提供的管道配置文件"ssd_inception_v2_pets.config".我相信，我已经正确设置了训练数据.该程序似乎可以开始训练.当它无法读取数据时，它会发出错误警报，而我已修复了该错误.

I've been attempting to train object_detection on my own library of roughly 1k photos. I've been using the provided pipeline config file "ssd_inception_v2_pets.config".And I've set up the training data properly, I believe. The program appears to start training just fine. When it couldn't read the data, it alerted with an error, and I fixed that.

我的train_config设置如下，尽管我更改了一些数字，以尝试使其以更少的资源运行.

My train_config settings are as follows, though I've changed a few of the numbers in order to try and get it to run with fewer resources.

train_config: {
  batch_size: 1000 #also tried 1, 10, and 100
  optimizer {
    rms_prop_optimizer: {
      learning_rate: {
        exponential_decay_learning_rate {
          initial_learning_rate: 0.04  # also tried .004
          decay_steps: 800 # also tried 800720. 80072
          decay_factor: 0.95
        }
      }
      momentum_optimizer_value: 0.9
      decay: 0.9
      epsilon: 1.0
    }
  }
  fine_tune_checkpoint: "~/Downloads/ssd_inception_v2_coco_11_06_2017/model.ckpt" #using inception checkpoint
  from_detection_checkpoint: true
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    ssd_random_crop {
    }
  }
}

基本上，我认为正在发生的事情是计算机正在非常快地耗尽资源，而我想知道是否有人进行过优化而花费了更多的时间来构建，却使用了更少的资源?

Basically, what I think is happening is that the computer is getting resource starved very quickly, and I'm wondering if anyone has an optimization that takes more time to build, but uses fewer resources?

或者我对为什么进程被终止感到不对，我是否有办法从内核中获取有关该进程的更多信息?

OR am I wrong about why the process is getting killed, and is there a way for me to get more information about that from the kernel?

这是我被杀死后的Dmesg信息.

This is the Dmesg information that I get after the process is killed.

[711708.975215] Out of memory: Kill process 22087 (python) score 517 or sacrifice child
[711708.975221] Killed process 22087 (python) total-vm:9086536kB, anon-rss:6114136kB, file-rss:24kB, shmem-rss:0kB

the

Tensorflow对象检测培训被杀死，资源匮乏?

问题描述

推荐答案