问题描述
我正在阅读 tensorflow 代码,并遇到了这个答案 tensorflow-using-分布式训练参数服务器
I am reading tensorflow code, and come across this answer tensorflow-using-parameter-servers-in-distributed-training
很想知道详情?
推荐答案
https://www.tensorflow.org/deploy/distributed.
除了 ps
节点没有除 server.join()
之外的代码之外,worker 和参数服务器"之间似乎没有什么区别,这允许其他节点对他们进行操作.在上述文档的示例中,在 worker 上运行的代码在 ps
设备上建立变量,使用它们计算模型,并像本地资源一样优化它们 - 使用基本相同的 使用
机制,就像将操作分配给gpu或cpu一样.
It appears that there's little distinction between workers and "parameter servers" other than that the ps
nodes have no code beyond server.join()
, which allows other nodes to place ops on them. In the example at the above documentation, the code run on the workers establishes variables on the ps
devices, computes models using them, and optimizes them as if they were local resources - using essentially the same with
mechanism as if one were to assign an op to a gpu or cpu.
您是否有兴趣了解更具体的内容?
Is there something more specific you're interested in knowing?
这篇关于tensorflow worker 驱动程序培训过程和导致 ps 作业的变量更新如何?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!