问题描述
我发现要遵循ray准则来在ray集群上运行docker映像以执行python脚本非常困难.我发现缺少简单的工作示例.
I am finding it incredibly difficult to follow rays guidelines to running a docker image on a ray cluster in order to execute a python script. I am finding a lack of simple working examples.
所以我有最简单的docker文件:
So I have the simplest docker file:
FROM rayproject/ray
WORKDIR /usr/src/app
COPY . .
CMD ["step_1.py"]
ENTRYPOINT ["python3"]
我用它来创建罐头映像并将其推送到docker hub.("myimage"只是一个例子)
I use this to create can image and push this to docker hub. ("myimage" is just an example)
docker build -t myimage .
docker push myimage
"step_1.py"每秒打印一次hello,持续200秒:
"step_1.py" just prints hello every second for 200 seconds:
import time
for i in range(200):
time.sleep(1)
print("hello")
这是我的config.yaml.再次非常简单:
This is my config.yaml. again very simple:
cluster_name: simple-1
min_workers: 0
max_workers: 2
docker:
image: "myimage"
container_name: "my_simple_docker_container"
pull_before_run: True
idle_timeout_minutes: 5
provider:
type: aws
region: eu-west-2
availability_zone: eu-west-2a
file_mounts_sync_continuously: False
auth:
ssh_user: ubuntu
ssh_private_key: /home/user/.ssh/aws_ubuntu_test.pem
head_node:
InstanceType: c5.2xlarge
ImageId: ami-xxxxx826a6b31fd2c
KeyName: aws_ubuntu_test
BlockDeviceMappings:
- DeviceName: /dev/sda1
Ebs:
VolumeSize: 200
worker_nodes:
InstanceType: c5.2xlarge
ImageId: ami-xxxxx826a6b31fd2c
KeyName: aws_ubuntu_test
InstanceMarketOptions:
MarketType: spot
head_setup_commands:
- pip install boto3==1.4.8
worker_setup_commands: []
head_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --head --port=6379 --object-manager-port=8076 --autoscaling-config=~/ray_bootstrap_config.yaml
worker_start_ray_commands:
- ray stop
- ulimit -n 65536; ray start --address=$RAY_HEAD_IP:6379 --object-manager-port=8076
我在终端机上打了
ray up simple1.yaml:
,并且每次都会出现此错误:
and this error every time:
shared connection to x.x.xx.119 closed.
"docker cp" requires exactly 2 arguments.
See 'docker cp --help'.
Usage: docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|-
docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH
Copy files/folders between a container and the local filesystem
Shared connection to x.x.xx.119 closed.
只需添加docker映像即可在其他任何远程计算机上运行,而不必在ray群集上运行.
Just to add the docker image will run on any other remote machine just fine, just not on the the ray cluster.
如果有人可以帮助我,我将永远感激不已,我甚至承诺在奋斗之后会在中等水平上增加一个教程.
If someone could please help me, I would be eternally grateful, and I will even promise to add a tutorial on medium after my struggles.
推荐答案
我认为问题可能出在使用 ENTRYPOINT
.Ray ClusterLauncher使用大致如下的命令启动docker:
I think the issue might be around using ENTRYPOINT
. The Ray ClusterLauncher starts docker using a command roughly like:
docker run --rm --name <NAME> -d -it --net=host <image_name> bash
当我运行 docker build -t myimage时.
,然后运行 docker run --rm -it myimage bash
,Docker出现以下错误:
When I ran docker build -t myimage .
and then ran docker run --rm -it myimage bash
, Docker errored with:
python3: can't open file 'bash': [Errno 2] No such file or directory
这篇关于使用docker在AWS ray集群上启动简单的python脚本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!