我正在尝试使用2个容器,应用程序和X射线作为辅助工具来启动ECS任务。在撰写了几篇博客文章和示例之后,我仍然面临这样的错误:在Xray AWS控制台中看不到任何XRAY段。
显然错误是,应用程序无法连接到X射线侧车以发送X射线段。
应用容器日志中的行显示了以下内容:
[错误]写udp 127.0.0.1:35963->127.0.0.1:2000:写入:连接被拒绝

整体ECS服务/任务设置:

  • ecsTaskRole具有附加的策略AWSXRayDaemonWriteAccess
  • 根据说明,任务定义中的
  • 网络模式为bridge =>,然后我必须在我的应用容器中设置环境变量AWS_XRAY_DAEMON_ADDRESS,并将其链接到xray-daemon容器
  • 使用动态端口映射的
  • :Xray容器中为2000 / tcp和2000 / udp,应用程序
  • 中为9000 / tcp

    您将在下面找到任务定义。我也尝试避免动态端口映射,但是错误是相同的。

    我想念什么,以便可以通过AWS XRay控制台看到我的X射线片段?
  • 是ecsTaskRole的IAM AWSXRayDaemonWriteAccess不够吗?
  • 应用程序容器配置丢失了一些内容,因此它可以通过env属性 AWS_XRAY_DAEMON_ADDRESS 连接到 xray守护程序容器吗?我已经尝试将其设置为其他值,例如0.0.0.0:2000,127.0.0.1:2000 ....哪一种方法可以?


  • ===应用程序容器日志===
    2020-06-06 14:36:36rate: 0.050000
    2020-06-06 14:36:361591446996579912116 [Trace] SamplingStrategy decided: true
    2020-06-06 14:36:361591446996579934272 [Trace] Closing segment named jukebox-front
    2020-06-06 14:36:361591446996579987354 [Error] write udp 127.0.0.1:35963->127.0.0.1:2000: write: connection refused
    2020-06-06 14:36:362020/06/06 12:36:36 ping requested, reponding with HTTP 200
    2020-06-06 14:36:362020/06/06 12:36:36 ping requested, reponding with HTTP 200
    2020-06-06 14:36:361591446996564675990 [Trace] Beginning segment named jukebox-front
    2020-06-06 14:36:361591446996564705526 [Trace] Determining ShouldTrace decision for:
    

    === X射线容器日志===
    2020-06-06 13:27:212020-06-06T11:27:21Z [Info] HTTP Proxy server using X-Ray Endpoint : https://xray.eu-central-1.amazonaws.com
    2020-06-06 13:27:212020-06-06T11:27:21Z [Info] Starting proxy http server on 0.0.0.0:2000
    2020-06-06 13:27:012020-06-06T11:27:01Z [Info] Using region: eu-central-1
    2020-06-06 13:26:412020-06-06T11:26:41Z [Info] Using buffer memory limit of 19 MB
    2020-06-06 13:26:412020-06-06T11:26:41Z [Info] 304 segment buffers allocated
    2020-06-06 13:26:412020-06-06T11:26:41Z [Info] Initializing AWS X-Ray daemon 3.2.0
    

    ECS任务定义如下:
    {
      "executionRoleArn": "arn:aws:iam::xxxxxxxx:role/ecsTaskExecutionRole",
      "taskRoleArn": "arn:aws:iam::xxxxxxxx:role/ecsTaskRole",
      "containerDefinitions": [
        {
          "logConfiguration": {
            "logDriver": "awslogs",
            "secretOptions": null,
            "options": {
              "awslogs-group": "/ecs/td-jukebox",
              "awslogs-region": "eu-central-1",
              "awslogs-stream-prefix": "ecs"
            }
          },
          "portMappings": [
            {
              "hostPort": 0,
              "protocol": "tcp",
              "containerPort": 9000
            }
          ],
          "environment": [
            {
              "name": "AWS_XRAY_DAEMON_ADDRESS",
              "value": "0.0.0.0:2000"
            },
            {
              "name": "PORT",
              "value": "9000"
            }
          ],
          "memoryReservation": 128,
          "image": "<<app-image>>:latest",
          "essential": true,
          "links": [
            "xray-daemon"
          ],
          "name": "jukebox"
        },
        {
          "dnsSearchDomains": null,
          "environmentFiles": null,
          "logConfiguration": {
            "logDriver": "awslogs",
            "secretOptions": null,
            "options": {
              "awslogs-group": "/ecs/td-jukebox",
              "awslogs-region": "eu-central-1",
              "awslogs-stream-prefix": "ecs"
            }
          },
          "portMappings": [
            {
              "hostPort": 0,
              "protocol": "udp",
              "containerPort": 2000
            },
            {
              "hostPort": 0,
              "protocol": "tcp",
              "containerPort": 2000
            }
          ],
          "command": [
            "\"-t\"",
            "\"0.0.0.0:2000\""
          ],
          "cpu": 32,
          "memoryReservation": 256,
          "image": "amazon/aws-xray-daemon",
          "essential": true,
          "name": "xray-daemon"
        }
      ],
      "compatibilities": [
        "EC2"
      ],
      "family": "td-jukebox",
      "requiresCompatibilities": [
        "EC2"
      ],
      "networkMode": "bridge"
     }
    

    解决的任何帮助写道:非常感谢连接被拒绝的错误;)

    ===========

    更新

    我编辑了任务定义,并为启动XRay守护程序添加了其他参数,以增加日志输出中的详细信息(如@shariqmaws所建议):
    -l,dev,-f,/var/log/xray.log,-o,-n,eu-central-1
    

    将服务重新部署并执行到xray-daemon容器后,/ var / log / xray.log包含:
    tail -f /var/log/xray.log
    2020-06-06T16:33:07Z [Debug] ARN of the AWS resource running the daemon:
    2020-06-06T16:33:07Z [Debug] No Metadata set for telemetry records
    2020-06-06T16:33:07Z [Debug] Using Endpoint: https://xray.eu-central-1.amazonaws.com
    2020-06-06T16:33:07Z [Debug] Telemetry initiated
    2020-06-06T16:33:07Z [Info] HTTP Proxy server using X-Ray Endpoint : https://xray.eu-central-1.amazonaws.com
    2020-06-06T16:33:07Z [Debug] Using Endpoint: https://xray.eu-central-1.amazonaws.com
    2020-06-06T16:33:07Z [Debug] Batch size: 50
    2020-06-06T16:33:07Z [Info] Starting proxy http server on 0.0.0.0:2000
    2020-06-06T16:34:07Z [Debug] Skipped telemetry data as no segments found
    2020-06-06T16:35:07Z [Debug] Skipped telemetry data as no segments found
    2020-06-06T16:36:07Z [Debug] Skipped telemetry data as no segments found
    2020-06-06T16:37:07Z [Debug] Skipped telemetry data as no segments found
    2020-06-06T16:38:07Z [Debug] Skipped telemetry data as no segments found
    2020-06-06T16:39:07Z [Debug] Skipped telemetry data as no segments found
    2020-06-06T16:40:07Z [Debug] Skipped telemetry data as no segments found
    

    嗯,不幸的是这里并没有真正发生错误……这使我想到,也许应用程序容器做了一些奇怪的事情才能连接到xray守护程序?!?!

    在应用程序容器中,与XRAY相关的env变量看起来不错:
     env | grep XRAY
    XRAY_DAEMON_PORT_2000_UDP_PROTO=udp
    XRAY_DAEMON_PORT_2000_TCP=tcp://172.17.0.7:2000
    XRAY_DAEMON_PORT_2000_UDP_ADDR=172.17.0.7
    XRAY_DAEMON_ENV_AWS_CONTAINER_CREDENTIALS_RELATIVE_URI=/v2/credentials/0e9e6ca8-c6d1-4f72-975e-aa3d99fdb64e
    XRAY_DAEMON_ENV_ECS_CONTAINER_METADATA_URI_V4=http://169.254.170.2/v4/8a6cd6e9-bda7-4579-8415-471a435d14b4
    XRAY_DAEMON_PORT_2000_UDP_PORT=2000
    AWS_XRAY_DAEMON_ADDRESS=xray-daemon:2000
    XRAY_DAEMON_ENV_ECS_CONTAINER_METADATA_URI=http://169.254.170.2/v3/8a6cd6e9-bda7-4579-8415-471a435d14b4
    XRAY_DAEMON_PORT_2000_TCP_ADDR=172.17.0.7
    XRAY_DAEMON_PORT_2000_UDP=udp://172.17.0.7:2000
    XRAY_DAEMON_PORT_2000_TCP_PORT=2000
    XRAY_DAEMON_PORT=tcp://172.17.0.7:2000
    XRAY_DAEMON_ENV_AWS_EXECUTION_ENV=AWS_ECS_EC2
    XRAY_DAEMON_NAME=/ecs-td-jukebox-13-jukebox-d0ddb483ebf899aee801/xray-daemon
    XRAY_DAEMON_PORT_2000_TCP_PROTO=tcp
    

    IP 172.17.0.7是正确的,与xray守护程序容器的IP匹配。

    ===========

    更新2

    我正在测试应用程序容器中的普通echo命令,以通过执行以下操作通过UDP 2000将数据发送到xray守护程序echo "test data" > /dev/udp/172.17.0.7/2000...而且工作正常...至少发送成功,但X射线段的格式当然是错误的。
    基于我在xray-daemon日志文件中收到的信息:
    2020-06-06T18:53:45Z [Warn] Missing header or segment: test data
    2020-06-06T18:54:07Z [Debug] Send 1 telemetry record(s)
    2020-06-06T18:55:07Z [Debug] Send 1 telemetry record(s)
    

    尽管它说Send 1 telemetry record,但在AWSXray控制台中什么也没出现...仍然没有数据。等待了几分钟之后,出现在xray守护程序日志文件中:
    2020-06-06T19:03:07Z [Debug] Failed to send telemetry 1 record(s). Re-queue records. SerializationError: failed to unmarshal response error
            status code: 400, request id:
    caused by: UnmarshalError: failed decoding error message
            00000000  3c 68 74 6d 6c 3e 0d 0a  3c 68 65 61 64 3e 3c 74  |<html>..<head><t|
    00000010  69 74 6c 65 3e 34 30 30  20 42 61 64 20 52 65 71  |itle>400 Bad Req|
    00000020  75 65 73 74 3c 2f 74 69  74 6c 65 3e 3c 2f 68 65  |uest</title></he|
    00000030  61 64 3e 0d 0a 3c 62 6f  64 79 20 62 67 63 6f 6c  |ad>..<body bgcol|
    00000040  6f 72 3d 22 77 68 69 74  65 22 3e 0d 0a 3c 63 65  |or="white">..<ce|
    00000050  6e 74 65 72 3e 3c 68 31  3e 34 30 30 20 42 61 64  |nter><h1>400 Bad|
    00000060  20 52 65 71 75 65 73 74  3c 2f 68 31 3e 3c 2f 63  | Request</h1></c|
    00000070  65 6e 74 65 72 3e 0d 0a  3c 2f 62 6f 64 79 3e 0d  |enter>..</body>.|
    00000080  0a 3c 2f 68 74 6d 6c 3e  0d 0a                    |.</html>..|
    
    caused by: invalid character '<' looking for beginning of value
    2020-06-06T19:04:07Z [Debug] Send 2 telemetry record(s)
    2020-06-06T19:05:07Z [Debug] Send 1 telemetry record(s)
    2020-06-06T19:06:07Z [Debug] Send 1 telemetry record(s)
    2020-06-06T19:07:07Z [Debug] Send 1 telemetry record(s)
    
    

    我猜想,这是基于我发送的字符串,当然,如果不是有效的X射线片段,则该字符串也是如此。

    ===========

    更新3

    摘录代码,以验证工具化
    package main
    
    import (
        "fmt"
        "io/ioutil"
        "log"
        "net/http"
        "os"
        "strings"
    
        "github.com/aws/aws-xray-sdk-go/xray"
        "github.com/pkg/errors"
    )
    
    ...
    
    func getXRAYAppName() string {
        appName := os.Getenv("XRAY_APP_NAME")
        if appName != "" {
            return appName
        }
        return "front"
    }
    
    
    type pingHandler struct{}
    
    func (h *pingHandler) ServeHTTP(writer http.ResponseWriter, request *http.Request) {
        log.Println("ping requested, responding with HTTP 200")
        writer.WriteHeader(http.StatusOK)
    }
    
    func main() {
        xraySegmentNamer := xray.NewFixedSegmentNamer(getXRAYAppName())
        http.Handle("/ping", xray.Handler(xraySegmentNamer, &pingHandler{}))
        log.Fatal(http.ListenAndServe(":"+getServerPort(), nil))
    }
    

    还有其他提示来解决为什么我的应用程序无法连接到X射线Sidecar吗?!?!

    最佳答案

    您可以尝试更改:

      "environment": [
        {
          "name": "AWS_XRAY_DAEMON_ADDRESS",
          "value": "0.0.0.0:2000"
        },
    


      "environment": [
        {
          "name": "AWS_XRAY_DAEMON_ADDRESS",
          "value": "xray-daemon:2000"
        },
    

    关于amazon-web-services - AWS ECS Xray边车 “write: connection refused”,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/62232386/

    10-11 08:03