我尝试为基于Prometheus + node-exporter + grafana + cAdvisor的Openshift集群实现监视解决方案。

我对cAdvisor组件有很大的疑问。我做了很多配置(更改始终与卷有关),但是它们都不起作用,容器每隔2分钟重新启动一次,或者没有收集所有数据指标(进程)

配置示例(使用此配置容器不会每隔2分钟重新启动,但不会收集进程)我知道,我的卷中没有/ rootfs,但是此容器的工作方式类似于5s并关闭:

containers:
    - image: >-
        google/cadvisor@sha256:fce642268068eba88c27c666e92ed4144be6188447a23825015884741cf0e352
      imagePullPolicy: IfNotPresent
      name: cadvisor-new-version
      ports:
        - containerPort: 8080
          protocol: TCP
      resources: {}
      securityContext:
        privileged: true
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
        - mountPath: '/sys/fs/cgroup/cpuacct,cpu'
          name: sys
          readOnly: true
        - mountPath: /var/lib/docker
          name: docker
          readOnly: true
        - mountPath: /var/run/containerd/containerd.sock
          name: docker-socketd
          readOnly: true
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: cadvisor-sa
  serviceAccountName: cadvisor-sa
  terminationGracePeriodSeconds: 300
  volumes:
    - hostPath:
        path: '/sys/fs/cgroup/cpu,cpuacct'
      name: sys
    - hostPath:
        path: /var/lib/docker
      name: docker
    - hostPath:
        path: /var/run/containerd/containerd.sock
      name: docker-socketd

我在具有scc特权的OS项目中使用了服务帐户。
  • Openshift版本-3.6
  • Docker版本-1.12
  • cAdvisor版本-我尝试了从v0.26.3到最新的
  • 的每个版本

    我找到了一个帖子,问题可能是docker的旧版本,有人可以确认吗?

    也许有人在Openshift上进行了正确的配置并实现了cAdvisor?

    日志示例:
    I0409 08:41:46.661453       1 manager.go:231] Version:
     {KernelVersion:3.10.0-693.17.1.el7.x86_64 ContainerOsVersion:Alpine Linux v3.4 DockerVersion:1.12.6 DockerAPIVersion:1.24 CadvisorVersion:v0.28.3 CadvisorRevision:1e567c2}
    E0409 08:41:50.823560       1 factory.go:340] devicemapper filesystem stats will not be reported: usage of thin_ls is disabled to preserve iops
    I0409 08:41:50.825280       1 factory.go:356] Registering Docker factory
    I0409 08:41:50.826394       1 factory.go:54] Registering systemd factory
    I0409 08:41:50.826949       1 factory.go:86] Registering Raw factory
    I0409 08:41:50.827388       1 manager.go:1178] Started watching for new ooms in manager
    I0409 08:41:50.838169       1 manager.go:329] Starting recovery of all containers
    W0409 08:41:56.853821       1 container.go:393] Failed to create summary reader for "/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podc323db44_39a9_11e8_accd_005056800e7b.slice/docker-26db795af0fa28047f04194d8169cf0249edf2c918c583422a1404d35ed5b62c.scope": none of the resources are being tracked.
    I0409 08:42:03.953261       1 manager.go:334] Recovery completed
    I0409 08:42:37.874062       1 cadvisor.go:162] Starting cAdvisor version: v0.28.3-1e567c2 on port 8080
    I0409 08:42:56.353574       1 fsHandler.go:135] du and find on following dirs took 1.20076874s: [ /rootfs/var/lib/docker/containers/2afa2c457a9c1769feb6ab542102521d8ad51bdeeb89581e4b7166c1c93e7522]; will not log again for this container unless duration exceeds 2s
    I0409 08:42:56.453602       1 fsHandler.go:135] du and find on following dirs took 1.098795382s: [ /rootfs/var/lib/docker/containers/65e4ad3536788b289e2b9a29e8f19c66772b6f38ec10d34a2922e4ef4d67337f]; will not log again for this container unless duration exceeds 2s
    I0409 08:42:56.753070       1 fsHandler.go:135] du and find on following dirs took 1.400184357s: [ /rootfs/var/lib/docker/containers/2b0aa12a43800974298a7d0353c6b142075d70776222196c92881cc7c7c1a804]; will not log again for this container unless duration exceeds 2s
    I0409 08:43:00.352908       1 fsHandler.go:135] du and find on following dirs took 1.199079344s: [ /rootfs/var/lib/docker/containers/aa977c2cc6105e633369f48e2341a6363ce836cfbe8e7821af955cb0cf4d5f26]; will not log again for this container unless duration exceeds 2s
    

    最佳答案

    OpenShift的kubelet中嵌入了一个cAdvisor进程。可能是由于某种竞争状况导致 pods 坠毁。

    08-19 09:51