I realise how unlikely the title to this issue seems but if there is an obvious error in my set up I can't spot it. I want to run Fabio as a Nomad managed service using the Nomad system scheduler (type = "system"). When I do then any subsequent pulls from our private Docker registry fails with the error
failed to register layer: open /dev/mapper/docker-202:32-786433-35e363b33db58a87d6a55b19f3297715b9978052e70edec86f03b51af3e44455: no such file or directory
From that point on I am not able to recover Docker.
Some details about our set up:
Ubuntu 14-04
Kernel = 3.13.0-53-generic
Docker = 1.12.2
Nomad = 0.5.0
Fabio = 1.3.4
I have a 3 x servers with 2 x clients. I am trying to run Fabio using the exec driver and the system scheduler. I am running Nomad as the root user on which I believe is required for the exec driver.
I do not see the issue if I run Fabio using the service scheduler.
I do not see the issue if I run a Docker container using the system scheduler .
I do not see the issue if I run another job (sleep binary) using the system scheduler.
I do not see the issue if I run Fabio using the system scheduler but using the raw_exec driver.
Docker is using the LVM storage option but I see the same issue if I drop back to the devicemapper storage option.
Below is a repeatable test case. After that are copies of the job specs used in the test case.
-
Go to Nomad user
ubuntu@ip-10-75-70-27:~$ sudo su - nomad
-
Software versions
$ uname -a
Linux ip-10-75-70-27 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ docker --version
Docker version 1.12.2, build bb80604
$ nomad version
Nomad v0.5.0
- Nomad running as root with no running jobs
$ ps -ef | grep nomad
root 17416 1 0 12:57 ? 00:00:00 /usr/local/bin/nomad agent -config /etc/nomad.d/config.json -rejoin -node=nomad_client_poc1
consul 17540 1 0 12:57 ? 00:00:00 /usr/local/bin/consul agent -config-file /etc/consul.d/config.json -rejoin -node nomad_client_poc1
root 17617 2602 0 12:58 pts/1 00:00:00 sudo su - nomad
- Demonstrate Docker pull
$ docker pull dockerregistry.adm.myprivatecloud.net/bti1003:latest
latest: Pulling from bti1003
5a132a7e7af1: Pull complete
fd2731e4c50c: Pull complete
28a2f68d1120: Pull complete
a3ed95caeb02: Pull complete
87f9029820c8: Pull complete
7582f6d126ab: Pull complete
Digest: sha256:6d7379af49cc17cc8a0055e06c4cb8374e5be73fe42ce2e8f1abca013c50a62a
Status: Downloaded newer image for dockerregistry.adm.myprivatecloud.net/bti1003:latest
- Remove pulled image
$ docker rmi dockerregistry.adm.myprivatecloud.net/bti1003:latest
Untagged: dockerregistry.adm.myprivatecloud.net/bti1003:latest
Untagged: dockerregistry.adm.myprivatecloud.net/bti1003@sha256:6d7379af49cc17cc8a0055e06c4cb8374e5be73fe42ce2e8f1abca013c50a62a
Deleted: sha256:3fee2600d434e469b6d4ac0e468bd988ebc105536886d6624dc9566577fcafbe
Deleted: sha256:e5fcd939dd4a2a9b9543dea61ca90d2def7c92cd983108916895a39f239799b8
Deleted: sha256:57bd7c9432ae86d63f2342e442eebd0f4dfc340ca61c6a4c7d702b17a315865f
Deleted: sha256:0aaccda2aadfc70ab2248437568fd17f4e8860cf612cc4b7e154b97222dccf91
Deleted: sha256:9dcfe19e941956c63860afee1bec2e2318f6fbd336bc523094ed609a9c437a01
Deleted: sha256:6ff1ee6fc8a0358aeb92f947fb7125cd9e3d68c05be45f5375cb59b98c850b4d
Deleted: sha256:56abdd66ba312859b30b5629268c30d44a6bbef6e2f0ebe923655092855106e8
- Run 'sleep' test job
$ ps -ef | grep nomad.*executor
root 17897 17416 6 13:00 ? 00:00:02 /usr/local/bin/nomad executor /var/nomad/alloc/87a48081-5d8e-b1b1-1538-a1e79a3f4152/sleep-task/sleep-task-executor.out
nomad 18299 17619 0 13:07 pts/1 00:00:00 grep nomad.*executor
- Pull Docker image
$ docker pull dockerregistry.adm.myprivatecloud.net/bti1003:latest
latest: Pulling from bti1003
5a132a7e7af1: Pull complete
fd2731e4c50c: Pull complete
28a2f68d1120: Pull complete
a3ed95caeb02: Pull complete
87f9029820c8: Pull complete
7582f6d126ab: Pull complete
Digest: sha256:6d7379af49cc17cc8a0055e06c4cb8374e5be73fe42ce2e8f1abca013c50a62a
Status: Downloaded newer image for dockerregistry.adm.myprivatecloud.net/bti1003:latest
- Remove pulled image
$ docker rmi dockerregistry.adm.myprivatecloud.net/bti1003:latest
Untagged: dockerregistry.adm.myprivatecloud.net/bti1003:latest
Untagged: dockerregistry.adm.myprivatecloud.net/bti1003@sha256:6d7379af49cc17cc8a0055e06c4cb8374e5be73fe42ce2e8f1abca013c50a62a
Deleted: sha256:3fee2600d434e469b6d4ac0e468bd988ebc105536886d6624dc9566577fcafbe
Deleted: sha256:e5fcd939dd4a2a9b9543dea61ca90d2def7c92cd983108916895a39f239799b8
Deleted: sha256:57bd7c9432ae86d63f2342e442eebd0f4dfc340ca61c6a4c7d702b17a315865f
Deleted: sha256:0aaccda2aadfc70ab2248437568fd17f4e8860cf612cc4b7e154b97222dccf91
Deleted: sha256:9dcfe19e941956c63860afee1bec2e2318f6fbd336bc523094ed609a9c437a01
Deleted: sha256:6ff1ee6fc8a0358aeb92f947fb7125cd9e3d68c05be45f5375cb59b98c850b4d
Deleted: sha256:56abdd66ba312859b30b5629268c30d44a6bbef6e2f0ebe923655092855106e8
- Stop 'sleep' job
$ ps -ef | grep nomad.*executor
nomad 18239 17619 0 13:06 pts/1 00:00:00 grep nomad.*executor
- Start Fabio job
$ ps -ef | grep nomad.*executor
root 18262 17416 33 13:07 ? 00:00:04 /usr/local/bin/nomad executor /var/nomad/alloc/5729f45b-185c-fa7b-7b05-866a774b8c73/fabio-task/fabio-task-executor.out
nomad 18299 17619 0 13:07 pts/1 00:00:00 grep nomad.*executor
- Pull docker image
$ docker pull dockerregistry.adm.myprivatecloud.net/bti1003:latest
latest: Pulling from bti1003
5a132a7e7af1: Extracting [==================================================>] 65.69 MB/65.69 MB
fd2731e4c50c: Download complete
28a2f68d1120: Download complete
a3ed95caeb02: Download complete
87f9029820c8: Download complete
7582f6d126ab: Download complete
failed to register layer: open /dev/mapper/docker-202:32-786433-35e363b33db58a87d6a55b19f3297715b9978052e70edec86f03b51af3e44455: no such file or directory
- Fabio job dies (10 minutes later), from syslog
Nov 23 13:17:56 ip-10-75-70-27 nomad[17416]: driver.exec: error destroying executor: 1 error(s) occurred:#012#012* 1 error(s) occurred:#012#012* failed to unmou
nt shared alloc dir "/var/nomad/alloc/5729f45b-185c-fa7b-7b05-866a774b8c73/fabio-task/alloc": invalid argument
Nov 23 13:17:57 ip-10-75-70-27 nomad[17416]: client: failed to destroy context for alloc '5729f45b-185c-fa7b-7b05-866a774b8c73': 2 error(s) occurred:#012#012* 1 error(s) occurred:#012#012* failed to remove the secret dir "/var/nomad/alloc/5729f45b-185c-fa7b-7b05-866a774b8c73/fabio-task/secrets": unmount: invalid argument#012* remove /var/nomad/alloc/5729f45b-185c-fa7b-7b05-866a774b8c73/fabio-task: directory not empty
From Docker log
time="2016-11-23T13:07:59.287783575Z" level=error msg="Error trying v2 registry: failed to register layer: open /dev/mapper/docker-202:32-786433-35e363b33db58a87d6a55b19f3297715b9978052e70edec86f03b51af3e44455: no such file or directory"
time="2016-11-23T13:07:59.287830271Z" level=error msg="Attempting next endpoint for pull after error: failed to register layer: open /dev/mapper/docker-202:32-786433-35e363b33db58a87d6a55b19f3297715b9978052e70edec86f03b51af3e44455: no such file or directory"
Fabio Job Spec
job "fabio-job" {
region = "eu"
datacenters = ["vpc-poc"]
type = "system"
update {
stagger = "5s"
max_parallel = 1
}
group "fabio-group" {
ephemeral_disk {
size = "500"
}
task "fabio-task" {
driver = "exec"
config {
command = "fabio-1.3.4-go1.7.3-linux_amd64"
}
artifact {
source = "https://github.com/eBay/fabio/releases/download/v1.3.4/fabio-1.3.4-go1.7.3-linux_amd64"
}
logs {
max_files = 2
max_file_size = 5
}
resources {
cpu = 500
memory = 64
network {
mbits = 1
port "http" {
static = 9999
}
port "ui" {
static = 9998
}
}
}
}
}
}
Sleep Job Spec
job "sleep-job" {
region = "eu"
datacenters = ["vpc-poc"]
type = "system"
update {
stagger = "5s"
max_parallel = 1
}
group "sleep-group" {
ephemeral_disk {
size = "500"
}
task "sleep-task" {
driver = "exec"
config {
command = "/bin/sleep"
args = ["1000"]
}
logs {
max_files = 2
max_file_size = 5
}
resources {
cpu = 500
memory = 64
network {
mbits = 1
}
}
}
}
}
I realise how unlikely the title to this issue seems but if there is an obvious error in my set up I can't spot it. I want to run Fabio as a Nomad managed service using the Nomad system scheduler (type = "system"). When I do then any subsequent pulls from our private Docker registry fails with the error
failed to register layer: open /dev/mapper/docker-202:32-786433-35e363b33db58a87d6a55b19f3297715b9978052e70edec86f03b51af3e44455: no such file or directoryFrom that point on I am not able to recover Docker.
Some details about our set up:
Ubuntu 14-04
Kernel = 3.13.0-53-generic
Docker = 1.12.2
Nomad = 0.5.0
Fabio = 1.3.4
I have a 3 x servers with 2 x clients. I am trying to run Fabio using the exec driver and the system scheduler. I am running Nomad as the root user on which I believe is required for the exec driver.
I do not see the issue if I run Fabio using the service scheduler.
I do not see the issue if I run a Docker container using the system scheduler .
I do not see the issue if I run another job (sleep binary) using the system scheduler.
I do not see the issue if I run Fabio using the system scheduler but using the raw_exec driver.
Docker is using the LVM storage option but I see the same issue if I drop back to the devicemapper storage option.
Below is a repeatable test case. After that are copies of the job specs used in the test case.
Go to Nomad user
ubuntu@ip-10-75-70-27:~$ sudo su - nomadSoftware versions
From Docker log
Fabio Job Spec
Sleep Job Spec