In ETAP we are running VRE including REANA.
We are using NFS storage as one of the advised options.
At some point we noticed that it's not possible to run REANA jobs. All REANA pods were live and ready.
Looking into logs showed that many of the REANA container logs has "Stale file handle" entries.
The triggering problem was that NFS provisioner restarted. But we'd expect that REANA pods identified their problem and restarted.
What do you think? Should there be extra health checks?
In ETAP we are running VRE including REANA.
We are using NFS storage as one of the advised options.
At some point we noticed that it's not possible to run REANA jobs. All REANA pods were live and ready.
Looking into logs showed that many of the REANA container logs has "Stale file handle" entries.
The triggering problem was that NFS provisioner restarted. But we'd expect that REANA pods identified their problem and restarted.
What do you think? Should there be extra health checks?