--userns=keep-id storage-chown-by-maps kills machine with large images

/kind bug

**Description**

While migrating a CI from Docker to Podman, I'm occasionally stumbling upon freezes of Podman commands. They may take dozens *(!!!)* of minutes, with Podman not doing anything at all.

The hangs aren't specific to any commands. E.g. right as I'm writing this text, I see two jobs, one with `podman run …` and another with `podman inspect` both frozen. So I connected to the server with ssh and trying running a `time podman inspect foobar` *(literally a request for non-existing `foobar` image)*, and it hanged as well. `podman ps` hangs, and `podman version` even hangs!!

Basically, to be able to create this report I had to kill a podman process. I had 2 `podman run` processes and 2 `podman inspect`s. I killed one of `podman inspect` processes, and a little later CI finally proceeded and podman commands started working.

**Steps to reproduce the issue:**

~~I'm afraid I couldn't find any. It seems to be happening when multiple podman processes are run, but my attempts simulating that in different ways didn't succeed. It just happens from time to time as part of CI, in which case CI basically breaks completely.~~

Steps to reproduce were found [as part of this duplicate issue](https://github.com/containers/podman/issues/16830) and are copied below:

> 1. This is the "fairly large" image:
>     ```
>     podman pull ghcr.io/martinpitt/swaypod:latest
>     time podman create --userns=keep-id ghcr.io/martinpitt/swaypod:latest
>     ```
> 2. This is the image that adds TeXlive (which makes it a few hundred MB larger):
>
>     ```
>     podman pull ghcr.io/martinpitt/swaypod:allpkgs
>     time podman create --userns=keep-id ghcr.io/martinpitt/swaypod:allpkgs
>     ```
> **Describe the results you received:**
>
> Step 1 takes 4 s on a Fedora 37 cloud VM (2 CPUs, 4 GiB RAM) with the default btrfs. On a standard RHEL 9.2 VM with XFS and on my laptop's Fedora 37 VM with /home being on ext4, it takes about 20 seconds. In `top` I see a process called "exe" which is taking 100% CPU:
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
>     972 admin     20   0 1351936  65172  28028 S  96.0   1.7   0:12.33 exe
> That is really this:
>
>     admin       1972 95.0  1.3 1351680 49344 pts/0   Sl+  04:04   0:01 storage-chown-by-maps /home/admin/.local/share/containers/storage/overlay/3cc2d72c07248c18a9185b6a5bba0e7932b0ce5c26dbc763e476eb50c2a7ea94/merged
> With the larger image in step 2, the Fedora 37 btrfs VM takes merely 6s. However, both on the RHEL 9.2 XFS VM as well as my ext4 real-iron Fedora 37 laptop, the storage-chown-by-maps process never ends. After maybe half a minute it kills the VM (ssh dead, cannot log into the virsh console either), and my laptop becomes really sluggish, I cannot even start top any more. Trying to `kill -9` or even sudo `kill -9` (!) that `storage-chown-by-maps` does not work either, it's just unkillable.
>
> **Describe the results you expected:**
>
> The `storage-chown-by-maps` process should finish eventually, but ideally reasonably fast. This is more or less a glorified `chown -R`, no? that shouldn't take more than a few seconds.


**Output of `podman version`:**

```
Client:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.18.1
Built:        Thu Jan  1 00:00:00 1970
OS/Arch:      linux/amd64
```




<details>
    <summary> <b>Output of <code>podman info</code>:</b> </summary>

```
host:
  arch: amd64
  buildahVersion: 1.28.0
  cgroupControllers:
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon_2:2.1.5-0ubuntu22.04+obs14.3_amd64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.5, commit: '
  cpuUtilization:
    idlePercent: 96.22
    systemPercent: 0.76
    userPercent: 3.03
  cpus: 4
  distribution:
    codename: jammy
    distribution: ubuntu
    version: "22.04"
  eventLogger: file
  hostname: node29
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 998
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 998
      size: 1
    - container_id: 1
      host_id: 10000
      size: 65536
  kernel: 5.15.0-52-generic
  linkmode: dynamic
  logDriver: k8s-file
  memFree: 1288302592
  memTotal: 67404197888
  networkBackend: netavark
  ociRuntime:
    name: crun
    package: crun_1.7-0ubuntu22.04+obs47.1_amd64
    path: /usr/bin/crun
    version: |-
      crun version 1.7
      commit: 40d996ea8a827981895ce22886a9bac367f87264
      rundir: /run/user/998/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/998/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns_1.2.0-0ubuntu22.04+obs10.15_amd64
    version: |-
      slirp4netns version 1.2.0
      commit: 656041d45cfca7a4176f6b7eed9e4fe6c11e8383
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.3
  swapFree: 8575172608
  swapTotal: 8589930496
  uptime: 223h 19m 11.00s (Approximately 9.29 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/gitlab-runner/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/gitlab-runner/.local/share/containers/storage
  graphRootAllocated: 983350071296
  graphRootUsed: 645746360320
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 202
  runRoot: /tmp/podman-run-998/containers
  volumePath: /home/gitlab-runner/.local/share/containers/storage/volumes
version:
  APIVersion: 4.3.1
  Built: 0
  BuiltTime: Thu Jan  1 00:00:00 1970
  GitCommit: ""
  GoVersion: go1.18.1
  Os: linux
  OsArch: linux/amd64
  Version: 4.3.1
```
</details>


**Package info:**

```
$ apt list podman
Listing... Done
podman/unknown,now 4:4.3.1-0ubuntu22.04+obs64.3 amd64 [installed]
podman/unknown 4:4.3.1-0ubuntu22.04+obs64.3 arm64
podman/unknown 4:4.3.1-0ubuntu22.04+obs64.3 armhf
podman/unknown 4:4.3.1-0ubuntu22.04+obs64.3 s390x
```

**Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/main/troubleshooting.md)**


Yes


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--userns=keep-id storage-chown-by-maps kills machine with large images #16541

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

--userns=keep-id storage-chown-by-maps kills machine with large images #16541

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions