Skip to content

pytorch hangs up in multiprocess test #25753

@puneet336

Description

@puneet336

,
we are trying to build pytorch on RHEL 9.5 usingh PyTorch-2.3.0-foss-2023b.eb , and the pytorch hangs up -
This is EasyBuild 5.1.0 (framework: 5.1.0, easyblocks: 5.1.0)

[user@server pytorch]$ eb PyTorch-2.3.0-foss-2023b.eb --robot
== Temporary log file in case of crash /scratch/tmp/eb-lwx7vib8/easybuild-9ttse133.log
== found valid index for/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs, so using it...
== found valid index for/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs, so using it...
== resolving dependencies ...
== processing EasyBuild easyconfig /home/singhpuv/MySoftware/busdev/pytorch/PyTorch-2.3.0-foss-2023b.eb
== building and installing PyTorch/2.3.0-foss-2023b...
  >> installation prefix: /CHBS/apps/busdev_apps/eb/software/PyTorch/2.3.0-foss-2023b
== fetching files and verifying checksums...
  >> download succeeded: https://github.com/pytorch/PyTorch/releases/download/v2.3.0/pytorch-v2.3.0.tar.gz
  >> sources:
  >> /CHBS/apps/busdev_apps/eb/sources/p/PyTorch/pytorch-v2.3.0.tar.gz [SHA256: 69579513b26261bbab32e13b7efc99ad287fcf3103087f2d4fdf1adacd25316f]
  >> patches:
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.7.0_disable-dev-shm-test.patch [SHA256: 622cb1eaeadc06e13128a862d9946bcc1f1edd3d02b259c56a9aecc4d5406b8a]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.12.1_add-hypothesis-suppression.patch [SHA256: e71ffb94ebe69f580fa70e0de84017058325fdff944866d6bd03463626edc32c]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch [SHA256: 1efc9850c431d702e9117d4766277d3f88c5c8b3870997c9974971bce7f2ab83]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.12.1_fix-TestTorch.test_to.patch [SHA256: 75f27987c3f25c501e719bd2b1c70a029ae0ee28514a97fe447516aee02b1535]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.12.1_skip-test_round_robin.patch [SHA256: 63d4849b78605aa088fdff695637d9473ea60dee603a3ff7f788690d70c55349]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.13.1_fix-gcc-12-warning-in-fbgemm.patch [SHA256: 5c7be91a6096083a0b1315efe0001537499c600f1f569953c6a2c7f4cc1d0910]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.13.1_fix-protobuf-dependency.patch [SHA256: 8bd755a0cab7233a243bc65ca57c9630dfccdc9bf8c9792f0de4e07a644fcb00]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.13.1_fix-warning-in-test-cpp-api.patch [SHA256: bdde0f2105215c95a54de64ec4b1a4520528510663174fef6d5b900eb1db3937]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.13.1_skip-failing-singular-grad-test.patch [SHA256: 72688a57b2bb617665ad1a1d5e362c5111ae912c10936bb38a089c0204729f48]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-1.13.1_skip-tests-without-fbgemm.patch [SHA256: 481e595f673baf8ae58b41697a6792b83048b0264aa79b422f48cd8c22948bb7]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.0.1_avoid-test_quantization-failures.patch [SHA256: 02e3f47e4ed1d7d6077e26f1ae50073dc2b20426269930b505f4aefe5d2f33cd]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.0.1_fix-skip-decorators.patch [SHA256: 2039012cef45446065e1a2097839fe20bb29fe3c1dcc926c3695ebf29832e920]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.0.1_fix-vsx-loadu.patch [SHA256: a0ffa61da2d47c6acd09aaf6d4791e527d8919a6f4f1aa7ed38454cdcadb1f72]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.0.1_skip-failing-gradtest.patch [SHA256: 8030bdec6ba49b057ab232d19a7f1a5e542e47e2ec340653a246ec9ed59f8bc1]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch [SHA256: 7047862abc1abaff62954da59700f36d4f39fcf83167a638183b1b7f8fec78ae]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch [SHA256: 166c134573a95230e39b9ea09ece3ad8072f39d370c9a88fb2a1e24f6aaac2b5]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch [SHA256: 3793b4b878be1abe7791efcbd534774b87862cfe7dc4774ca8729b6cabb39e7e]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch [SHA256: aef38adf1210d0c5455e91d7c7a9d9e5caad3ae568301e0ba9fc204309438e7b]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.0_remove-test-requiring-online-access.patch [SHA256: 35184b8c5a1b10f79e511cc25db3b8a5585a5d58b5d1aa25dd3d250200b14fd7]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.0_skip-diff-test-on-ppc.patch [SHA256: 394157dbe565ffcbc1821cd63d05930957412156cc01e949ef3d3524176a1dda]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.0_skip-dynamo-test_predispatch.patch [SHA256: 6298daf9ddaa8542850eee9ea005f28594ab65b1f87af43d8aeca1579a8c4354]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch [SHA256: 5229ca88a71db7667a90ddc0b809b2c817698bd6e9c5aaabd73d3173cf9b99fe]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch [SHA256: 7ace835af60c58d9e0754a34c19d4b9a0c3a531f19e5d0eba8e2e49206eaa7eb]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch [SHA256: fb96eefabf394617bbb3fbd3a7a7c1aa5991b3836edc2e5d2a30e708bfe49ba1]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch [SHA256: 23416f2d9d5226695ec3fbea0671e3650c655c19deefd3f0f8ddab5afa50f485]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch [SHA256: 0dcbdfde6752c3ff54c5376f521b4a742167669feb7f0f1d4e1d4d55f72b664f]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_fix-cpuinfo-bug-with-smt.patch [SHA256: 29fb95d1dba070133b513de050febd328ed36905a73f1ca135dc633f16beafa4]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_skip-test_init_from_local_shards.patch [SHA256: 90ed9c2870f57ee6dc032d00873a37e2217a2b92a13035ded1c25ad5306455f2]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_no-cuda-stubs-rpath.patch [SHA256: 7ba26824b5def7379cff02ae821a080698e6affea0da45bc846e9ecb89939cb1]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_disable-gcc12-warning.patch [SHA256: a8a624e1a2a5f4c82610173e50bd0f853e49bd5621b432f5aac689f9f6eb1514]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch [SHA256: 36aa2d5ba175be17f4e996f4fb2d544fe477d4a0bd0644cd59a85063779afc8e]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_disable_tests_which_need_network_download.patch [SHA256: b7fd1a5135dfd4098cdc054182f7bf84a23ac98462a00477712182b5442da855]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch [SHA256: 041adcd91d994b8c2ab57d227f081cd57e572c157117b37171e1eb8eb576f8fc]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch [SHA256: aa6ff764f3f7bf84372a8a257fe1b4ae6dc4b9744ad35f0f9015f2696c62a41e]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_skip_test_var_mean_differentiable.patch [SHA256: 9703fd0f1fca8916f6d79d83e9a7efe8e3f717362a5fdaa8f5d9da90d0c75018]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch [SHA256: 7955f2655db3da18606574fdcbc5990be24098f49ad1db5e86ea756ea1cc506f]
  >>/EB_TOP/software/EasyBuild/5.1.0/easybuild/easyconfigs/p/PyTorch/PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch [SHA256: ee07d21c3ac7aeb0bd0e39507b18a417b9125284a529102929c4b5c6727c2976]
== ... (took 13 secs)
== creating build dir, resetting environment...
  >> build dir: /CHBS/apps/busdev_apps/eb/buildpath/PyTorch/2.3.0/foss-2023b
== ... (took < 1 sec)
== unpacking...
  >> running shell command:
        tar xzf /CHBS/apps/busdev_apps/eb/sources/p/PyTorch/pytorch-v2.3.0.tar.gz
        [started at: 2026-04-12 03:41:29]
        [working dir: /CHBS/apps/busdev_apps/eb/buildpath/PyTorch/2.3.0/foss-2023b]
        [output and state saved to /scratch/tmp/eb-lwx7vib8/run-shell-cmd-output/tar-knj5u09r]
  >> command completed: exit 0, ran in 00h01m05s
== ... (took 1 min 5 secs)
== patching...
  >> applying patch PyTorch-1.7.0_disable-dev-shm-test.patch
  >> applying patch PyTorch-1.12.1_add-hypothesis-suppression.patch
  >> applying patch PyTorch-1.12.1_fix-test_cpp_extensions_jit.patch
  >> applying patch PyTorch-1.12.1_fix-TestTorch.test_to.patch
  >> applying patch PyTorch-1.12.1_skip-test_round_robin.patch
  >> applying patch PyTorch-1.13.1_fix-gcc-12-warning-in-fbgemm.patch
  >> applying patch PyTorch-1.13.1_fix-protobuf-dependency.patch
  >> applying patch PyTorch-1.13.1_fix-warning-in-test-cpp-api.patch
  >> applying patch PyTorch-1.13.1_skip-failing-singular-grad-test.patch
  >> applying patch PyTorch-1.13.1_skip-tests-without-fbgemm.patch
  >> applying patch PyTorch-2.0.1_avoid-test_quantization-failures.patch
  >> applying patch PyTorch-2.0.1_fix-skip-decorators.patch
  >> applying patch PyTorch-2.0.1_fix-vsx-loadu.patch
  >> applying patch PyTorch-2.0.1_skip-failing-gradtest.patch
  >> applying patch PyTorch-2.0.1_skip-test_shuffle_reproducibility.patch
  >> applying patch PyTorch-2.0.1_skip-tests-skipped-in-subprocess.patch
  >> applying patch PyTorch-2.1.0_fix-vsx-vector-shift-functions.patch
  >> applying patch PyTorch-2.1.0_increase-tolerance-functorch-test_vmapvjpvjp.patch
  >> applying patch PyTorch-2.1.0_remove-test-requiring-online-access.patch
  >> applying patch PyTorch-2.1.0_skip-diff-test-on-ppc.patch
  >> applying patch PyTorch-2.1.0_skip-dynamo-test_predispatch.patch
  >> applying patch PyTorch-2.1.0_skip-test_jvp_linalg_det_singular.patch
  >> applying patch PyTorch-2.1.2_skip-cpu_repro-test-without-vectorization.patch
  >> applying patch PyTorch-2.1.2_workaround_dynamo_failure_without_nnpack.patch
  >> applying patch PyTorch-2.3.0_disable_test_linear_package_if_no_half_types_are_available.patch
  >> applying patch PyTorch-2.3.0_disable_DataType_dependent_test_if_tensorboard_is_not_available.patch
  >> applying patch PyTorch-2.3.0_fix-cpuinfo-bug-with-smt.patch
  >> applying patch PyTorch-2.3.0_skip-test_init_from_local_shards.patch
  >> applying patch PyTorch-2.3.0_no-cuda-stubs-rpath.patch
  >> applying patch PyTorch-2.3.0_disable-gcc12-warning.patch
  >> applying patch PyTorch-2.3.0_fix-test_extension_backend-without-vectorization.patch
  >> applying patch PyTorch-2.3.0_disable_tests_which_need_network_download.patch
  >> applying patch PyTorch-2.3.0_avoid_caffe2_test_cpp_jit.patch
  >> applying patch PyTorch-2.3.0_fix_missing_masked_load_for_int_type.patch
  >> applying patch PyTorch-2.3.0_skip_test_var_mean_differentiable.patch
  >> applying patch PyTorch-2.3.0_skip_test_sdpa_nn_functional_scaled_dot_product_attention_cpu.patch
  >> applying patch PyTorch-2.3.0_fix-mkldnn-avx512-f32-bias.patch
== ... (took < 1 sec)
== preparing...
  >> loading toolchain module: foss/2023b
  >> loading modules for build dependencies:
  >>  * CMake/3.27.6-GCCcore-13.2.0
  >>  * hypothesis/6.90.0-GCCcore-13.2.0
  >>  * pytest-flakefinder/1.1.0-GCCcore-13.2.0
  >>  * pytest-rerunfailures/14.0-GCCcore-13.2.0
  >>  * pytest-shard/0.1.2-GCCcore-13.2.0
  >>  * tlparse/0.3.5-GCCcore-13.2.0
  >>  * optree/0.13.0-GCCcore-13.2.0
  >>  * unittest-xml-reporting/3.1.0-GCCcore-13.2.0
  >> loading modules for (runtime) dependencies:
  >>  * Ninja/1.11.1-GCCcore-13.2.0
  >>  * Python/3.11.5-GCCcore-13.2.0
  >>  * Python-bundle-PyPI/2023.10-GCCcore-13.2.0
  >>  * protobuf/25.3-GCCcore-13.2.0
  >>  * protobuf-python/4.25.3-GCCcore-13.2.0
  >>  * pybind11/2.11.1-GCCcore-13.2.0
  >>  * SciPy-bundle/2023.11-gfbf-2023b
  >>  * PyYAML/6.0.1-GCCcore-13.2.0
  >>  * MPFR/4.2.1-GCCcore-13.2.0
  >>  * GMP/6.3.0-GCCcore-13.2.0
  >>  * numactl/2.0.16-GCCcore-13.2.0
  >>  * FFmpeg/6.0-GCCcore-13.2.0
  >>  * Pillow/10.2.0-GCCcore-13.2.0
  >>  * expecttest/0.2.1-GCCcore-13.2.0
  >>  * networkx/3.2.1-gfbf-2023b
  >>  * sympy/1.12-gfbf-2023b
  >>  * Z3/4.13.0-GCCcore-13.2.0
  >> defining build environment for foss/2023b toolchain
== ... (took 4 secs)
== configuring...
  >> running shell command:
        /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python -c 'import xmlrunner'
        [started at: 2026-04-12 03:42:39]
        [working dir: /CHBS/apps/busdev_apps/eb/buildpath/PyTorch/2.3.0/foss-2023b/pytorch-v2.3.0]
        [output and state saved to /scratch/tmp/eb-lwx7vib8/run-shell-cmd-output/python-sdo3qd7w]
  >> command completed: exit 0, ran in < 1s
== ... (took < 1 sec)
== building...
  >> running shell command:
        PYTORCH_BUILD_VERSION=2.3.0 PYTORCH_BUILD_NUMBER=1 VERBOSE=0 MAX_JOBS=16 BLAS=FlexiBLAS WITH_BLAS=flexi USE_FFMPEG=1 BUILD_CUSTOM_PROTOBUF=0 USE_SYSTEM_PYBIND11=1 USE_IBVERBS=1 USE_CUDA=0 USE_ROCM=0 USE_METAL=0 CMAKE_BUILD_TYPE=Release   /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python setup.py build
        [started at: 2026-04-12 03:42:39]
        [working dir: /CHBS/apps/busdev_apps/eb/buildpath/PyTorch/2.3.0/foss-2023b/pytorch-v2.3.0]
        [output and state saved to /scratch/tmp/eb-lwx7vib8/run-shell-cmd-output/PYTORCH_BUILD_VERSION230-myiqw9h_]
  >> command completed: exit 0, ran in 00h26m39s
== ... (took 26 mins 39 secs)
== testing...
  >> running shell command:
        export PYTHONPATH=/scratch/tmp/eb-lwx7vib8/tmpofidlzws/lib/python3.11/site-packages:$PYTHONPATH &&  PYTORCH_BUILD_VERSION=2.3.0 PYTORCH_BUILD_NUMBER=1 VERBOSE=0 MAX_JOBS=16 BLAS=FlexiBLAS WITH_BLAS=flexi USE_FFMPEG=1 BUILD_CUSTOM_PROTOBUF=0 USE_SYSTEM_PYBIND11=1 USE_IBVERBS=1 USE_CUDA=0 USE_ROCM=0 USE_METAL=0 CMAKE_BUILD_TYPE=Release   /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python -m pip install --prefix=/scratch/tmp/eb-lwx7vib8/tmpofidlzws  --verbose --no-deps --ignore-installed --no-index --no-build-isolation .
        [started at: 2026-04-12 04:09:19]
        [working dir: /CHBS/apps/busdev_apps/eb/buildpath/PyTorch/2.3.0/foss-2023b/pytorch-v2.3.0]
        [output and state saved to /scratch/tmp/eb-lwx7vib8/run-shell-cmd-output/export-x6tmlcnb]
  >> command completed: exit 0, ran in 00h02m23s
  >> running shell command:
        export PYTHONPATH=/scratch/tmp/eb-lwx7vib8/tmpofidlzws/lib/python3.11/site-packages:$PYTHONPATH &&  cd test && PYTHONUNBUFFERED=1 /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python run_test.py --continue-through-error --pipe-logs --verbose -x distributed/test_distributed_spawn distributions/test_constraints doctests test_native_mha distributed/rpc/test_tensorpipe_agent test_ci_sanity_check_fail test_cpp_extensions_open_device_registration
        [started at: 2026-04-12 04:11:43]
        [working dir: /CHBS/apps/busdev_apps/eb/buildpath/PyTorch/2.3.0/foss-2023b/pytorch-v2.3.0]
        [output and state saved to /scratch/tmp/eb-lwx7vib8/run-shell-cmd-output/export-7ylab5c_]









^C
WARNING: signal received (2), cleaning up locks (_CHBS_apps_busdev_apps_eb_software_PyTorch_2.3.0-foss-2023b)...

== ... (took 9 hours 15 mins 27 secs)

here are running processes before i decided to terminate:


[user@server ~]$ top -b -n1 -u singhpuv -c
top - 13:18:55 up 23 days, 16:52,  2 users,  load average: 81.43, 89.50, 83.07
Tasks: 925 total,   3 running, 912 sleeping,   0 stopped,  10 zombie
%Cpu(s): 91.8 us,  7.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.2 hi,  0.1 si,  0.0 st
MiB Mem : 515508.0 total,  95909.3 free,  38914.3 used, 385549.9 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used. 476593.7 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
3836884 singhpuv  20   0   16.5g   3.0g 118632 R  4175   0.6 535:18.05 /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python -bb test_ops.py --shard-id=1 --num-shards=1 -v -vv -rfEX -p no:xdist --use-pytest -x --reruns=2 --sc=test_ops_1_ad92143599fc738b --print-items
3841652 singhpuv  20   0   12.8g 568656  94028 R  65.0   0.1 577:36.42 /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python -bb test_ops_fwd_gradients.py --shard-id=1 --num-shards=1 -v -vv -rfEX -p no:xdist --use-pytest -x --reruns=2 --sc=test_ops_fwd_gradients_1_99b6d2f82ad8adf1 --print-items
3870669 singhpuv  20   0   13884   5376   3840 R  10.0   0.0   0:00.05 top -b -n1 -u singhpuv -c
 239494 singhpuv  20   0    9420   3608   2304 S   0.0   0.0   0:04.52 tmux new -s PARAVIEW
 239495 singhpuv  20   0   11600   6144   3840 S   0.0   0.0   0:00.05 -bash
 240154 singhpuv  20   0    5628   1920   1920 S   0.0   0.0   0:00.50 script
 240156 singhpuv  20   0   13532   7680   3840 S   0.0   0.0   0:00.84 bash -i
1437764 singhpuv  20   0   36540   7528   5760 S   0.0   0.0   0:02.35 sshd: singhpuv@pts/4
1437765 singhpuv  20   0   11560   5376   3456 S   0.0   0.0   0:00.07 -bash
1440458 singhpuv  20   0  170348 143552  12672 S   0.0   0.0   0:02.38 python3 -m easybuild.main PyTorch-2.3.0-foss-2023b.eb --robot
1468834 singhpuv  20   0  215836   8200   5760 S   0.0   0.0   0:00.00 /usr/libexec/geoclue-2.0/demos/agent
1562411 singhpuv  20   0   36536   7140   5376 S   0.0   0.0   0:00.11 sshd: singhpuv@pts/7
1562475 singhpuv  20   0   11428   4992   3072 S   0.0   0.0   0:00.06 -bash
1567828 singhpuv  20   0  215836   8196   5760 S   0.0   0.0   0:00.00 /usr/libexec/geoclue-2.0/demos/agent
1572432 singhpuv  20   0  215836   8200   5760 S   0.0   0.0   0:00.00 /usr/libexec/geoclue-2.0/demos/agent
1584639 singhpuv  20   0 6992628 226220  54144 S   0.0   0.0   0:03.01 /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python run_test.py --continue-through-error --pipe-logs --verbose -x distributed/test_distributed_spawn distributions/test_constraints doctests test_native_mha distributed/rpc/test_tensorpipe_agent test_ci_sanity_check_fail test_cpp_extensions_open_device_registration
1584766 singhpuv  20   0   16472  12288   6144 S   0.0   0.0   0:00.04 /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python -s -c from multiprocessing.resource_tracker import main;main(5)
1622116 singhpuv  20   0   33632  23336  10752 S   0.0   0.0   5:46.35 /usr/lib/systemd/systemd --user
1622118 singhpuv  20   0  202176  18172   1536 S   0.0   0.0   0:00.00 (sd-pam)
1622817 singhpuv  20   0   13040   4608   4224 S   0.0   0.0   0:00.00 /usr/bin/dbus-broker-launch --scope user
1622830 singhpuv  20   0  215836   7816   5376 S   0.0   0.0   0:00.00 /usr/libexec/geoclue-2.0/demos/agent
1622831 singhpuv  20   0    4888   2304   2304 S   0.0   0.0   0:01.23 dbus-broker --log 4 --controller 10 --machine-id cb3948eb3517494da85173d7993e15d4 --max-bytes 100000000000000 --max-fds 25000000000000 --max-matches 5000000000
1630566 singhpuv  20   0 4728492 245332 176256 S   0.0   0.0   2:59.51 /home/singhpuv/.MathWorks/ServiceHost/glchbs-sp200321/v2026.4.0.6/bin/glnxa64/MathWorksServiceHost service --realm-id companion@prod@production
1630876 singhpuv  20   0 1736496  64896  54144 S   0.0   0.0   2:21.96 /home/singhpuv/.MathWorks/ServiceHost/-mw_shared_installs/v2026.4.0.6/bin/glnxa64/MathWorksServiceHost-Monitor --client-id 1630566 --realm-id companion@prod@production --lifetime-token 45 -1
1635982 singhpuv  20   0    5500   3840   3456 S   0.0   0.0   0:00.00 tmux a -t pytorch
1642511 singhpuv  20   0  215836   8240   5760 S   0.0   0.0   0:00.00 /usr/libexec/geoclue-2.0/demos/agent
1679232 singhpuv  20   0   16472  13056   6144 S   0.0   0.0   0:00.03 /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python -s -c from multiprocessing.resource_tracker import main;main(7)
1752818 singhpuv  20   0 7070944 284900  61056 S   0.0   0.1   0:16.03 /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=8, pipe_handle=14) --multiprocessing-fork
1755540 singhpuv  20   0   98020  11752   7628 S   0.0   0.0   0:00.10 orted --hnp --set-sid --report-uri 20 --singleton-died-pipe 21 -mca state_novm_select 1 -mca ess hnp -mca pmix ^s1,s2,cray,isolated
1841247 singhpuv  20   0  215836   7816   5376 S   0.0   0.0   0:00.00 /usr/libexec/geoclue-2.0/demos/agent
1871345 singhpuv  20   0  215836   8196   5760 S   0.0   0.0   0:00.00 /usr/libexec/geoclue-2.0/demos/agent
2891909 singhpuv  20   0   11732   5760   3840 S   0.0   0.0   0:00.06 -bash
2893492 singhpuv  20   0    5628   1920   1920 S   0.0   0.0   0:00.26 script
2893494 singhpuv  20   0   11736   6144   3840 S   0.0   0.0   0:00.27 bash -i
3099987 singhpuv  20   0  215836   6144   5760 S   0.0   0.0   0:00.00 /usr/libexec/geoclue-2.0/demos/agent
3475735 singhpuv  20   0  215836   7812   5376 S   0.0   0.0   0:00.00 /usr/libexec/geoclue-2.0/demos/agent
3836515 singhpuv  20   0   16788   9600   6528 S   0.0   0.0   0:00.02 vim /scratch/tmp/eb-vnnbqzjz/run-shell-cmd-output/cd-0b78yaha/out.txt
3836829 singhpuv  20   0 6762956 196152  53376 S   0.0   0.0   0:01.45 /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=6, pipe_handle=13) --multiprocessing-fork
3841650 singhpuv  20   0 6763088 193840  54144 S   0.0   0.0   0:01.14 /CHBS/apps/busdev_apps/eb/software/Python/3.11.5-GCCcore-13.2.0/bin/python -s -c from multiprocessing.spawn import spawn_main; spawn_main(tracker_fd=6, pipe_handle=13) --multiprocessing-fork
[user@server ~]$

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions