Im am currently trying to run a kfold trining loop. At the end of each iteration I free memory using gc.collect() and torch.cuda.empty_cache() but seems not to do the job completely. I leave the code here:
dataset = load_from_disk(os.path.join("data", cfg.kfold_kwargs.kfold_dataset_name))
folds = StratifiedKFold(n_splits=cfg.kfold_kwargs.n_splits)
splits = list(folds.split(np.zeros(dataset.num_rows), dataset[cfg.label_column]))
args = setfit.TrainingArguments(**cfg.train_kwargs)
all_metrics = []
for train_idxs, test_idxs in splits:
fold_dataset = DatasetDict(
{
"train": dataset.select(train_idxs),
"test": dataset.select(test_idxs),
}
)
trainer = setfit.Trainer(
model_init=model_init_fn(cfg.model_kwargs),
args=args,
train_dataset=fold_dataset["train"],
eval_dataset=fold_dataset["test"],
metric=custom_metrics_fn(fold_dataset, cfg.label_column),
column_mapping={"text": "text", cfg.label_column: "label"},
)
trainer.train()
# metrics = trainer.evaluate(fold_dataset["test"])
# all_metrics.append(metrics)
# print(metrics)
def memory_stats():
print(f"Memory allocated: {torch.cuda.memory_allocated()/1024**2}\nMemory reserved: {torch.cuda.memory_reserved()/1024**2}")
memory_stats()
del trainer.model.model_head, trainer.model.model_body
del fold_dataset, trainer
# torch.cuda.synchronize()
gc.collect()
torch.cuda.empty_cache()
memory_stats()
print('\n')
and my setup:
accelerate==1.0.1
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiosignal==1.3.1
alembic==1.13.3
antlr4-python3-runtime==4.9.3
async-timeout==4.0.3
attrs==24.2.0
certifi==2024.8.30
charset-normalizer==3.4.0
colorlog==6.8.2
datasets==3.0.1
dill==0.3.8
evaluate==0.4.3
filelock==3.16.1
frozenlist==1.4.1
fsspec==2024.6.1
greenlet==3.1.1
huggingface-hub==0.26.1
hydra-core==1.3.2
idna==3.10
Jinja2==3.1.4
joblib==1.4.2
Mako==1.3.6
MarkupSafe==3.0.2
mpmath==1.3.0
multidict==6.1.0
multiprocess==0.70.16
networkx==3.4.2
numpy==2.1.2
nvidia-cublas-cu12==12.4.5.8
nvidia-cuda-cupti-cu12==12.4.127
nvidia-cuda-nvrtc-cu12==12.4.127
nvidia-cuda-runtime-cu12==12.4.127
nvidia-cudnn-cu12==9.1.0.70
nvidia-cufft-cu12==11.2.1.3
nvidia-curand-cu12==10.3.5.147
nvidia-cusolver-cu12==11.6.1.9
nvidia-cusparse-cu12==12.3.1.170
nvidia-nccl-cu12==2.21.5
nvidia-nvjitlink-cu12==12.4.127
nvidia-nvtx-cu12==12.4.127
omegaconf==2.3.0
optuna==4.0.0
packaging==24.1
pandas==2.2.3
pillow==11.0.0
propcache==0.2.0
psutil==6.1.0
pyarrow==17.0.0
python-dateutil==2.9.0.post0
pytz==2024.2
PyYAML==6.0.2
regex==2024.9.11
requests==2.32.3
safetensors==0.4.5
scikit-learn==1.5.2
scipy==1.14.1
sentence-transformers==3.2.1
setfit==1.1.0
six==1.16.0
SQLAlchemy==2.0.36
sympy==1.13.1
threadpoolctl==3.5.0
tokenizers==0.20.1
torch==2.5.0
tqdm==4.66.5
transformers==4.45.2
triton==3.1.0
typing_extensions==4.12.2
tzdata==2024.2
urllib3==2.2.3
xxhash==3.5.0
yarl==1.16.0
I also leave the memory printed at each iteration:
Memory allocated: 279.685546875
Memory reserved: 596.0
Memory allocated: 279.685546875
Memory reserved: 342.0
Memory allocated: 411.4501953125
Memory reserved: 738.0
Memory allocated: 411.4501953125
Memory reserved: 484.0
Memory allocated: 542.93359375
Memory reserved: 876.0
Memory allocated: 542.93359375
Memory reserved: 626.0
Memory allocated: 674.4638671875
Memory reserved: 1052.0
Memory allocated: 674.4638671875
Memory reserved: 780.0
Does anyone could suggest the reason?
Im am currently trying to run a kfold trining loop. At the end of each iteration I free memory using
gc.collect()andtorch.cuda.empty_cache()but seems not to do the job completely. I leave the code here:and my setup:
I also leave the memory printed at each iteration:
Memory allocated: 279.685546875
Memory reserved: 596.0
Memory allocated: 279.685546875
Memory reserved: 342.0
Memory allocated: 411.4501953125
Memory reserved: 738.0
Memory allocated: 411.4501953125
Memory reserved: 484.0
Memory allocated: 542.93359375
Memory reserved: 876.0
Memory allocated: 542.93359375
Memory reserved: 626.0
Memory allocated: 674.4638671875
Memory reserved: 1052.0
Memory allocated: 674.4638671875
Memory reserved: 780.0
Does anyone could suggest the reason?