Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
Any Environment.
Reproduces the problem - code/configuration sample
config.py
from mmengine.config import read_base
with read_base():
from opencompass.configs.datasets.SmolInstruct.smolinstruct_nc_0shot_instruct import mini_nc_0shot_instruct_datasets
from opencompass.configs.datasets.SmolInstruct.smolinstruct_pp_acc_0_shot_instruct import mini_pp_acc_datasets_0shot_instruct
from opencompass.configs.datasets.SmolInstruct.smolinstruct_rmse_0shot_instruct import mini_pp_rmse_0shot_instruct_datasets
from opencompass.configs.datasets.SmolInstruct.smolinstruct_fts_0shot_instruct import mini_fts_0shot_instruct_datasets
from opencompass.configs.datasets.SmolInstruct.smolinstruct_meteor_0shot_instruct import mini_meteor_0shot_instruct_datasets
from opencompass.configs.models.internlm.internlm_7b import models
mini_smolinstruct_datasets_0shot_instruct = mini_nc_0shot_instruct_datasets + mini_pp_rmse_0shot_instruct_datasets + mini_pp_acc_datasets_0shot_instruct + mini_meteor_0shot_instruct_datasets + mini_fts_0shot_instruct_datasets
from opencompass.partitioners import NaivePartitioner, NumWorkerPartitioner
from opencompass.runners import LocalRunner
from opencompass.tasks import OpenICLEvalTask, OpenICLInferTask
infer = dict(
partitioner=dict(type=NumWorkerPartitioner, num_worker=8),
runner=dict(type=LocalRunner,
max_num_workers=8,
task=dict(type=OpenICLInferTask)),
)
eval = dict(
partitioner=dict(type=NaivePartitioner, n=10),
runner=dict(type=LocalRunner,
max_num_workers=256,
task=dict(type=OpenICLEvalTask)),
)
work_dir = 'outputs/test/SmolInstruct'
Reproduces the problem - command or script
opencompass /path/to/config.py --reuse
Reproduces the problem - error message
I redirected the dataset path to the already downloaded and decompressed path in opencompass/configs/datasets/SmolInstruct/smolinstruct_fts_0shot_instruct.py -- 63 lines and other mini subset's, but when the evaluation started loading the data, this error was raised:
TypeError: Couldn't cast array of type struct<Hepatobiliary disorders: string, Metabolism and nutrition disorders: string, Eye disorders: string, Musculoskeletal and connective tissue disorders: string, Gastrointestinal disorders: string, Immune system disorders: string, Reproductive system and breast disorders: string, Neoplasms benign, malignant and unspecified (incl cysts and polyps): string, Endocrine disorders: string, Vascular disorders: string, Blood and lymphatic system disorders: string, Skin and subcutaneous tissue disorders: string, Congenital, familial and genetic disorders: string, Respiratory, thoracic and mediastinal disorders: string, Psychiatric disorders: string, Renal and urinary disorders: string, Pregnancy, puerperium and perinatal conditions: string, Ear and labyrinth disorders: string, Cardiac disorders: string, Nervous system disorders: string> to string
Other information
Unlike other datasets where output is a plain string, property_prediction-sider.jsonl in SmolInstruct stores output as a dictionary. Reading all files under the pp directory with a uniform parser therefore causes an error when processing this specific file.
Sample entry from property_prediction-sider.jsonl:
{
"input": "NC1=CC=C(N=NC2=CC=CC=C2)C(N)=N1",
"output": {
"Hepatobiliary disorders": "Yes",
"Metabolism and nutrition disorders": "No",
"Eye disorders": "Yes",
"Musculoskeletal and connective tissue disorders": "No",
"Gastrointestinal disorders": "Yes",
"Immune system disorders": "Yes",
"Reproductive system and breast disorders": "No",
"Neoplasms benign, malignant and unspecified (incl cysts and polyps)": "No",
"Endocrine disorders": "No",
"Vascular disorders": "No",
"Blood and lymphatic system disorders": "Yes",
"Skin and subcutaneous tissue disorders": "Yes",
"Congenital, familial and genetic disorders": "Yes",
"Respiratory, thoracic and mediastinal disorders": "No",
"Psychiatric disorders": "No",
"Renal and urinary disorders": "Yes",
"Pregnancy, puerperium and perinatal conditions": "No",
"Ear and labyrinth disorders": "No",
"Cardiac disorders": "No",
"Nervous system disorders": "Yes"
},
"task": "property_prediction-sider",
"split": "test"
}
To ensure reproducibility across the community, we propose one of the following:
- Provide a unified, revised version of the dataset; or
- Update the evaluation pipeline's data-loading configuration.
Prerequisite
Type
I'm evaluating with the officially supported tasks/models/datasets.
Environment
Any Environment.
Reproduces the problem - code/configuration sample
config.py
Reproduces the problem - command or script
Reproduces the problem - error message
I redirected the dataset path to the already downloaded and decompressed path in
opencompass/configs/datasets/SmolInstruct/smolinstruct_fts_0shot_instruct.py-- 63 lines and other mini subset's, but when the evaluation started loading the data, this error was raised:Other information
Unlike other datasets where output is a plain string,
property_prediction-sider.jsonlin SmolInstruct stores output as a dictionary. Reading all files under the pp directory with a uniform parser therefore causes an error when processing this specific file.Sample entry from
property_prediction-sider.jsonl:{ "input": "NC1=CC=C(N=NC2=CC=CC=C2)C(N)=N1", "output": { "Hepatobiliary disorders": "Yes", "Metabolism and nutrition disorders": "No", "Eye disorders": "Yes", "Musculoskeletal and connective tissue disorders": "No", "Gastrointestinal disorders": "Yes", "Immune system disorders": "Yes", "Reproductive system and breast disorders": "No", "Neoplasms benign, malignant and unspecified (incl cysts and polyps)": "No", "Endocrine disorders": "No", "Vascular disorders": "No", "Blood and lymphatic system disorders": "Yes", "Skin and subcutaneous tissue disorders": "Yes", "Congenital, familial and genetic disorders": "Yes", "Respiratory, thoracic and mediastinal disorders": "No", "Psychiatric disorders": "No", "Renal and urinary disorders": "Yes", "Pregnancy, puerperium and perinatal conditions": "No", "Ear and labyrinth disorders": "No", "Cardiac disorders": "No", "Nervous system disorders": "Yes" }, "task": "property_prediction-sider", "split": "test" }To ensure reproducibility across the community, we propose one of the following: