This is the official repository of the article "Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?".
To run the code you can either set up a conda environment and install requirements.txt (without open-source models) or build the docker container to launch the other models on your machine.
We use the dataset provided by Depeweg et. al [1] which contains the 100 original Bongard Problems in high resolution (Link here).
cd data
wget --content-disposition 'https://files.de-1.osf.io/v1/resources/95dks/providers/osfstorage/65c674103280d80d5da3aa33/?zip='
unzip bpimgs.zip -d bpimgsFor the perception-focussed evaluation we considered the single diagrams of the BPs which can be generated by executing:
python utils/crop_images.py[1] Depeweg, S., Rothkopf, C.A., Jäkel, F. (2024). Solving Bongard Problems with a Visual Language and Pragmatic Constraints. Cognitive Science, 48(5), e13432.
The experimental scripts can be found in experiments/. You can execute them from the command line, e.g.,
python experiments/zero_shot_bp.py --model "gpt-4o"Make sure to include your API access keys in the respective folders of the model, e.g., gpt-4o/open-ai-key.
The results of the evaluations will be stored in results/. The evaluation scripts, including the llm-judge can be found in experiments/evaluate. You can run those from the command line as well, e.g.,
python experiments/zero_shot_bp.py --model "gpt-4o" --mode "zero_shot"You can also use the demo provided in demo.ipynb to run a model on individual BPs and inspect the results.
If you find the code of this repository helpful, consider citing us.
@inproceedings{wust2bongard,
title={Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?},
author={W{\"u}st, Antonia and Tobiasch, Tim and Helff, Lukas and Ibs, Inga and Stammer, Wolfgang and Dhami, Devendra Singh and Rothkopf, Constantin A and Kersting, Kristian},
booktitle={Forty-second International Conference on Machine Learning},
year={2025}
}