These 2 presses have a __post_init_from_model__ method that load data from the internet. When running evaluation, the method is called for every sample even if the model is the same. There should be some check in the post init to avoid re-loading the data.