Skip to content

feat: add support for single-scale ViT models via adapter#1265

Open
ha405 wants to merge 1 commit intoqubvel-org:mainfrom
ha405:feature/vit-adapter-support
Open

feat: add support for single-scale ViT models via adapter#1265
ha405 wants to merge 1 commit intoqubvel-org:mainfrom
ha405:feature/vit-adapter-support

Conversation

@ha405
Copy link
Copy Markdown

@ha405 ha405 commented Feb 9, 2026

This PR adds support for timm Vision Transformer (ViT) models by implementing a ViTFeatureAdapter #1244 to generate the multi-scale features required by SMP decoders.

Key Changes:

ViTFeatureAdapter: Converts single-scale features (e.g., 1/16) into a hierarchical scale (1/4, 1/8, 1/16, 1/32) using learnable up/down-sampling.
Auto-Detection:
TimmUniversalEncoder
now automatically detects and adapts ViT-style models.
Testing: Added
tests/encoders/test_vit_adapter.py
with full coverage for ViT architectures.
All existing and new encoder tests passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant