Home / AI Models / Ker-VLJEPA-3B

Details

Architecture Llama 3.2 3B + LeJEPA + Z-Zoned Perceiver + Flamingo cross-attention

Parameters ~3.2B + 1.7GB LoRA + 320MB bridge

Base Model meta-llama/Llama-3.2-3B

Relation finetune

License cc-by-nc-sa-4.0

Multi-modal vision-language model that generates free-text radiology reports from CT slice embeddings. Achieves new state-of-the-art on CT-RATE benchmark.

Architecture

Language model: Llama 3.2 3B with LoRA (rank 64, alpha 128)
Visual encoder: Guided-Chest-CT-LeJEPA
Bridge: Z-Zoned Perceiver
Cross-attention: Flamingo-style gated

Results

Metric	Value	Previous SOTA
Macro F1	0.429	0.414 (U-VLM)

Ker-VLJEPA-3B

Details

Architecture

Results

Hugging Face Metadata

Related

Publications