Home / AI Models / Ker-VLJEPA-3B
Model released text-generation cc-by-nc-sa-4.0

Ker-VLJEPA-3B

Details

Architecture Llama 3.2 3B + LeJEPA + Z-Zoned Perceiver + Flamingo cross-attention
Parameters ~3.2B + 1.7GB LoRA + 320MB bridge
Base Model meta-llama/Llama-3.2-3B
Relation finetune
License cc-by-nc-sa-4.0

Multi-modal vision-language model that generates free-text radiology reports from CT slice embeddings. Achieves new state-of-the-art on CT-RATE benchmark.

Architecture

  • Language model: Llama 3.2 3B with LoRA (rank 64, alpha 128)
  • Visual encoder: Guided-Chest-CT-LeJEPA
  • Bridge: Z-Zoned Perceiver
  • Cross-attention: Flamingo-style gated

Results

MetricValuePrevious SOTA
Macro F10.4290.414 (U-VLM)