Vision Foundry

Home / Platforms / Vision Foundry

Vision Foundry is a self-service platform for AI-powered image analysis. It is designed to assist researchers working with unlabeled or hard-to-label image datasets, especially in the biomedical domain. It helps researchers extract high-dimensional feature representations from unlabeled image data.

At the core of Vision Foundry is DinoMX, a modular PyTorch-based training framework that facilitates self-supervised representation learning using Vision Transformers (ViTs). The pipeline builds upon the DINO and DINOv2 frameworks (self-distillation with no labels) introduced by Meta in 2023.

Key Capabilities

LoRA Fine-Tuning (via PEFT): Parameter-efficient adaptation without retraining the full backbone
Knowledge Distillation: Transfer representations from large teacher to smaller student model
ClearML Integration: Experiment tracking, logging, and artifact storage
Model Standardization: Hugging Face compatible checkpoints
Multi-modal development: Integration of vision encoders with LLMs for clinical AI

The initial use case focused on neuropathology. As part of the Federated Brain Digital Slide Archive project, NP-TEST-0 was developed — a Vision Transformer pretrained using DinoMX on real-world neuropathology data. It supports transfer learning, tissue segmentation, patch-level classification, and similarity search.

Key Capabilities

Related

Publications