dinov2-I16 – Kentucky Open Science

Home / Projects / dinov2-I16

Details

Stars ★ 1

Forks 1

DINOv2 patchThis is a patch for the original repository to make it work with the latest version of PyTorch (>2.1).Install the dependencies using the following command:``bash conda env create -f conda.yaml conda activate dinov2 `Then run the following command to install the package:`bash pip install -e . `Then add the training images to data/train.You can now start the training using…

Source code on GitHub.

Repository Info

RepositoryKentucky-Open-Science/dinov2-I16

LanguageJupyter Notebook

Stars★ 1

Forks1

LicenseApache-2.0

Last Updated2025-09-04

README

DINOv2 patch

This is a patch for the original repository to make it work with the latest version of PyTorch (>2.1).

Install the dependencies using the following command:

``bash conda env create -f conda.yaml conda activate dinov2`

Then run the following command to install the package:

`bash pip install -e .`

Then add the training images to data/train.

You can now start the training using torchrun instead of submitit. The following command will start the training on 2 GPUs:

`bash torchrun --nproc_per_node 2 train.py --config-file dinov2/configs/train/vitl16_short_custom.yaml --output-dir output`

Thanks a lot to https://github.com/csaroff/dinov2 for an example of custom dataset.

:new: [2023-10-26] Added DINOv2 backbones with registers, following Vision Transformers Need Registers.

`DINOv2: Learning Robust Visual Features without Supervision`

Meta AI Research, FAIR

Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V. Vo, Marc Szafraniec, Vasil Khalidov, Patrick Labatut, Armand Joulin, Piotr Bojanowski

[Paper #1] Paper #2] [Blog] [Demo] [BibTeX]

PyTorch implementation and pretrained models for DINOv2. For details, see the papers: DINOv2: Learning Robust Visual Features without Supervision and Vision Transformers Need Registers.

DINOv2 models produce high-performance visual features that can be directly employed with classifiers as simple as linear layers on a variety of computer vision tasks; these visual features are robust and perform well across domains without any requirement for fine-tuning. The models were pretrained on a dataset of 142 M images without using any labels or annotations.

https://github.com/facebookresearch/dinov2/assets/60359573/f168823e-7922-415a-b429-578badf5c356


  Visualization of the three first principal components of the patch features of all frames, mapped to RGB values.

`Pretrained models`


      ...      
    
      
  
  
  



  
    
      Explore
      Platforms
      AI Models
      Datasets
      Projects
      Libraries
    
    
      Resources
      Publications
      Grants & Funding
      GitHub
      Hugging Face
    
    
      About
      Advancing research through open collaboration across the Commonwealth of Kentucky.
    
  
  
    © 2026 Kentucky Open Science