Training state-of-the-art pathology foundation models with orders of magnitude less data

Abstract

The field of computational pathology has recently seen rapid advances driven by the development of modern vision foundation models (FMs), typically trained on vast collections of pathology images. It has been shown that scaling up the training data set and model size and using domain-specific image processing operations often leads to substantially higher performance on downstream tasks. In this work, we adopt several changes recently proposed in the literature to the standard DINOv2 framework for training pathology FMs. We also apply a post-training procedure for fine-tuning models on higher-resolution images to further enrich the information encoded in the embeddings. We present three novel pathology FMs trained on up to two orders of magnitude fewer WSIs than those used to train other state-of-the-art FMs while demonstrating a comparable or superior performance on downstream tasks. Even the model trained on TCGA alone (12k WSIs) outperforms most existing FMs and, on average, matches Virchow2, the second-best FM published to date. This suggests that there still remains a significant potential for further improving the models and algorithms used to train pathology FMs to take full advantage of the vast data collections.

Date
Sep 23, 2025 — Sep 27, 2025
Mikhail Karasikov
Mikhail Karasikov
ML Engineer, PhD

Machine learning researcher/engineer at kaiko.ai with a background in mathematics and computer science.