Open Science Initiative

Kentucky
Open Science

Discover AI models, datasets, platforms, and computational tools advancing research across the Commonwealth.

Projects

Applied research projects across healthcare, agriculture, public health, and education.

View all 69

LLM-powered robotic health assistant with smell sensing, fall detection, and conversational AI.

multimodal beta ★ 9

AI-powered patient-to-clinical-trial matching using reasoning models for eligibility decisions.

text-generation in-development ★ 1

AI-powered synthetic personas for medical and social work education using interactive patient simulations.

text-generation in-development

Automated texting service for clinical TRE studies. 93.53% comprehension accuracy, 93% daily adherence.

active

One Good Choice (1GC)

Project

Nutritional guidance platform predicting food health scores from text descriptions and suggesting healthier alternatives.

text-generation in-development

SpeakEZ

Project

Oral history transcription and analysis system for thousands of hours of Kentucky recordings.

automatic-speech-recognition prototype

AI Models

Pre-trained and fine-tuned models for computer vision, NLP, and medical AI.

View all 14

DINOv2 ViT-Large finetuned on CT-RATE for chest CT feature extraction with anatomically-aware cropping.

feature-extraction cc-by-nc-sa-4.0 released ViT-Large with Registers (1024-dim,

Lightweight medical LLM. 13.76% improvement over base TinyLlama across 3 medical benchmarks.

text-generation apache-2.0 released TinyLlama (1.1B params)

MAD-NP

Model

Magnification-aware self-supervised neuropathology model. Linear F1: 0.9307, KNN F1: 0.9286.

image-feature-extraction apache-2.0 released ViT-Giant with Registers (~1B

Medical LLM based on Mistral, fine-tuned on medical datasets.

text-generation apache-2.0 released Mistral 3x7B

Medical LLM based on LLaMA 2 70B, fine-tuned on medical datasets.

text-generation apache-2.0 released LLaMA-2 3x70B

Medical LLM based on LLaMA 2 7B, fine-tuned on medical datasets.

text-generation apache-2.0 released LLaMA-2 7B

Datasets

Curated datasets for training and evaluation.

View all 7

Smell/VOC sensor data collected by a mobile robot over ~5 months. 64 smell channels, temperature, humidity. 4.35 GB.

apache-2.0 public

728 pathological image ROIs with 1536-dim Prov-Gigapath features across 3 classes.

restricted-access

Statewide EMS opioid response data collected weekly since January 2018.

restricted-access

19,000+ audio and video recordings spanning Kentucky oral history.

restricted-access

Aggregated pathology slides from Kentucky healthcare sources. Over 500,000 slides.

restricted-access

114 annotated syringe deposit images for computer vision classification and counting.

restricted-access

Recent Publications

Peer-reviewed papers and preprints from Kentucky researchers.

View all 55

Coronary artery calcium (CAC) scoring is a key predictor of cardiovascular risk, but it relies on ECG-gated CT scans, restricting its use to specialized cardiac imaging settings. We introduce an…

2026 arXiv preprint

Automated radiology report generation from 3D computed tomography (CT) volumes is challenging due to extreme sequence lengths, severe class imbalance, and the tendency of large language models (LLMs) to ignore…

2026 arXiv preprint

Medical & Biological Engineering & Computing (2026)

2026 Medical & biological...

Scientific Reports (2025)

2025 Scientific reports

Grants & Funding

Federal and institutional grants supporting Kentucky research.

View all 24

College of Medicine

inactive College of...

NSF

inactive NSF

NSF

active NSF (National...

Platforms & Services

Self-service tools for researchers — no programming expertise required.

View all 6

CAT-Talk

Platform

Secure, web-based AI transcription platform with speaker diarization, timestamping, and LLM-powered analysis.

automatic-speech-recognition active

CLASSify

Platform

No-code, web-based machine learning platform for training and evaluating classification models on tabular data.

tabular-classification active

Forecaster

Platform

User-friendly web platform for time series forecasting with multiple models and LLM-assisted interpretation.

time-series-forecasting active

LLM Factory

Platform

Self-service platform for interacting with open-source large language models via web chat interface or OpenAI-compatible API.

natural-language-processing active

SmartState

Platform

Open-source automated protocol adherence platform using finite state machines and conversational AI for clinical research.

conversational active

Vision Foundry

Platform

Platform for training and deploying foundational vision AI models using self-supervised learning on Vision Transformers.

image-classification in-development