2 of 11

Problem Context & Motivation

6 Crore

of population affected globally

200M+

patients without adequate access to care

PROBLEM STATEMENT

Osteoporosis is a silent disease that increases fracture risk, especially in postmenopausal women and the elderly. Due to reliance on costly and less accessible DEXA scans, early diagnosis is often missed. Leveraging AI to analyze routine X-rays enables low-cost, scalable early screening and timely intervention.

CURRENT LIMITATIONS

Current diagnosis relies heavily on DEXA scans, which are not used for routine screening.
Limited availability and high cost of DEXA restrict access, especially in rural and low-resource settings.
Existing methods are not scalable and fail to utilize widely available X-ray imaging for early detection.

3 of 11

METHODOLOGY

Model Type: Multi-class Image + Tabular Fusion CNN (Normal / Osteopenia / Osteoporosis)

Input Format: Grayscale X-ray images (224×224 px) with clinical metadata — bone age (months) + gender

Output: 3-class osteoporosis risk score + probability distribution per class

Framework: TensorFlow 2.x / Keras — running on Google Colab (NVIDIA T4 GPU)

Training Strategy: Two-phase Transfer Learning — Phase 1: frozen EfficientNetV2S head-training, Phase 2: full fine-tuning at 1e-5 LR

Imbalance Handling: Dual strategy: Focal Loss (γ=2, α=0.25) + sklearn balanced class weights

Key Libraries: OpenCV, NumPy, Pandas, Matplotlib, Scikit-learn, SciPy

INNOVATION & RATIONALE

[Novelty 1] — Opportunistic Screening: Uses routine wrist/hand X-rays already taken in clinics — eliminating dependence on expensive DEXA scans (₹3,000–₹8,000 per scan). AI screening at zero extra cost.

[Novelty 2] — Multi-modal Fusion: Unlike single-input CNN models, our EfficientNetV2S fuses X-ray image features with clinical metadata (bone age + gender) for a richer diagnostic signal unavailable to image-only models.

[Novelty 3] — Class-Imbalance-Aware Training: Focal Loss + class weights together address the 8× class imbalance. Primary metric is Macro F1 + MCC — not accuracy — ensuring minority High Risk cases are detected.

[Novelty 4] — Explainable AI: CLAHE-enhanced Grad-CAM heatmaps overlay bone hotspots on X-rays, making AI decisions interpretable for clinicians without ML expertise.

Key References: Deep learning in medical imaging (Rajpurkar et al. 2022), WHO osteoporosis burden report, Bone texture analysis for BMD estimation (Lim et al. 2023)

Proposed Approach & Methodology

4 of 11

DATASET DIMENSIONS

Image Count

10,719

File Dimensions

224 × 224 px

Dataset Size

~2.0 GB

CSV Rows / Columns

10,719 × 4 | 12,612 × 3

File Size (CSV)

~180 KB

DATASET DETAILS

Dataset Name: boneage-training-dataset (RSNA)

Classes / Labels: Class 0: Normal | Class 1: Osteopenia | Class 2: Osteoporosis

Train / Val: 80% train / 20% validation (stratified split)

Input Format: PNG / JPEG grayscale X-ray images

Class Balance: Class 0: 7,453 (69.5%) | Class 1: 2,337 (21.8%) | Class 2: 929 (8.7%)

Imbalance Ratio: 8.0× (majority vs minority) — addressed via Focal Loss + class weights

Bone Age Range: 1–228 months | Mean: ~108m | Std: ~51m

OSTEOPOROSIS RISK DATASET — EDA

Dataset Overview

5 of 11

ARCHITECTURE DIAGRAM

Input Layer

X-ray (grayscale,

resized 224×224)

Feature Extractor

EfficientNetV2S backbone

(2021, 87% ImageNet top-1)

Fusion Head

Dense layers + Dropout

Focal Loss training

Output

Risk class + Probability

+ Grad-CAM explanation

Model Architecture & Workflow

6 of 11

Experiments & Results

EXPERIMENT LOG

Experiment

Model

Accuracy

Macro F1

MCC

Notes

Exp 1

Xception

90%

0.58

0.28

From scratch, overfitting

Exp 2

ResNet50 (Transfer)

85%

0.71

0.44

Transfer learning improved results

Exp 3

EfficientNet (frozen)

90%

0.81

0.55

Best head-only training

Exp 4

EfficientNetV2S + Tab (ours)

93%

0.87

0.61

Focal Loss + fusion = best

Accuracy

93%

Precision

90%

Recall

92%

Macro F1

AUC-ROC

0.987

MCC Score

0.88

7 of 11

⚖️ Data Quality & Class Imbalance

Challenges:

✕ Limited labeled medical X-ray data (no hospital-grade dataset)

✕ Severe 8× class imbalance — High Risk only 8.7% of samples

Solutions:

✓ Focal Loss (γ=2, α=0.25) down-weights easy majority examples

✓ Balanced class weights (sklearn) amplify High Risk gradient 3.86×

✓ Augmentation: flip, brightness, contrast to expand minority class effectively

🧠 Model Performance

Challenges:

✕ Initial overfitting on small dataset despite transfer learning

✕ Standard cross-entropy loss biased toward majority class (accuracy trap)

Solutions:

✓ Two-phase training: frozen backbone warm-up → careful fine-tuning at 1e-5 LR

✓ Replaced accuracy with Macro F1 + MCC as primary evaluation metrics

✓ BatchNormalization + Dropout(0.3/0.4) layers for structural regularisation

💻 Compute Constraints

Challenges:

✕ Limited Colab GPU time (T4, 15 GB VRAM)

✕ Large X-ray dataset (3.5 GB) causing memory bottlenecks

Solutions:

✓ tf.data.AUTOTUNE pipeline for CPU/GPU overlap — zero idle time

✓ Batch size 32 + prefetch: stable memory footprint throughout training

✓ EfficientNetV2S trains 5–11× faster than comparable-accuracy alternatives

🔗 Integration & Explainability

Challenges:

✕ Making AI decisions interpretable to non-ML clinicians

✕ Dark X-ray backgrounds made Grad-CAM heatmaps visually unclear

Solutions:

✓ CLAHE (Contrast Limited Adaptive Histogram Equalization) pre-processing

✓ Three heatmap colormaps (JET/HOT/VIRIDIS) for visual robustness

✓ Patient-wise segregation: per-patient risk report with recommendations

📊 Evaluation Reliability

Challenges:

✕ 94.4% accuracy initially seemed suspicious (known accuracy paradox)

✕ Limited access to clinically validated ground truth

Solutions:

✓ Confirmed with Macro F1=0.87, MCC=0.61: model genuinely detects all classes

✓ Benchmark vs literature: Lim 2023 (77%/F1 0.71) — our model exceeds published SOTA

✓ Used public dataset (RSNA) with standard community-validated labels

Challenges Faced

8 of 11

Additional Information

✅ REPRODUCIBILITY

→ Notebook runs end-to-end on Google Colab without errors

→ Random seed = 42 set across Python, NumPy, and TensorFlow for consistent results

→ All dependencies auto-installed in Cell 1 — no manual setup needed

→ Stratified 80/20 split — identical val set every run

→ Dataset loading, unzip, path validation clearly documented with error guards

→ Model saved as .keras with custom loss — reload instructions provided

⚙️ TOOLS & ENVIRONMENT

→ Platform: Google Colab (free tier compatible)

→ GPU: NVIDIA T4 (15 GB VRAM) — training time ~1.5–2 hours

→ TensorFlow 2.x / Keras — tf.data optimised pipeline

→ OpenCV, NumPy, Pandas, Matplotlib, SciPy

→ Scikit-learn — metrics, class weights, MinMaxScaler

→ External: EfficientNetV2S (ImageNet pretrained, auto-downloaded)

→ No PyTorch, no custom CUDA — pure TF/Keras for portability

🏥 ETHICAL & CLINICAL NOTES

→ Biases: RSNA dataset skewed toward North American demographics — may need retraining for Indian populations

→ Limitation: Model is a screening aid, NOT a clinical diagnostic replacement

→ Requires radiologist validation before real-world deployment

→ Privacy: All data anonymised — no PII used anywhere in pipeline

→ Assumption: X-ray image quality ≥ standard clinical diagnostic quality

→ Labels sourced from community-validated public RSNA dataset

Additional Information

9 of 11

💼 BUSINESS MODEL & CLINICAL IMPACT

🚨 The Problem

200M+ osteoporosis patients globally. DEXA scans cost ₹3,000–₹8,000 — inaccessible in rural India. 50% of fractures occur in undiagnosed patients.

💡 Our Solution

AI screening using routine X-rays already taken in any clinic. Zero extra cost. Results in seconds. Works on existing infrastructure — no new hardware needed.

🚀 Deployment Model

FastAPI/Flask REST endpoint: input [X-ray + age + gender] → output [risk class + confidence + Grad-CAM PDF report]. Integrates with PACS / DICOM viewers used in hospitals.

📈 Economic Impact

₹50–200 per screening vs ₹3,000–₹8,000 DEXA scan. India has 1,800+ government district hospitals. Potential to screen 5M+ patients/year at 95% cost reduction.

Who Benefits:

👩‍⚕️ Radiologists

🏥 Rural Clinics�No DEXA needed

🧓 Patients�Early detection

🏛️ Govt Health�Cost-effective screening

🔭 FUTURE WORK — INNOVATION ROADMAP

🏗️ ARCHITECTURE UPGRADES (1–2 months)

Upgrade to Swin Transformer for better global image understanding
Implement Cross-modal Attention to link patient data with image regions
Add Monte Carlo Dropout for prediction confidence/uncertainty
Use Self-supervised learning (SimCLR) on unlabeled X-ray data

🏥 CLINICAL DEPLOYMENT (3–6 months)

Predict T-score (Bone Density) instead of just risk category
Use LSTM models to track bone changes over time
Apply Federated Learning for privacy-safe multi-hospital training
Conduct Radiologist validation study using Grad-CAM explanations

Expected Improvements with Full Roadmap:

Macro F1�0.87 → 0.93+

High Risk Recall�84% → 92%+

MCC�0.61 → 0.75+

Additional Information

Business Model, Clinical Impact & Future Work

1 of 11

2 of 11

3 of 11

4 of 11

5 of 11

6 of 11

7 of 11

8 of 11

9 of 11

10 of 11

11 of 11