Data Pre-processing
Data Extraction & Turnaround-Time (TAT) Feature Engineering
IQR Based
*target variable
Prediction Models For IQR Based feature with ANN model
ADASYN (Adaptive Synthetic Sampling) + ANN | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
ANN- Train | 0.60 | 0.59 | 0.58 | 0.73 | 0.58 | 0.83 |
ANN- Test | 0.54 | 0.54 | 0.54 | 0.70 | 0.54 | 0.83 |
Although the accuracy (~54%) suggests moderate classification ability, the ROC-AUC scores (0.83) indicate strong ranking and separability power the ANN understands relative order between TAT classes but struggles at precise boundary classification due to class overlap in the IQR-based method.
My rule of thumb is 70% training- 15% validation- 15% testing
ADASYN generates more samples near decision boundaries this is done to avoid overlap between data’s.
It improves class separation without overfitting to dense areas.
Data Pre-processing
Data Extraction & Turnaround-Time (TAT) Feature Engineering(Hour based)
*target variable
Is_TAT_Anomaly (Anomaly Flag)
Prediction Models For Hour Based feature with ANN model
At the time of entry (with RandomOverSampler) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
ANN- Train | 0.755 | 0.765 | 0.755 | 0.820 | 0.752 | 0.942 |
ANN- Test | 0.775 | 0.786 | 0.775 | 0.880 | 0.773 | 0.945 |
The ANN model achieved 77.5% test accuracy with a balanced F1-score of 0.77, indicating robust learning across all TAT categories.The High-TAT category remains challenging (F1 ≈ 0.42) and could improve with additional long-tail samples or engineered time-series features.
My rule of thumb is 70% training- 15% validation- 15% testing
The RandomOverSampler was used to balance the minority class like High TAT or Incomplete workflow.
SHAP Diagram
Y-axis:
Features are sorted by their overall importance (top = most influential).
X-axis:
Negative SHAP → pushed prediction toward lower TAT / faster workflow
Positive SHAP → pushed prediction toward higher TAT / possible delay or anomaly�The farther from zero, the stronger the feature’s effect.
Color:
> Blue: low value of feature
> Red: high value of feature
Prediction Models For binary Class with missing Verification time ANN model Class B
At the time of entry (with RandomOverSampler) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
ANN- Test | 0.67 | 0.6476 | 0.4644 | 0.6100 | 0.5249 | 0.5653 |
The ANN model achieved 67% test accuracy with a weighted F1-score of 0.52 indicating below average predictive performance.
Precision (0.6476) and Recall (0.46) suggest a not highly confident classification performance across three classes.
The ROC AUC of 0.56 indicates average discrimination between normal and anomalous entries.
Specificity (0.61) shows that the model correctly identifies around 61% of non-anomalous workflows
My rule of thumb is 70% training- 15% validation- 15% testing
The RandomOverSampler was used to balance the minority class like High TAT or Incomplete workflow.
SHAP Diagram
Y-axis:
Features are sorted by their overall importance (top = most influential).
X-axis:
Negative SHAP → pushed prediction toward lower TAT / faster workflow
Positive SHAP → pushed prediction toward higher TAT / possible delay or anomaly�The farther from zero, the stronger the feature’s effect.
Color:
> Blue: low value of feature
> Red: high value of feature
Adding New_Features
Backlog_SameDay_Before
Batch_IsLarge_5m
Removing the relevant features correlation between feature and the Target (|r| < 0.05)
PatientVisitFrequency (|r|≈0.0486)
RxTotalRxAmount (|r|≈0.0444)
RxRefillsRemaining (|r|≈0.0276)
RxTotalPrice (|r|≈0.0123)
Price_vs_Total_Ratio (|r|≈0.002)
DAW_Required (|r|≈0.0002)
Positive r: as the feature goes up, label tends to go up.
Negative r: as the feature goes up, label tends to go down.
Remove it if it’s near zero (|r| < 0.05)
Prediction Models for after removing the features (|r| < 0.05)
The ANN model achieved 58% test accuracy with a weighted F1-score of 0.56 indicating below average predictive performance.
Precision (0.565) and Recall (0.564) suggest a not highly confident classification performance across three classes.
The ROC AUC of 0.813 indicates average discrimination between normal and anomalous entries.
Specificity (0.81) shows that the model correctly identifies around 81% of non-anomalous workflows
My rule of thumb is 70% training- 15% validation- 15% testing
The RandomOverSampler was used to balance the minority class like High TAT or Incomplete workflow.
At the time of entry (with RandomOverSampler) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
ANN- Train | 0.578 | 0.565 | 0.565 | 0.866 | 0.551 | 0.813 |
ANN- Test | 0.578 | 0.564 | 0.564 | 0.866 | 0.551 | 0.814 |
SHAP Diagram
Y-axis:
Features are sorted by their overall importance (top = most influential).
X-axis:
Negative SHAP → pushed prediction toward lower TAT / faster workflow
Positive SHAP → pushed prediction toward higher TAT / possible delay or anomaly�The farther from zero, the stronger the feature’s effect.
Color:
> Blue: low value of feature
> Red: high value of feature
Adding New_Features
Backlog_SameDay_Before
Batch_IsLarge_8h : (Working hours)
Batch_IsLarge_24h : (Last Day).
Batch_IsLarge_38h : (Average Workflow).
Adding New_Features
Workload_Ratio_24h
|
correlation between feature and the Target (Numeric)
\
|
correlation between feature and the Target (Categorical)
\
|
correlation between feature and the Target with the association ( < 0.05)
\
|
I have decided to remove the categorical feature manually since the numerical feature has some correlation with other columns that has good correlation with the target variable.
I also removed the Insurance_Rejection_Count', 'Pharmacy_Rejection_Count',
'Peak_Hours_Rejection', 'Weekend_Rejection’ since all values are zero.
Prediction Models for after removing the features with the association ( < 0.05)
The model is consistently around 62–63% accurate on both train and test (no overfitting).
Recall & Precision (macro ~0.62) show balanced performance across classes, while Specificity ~0.88 means it reliably avoids false alarms.
ROC-AUC ~0.88 indicates strong overall ranking/separation of classes. This results are good than the previous results but not good yet.
My rule of thumb is 70% training- 15% validation- 15% testing
The Focal Loss with class weight is used to balance batches which improves minority detection without creating invalid samples.
At the time of entry (Focal Loss with class weight) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
ANN- Train | 0.6267 | 0.6231 | 0.6328 | 0.8841 | 0.5905 | 0.8785 |
ANN- Test | 0.6255 | 0.6235 | 0.6322 | 0.8838 | 0.5905 | 0.8776 |
Sorting Rx Rx_Entered Date
2020 – 2024
I sorted all records by Rx RxEntered Date instead of mixed data that is already Existing.
Reason: pharmacy workflows change over days/weeks—so training in time order lets the model see natural patterns (weekday vs. weekend, month-start rush, batch entries, shift changes).
It also prevents future→past leakage that can happen with random splits.
�This gives us some better results than the one with the mixed data�
|
Rx RxEntered Date |
02/01/20 8:48 |
04/01/20 9:49 |
04/01/20 :49 |
04/01/20 11:13 |
04/01/20 11:42 |
04/01/20 12:01 |
04/01/20 12:09 |
04/01/20 12:12 |
04/01/20 12:15 |
04/01/20 12:16 |
04/01/20 12:16 |
04/01/20 12:19 |
04/01/20 13:09 |
04/01/20 14:22 |
05/01/20 8:48 |
05/01/20 9:05 |
05/01/20 9:08 |
05/01/20 10:32 |
05/01/20 10:35 |
MLP and Class Weight 5 cross validation (Random)
Good separation (high ROC-AUC) but threshold metrics are mid (0.59 F1).
For operations, this is useful but not final we can tune thresholds to trade off recall vs. precision based on cost.
ROC-AUC ~0.87 indicates strong overall ranking/separation of classes. This results are good than the previous results but not good yet.
My rule of thumb is 70% training- 15% validation- 15% testing
Loss: Focal Loss + class weights to handle imbalance
At the time of entry (Focal Loss with class weight) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
MLP- Train | 0.6220 | 0.6222 | 0.6290 | 0.8829 | 0.5900 | 0.8756 |
MLP- Test | 0.6226 | 0.6189 | 0.6286 | 0.8831 | 0.5907 | 0.8747 |
MLP and Class Weight 10 cross validation (Nonrandom)
Low accuracy (0.46–0.48) and only moderate F1 (0.52–0.54) even though Precision/Recall 0.60 many mistakes at the default threshold.
Strength: ROC-AUC (0.83–0.85) and Specificity 0.82 show the model separates classes reasonably well and avoids too many false alarms.
Cross design: Time-blocked folds (no shuffling). Each fold uses 3 months train → 3 months validation → 3 months test.�My rule of thumb is 70% training- 15% validation- 15% testing
Loss: Focal Loss + class weights to handle imbalance
At the time of entry (Focal Loss with class weight) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
MLP- Train | 0.4601 | 0.5894 | 0.5936 | 0.8190 | 0.5210 | 0.8346 |
MLP- Test | 0.4784 | 0.6065 | 0.6096 | 0.8196 | 0.5407 | 0.8473 |
Model 1 and Model 4
Area | Model1 (Code 1) | Model4 (Code 2) | Why Model4 helped |
Data split | Likely single train/val/test (possibly random) | Time-blocked folds (train past → predict future) | Matches the real use case, reduces leakage, more reliable generalization. |
Imbalance handling | Class weights only | Focal Loss + class weights | Focal focuses learning on hard/rare cases → better recall/precision at the same threshold. |
Batching | Standard batches | WeightedRandomSampler (or balanced mini-batches) | Keeps minority class visible every step → steadier gradients, less bias to majority. |
Thresholds | Default 0.5 | Validated threshold | Converts good ranking into better F1/ops metrics by picking a smarter cut-off. |
Regularization | Dropout/L2 (basic) | Dropout + AdamW + early stopping tuned | Lowers overfit; val and test get closer → more stable metrics. |
Feature pipeline | Basic scaling/encodings | Cleaner timestamps + consistent TZ + better encodings | Less noise → the model sees clearer patterns (weekday/month, backlog effects). |
Architecture details | MLP with ReLU | Same MLP but better tuned (lr, layers, hidden size, patience) | Small tuning often lifts F1/AUC without making the net bigger. |
MLP and Class Weight Latest Model
Consistent and balanced: Accuracy ~0.673 with Precision ≈ Recall ≈ 0.64 and F1 ≈ 0.638 across folds by this we can say there no obvious overfitting.
Strong separation, conservative alerts: ROC-AUC ~0.884 and Specificity ~0.894 show solid ranking and low false-positive rate.
Limitation: Performance plateaus around F1- 0.64.
My rule of thumb is 70% training- 15% validation- 15% testing
Loss: Focal Loss + class weights to handle imbalance
At the time of entry (Focal Loss with class weight) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
MLP- Train | 0.6724 | 0.6385 | 0.6455 | 0.8936 | 0.6381 | 0.8851 |
MLP- Test | 0.6729 | 0.6381 | 0.6446 | 0.8938 | 0.6378 | 0.8840 |
MLP and Class Weight Latest Model
Consistent and balanced: Accuracy ~0.60 with Precision ≈ Recall ≈ 0.55 and F1 ≈ 0.55 across folds by this we can say there no obvious overfitting but not a good performance overall.
Strong separation, conservative alerts: ROC-AUC ~0.884 and Specificity ~0.87 show solid ranking and low false-positive rate.
Limitation: Performance plateaus around F1- 0.55 and low accuracy of 0.60.
10-Fold Rolling window Validation method�Stability Period From 2022 to 2024�My rule of thumb is 70% training- 15% validation- 15% testing
Loss: Focal Loss + class weights to handle imbalance
At the time of entry (Focal Loss with class weight) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
MLP- VAL | 0.6002 | 0.5521 | 0.5495 | 0.8669 | 0.5504 | 0.8200 |
MLP- Test | 0.5109 | 0.4910 | 0.4675 | 0.8485 | 0.4570 | 0.7870 |
Fold 1:
Fold 2: (Slide the window forward by 1 month)
Fold 3: (Slide the window forward by 1 month)
Task1:�Train the model over specific time period before the pipe or after.
Data Extraction & Turnaround-Time (TAT) Feature Engineering
In the stability period 2022 to 2023
*target variable
�The data consisted of Rows: 29367 for around 1.2 year of data �that is 2022 to 2023�Derived two analytical columns to characterize workflow efficiency and anomalies:
0 – Low: Under 38 hrs
1 –High: More than 38hrs�2 – Incomplete Workflow: Missing Ph_verification��Task2:�Recalculate the feature after dividing the dataset to prevent the data leakage from training and testing.(≤ t.)�
MLP-Model
Stability Period From 2022 to 2023 (Before outlier)
My rule of thumb is 70% training- 15% validation- 15% testing
Loss: Focal Loss + class weights
Task3: 3 months for train and 1 month (fixed) for testing
�
At the time of entry (Focal Loss with class weight) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
MLP- Train | 0.7499 | 0.7351 | 0.7324 | 0.8677 | 0.7290 | 0.8843 |
MLP- Test | 0.5952 | 0.5129 | 0.5868 | 0.7908 | 0.5208 | 0.6943 |
MLP-Model
Stability Period From 2022 to 2023 (Before outlier)
My rule of thumb is 70% training- 15% validation- 15% testing
Loss: Focal Loss + class weights
Task3: 4 months for train and 1 month (fixed) for testing
�
At the time of entry (Focal Loss with class weight) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
MLP- Train | 0.7355 | 0.7265 | 0.7411 | 0.8654 | 0.7319 | 0.8803 |
MLP- Test | 0.6384 | 0.6725 | 0.6467 | 0.8112 | 0.6543 | 0.8214 |
MLP-Model
Stability Period From 2022 to 2023 (Before outlier)
My rule of thumb is 70% training- 15% validation- 15% testing
Loss: Focal Loss + class weights
Task3: 6 months for train and 1 month (fixed) for testing
�
At the time of entry (Focal Loss with class weight) | ||||||
Model | Accuracy | Precision | Recall | Specificity | F1 Score | ROC AUC |
MLP- Train | 0.7248 | 0.7470 | 0.6999 | 0.8474 | 0.7118 | 0.8832 |
MLP- Test | 0.6608 | 0.6997 | 0.6509 | 0.7932 | 0.6711 | 0.8120 |
Shap-Diagram�
Added Features:
1.Daily_Avg_RxEntered
Added Features: Staff based trend feature
2.Trend-Based Staff Efficiency Lag Features (1d, 7d, 30d)
Added Features: Timed - based trend feature
4. Delay from Entry to Verification (7d vs Overall)
5. Daily Volume Trend (7d vs Overall)
MLP-Model (Class weight + Focal loss)
My rule of thumb is 70% training- 15% validation- 15% testing
Loss: Focal Loss + class weights
Task3: 3 months for train and 1 month (fixed) for testing
�
From the performance of the model there is minor overfitting between the train and val ,test if that is sorted the overall model performance might be better than the present one.
Each class performance:
Shap-Diagram�
Why Staff – based appears in the SHAP?
Approach:
Shap-Diagram�
After removing the >0.8 Correlation between features.
�
After
Before:
Update
Window1:�2022-01-03 to 2023-02-27
Window2:�2023-06-01 to 2024-12-31
Total number of Prescription:
64551
The model trained for all 100 epochs as validation metrics showed continuous small improvements, with final test accuracy of 68% exceeding validation performance, confirming good generalization to unseen data.
Each class result: