Model Name: Screening Model #

Version: 1.0 #

Overview #

The Screening Model is a machine learning system designed to assist systematic review teams in prioritizing and screening literature records based on their likelihood of inclusion in a review.

The model learns from screening decisions made within a specific nest, identifying patterns associated with included, excluded, or advanced records. It then generates inclusion or advancement probabilities for unscreened records.

These probabilities can be used to:

Assist manual screening workflows by prioritizing records based on predicted relevance, or
Power the Robot Screener, an automated reviewer that can act as a second reviewer in dual screening workflows.

The system is designed to accelerate the screening stage of evidence synthesis while maintaining a human-in-the-loop workflow.

Intended Use #

Primary Purpose #

To support systematic review workflows by predicting the probability that a study should be included or advanced during screening, enabling reviewers to prioritize high-relevance records and reduce manual screening workload.

Intended Users #

Systematic review teams
Clinical researchers
Evidence synthesis professionals
Health technology assessment groups
Enterprise research organizations conducting literature reviews

Limitations #

Model performance depends on the quantity and quality of prior screening decisions in the nest.
Early in a review, limited training data may reduce model reliability.
Screening decisions rely on human-defined inclusion criteria, which may be complex or difficult for the model to learn.
The model should not replace expert review for final screening decisions unless used within controlled workflows (e.g., Robot Screener in dual-review mode).

Training Data #

Dataset #

The Screening Model is trained dynamically using screening decisions within a specific nest, including:

Included records
Excluded records
Advanced records (in two-pass screening)

Training data is generated from reviewer decisions made during the screening process.

Input Features #

The model uses multiple features derived from bibliographic metadata and textual information, including:

Bibliographic metadata
Publication age
Page count
Keywords and descriptors
Abstract content
Text n-grams
Text embeddings (OpenAI embedding model)
Citation metrics from Scite
Number of citing publications
Supporting and contrasting citation statements

Missing features are imputed using distributions derived from other records within the nest.

Language #

Primarily English.

Evaluation #

Performance Metrics #

Internal testing across several hundred systematic review projects produced the following representative performance metrics.

Standard Screening #

Metric	Value
AUC	0.88
Classification Accuracy	0.92
Recall	0.76
Precision	0.40
F1 Score	0.51

Two-Pass Screening #

Metric	Value
AUC	0.88
Classification Accuracy	0.93
Recall	0.81
Precision	0.44
F1 Score	0.56

Performance Characteristics #

High recall by design, minimizing the risk of excluding relevant studies.
Precision is lower due to class imbalance and the model’s preference for conservative exclusion.
Accuracy is typically high due to the large proportion of excluded records in screening datasets.

Known Issues #

Performance varies based on the size of the screened dataset used for training.
Class imbalance (many more exclusions than inclusions) can reduce precision.
Some records may lack sufficient metadata or abstract text for reliable prediction.
Predictions early in the screening process may be unstable due to limited training examples.

Ethical Considerations #

Human-in-the-Loop Limitations #

The Screening Model is intended to augment human screening workflows, not replace expert judgment.

When used solely to generate inclusion probabilities, the model provides decision support that reviewers may use to prioritize records.

When used as Robot Screener, the model may act as a second reviewer in dual-review workflows. While this can significantly accelerate screening, it increases reliance on automated predictions.

Best practice is to:

Monitor model performance metrics
Validate screening outcomes
Use human adjudication where disagreements occur

Limitations #

Model accuracy depends on the number of screened records available for training.
Early-stage reviews may not provide sufficient training data for strong performance.
Abstract-only information may not fully capture eligibility criteria.
Missing metadata may reduce prediction reliability.
Precision is intentionally lower than recall, meaning the model may suggest inclusion of some irrelevant records.

Contact Information #

For questions, feedback, or support, please contact
support@nested-knowledge.com

PALISADE Compliance #

Purpose #

To assist literature screening during systematic reviews by estimating the probability that a record should be included or advanced based on patterns learned from prior screening decisions.

Appropriateness #

The Screening Model is appropriate for:

Evidence synthesis workflows
Systematic literature reviews
Research prioritization tasks

It is not intended for clinical decision-making or diagnostic use.

Limitations #

Predictions depend on patterns learned from reviewer behavior within a specific nest.
Performance improves as more records are screened.
Automated screening workflows (e.g., Robot Screener) may introduce unreviewed errors if not monitored.

Implementation #

The Screening Model uses a gradient-boosted decision tree ensemble trained on screening decisions within each nest.

At a high level, the model evaluates records by asking a series of binary questions about their characteristics (e.g., metadata, textual features, citation signals). These decisions collectively produce a posterior probability of inclusion or advancement.

Model characteristics include:

Gradient-boosted decision tree ensemble
Logistic loss optimization
Cross-validation–based hyperparameter tuning
SMOTE oversampling to address class imbalance
Per-nest model training

Training begins once the following thresholds are met:

50 screened records
10 included or advanced records

After training, the model may update automatically as additional screening decisions are made.

Sensitivity and Specificity #

The Screening Model is designed to prioritize high recall (sensitivity) over precision.

This design reflects the asymmetric cost of screening errors:

False exclusions (missing a relevant study) are highly costly.
False inclusions can be corrected later during downstream review stages.

Typical performance characteristics include:

Recall typically between 0.75–0.80
Precision typically between 0.40–0.45
AUC around 0.88

Algorithm Characteristics #

Gradient-boosted decision tree ensemble
Probabilistic prediction of inclusion or advancement
Trained on reviewer decisions within each nest
Cross-validation–based performance evaluation
Class imbalance correction via SMOTE

The model is deterministic once trained but continuously updated as new screening data becomes available.

Data Characteristics #

The model processes data derived from records within the nest, including:

Bibliographic metadata
Abstract text
Keyword and descriptor fields
Citation metrics
Text embeddings
Derived linguistic features (n-grams)

Records may contain incomplete metadata; missing values are imputed during training.

Explainability #

The Screening Model provides transparency through:

Probability scores representing likelihood of inclusion or advancement
Cross-validation performance metrics

These outputs allow reviewers to:

Assess model reliability
Monitor screening progress
Identify high-relevance records

Model predictions should be interpreted as decision-support signals rather than definitive screening decisions.

Additional Notes on Compliance #

The Screening Model operates only on user-provided records within a nest and does not retain content beyond the review workflow.

Because the model learns from reviewer behavior and dataset characteristics, performance may vary between reviews. Human oversight remains essential to ensure that relevant studies are not inadvertently excluded.

Screening Model & Robot Screener

Model Name: Screening Model #

Version: 1.0 #

Overview #

Intended Use #

Primary Purpose #

Intended Users #

Limitations #

Training Data #

Dataset #

Input Features #

Language #

Evaluation #

Performance Metrics #

Standard Screening #

Two-Pass Screening #

Performance Characteristics #

Known Issues #

Ethical Considerations #

Human-in-the-Loop Limitations #

Limitations #

Contact Information #

PALISADE Compliance #

Purpose #

Appropriateness #

Limitations #

Implementation #

Sensitivity and Specificity #

Algorithm Characteristics #

Data Characteristics #

Explainability #

Additional Notes on Compliance #

How can we help?

Did this article help?

Have a question?