Validation Studies of AI Tools in Nested Knowledge

At Nested Knowledge, we are committed to transparency and rigorous evaluation of our AI-powered evidence synthesis tools. This page compiles internal and external validation studies across the core stages of the review workflow: Search, Screening, Data Extraction, Critical Appraisal, and Reporting. These studies demonstrate the reliability, accuracy, and real-world performance of the AI tools embedded in Nested Knowledge (NK).

⭐ Featured Study: Independent End-to-End Validation

External Validation Assessment and Publication of Nested Knowledge AI Tools

(ISPOR EU, Nov 2025)

This independent evaluation assessed the performance of AI tools in AutoLit across the full literature review workflow, including Smart Search, Robot Screener, Adaptive Smart Tags for data extraction, and automated critical appraisal. Across multiple real-world projects, each AI capability achieved ~85% time savings and 90%+ recall, with some functions surpassing human reviewers in accuracy. View accompanying poster presentation.

Quick Links #

AI in Nested Knowledge – how AI powers our platform
NICE Compliance Guide – how Nested Knowledge aligns with UK standards for clinical evidence
Use Cases and Case Studies – real-world applications across the evidence synthesis life-cycle

1. Literature Search #

Smart Search is a human-in-the-loop reasoning agent in Nested Knowledge that uses LLM-driven chain-of-thought logic to build Boolean search strings after input of a research question.

AI Tool	Study Description	Key Findings	Source
Smart Search	LLM-driven chain-of-thought reasoning agent for Boolean search string generation, validated against Cochrane and in-system SLRs.	75%+ recall, significantly outperforming black-box LLM approaches.	ISPOR USA 2025 Poster

2. Screening #

Robot Screener is a machine learning-based AI tool in Nested Knowledge that replaces the second reviewer in a Dual Screening workflow. See blog on internal performance.

Criteria-Based Screening (CBS) uses an LLM-driven reasoning agent to convert a research question into a set of affirmative eligibility criteria. Smart Screener then evaluates each study against those criteria, returning a Yes/No decision per criterion.

AI Tool	Study Description	Key Findings	Source
Robot Screener	Internal validation: ML-based tool replacing second reviewer in Dual Screening workflow.	Up to 97% recall in internal studies; comparable to human reviewers.	ISPOR USA 2024 Poster
Robot Screener	External validation: Comparison with dual human screening across clinical and economic SLRs.	Comparable recall (AI: 0.82 vs human: 0.75; p=0.59). No significant difference in error rates.	ISPOR USA 2024 Poster
Robot Screener	AI-assisted screening of abstracts in a review with multiple population subgroups and outcomes.	Accuracy increased from 0.71 (225 abstracts) to 0.72 (450 abstracts), demonstrating Robot Screener performance can be assessed early with relatively few training records.	ISPOR EU 2024 Poster Value in Health Publication
Robot Screener	AI Time and Motion: Accuracy and efficiency of AI for HTA-standard SLRs (234 studies).	Hybrid-AI screening: 97% accuracy, 51% time reduction. Fully-AI: 81% time savings.	ISPOR EU 2025
Robot Screener	Old Data, New Tricks: AI-driven SLR updates comparing ML-based screening to human screening across clinical, HRQoL, and economic SLRs.	All relevant studies captured by ML model. Substantial efficiency gains: Clinical 68h→1h (fully-AI), HRQoL 100h→0.6h, Economic 124h→0.6h.	ISPOR US 2026
Criteria-Based Screening (CBS) / Smart Screener	Assessment of autonomous CBS for umbrella review screening using PICOS-based Yes/No questions, tested on a published umbrella review of proton pump inhibitor safety.	97.7% recall at both abstract and full-text stages. 86.2% accuracy at abstract level, 81.9% at full text. Fully traceable, human-in-the-loop workflow.	ISPOR USA 2026
Robot Screener & Smart Screener	Comparative external validation of two AI screening approaches in NK: criteria-based screening (Smart Screener) versus advancement-probability-based screening (Robot Screener) at the title/abstract stage.	Criteria-based screening: 0.99 sensitivity, 0.21 specificity (471 references flagged). Advancement probability-based: 0.71 sensitivity, 0.46 specificity (327 advanced). Approach selection shifts the balance between completeness and reviewer workload.	ISPOR USA 2026

Additional Screening Studies #

AI Tool	Study Description	Source
Robot Screener	Methodological paper on AI-assisted screening across 5 diverse Targeted Literature Reviews.	ISPOR EU 2025
Robot Screener	Large-scale deployment of NK screening across 28 oncology TLRs, demonstrating platform scalability.	ISPOR EU 2025

3. Tagging (Data Extraction) #

Core Smart Tags (CSTs) is a specialised AI tool in Nested Knowledge that combine machine learning, NLP, and heuristics to extract and hierarchically structure key clinical data from a research question. This includes extracting PICOs, study type, location, and size with validated accuracy and human-in-the-loop oversight, enabling faster, reliable evidence extraction for clinical SLRs.

AI Tool	Study Description	Key Findings	Source
Core Smart Tags	Repeatable Auto-Extraction Frameworks: Validates multi-model system for extracting PICOs, study type, location, and size.	Validated accuracy for PICO extraction, study type, location, and size with human-in-the-loop oversight.	ISPOR USA 2025 Poster
Core Smart Tags	PICO Prediction for Joint Clinical Assessments (JCA): Mapping and consolidation of PICOs for Soft Tissue Sarcoma ahead of JCA.	Revealed extensive heterogeneity; demonstrated how AI-supported searching streamlines PICO assessment for complex conditions.	ISPOR EU 2025

Adaptive Smart Tags (ASTs) leverages LLMs to automatically highlight and extract user-defined variables across abstracts and full texts, achieving up to 80% match to manual extraction in validation studies enabling faster, intuitive and audit-tracked data structuring for complex clinical reviews.

AI Tool	Study Description	Key Findings	Source
ASTs (V1)	Efficiency of NK to facilitate SLR: Validated at GES Prague 2024.	Demonstrated up to 80% match to manual extraction.	GES Prague 2024 Poster
ASTs (V1)	Proof-of-concept study: AI-assisted SLR of economic burden in metastatic pancreatic adenocarcinoma, evaluating screening and extraction.	ML screening: 87% accuracy, 82% recall. Data extraction: 72.9% mean accuracy. No AI hallucinations. 44–59% time savings across tasks.	ISPOR USA 2025
ASTs (V2)	Assessing Accuracy of Data Extraction of GLP-1 Agonists for Weight Loss using unsupervised ASTs.	Exceptionally high accuracy (F1 ≈ 0.98) in identifying key concepts across GLP-1 RA RCT abstracts.	ISPOR EU 2025
ASTs (V2)	Performance of ASTs for automated extraction of study characteristics in an SLR of refractory chronic cough treatments (37 studies, 18 characteristics).	95% accuracy for publication type and trial registration. 78–92% for core design elements. Human-in-the-loop essential for complex/variable elements.	ISPOR US 2026

Additional AST Studies #

AI Tool	Study Description	Source
ASTs (V2)	Internal validation testing of abstract Adaptive Smart Tags; live online statistics, performance still evolving.	Internal Testing
ASTs (V2)	Case study: ML-assisted data extraction in a movement disorders SLR (by RTI).	ISPOR US 2026
ASTs (V2)	AI-assisted qualitative data extraction and evidence mapping in an umbrella review of safety outcomes in solid tumors.	ISPOR US 2026

4. Critical Appraisal #

AI Tool	Study Description	Key Findings	Source
ASTs for Critical Appraisal	Evaluation of ML-assisted risk of bias assessments for supporting SLRs: Case study in movement disorders (by RTI).	Validated AST performance for automated risk of bias / critical appraisal in a movement disorders SLR.	ISPOR US 2026

5. Meta-Analytical Extraction #

Smart Meta-Analytical Extraction (SMAE) is an AI tool that generates a rapid meta-analytical outputs, such as forest plots, from chosen studies and their accompanying full texts. Released May 2025, it is currently in beta.

AI Tool	Study Description	Key Findings	Source
Smart MA Extraction	AI-assisted quantitative extraction of Acute Lymphoblastic Leukemia (ALL) studies.	~95% time savings. Accurate AI-generated summaries. Human review needed for heterogeneously reported outcomes.	ISPOR EU 2025

6. Insights #

Smart Insights is an AI-assisted feature in Nested Knowledge that helps you generate clear, evidence-backed written insights directly from your extracted data.

AI Tool	Study Description	Key Findings	Source
Smart Insights	From Data to Insights: First validation of AI-generated writing in evidence synthesis. Evaluates Smart Insights for automated narrative generation from structured review data.	First validation of the Smart Insights tool for AI-generated evidence synthesis writing.	ISPOR US 2026

7. Full Workflow & Platform Reviews #

End-to-End Workflow Evaluations #

Scope	Study Description	Key Findings	Source
Entire NK Workflow	First comprehensive methodological framework and validation summary for the AutoLit AI tool suite across the full SLR workflow — search strategy generation, dual title/abstract and full-text screening, qualitative and quantitative extraction, critical appraisal, and insight drafting, with fully automated network meta-analysis — under human-in-the-loop curation.	Strongest evidence for search, screening, structured tagging, and extraction efficiency. Smart Search: 76.8% / 79.6% recall across two sets. Screening: 82–97.1% recall (Inclusion Prediction Model); ~95% accuracy and ~80% time savings (Smart Screener). Core Smart Tags: F1 = 0.74 (PICO); 74% / 78% / 91% accuracy for study type, location, and size. Smart MA Extraction: >95% time savings. Smart Insights: 4.9/5 citation integrity, 4.4/5 synthesis quality. Smart Critical Appraisal supported but with more limited validation to date.	JMIR Preprints (2026) — preprint, under open peer review; DOI: 10.2196/preprints.103265
Entire NK Workflow	High-level review evaluating the performance and optimal use strategy of an AI-assisted tool for oncology literature review.	Comprehensive assessment of the NK platform across the full SLR workflow in oncology.	ISPOR US 2026
Entire NK Workflow	Scoping review mapping current and developing guidelines for the use of AI in evidence synthesis across regulatory bodies, methods organisations, and journals. Awarded a JMCP Gold medal at AMCP Nexus 2025.	Purpose-built SLR tools achieved 99% accuracy and 96.1% sensitivity on average, outperforming foundational models (81.7% accuracy, 0.77 F1) and custom models (81.2% accuracy, 0.50 F1).	AMCP Nexus 2025 (Abstract 119)

Feature Comparisons & External Reviews #

Type	Description	Key Findings	Source
Feature Comparisons	Function-by-function match-up across evidence synthesis systems.	First published in 2022, updated in 2023.	JMIR (2022)
NK vs 3 systems	Independent comparative benchmarking: NK was concluded as the most comprehensive, effective tool (referred to as “T1”).	NK outperformed three other automation platforms across features.	ISPOR USA 2025 Poster
Academic Review	External review of NK for academic research.	Positive assessment of NK’s capabilities for SLR workflows.	Effortless Academic
Regulatory Review	Review of digital tools for clinical evaluation of medical devices, empowering regulatory writers.	NK positioned as a key tool for clinical reviews in regulatory research.	Medical Writing Journal
Ecosystem Map	Living Evidence Map: Position of NK in the automation ecosystem for systematic reviews.	NK mapped within the broader landscape of SLR automation tools.	Medium (FarhadInfo)
External Overview	Independent overview cataloguing 38 ML/automation approaches for systematic review and meta-analysis tasks across four categories (Framework/Tool, Package/Software, Model/Method, Web Application).	Nested Knowledge included as a “Web Application and Integrated System,” recognised for offering a comprehensive platform spanning search, screening, tagging, extraction, and synthesis.	MDPI BioMedInformatics (2023)
Epidemiological Review	External assessment of NK’s suitability for epidemiological systematic literature reviews.	Smart Search retrieved ~90% of articles from the published SLRs. Pooled estimates closely matched published values (TAK incidence: 1.10 vs 1.11 per 1,000,000; NCFBE prevalence: 680 vs 570 per 100,000). Workflow required ~15% of the time of a traditional SLR.	ISPOR USA 2025 Poster

Do you have further questions about how our AI tools work? Let us know!

AutoLit

Synthesis

Administrative Tools

Support and FAQs

Best Practices for NKs AI Tools

Best Nest Building Practices

SLR and MA Basics: How to Perform Systematic Review & Meta-Analysis

Best Practices for Writing a Publishable Manuscript

Model Cards

Validation Studies of AI Tools in Nested Knowledge

Quick Links #

1. Literature Search #

2. Screening #

Additional Screening Studies #

3. Tagging (Data Extraction) #

Additional AST Studies #

4. Critical Appraisal #

5. Meta-Analytical Extraction #

6. Insights #

7. Full Workflow & Platform Reviews #

End-to-End Workflow Evaluations #

Feature Comparisons & External Reviews #

Did this article help?

Have a question?

Validation Studies of AI Tools in Nested Knowledge

Quick Links #

1. Literature Search #

2. Screening #

Additional Screening Studies #

3. Tagging (Data Extraction) #

Additional AST Studies #

4. Critical Appraisal #

5. Meta-Analytical Extraction #

6. Insights #

7. Full Workflow & Platform Reviews #

End-to-End Workflow Evaluations #

Feature Comparisons & External Reviews #

How can we help?

Did this article help?

Have a question?