Validation Studies of AI Tools in Nested Knowledge

At Nested Knowledge, we are committed to transparency and rigorous evaluation of our AI-powered evidence synthesis tools. This page compiles internal and external validation studies across the core stages of the review workflow: Search, Screening, Data Extraction, Critical Appraisal, and Reporting. These studies demonstrate the reliability, accuracy, and real-world performance of the AI tools embedded in Nested Knowledge (NK).


⭐ Featured Study: Independent End-to-End Validation

External Validation Assessment and Publication of Nested Knowledge AI Tools

(ISPOR EU, Nov 2025)

This independent evaluation assessed the performance of AI tools in AutoLit across the full literature review workflow, including Smart Search, Robot Screener, Adaptive Smart Tags for data extraction, and automated critical appraisal. Across multiple real-world projects, each AI capability achieved ~85% time savings and 90%+ recall, with some functions surpassing human reviewers in accuracy. View accompanying poster presentation.


Quick Links #

1. Literature Search #

Smart Search is a human-in-the-loop reasoning agent in Nested Knowledge that uses LLM-driven chain-of-thought logic to build Boolean search strings after input of a research question.

AI ToolStudy DescriptionKey FindingsSource
Smart SearchLLM-driven chain-of-thought reasoning agent for Boolean search string generation, validated against Cochrane and in-system SLRs.75%+ recall, significantly outperforming black-box LLM approaches.ISPOR USA 2025

Poster

2. Screening #

Robot Screener is a machine learning-based AI tool in Nested Knowledge that replaces the second reviewer in a Dual Screening workflow.

Criteria-Based Screening (CBS) uses an LLM-driven reasoning agent to convert a research question into a set of affirmative eligibility criteria. Smart Screener then evaluates each study against those criteria, returning a Yes/No decision per criterion.

AI ToolStudy DescriptionKey FindingsSource
Robot ScreenerInternal validation: ML-based tool replacing second reviewer in Dual Screening workflow.Up to 97% recall in internal studies; comparable to human reviewers.ISPOR USA 2024 (Internal)

Poster
Robot ScreenerExternal validation: Comparison with dual human screening across clinical and economic SLRs.Comparable recall (AI: 0.82 vs human: 0.75; p=0.59). No significant difference in error rates.ISPOR USA 2024 (External)

Poster
Robot ScreenerAI Time and Motion: Accuracy and efficiency of AI for HTA-standard SLRs (234 studies).Hybrid-AI screening: 97% accuracy, 51% time reduction. Fully-AI: 81% time savings.ISPOR EU 2025
Robot ScreenerAI-Assisted Screening in Targeted Literature Reviews (TLRs): Evaluated across 5 diverse TLRs.~90% workload reduction; recall 0.88–0.97; fewer than 10% of abstracts required human review.ISPOR EU 2025
Robot ScreenerUse in Oncology-Focused TLRs: Large-scale deployment of NK screening across 28 oncology targeted literature reviews.NK AI screening was used successfully across 28 reviews, demonstrating platform scalability and reliability for oncology-focused evidence synthesis.ISPOR EU 2025
Robot ScreenerOld Data, New Tricks: AI-driven SLR updates comparing ML-based screening to human screening across clinical, HRQoL, and economic SLRs.All relevant studies captured by ML model. Substantial efficiency gains: Clinical 68h→1h (fully-AI), HRQoL 100h→0.6h, Economic 124h→0.6h.ISPOR USA 2026
Criteria-Based Screening (CBS)Assessment of autonomous CBS for umbrella review screening using PICOS-based Yes/No questions, tested on a published umbrella review of proton pump inhibitor safety.97.7% recall at both abstract and full-text stages. 86.2% accuracy at abstract level, 81.9% at full text. Fully traceable, human-in-the-loop workflow.ISPOR USA 2026

3. Tagging (Data Extraction) #

Core Smart Tags (CSTs) is a specialised AI tool in Nested Knowledge that combine machine learning, NLP, and heuristics to extract and hierarchically structure key clinical data from a research question. This includes extracting PICOs, study type, location, and size with validated accuracy and human-in-the-loop oversight, enabling faster, reliable evidence extraction for clinical SLRs.

AI ToolStudy DescriptionKey FindingsSource
Core Smart TagsRepeatable Auto-Extraction Frameworks: Validates multi-model system for extracting PICOs, study type, location, and size.Validated accuracy for PICO extraction, study type, location, and size with human-in-the-loop oversight.ISPOR USA 2025

Poster
Core Smart TagsPICO Prediction for Joint Clinical Assessments (JCA): Mapping and consolidation of PICOs for Soft Tissue Sarcoma ahead of JCA.Revealed extensive heterogeneity; demonstrated how AI-supported searching streamlines PICO assessment for complex conditions.ISPOR EU 2025

Adaptive Smart Tags (ASTs) leverages LLMs to automatically highlight and extract user-defined variables across abstracts and full texts, achieving up to 80% match to manual extraction in validation studies enabling faster, intuitive and audit-tracked data structuring for complex clinical reviews.

AI ToolStudy DescriptionKey FindingsSource
ASTs (V1)Efficiency of NK to facilitate SLR: Validated at GES Prague 2024.Demonstrated up to 80% match to manual extraction.GES Prague 2024

Poster
ASTs (V1)Proof-of-concept study: AI-assisted SLR of economic burden in metastatic pancreatic adenocarcinoma, evaluating screening and extraction.ML screening: 87% accuracy, 82% recall. Data extraction: 72.9% mean accuracy. No AI hallucinations. 44–59% time savings across tasks.ISPOR USA 2025
ASTs (V2)Internal validation testing of abstract Adaptive Smart Tags.Internally conducted online statistics demonstrating improved performance over V1.Internal Testing
ASTs (V2)Assessing Accuracy of Data Extraction of GLP-1 Agonists for Weight Loss using unsupervised ASTs.Exceptionally high accuracy (F1 ≈ 0.98) in identifying key concepts across GLP-1 RA RCT abstracts.ISPOR EU 2025
ASTs (V2)Performance of ASTs for automated extraction of study characteristics in an SLR of refractory chronic cough treatments (37 studies, 18 characteristics).95% accuracy for publication type and trial registration. 78–92% for core design elements. Human-in-the-loop essential for complex/variable elements.ISPOR USA 2026
ASTs (V2)AI-assisted qualitative data extraction and evidence mapping in an umbrella review of safety outcomes in solid tumors.AI extractions via GPT-4 with 100% human QC. Demonstrated streamlined evidence mapping without compromising accuracy.ISPOR USA 2026
ASTs (V2)Evaluation of ML-assisted data extraction for supporting SLRs: Case study in movement disorders.Validated AST performance for extracting data in a movement disorders SLR, supporting reliability of AI-assisted extraction workflows.ISPOR USA 2026

4. Critical Appraisal #

AI ToolStudy DescriptionKey FindingsSource
ASTs for
Critical Appraisal
Evaluation of ML-assisted risk of bias assessments for supporting SLRs: Case study in movement disorders (by RTI).Validated AST performance for automated risk of bias / critical appraisal in a movement disorders SLR.ISPOR USA 2026

5. Meta-Analytical Extraction #

Smart Meta-Analytical Extraction (SMAE) is an AI tool that generates a rapid meta-analytical outputs, such as forest plots, from chosen studies and their accompanying full texts. Released May 2025, it is currently in beta.

AI ToolStudy DescriptionKey FindingsSource
Smart MA Extraction AI-assisted quantitative extraction of Acute Lymphoblastic Leukemia (ALL) studies.~95% time savings. Accurate AI-generated summaries. Human review needed for heterogeneously reported outcomes.ISPOR EU 2025

6. Insights #

Smart Insights is an AI-assisted feature in Nested Knowledge that helps you generate clear, evidence-backed written insights directly from your extracted data.

AI ToolStudy DescriptionKey FindingsSource
Smart InsightsFrom Data to Insights: First validation of AI-generated writing in evidence synthesis. Evaluates Smart Insights for automated narrative generation from structured review data.First validation of the Smart Insights tool for AI-generated evidence synthesis writing.ISPOR USA 2026

7. Full Workflow & Platform Reviews #

End-to-End Workflow Evaluations #

ScopeStudy DescriptionKey FindingsSource
Entire NK WorkflowHigh-level review evaluating the performance and optimal use strategy of an AI-assisted tool for oncology literature review.Comprehensive assessment of the NK platform across the full SLR workflow in oncology.ISPOR USA 2026

Feature Comparisons & External Reviews #

TypeDescriptionKey FindingsSource
Feature ComparisonFunction-by-function match-up across evidence synthesis systems.First published in 2022, updated in 2023.JMIR (2022) / SciRes (2023)
NK vs 3 systemsIndependent comparative benchmarking: NK was concluded as the most comprehensive, effective tool (referred to as “T1”).NK outperformed three other automation platforms across features.ISPOR USA 2025
Academic ReviewExternal review of NK for academic research.Positive assessment of NK’s capabilities for SLR workflows.Effortless Academic
Regulatory ReviewReview of digital tools for clinical evaluation of medical devices, empowering regulatory writers.NK positioned as a key tool for clinical reviews in regulatory research.Medical Writing Journal
Ecosystem MapLiving Evidence Map: Position of NK in the automation ecosystem for systematic reviews.NK mapped within the broader landscape of SLR automation tools.Medium (FarhadInfo)

Do you have further questions about how our AI tools work? Let us know!

Updated on April 28, 2026
Did this article help?

Have a question?

Send us an email and we’ll get back to you as quickly as we can!