
How Smart Study Type Tags Are Reinventing Evidence Synthesis
One of the features of Core Smart Tags is Smart Study Type – this refers to our AI system that automatically categorises the study type
If you’ve ever uploaded a set of articles into Nested Knowledge and noticed that the numbers shown in your Literature Search, PRISMA diagram, or uploaded files don’t quite match, you’re not alone. Deduplication is a nuanced process, and while Nested Knowledge handles it automatically, there are several key implementation details to ensure deduplication remains sane in a new paradigm: living systematic review. Understanding how and where deduplication happens will help you interpret your record counts with confidence and avoid panic when things don’t add up at first glance.
This guide explains how deduplication works in Nested Knowledge, where each number comes from, and how to set up your project for clean, auditable reporting.
Deduplication is central to producing a trustworthy foundation for your review. But it can also be one of the most confusing aspects of systematic review software. Nested Knowledge automatically deduplicates records so that you screen and extract from unique evidence, but different types of duplication are handled differently depending on their relevance for audit and reporting.
This guide explains:
Not all “duplicates” are created equal. While any repeated record must be collapsed to enable screening of unique evidence, only some duplicates are meaningful for audit and reporting purposes.
Nested Knowledge distinguishes between two categories by design.
A between-database duplicate occurs when the same record appears in two or more distinct databases that were searched. For example, a study indexed in both PubMed and Embase.
How Nested Knowledge treats them
Between-database duplicates represent overlap between independent sources. Reporting this overlap is essential for demonstrating the breadth of a search strategy and for tracing how records flowed from multiple databases into a single, deduplicated evidence set.
A within-search match occurs when the same record appears more than once within a single database search, such as:
These are not independent records. They are effectively repeated copies of the same metadata originating from a single source.
How Nested Knowledge treats them
Counting within-search matches as “duplicates” would artificially inflate duplicate totals and misrepresent search overlap. Conceptually, these matches are equivalent to a database internally deduplicating results when multiple queries are ORed together.
There are several locations across the platform where record counts appear. Each reflects a different stage of import and deduplication and each aligns differently with the inter- vs. intra-search distinction.
Execution History shows the raw number of records imported from each uploaded file. This reflects the total rows present in the file, regardless of duplication.
Execution History represents input volume, not usable evidence.
The “Results” column shows the number of unique studies attributed to a search after within-search deduplication.
This number matches the Study Inspector when filtered by the same search.
The Intersections view displays a Venn diagram of overlap between searches.
This view aligns with within-search deduplicated counts, not raw imports.
The Duplicate Queue is a manual review and auditing interface used to inspect or override deduplication decisions.
The PRISMA flowchart in Nested Knowledge follows the PRISMA 2020 guidelines and therefore includes only between-database duplicate records that appear across distinct sources such as PubMed and Embase.
This is intentional. PRISMA is designed to show how many records were uniquely identified across sources, not how many times a single database repeated the same record.
Duplicate records and related reports are often confused, but they represent fundamentally different concepts:
Related reports are not a deduplication issue. They should be handled using the Related Reports feature, which ensures data are extracted once per underlying study.
File: PubMed_A.nbib, contains 1,000 records
Within-search duplicates: 50
Execution History: 1,000
Results: 950
PRISMA:
Records identified: 950
Duplicates removed: 0
What happened: The 50 duplicates were within a single file, so they were removed silently and don’t appear in PRISMA.
Example B: Same Database, Two Files
Files: PubMed_A.nbib (800 records), PubMed_B.nbib (300 records)
Overlap between the files: 100
Execution History: 1,100
Results: 1,000
PRISMA:
Records identified: 1,000
Duplicates removed: 0
What happened: Since both files are from the same database, the overlap is treated as within-database duplication and excluded from PRISMA’s duplicate count.
Example C: Two Distinct Databases (Between-Database Deduplication)
Files: PubMed.nbib (1,000 records), Embase.xml (900 records)
Between-database duplicates: 200
Execution History: 1,900
Results: 1,700
PRISMA:
Records identified: 1,900
Duplicates removed: 200
Records after deduplication: 1,700
What happened: 200 studies appeared in both databases and were correctly reported in PRISMA.
Example D: Mislabeling Databases
E.g. you bring two files: PubMed (1,000), Embase (900)
Uploaded under label: “Other” (for both)
Actual overlap: 200
Execution History: 1,900
Results: 1,700
PRISMA:
Records identified: 1,700
Duplicates removed: 0
What happened: The system treated both files as one source (since the user labelled them as the same database), so the between-database duplicates collapsed silently. This setup prevents PRISMA from correctly showing the duplicate count.
Avoiding Panic Points
If you see that Execution History = 1,900, but PRISMA only shows 1,700 records identified and no duplicates removed, that doesn’t mean records disappeared. Most likely, those duplicates occurred within a single labeled source and were removed during import, but not counted in the PRISMA duplicate line because they were not cross-database.
Recommended Practices
Label each database accurately (e.g., “PubMed,” “Embase,” not “Other”).
→ Ensures that cross-database duplicates are reflected in PRISMA.
Name files clearly with query type and date (e.g., PubMed_2025-07-28_main-query.nbib).
→ Makes Execution History traceable and search updates auditable.
For updates, upload a new file under the same database label.
→ Prevents unintended inflation of record counts.
Keep grey literature and hand-searches separate, labeled distinctly.
→ PRISMA displays these separately, maintaining transparency.
Use Study Inspector or the Intersections view to verify attribution and clustering.
→ Helpful for reconciliation and record traceability.
Common Mistakes to Avoid
Uploading multiple databases under a generic label like “Other”
→ Hides true between-database duplication and skews PRISMA output.
Re-uploading the same export repeatedly under a new name
→ Silently creates within-database duplicates that won’t appear in PRISMA.
Mixing grey literature and database records under the same source
→ Breaks the reporting structure and makes origin tracking harder.
Reconciliation Checklist
Use this workflow when you’re comparing counts or validating your setup:
Check Execution History to sum the raw number of imported records.
Confirm correct source labels for each upload (PubMed, Embase, etc.).
Compare to PRISMA:
“Records identified” ≈ deduplicated count per source
“Duplicates removed” = between-database overlap only
Review the Results column to see unique studies per search.
Use Study Inspector or cluster view to trace which files contributed to each unique record.
We know how important accurate record counts are in your review. If you’re unsure about what you’re seeing or how to resolve a discrepancy, see our Duplicate Review documentation or reach out to our support team or request a walkthrough. We’re happy to help!
Yep, you read that right. We started making software for conducting systematic reviews because we like doing systematic reviews. And we bet you do too.
If you do, check out this featured post and come back often! We post all the time about best practices, new software features, and upcoming collaborations (that you can join!).
Better yet, subscribe to our blog, and get each new post straight to your inbox.

One of the features of Core Smart Tags is Smart Study Type – this refers to our AI system that automatically categorises the study type