Crafting the Perfect Prompts for Nested Knowledge’s AI Tools

Rules of thumb:  #

  1. Despite the user interface being designed around question-based prompts, always treat the Question box as you would with any other large-language model (LLM)-based prompting – tell it exactly what you want! 
  • A simple question will get you going, but then add more context — include instruction statements, examples, synonyms, definitions, formatting, etc. 
  • Details can and should be related to: 1) the information that you want the AI to retrieve and 2) the format in which you want that information.
  1. First and foremost, understand your research question and your expected data outputs. A protocol-driven review with pre-specified criteria and elements for data extraction will be your best tool. An uncertain reviewer leads to inconsistent AI results (i.e., garbage in, garbage out).
  • Identify a few papers of interest and see how your potential eligibility criteria are reported in the abstract and how content/data elements of interest are reported in the full text. 
  • When in doubt, give extra instructions to the LLM and further refine afterwards. Prompting can be an iterative process. 
  1. Prompt development is an iterative process of draft, test, revise, run. 

4. All prompt-based questions can be run with or without associated annotations, meaning the LLM response will highlight a direct excerpt from the text to inform its response. Selecting to run the LLM without requiring annotations will provide a more generative response, which in some cases, may be more accurate. 

Criteria-based Screening (CBS) with Smart Screener: #

Start by defining a set of inclusion criteria for your review (based on your protocol) such that each criterion being met will be answered as ‘Yes’ and marked for inclusion. For example, “is the study conducted in adults” will mean that pediatric studies fail to meet this criterion and Smart Screener will mark it as a ‘No’ and exclude the record. Currently, Smart Screener is unable to mark an answer as ‘Unanswered’ if the criterion is not reported. To ensure that any AI-based decisions are not overly inclusive, incorporate how to handle uncertainty or missing information in your prompt. 

For PICOS-based reviews, outline each component as its own criterion. Criteria should be ordered hierarchically, starting broad and narrowing in on more specific concepts such that the first criterion that the record fails to meet will serve as the reason for exclusion in the PRISMA. An easy way to think about how to set your hierarchy is to put the most obvious exclusion first that does not require full review of the record (e.g., can you tell from the title that the study type is relevant?).

Special consideration will be needed for your abstract-level criteria vs. full-text level, particularly with respect to eligibility criteria that may be unclear or not reported in the abstract and need to be confirmed in the full-text article (see Table 1). Note that any records without an abstract (for abstract level) or attached PDF (for full text level) will not be reviewed by Smart Screener.

Smart Screener can be run on all records (up to 5,000) or limited to only unscreened records. For nest with more than 5,000 records, do some preliminary screening to weed out obvious exclusions (i.e., via bulk actions based on CORE smart tags or other terms/keywords) and run Smart Screener on the remaining unscreened records. This approach can also be used to test your criteria by temporarily excluding all records other than a small test set. Once you are happy with the Smart Screener performance, return the excluded records to unscreened to run on all records.

Table 1. PICOS-based criteria

Abstract CriteriaFull-text Criteria
Population (broad)Is a broad disease state being studied?If you are confident that the correct population will be captured at the abstract level, there is no need to rescreen full texts for population
Population (specific)Specific population details should be reserved for full-text criteria (e.g., mutation status, line of therapy, 
InterventionAre interventions (class or category) of interest being evaluated? You may consider listing treatment names for any pre-specified lists.Use full-text criteria to drill down on particular doses or regimens of interest, procedure details, etc.
ComparatorConfirm that the study also assesses a comparator of interest (in addition to relevant interventions). 
Note: intervention, comparator, and outcomes can be combined into a single criterion to specifically review for outcomes reported for therapies of interest (to ensure the data are tied to interventions).
OutcomesUse caution at the abstract level: ask for broad outcome categories (e.g., efficacy or quality of life) without listing individual outcomes unless your approach requires relevant data to be reported in the abstract. Consider not including an outcomes criterion at the abstract level.Provide a list of specific outcomes for inclusion. You can also use the prompt to specify the types of data or summary statistics that are of interest. 
Study Type / Study DesignPrimary research articles, not including protocols, commentaries, editorials, etc.
 
You can also combine this with study design and state that clinical trials are relevant. Do not include observational studies, reviews or meta-analyses, guidelines, statistical models, etc.
Other criteria to consider
GeographyUse caution at the abstract level. While Smart Screener can infer location based on names of registries/database or institutions, most abstracts do not indicate geographic location. Instead, try using Study Location from CORE smart tags.Confirm geographic location based on the full-text paper.
Study SizeUse caution at the abstract level. Number of participants may not explicitly reported or is unable to be inferred based on the abstract alone. Instead, try using Study Size from CORE smart tags.Confirm the number of participants based on the full-text paper.
AgeUse caution at the abstract level. Age may not be explicitly reported or is unable to be inferred based on the abstract alone.Confirm the age of study participants based on the full-text paper.
SubgroupsUse caution at the full-text level if subgroups of interest may not be reported in the main paper. Smart Screener does not review supplemental materials.

Example of Hierarchically Ordered Criteria for CBS:

  1. Study type
  2. Geography
  3. Broad population
  4. Age
  5. Specific population
  6. Interventions
  7. Comparators
  8. Outcomes
  9. Subgroups

Sample Criteria with Example Prompts:

Study Type: “Is the study an original report (i.e., not a systematic review/meta-analysis or secondary report) of a clinical experiment (randomized, observational/cohort, or case report/series design)? Also, not a protocol.”

Study Size: “Does the study involve 5 or more patients? If the study size is not reported, it is safe to assume yes unless it’s a case report. 

Population: “Does the study population consist of primarily adults? (i.e., not a pediatric or animal population). If not explicitly mentioned to the contrary, it is acceptable to assume yes.”

Shoulder Arthroplasty as an Intervention: “Does the study involve shoulder arthroplasty as an intervention?”

Adaptive Smart Tags (ASTs): #

Text Tags:

Tags that are configured as “text” are qualitative in nature and will extract one text segment per study. The name of the tag should be informative for the LLM – not just an abbreviation that would be known by a subject matter expert – followed by a descriptive question/prompt with instructions to define the concept, provide examples, and tell the LLM how you want your output to be (e.g., detailed, concise, specific summary statistics or units, etc.) or even what you don’t want it to extract.

  • Geography: “Extract the country or region from which the study was conducted. Only extract the name of the country.” [this will avoid the LLM responding with a full sentence]
  • Age: “What was the age of participants in the study? Extract only as mean (SD) years for the overall study population. If age is only available by study arm, extract per arm.”
  • Study Objective: “What was the main objective of the study? Be as concise as possible. Do not extract objectives unrelated to assessing treatment efficacy”
  • Stratification factors for Randomization: “List only the factors used to stratify the randomization process (e.g., age, disease stage, prior treatment).”

Text tags can also be configured with “Text Options” to help standardize the outputs even further. With text options, you provide the exact word or phrase that the LLM should respond with (similar to a dropdown menu). This can be used for specific study designs, locations, treatment names, subgroup names, etc.

For content that is expected to be extracted as counts (e.g., number of patients enrolled, number of study centers, number of lines of prior therapy), tags can also be configured as “Numeric” to ensure the LLM only retrieves the number corresponding to your tag description. Remember to use caution when configuring numeric tags if the data may be reported heterogeneously (e.g., mean or median age), as the response will only provide a number and not the type of summary statistic, measure of variance, etc. In such cases, it is recommended to configure separate tags for different summary statistics. 

Tag Tables:

For more structured or quantitative data, tag tables are a great option to extract multiple rows of data per concept per study (e.g., arms, subgroups, timepoints). While they require a bit more configuration effort – see our Guide on Tag Tables – they’re highly flexible and optimized for structured data. 

When setting up your tag table, the tag name and question should provide sufficient information as instructions for the LLM across the entire table. Each column header should also be easily understandable as a standalone data element or concept. 

For example, a tag table looking to extract outcomes related to treatment response would have a clear tag name of “Response Outcomes” and question/prompt: “Provide data associated with response to treatment by treatment arm. Extract objective response rate (ORR), complete response (CR), partial response (PR), and stable disease (SD) as number of patients and percentage for each outcome.” The table setup for this outcome might look like the following to extract arm-level data for response outcomes:

Table 2. Example Tag Table Setup

Treatment ArmNumber of Patients AnalyzedTimepointORR, n (%)CR, n (%)PR, n (%)SD, n (%)
Arm 1
Arm 2

Running ASTs: #

Once your configuration is set and all of your tags have AI prompts, carefully select AI options: 

  • Do you want to run on abstracts or full texts? 
  • What about allowing answers that are more generative/summative (rather than requiring annotations)? 
  • Have you considered whether you can use “Apply”, or are you under constraints that mean you should only “Recommend”?

Review the retrieved responses and assess whether revisions should be made to any of your tag configurations or prompts. If changes are made (edits to prompts, changes to tag setup, new tags added), be sure to refresh your ASTs!

How (and When) to Refresh: #

Rerunning the AI without Refreshing will generate answers to new tag Questions you’ve built, but while leaving the pre-existing tags untouched. So, if you want to save your earlier work, do not refresh!

Running the AI and Refreshing will remove AI-applied tags or AI recommendations (whichever are applicable), and will re-run on all Questions. It will not remove or replace manually-applied tags!

Clearing is also an option, which will remove AI-applied or recommended tags without replacement. 

Tricks and Tips: #

  • The first tip is not about Configuration, but what you do after: Sanity-check a few studies! Sometimes, configuration issues or AI misinterpretations are very easily discoverable by checking just a few studies. If there is an obvious error type, reconfigure and refresh. Work with the LLM to improve how it helps you!
  • Write literal and precise Questions: You’re instructing an LLM with every Question you create! Don’t ask “Was the number of patients reported?” if what you want is “What was the number of patients reported?”
  • Learn Question Types: Single Apply means “find this tag”; Select questions can only be answered with sub-tags! If a tag is not either a Single Apply Question or configured as a child of a Select Question tag, it will not be assessed by the AI, and Question type is highly impactful on LLM extractions.
  • Build only what you need: Extraneous tags and ‘over-gathering’ from underlying studies can slow your review and may indicate that greater focus of your Research Question or Protocol may help. Tagging for secondary topics is useful, but balance complexity and noise against each additional topic! For tangential questions, especially if different studies may provide the answers, a second nest may be appropriate.
  • Watch our videos! Especially the AI Tagging Configuration advice. 5 minutes could save hours on future reviews.

Updated on September 25, 2025
Did this article help?

Have a question?

Send us an email and we’ll get back to you as quickly as we can!