Smart Study Location

Model Name: Smart Study Location

Version: 1.0


Overview #

The Smart Study Location tool identifies the geographical location of a study based on textual information in bibliographic data, including country and city mentions in abstracts and titles, and affiliation details. It is designed to extract the study’s most relevant location

Key features:

  • Extracts mentions of geopolitical entities, cities, and nationalities in various bibliographic fields using general purpose entity recognition models
  • Selects the most likely country of the study using context and population statistics

For non-original research or research occurring in multiple locations, the model aims to provide the most contextually appropriate result, such as author affiliations for narrative reviews.


Intended Use #

  • Primary Purpose: Automate the extraction of study location data from research abstracts to support systematic reviews, meta-analyses, and research landscape analyses.
  • Intended Users: Researchers, healthcare analysts, and systematic review teams.

Evaluation #

  • Performance Metrics:
    • Accuracy: 78% on a random sample of PubMed records with NCT ID linkage.
    • Recall: 0.79
    • Precision: 0.90
    • Most Reliable Use Case: Clinical study types, such as RCTs and cohorts.

Ethical Considerations #

  • Human Oversight: Users should validate extracted locations, especially in cases where the study type may influence the definition of “study location.”
  • Bias: The model may favor certain study types (e.g., RCTs) due to training data distribution and may misinterpret different types of mentions (e.g. “Japanese population in America”).

Limitations #

  1. Study Type Sensitivity: The definition of “study location” varies across research designs, leading to potential inaccuracies for less-structured study types (e.g., reviews).
  2. Ambiguity Handling: In cases of unclear location information, the model provides the most sensible result based on context, but this may not always align with user expectations.
  3. Scope: Outputs are restricted to country-level data, which may oversimplify studies conducted across multiple locations.

Planned Improvements #

  1. Study Type Awareness: Improve the model’s contextual understanding of study designs to refine its definition of “study location”.
  2. Multilingual Support: Extend capabilities to non-English abstracts to increase global applicability.

Contact Information #

For questions, feedback, or support, please contact support@nested-knowledge.com.


#

PALISADE Compliance #

Purpose

The Smart Study Location model is designed to identify the geographical context of studies, primarily for systematic reviews and research mapping. 

Appropriateness

The Smart Study Location tool is appropriate for identifying the geographical location of studies because it combines general-purpose entity recognition models with domain-specific logic tailored to bibliographic data. The tool extracts mentions of geopolitical entities, cities, and nationalities from multiple textual fields, such as titles, abstracts, and author affiliations, ensuring comprehensive coverage of location information.

The method is particularly well-suited for this task because it integrates statistical and contextual reasoning to identify the most likely country of the study, even in cases where multiple locations or ambiguous references are present.

Limitations

  • The model’s performance is constrained by the variability of study type definitions and the clarity of location information in abstracts.
  • In some cases, the reported location may default to author affiliations (e.g., in narrative reviews) rather than the study setting.
  • Limitations of the data: Restricted to English-language abstracts; performance may vary for ambiguous or poorly structured abstracts.

Implementation

The model is made easily available in cloud software and may run on standard hardware or GPUs for faster computation.

Sensitivity and Specificity

  • Recall: 0.79
  • Precision: 0.90
  • The model performs best with well-defined study types like RCTs and cohorts.

Algorithm Characteristics

  • Design: Geopolitical entity recognition, selection using context & source, and annotation back to the source text

Data Characteristics

  • Training and Testing Data: Derived from PubMed records with NCT ID linkage, ensuring high relevance to clinical and cohort studies.

Explainability

The model outputs a single country-level location based on detected mentions and contextual relevance, along with the location of these mentions in the text.


Additional Notes on Compliance #

This algorithm is not trained on the input data. It does not learn, store, or transmit input data, ensuring privacy and avoiding potential biases or ethical concerns related to training data.

Updated on January 17, 2025
Did this article help?

Have a question?

Send us an email and we’ll get back to you as quickly as we can!