Literature search: Manual vs Data Driven

Literature search: Manual vs Data Driven
Parvathy Hariharan

Nested Knowledge is a medical analytics platform that translates scientific data from vast numbers of studies into interactive visuals updated in real time. Effectively, what we do is to put best meta-analytical practices into a software format, which enables us to use data-driven analytics to assess the coverage and accuracy of each step in the review process. Below, Nick Mikoff, assistant project manager, and Karl Holub, chief technical officer, at Nested Knowledge, explain how manual and data-driven literature searches are different and why the latter is superior to the former.

The key difference between manual and data-driven searches: humans derive personal experience and expertise from each search, whereas data-driven searches improves future searches by statistical optimization. Previous searches help find traits of the “included” articles — citation, text, and metadata patterns — that helps refine search terms and “rank” future search terms and studies. Software systems are also easily reusable and updateable, allowing novel analytics to be added to our ever-growing database. “For each new search we run, we use the results of that search to train new models with more data with increased statistical power,” Holub states. “In manual mode, maybe the literature reviewer learned a few new tricks.”

What are some flaws in manual searching that we need to overcome? “A big challenge that we face is balancing the amount of studies to filter and finding the relevant studies for the search,” Mikoff says. “We don’t want to under-search for a topic, but at the same time we don’t want to over-search and spend too much time going through irrelevant studies. Therefore, creating ways to improve our methods of creating and refining search terms is of high importance. For some of the larger topics, if we can cut our search down by 40% or so that would improve our manual search greatly.” Only a human expert with a lot of experience in the area of research can do the final decision-making in difficult cases, Holub adds, but data-driven methods can take the less qualitative work off of the hands of experts.

Developing software to replicate and improve a human process brings its own challenges. Mikoff points out that some fields gathering data more congruently, while other fields are more inconsistent. “This inconsistency makes figuring out what is important to gather challenging for us, and makes it hard to analyze data,” he says. Holub says that the sheer variety of study designs, treatment methods, and metrics in medicine are difficult to assimilate into structured data. He explains that studies can be retrospective, prospective, or randomized; they can analyze a single therapy, contrast many therapies, provide a continuous treatment effect or have repeated measures; they can also report continuous, ordinal, and categorical types of data. “These characteristics are combinatorial, leading to immense complexity in any data representation,” he says.

Both data science experts say they feel deeply fulfilled by their work. “Knowing that we are making it easier for the scientific field to understand data is enjoyable and rewarding,” Mikoff says. Holub agrees. “I have a progress bar in my head – I want to extract data from every published clinical trial and study. Each study gathered, tool built, and insight gained ticks that progress bar forward,” he says. “When I find a pattern in data — for example, how citation networks say a lot about what studies are likely to be included in a review — I feel like I’ve broken through a cloud of noise.”