Lesson 4: The Study Lifecycle

  • How should you read a study when Screening? When tagging? When extracting quantitative data?
  • PRISMA Flow Diagrams and auditing your review
  • Scrutinizing study design and findings from underlying studies
  • How to collect data from secondary reports
  • Introduction to using the Nested Knowledge software (AutoLit)



  • Create a nest and add your Search (instructions); then, run it on PubMed
  • Add your Exclusion Criteria to your nest (instructions)
  • Build your Tagging Hierarchy to contain your study characteristics, P, I, and O (instructions)
  • Screen and tag at least 3 studies
  • (Optional): Configure your Extraction (instructions) and extract all Data Elements from at least 3 studies

Kevin Kallmes: Hi, this is Kevin Kallmes with Nested Knowledge, here to bring you Lesson Four in our course on how to systematically review the medical literature. Today, we’re going to be going over the study, lifecycle. So the flow of a study from when you pull in its metadata off PubMed, through to when you’ve extracted every bit of data for your own analysis. Before we get started, let’s go over what we’ve learned in the first three lessons. If you recall, we built out our research question, reflecting the patient population, the interventions and comparators and our outcomes of interest. We drafted a systematic review protocol reflecting our study design, our search strategy, our inclusion and exclusion criteria, our study characteristics, patient characteristics, and then the interventions and outcomes of interest.


KK: We took our preliminary search, which was effectively composed of terms of interest added in Boolean operators, and mesh into other synonyms and effectively built out those terms into a structured Boolean search query. And all of this, if you recall, is on the topic of basilar artery stroke. And today, we’re going to be drilling down further on the actual studies within the basilar artery review. But let’s take a second and ask ourselves, “If I’ve built up my full protocol, I’ve searched and I’ve pulled in all my records, what’s left for me to do in a study, sorry, in a review, in order to complete it,” really, it is composed of three steps that are completed on underlying studies. So first, you’ll screen and for every included study, then you’ll tag qualitative content, and then extract quantitative data. Once you’ve done so, you will be set to analyze your data, write up your findings, and then you are done with your systematic review. So I will cover those two steps of analysis and write up in the next lesson. But for now, let’s hop in on the study lifecycle.


KK: The basic study flow should already be pretty familiar to you from drafting your protocol, you have already put together your inclusion and exclusion criteria. And so now you’re going to put every study through a process of identifying, “Does this meet my exclusion criteria?” If so, it does not move forward in this process, then you’re going to find the underlying qualitative key concepts in these studies, and you’re going to apply tags directly into the software. And then you’re going to, for those tags, that are also quantitative concepts that have data associated with them, you’re going to also extract those from the underlying studies as well. And we’re actually gonna do that in a PDF in the software in a second. But before we do so, I wanted to go through a couple of wrinkles. The first is, it’s actually really important to track the excluded studies, so you don’t just throw them out, you need to track their flow. And in doing so, we’re going to go through the Prisma diagram a bit, which reflects where studies came in, and then also where studies exited from your flow.


KK: In the Prisma diagram, there are a couple of key definitions that you’ll be confused without. So I’ve been using the term, study throughout this course. But there are actually a couple of distinctions that we need to make between a record, which is the bibliographic data, a report, which is the publication itself, and the study, which is the work done by an investigator to treat patients and collect their data. And there’s not always a one-to-one relationship among those. So again, the record is the bibliographic data that you’re pulling in from PubMed, which usually represents a single report, or a full text describing a given study, but the study itself represents the actual treatment of patients and collection of data. And there can be multiple reports on the same study. And so part of your job in the systematic review process is figuring out how you’re going to treat secondary reports.


KK: So reports on the same study containing data on the same patients, but giving a different slice of the data or a different interpretation of that same underlying study. And then lastly, with every study that you’re taking on, you’re going to need to ask yourself about bias and potential quality issues. This can be through a risk of bias assessment. But in effect, you should be examining the underlying studies, do their designs, drill down, for instance, on a specific subset of the population that would bias the results. Do they have methods that are different from the standard of care and therefore shouldn’t be combined with other studies? These questions should always be asked about every start of the year, including because you’re about to combine data from across different studies. And you need to make sure that there’s going to be an apples to apples comparison among all of them. But with those wrinkles out of the way, I think it’s time for us to jump into our basilar artery nest, and I’ve already got it open so I can pull it up.


KK: If you recall, we have the protocol on the homepage of the nest. And generally once you’ve completed your search strategy, you would complete screening, tagging and extraction simply by clicking on each of these main links. However, Nested Knowledge also offers study inspector which is a search of all studies in the nest. So here you can see that I can search for included studies. And once I filter it to included, I can click in on the Lanza gall study which is a randomized controlled trial, called the basic study published in the New England Journal of Medicine, which will be the study that we’re tracking through this study lifecycle. But really quickly, before we even screen the Lanza gall at all article, we also need to make sure that we have configured our screening mode. And for that, I’m gonna hop over to admin and show the different screening modes available in Nested Knowledge, which I think are also the standard screening modes you should be using or finding elsewhere for your review. Standard screening means that one user goes through a given record, determines whether they should be in or out and then included studies are sent forward.


KK: That process doesn’t have any quality control on it, though. So we’d effectively be trusting that I as the user, I’m properly determining with each study whether it meets the exclusion criteria of interest. You can also configure dual screening, where two independent users will screen each record. And then a third party will adjudicate any disagreements. This quality controls my decisions by having another person do so independently. And then a third person check our work. So effectively a lot of checks to prevent studies that should be in from flowing out, and vice versa. And then thirdly, you can also set up two pass screening. This is where a single user screens at the abstract level. And then screening is again performed on all studies that are considered candidates by examining their full texts.


KK: So rather than having independent people complete screening, you do two independent steps of screening, first examining just abstracts and then drilling down on the full text of every article that is advanced past abstract screening. So before you screen, make sure that you know which mode you wanna use, generally the upside of standard screening is that it’s the fastest, whereas dual screening has the most external, independent quality control, and two pass screening gives you multiple chances to exclude every article, one with much more information so after you’ve already uploaded full texts. So pick your screening mode and then let’s hop back in on the Lanza gall article. Generally when you’re screening, you will be doing this on the abstract. So we would read that endovascular therapy or thrombectomy was applied to patients in basilar artery, who have basilar artery occlusions. And then that is compared against standard medical care.


KK: So endovascular therapy we know is a synonym for thrombectomy in stroke. And we know standard medical care includes thrombolysis. So this looks like it is a candidate. And the only other thing we need to do is check to see whether it meets any of our other exclusion criteria. So this is a randomized trial. So it’s not a case report. It was published in 2021. So it was obviously after our date exclusion. But we can go through every one of these exclusion reasons and make sure it does not meet them before we hit include and move on to tagging. In this case, I happen to know that Lanza gall was in fact includable. And we did upload a full text. So we can proceed to the next step where we take that full text and apply those underlying tags to it. Now I can exit really quickly and show you guys where these tags were configured. We did have a brief lesson on on tag configuration. But you can see here that we have configured tags for study type.


KK: In this study, we included all prospective evidence so we broke it out into RCTs prospective cohort and registry studies. We also collected population characteristics including the timing of care. So basically, how long was it since you had a stroke, then medication so any existing anticoagulant therapy that a patient’s on and then demographics. For interventions, we kept it very simple. We just compared endovascular therapy to standard medical therapy. And for outcomes we tracked angiographic outcomes which means clot clearance on imaging angiography is the imaging reflected by scales of clot clearance. And then we collected those clinical outcomes that were our primary outcomes when we were doing our study designs. So that’s Modified Rankin Scale Score and mortality. And we also collected some other variables like, did the patient have a hemorrhage, symptomatic hemorrhage during the procedure. And lastly, we also collected the number of times the physicians had to try to treat the patients.


KK: I strongly recommend that before you tag into any underlying article, you are going to this configure tagging page. Even if you’re not the person who built the hierarchy, it helps you understand the structure of information that is going to be pulled. Then for the actual tagging process. We go into our Lanza gall study. And we read it with the hierarchy in mind and we’re helped out by the fact that the tag drop down is hierarchically nested in the same way that the configured tags page is. So we can start by just noting that this is in fact a randomized controlled trial. So we can select the tag randomized controlled trial, and then highlight the excerpt that proves out that information. So I should consider the excerpt, the minimum viable information to prove to my reader that I am correct on this being a randomized controlled trial. Once we’ve highlighted that excerpt, we can apply the tag and we can see that we’ve already applied the same tag previously. But I can now move forward with my tagging, and add any other tags that I think reflect the underlying content to this study.


KK: As I’m going, I will always be able to edit or go back to these other tags. So as I’ve just stated, I’ve duplicated this tag. So I can go and find it by clicking here, and it will auto jump me to the highlight the excerpt that I’ve just highlighted. And then I can edit, update, or even delete that tag as I’m going. So I should treat this as an exercise and very carefully reflecting every tag from the underlying hierarchy that is contained within the study, and finding excerpts that prove that out. Once I’ve tagged for study characteristics, I should then proceed to population characteristics. So scrolling down to the Methods section, I should be able to see what demographics they collected, I can also tag this in tables. So if I think that the best way to reflect this information is to go to the results and capture it either in the text or the tables there, that could prove out not just that the underlying study collected demographic information, but actually give the data itself.


KK: So I could find the patient characteristic table and highlight the age here. And I actually have several methods of doing so. So I see that they’re reporting a mean and a standard deviation around age. So I can say age mean. And then I can highlight the age information here, which would be extracted as an excerpt. If I want, I can also take an image. So instead of highlighting, I can take a box covering all this information, showing the context with a bit more detail. So this selection will be saved if I apply the tag this way. And so if I scroll away, I’ll always be able to jump back to that table and see the exact excerpt associated with it. Then I proceed down my hierarchy and continue to add all the tags related to my patient characteristics of interest, my interventions of interest and my outcomes of interest. But in interest of time, I’m going to assume that you follow that exact same process with each of these underlying tags.


KK: So find the text or table that contains the information of interest and then use either the text annotations, highlighting, or the area annotation. So the image selection to reflect in an excerpt, the proof of that concepts presence in the study, then we can hop over to the extraction and extraction was configured in the same page as tagging, we can hop out and look at that quickly. So in the tagging page, we configured extraction by selecting the part of the hierarchy that represents our intervention. So you can see me doing it again here. So I’ve cleared it out. And if I come in here, I can identify this intervention hierarchy as the two groups that I want to compare in this study. So as simple as it gets. And then for data elements, we also configured any tag such as, mean age, that is both a tag and a quantitative piece of evidence to collect as a data element. To do so, we clicked in on each of these nodes, we identified the data type. And if it’s dichotomous, that just means that it’s going to be an event rate out of a total number. So mortality is a good example of a dichotomous variable.


KK: And then for continuous variables, we also identified whether the continuous variable in question was a mean, with a standard deviation, or a median with interquartile range or a total range. So once those are configured, we can proceed and look at what that looks like in the extraction. So we open the Lanza gall study, you can see that we’ve identified endovascular therapy, and then standard medical therapy is the two arms. We also collected the total arm size, arm size means the total patient population treated with endovascular or standard of care. We found that here. So endovascular therapy had 154 patients randomized into it, and 146 were randomized into the medical care group. So the thrombolysis group. In extraction, rather than highlighting excerpts representing the underlying information, we’re actually going to directly extract it into this panel here. So the way that we would extract age is we go to that same table that we looked at before. And a really helpful piece of this is that if I go to this little tag icon next to where it says, age mean, and click on it, it should jump me automatically to the table that contains age.


KK: So rather than scrolling through these data elements, I can use this jump to, to find the data element of interest, and then extract its information. For every data element, you need to identify whether it was collected as baseline, or outcome. Or, if it was both, you can add another data element. Here you can see, I could add age at outcome, that wouldn’t make sense in our case, because we’re assuming the patient’s age at the same rate. But I could add another time point, so long as I identify the period of follow up, that that information is being collected at. Once I do so, I can then reflect the underlying studies, in this case, mean age and standard deviation. And if I need to, I can change the total patient population. This is to reflect any patients that are lost to follow up. And it’s usually more relevant to outcome variables. So if say, nine patients didn’t follow up, at 90 days, for 90-day mortality, we would put in 145 here rather than 154.


KK: Generally, in my experience, the best practice when extracting is to use the filtering and go through what the study reports and then review the evidence. So rather than just going down this list of tags, I can go through the table itself and filter to see, obviously I already have age here. But if I wanted to find sex, I could just type in sex, if I don’t see it, I can type in gender, and see if it’s extractable. For outcomes of interest, if I find mortality, I can just start typing mortality and it will pop up, I can extract it at the time point of interest, and note the event rate out of the total patient population. And here you can see that there wasn’t actually any loss to follow up, but mortality was 59 out of the 154 and 63 out of 146 patients in the medical therapy arm.


KK: So in brief review, I can use the jump to, to find any information that was previously tagged, that should then be extracted, I need to identify a time point and the exact time period for that time point collected. And if I want to I can add multiple time points per data element, then I can search for the next data element, and then collect the information for it, including again, the time point, the period and then for continuous variables, the mean with standard deviation or the median with an IQR or interquartile range. And then for dichotomous variables, just the event rate out of the total patient population. We also have the option to collect categorical variables. So if you’re collecting for instance race, you can configure that and then have configurable categories where you collect how many patients go into each.


KK: Once I’m done with that process, this study has finished its flow through the systematic review, which seems very quick, right? All we did was say the study should be in, we tagged the information, we thought that was important for this study, and we extracted the quantitative data. This gives us all the information that we need to move forward with our analysis because any interpretation that’s based on say, study design will be reflected in the tags. And any information that’s going to be statistically analyzed to see if the actual performance of endovascular therapy against medical care will be reflected in our quantitative synthesis. Once you have extracted all the data from the study, it is done going through the study lifecycle of screen tag and extract and it’s ready to be interpreted. However, there are a couple of other pieces that I wanted to highlight that are assistive by the Nested Knowledge software. And then I also wanna go through Prisma diagrams. So first, the assistive pieces. If you recall, you can add in screening, any exclusion we’ve previously configured, but you can also configure on the fly by just typing any reason that doesn’t exist yet. Adding it and then that will join the other exclusion reasons in your study design and be applicable to any other subsequent study.


KK: The thing to be careful of there is make sure that any study that has been previously included doesn’t meet that criterion, because you don’t want to be applying inconsistent criteria between your included studies. You can also configure tags on the fly, so if there was a tag that I wanted to add, but that didn’t exist yet. So I can hit Add Option, it will name that tag and all I need to do is identify where in the hierarchy it should live and then press Create and that will create the tag on the fly and be applicable to all other studies in the review. And yet again, the thing to be careful of here is that you should go back and add that tag to any study that contained that concept that you’d previously tagged before this tag existed.


KK: The second thing that I wanted to show was Inspector. As I said earlier, when you’re generally going through this review, you’re going to start in Search and then when you screen, you can just hit Screening and it will give you the studies in order. It actually gives them to you in order of an AI’s prediction of its inclusion rate, but you can also use Study Inspector which searches all studies that were ever brought into this nest, so any excluded, included, or as of yet unscreened study, which can be filtered by any range of filters, including bibliographic information, the inclusion probability based on our AI, the module status, so whether it’s completed tagging, whether it’s completed extraction, screening status, so whether it’s been included, excluded, or is still unscreened; and then a bunch of other pieces of information. So you can screen your… Sorry, you can filter your review set to find a study that you want to do something on. So, Inspector’s where you can make edits, changes, and really drill down on anything that you are not screening, tagging, or extracting sequentially.


KK: So, if I wanted to search among included studies for the Lanza gall study, I could start typing and say, “Filter” to include it. Or, I could do it manually by saying, “Final screening status is equivalent to included” and that would get me to the included. And then I could also just start typing in the author’s name and do an author search for it and it should bring us only that one study that we wanted to examine here. So effectively, you can treat Inspector as your extraordinarily customizable, filterable library to go to when you want to edit or drill down on a study in non-sequential mode. Inspector also has a further cheat code, let’s call it, the Bulk Action. So if you are completing screening, tagging, and extraction and notice that you need to change something about multiple studies at once, you can do so from Inspector. So if we needed to change, for instance, all included studies to unscreened because we want them to be reviewed again, we can filter first, so the first step of Bulk Action is to filter to the set of studies that you wanna do the action to.


KK: In this case, if I want to unscreen all included studies, I filter to included and I can come over to Bulk Action and I… Under final screening status, I update the screening status to unscreened and apply that, that would push all three of those studies to the beginning of the screening process. It will not lose the tag, the tags or the extracted data, but they will still need to be marked complete in the tagging and in the extraction module for them to have gone through the study lifecycle again, all the way from unscreened through to fully extracted. I can also add or subtract tags in bulk, so if there’s a tag that’s missing or a tag that is extraneous on the set of studies that I’ve filtered to, I can mark modules complete. So, if after pushing these included studies back to the beginning, I then turn around and want them to be marked as fully completed tagging and fully completed extraction, I can do so by just saying, “Mark module complete.” I can import any full texts that are freely available in bulk just by checking that box and hitting Apply.


KK: And then we also have some assistance that helps you with quality control where if you have extracted a data element but you neglected to add the tag associated with that data element, you can Bulk Apply all tags associated with a study where data were extracted. So, if I extracted mean age from the Lanza gall study, then I can add that tag automatically by hitting apply here.


KK: So in effect, there are five different ways you can cheat by doing actions in bulk across all studies in your review, so another assistive piece there. And then lastly, I wanted to actually go through Prisma charts because I think that they are also not only a good representation of the flow through a study, but also are very difficult to understand without some walk-through. So I… Let’s zoom in a bit for you guys. A Prisma diagram reflects studies flow from when you searched them through to what you extracted from the underlying study. And the flow is, now, with the 2020 Prisma update, is separated out into searched on a database and identified by any other method. And the last bit of background I’ll give you on Prisma is that Prisma diagrams are one of the key elements in the Prisma checklist, which is the checklist released by the Equator Network, which is a network of bio-statisticians and epidemiologists and other research experts on systematic reviews. So Prisma are the guidelines on how to complete a systematic review, and I strongly recommend reading through those guidelines in full if you wanna take your review seriously. But the most important piece of the Prisma guidelines are that you must include a diagram reflecting the flow of records from your databases and from your other methods of intake all the way through to what you did with those studies, and so.


KK: Let’s read this one together very quickly, so… You can see the number of records that were identified by databases, and we only used one database in this study, that was PubMed. And then you can also see the number that were… The number that were removed by duplication, we didn’t have any duplicates ’cause we were only searching on the one database. And in our Nested Knowledge Living Paradigm, you can also see the number of records that are awaiting screening, so… While we’ve published our basilar artery review, there have been 26 publications that might be relevant that have been published in the wake of our previous completion of this review. You can also see on the other side of the flow, all the records identified by other methods, and usually this will mean expert recommendations. The only other way that we take studies in Nested Knowledge is by bibliomining. So here you can reflect expert recommendations, which are studies that you added individually and then bibliomining, reflecting studies that you took from existing reviews, citations. And actually, I will quickly go and show you how that’s completed. We can open the same review.


KK: And under Other Sources, you can see that we can add studies by PubMed ID, by DOI, or manually by adding their reference information. That is going to be reflected as the expert recommendations in our Prisma chart, and we can also bibliomine where you upload an existing systematic review and we’ll automatically pull out all citations from that review and add them to your screening queue. Those will be reflected in your Prisma chart as bibliomined. So, those are the two other sources that you can have other than databases. Then you can see that every record that’s pulled in via a database is screened and every exclusion that you make is reflected in your chart.


KK: So, not only do we have the total number of records excluded, but we also have the number excluded for every reason that you’ve configured. Then, if you recall, we talked about two-pass screening, two-pass screening is where you first screen the records, and then you go out and seek those full-text reports. So the full-text reports are imported, any that are unavailable can be set aside or excluded. But in this case, we found all of the full-text reports for studies that we wanted to screen at that level and excluded them as well. If you are on standard or dual mode, as you’re doing your screening, what will be reflected here is whether there was a full text present when you made your final distinction. So in our case, we had uploaded 28 full texts, but in most of those cases, those studies based on full-text review, were not included. At that stage, you can see that the expert recommended studies are also reviewed. So they are not added automatically, they are still screened, but they are basically assessed for eligibility and then pushed forward for inclusion in a very similar process, and in fact, you won’t be able to tell the difference in the Nested Knowledge flow between a database-sourced and other sources review… Or, another sources’ study.


KK: Then those reports that are assessed for eligibility are either excluded at the full text or full report level or included. And you can see in this last box, the number of studies that are included, but also the number of reports that are included. And this is where that vocabulary lesson that we had earlier can help out a lot. As you can see, what starts out as a bibliographic record is transformed into a full-text report when you have the full-text section of the Prisma diagram. And then finally, you can see that Prisma has an allowance for secondary reports. So if there are… In our case, there were three studies reflected in three reports, but if there were secondary analysis, analyses that we wanted to pull data from, we could have included them, but noted that there were three studies reported in four reports, if indeed there was secondary analysis that had further data that we wanted to extract. So, a Prisma diagram depends on that distinction between a record, a report, and a study to reflect the flow of bibliographic data, full-text report retrieval, and then distinguishing any secondary analyses that report the same study.


KK: Okay, I think we’ve covered Prisma charts in as much depth as we need, and so we can hop back to our slides for some summarization. So if you recall from just now, our study lifecycle is effectively taking every study and either including or excluding with reason, then tagging the underlying key concepts and extracting the underlying quantitative data. We also went over the Prisma diagram, which reflects that flow or, at least, reflects the portion of import from a database or expert recommendation, inclusion at the abstract or full-text level, and then if there are any secondary reports reflecting the same study. And then we also learned about Study Inspector, and along with Study Inspector, we also learned about some cheat codes to a review, like Bulk Actions, which enable you to screen tag or extract in bulk on a filtered set of records. We also learned that you can configure both tags and screening on the fly, and that if you tag properly, you are setting up your data elements in a way. If you recall, once I had tagged mean age, I could automatically jump in the PDF to the exact spot that it was reported when I’m extracting my data, which saves me a lot of time in searching the PDF as part of my interaction with the study.


KK: And then lastly, I’ll just note that next time we’re going to go over the last two major steps of a review, which are analyzing your evidence and writing up your evidence. And so with that, thank you so much for your attention. I hope that you liked the jump into our software, I will note that if you want to learn more about the software, we have a full set of videos explaining how to use the software rather than just a general course on how to complete systematic reviews, which I will include in the sources and links below. Thanks so much.