Last month, we told you about cTAKES, which can read notes from clinical records and turn them into structured data that can be used for research on drug interactions, risk factors, clinical phenotyping and much more.
One of the key challenges with cTAKES, though, is getting access to the data in the first place. Electronic medical records (EMRs) generally run on proprietary platforms built for record keeping, and it can be difficult to extract data for research purposes. In addition, hospitals’ processes and controls around patient privacy usually don’t readily lend themselves to data mining.
Now mind you, when we talk about EMR data, we’re not just talking about notes, but also about the structured data gathered with every clinical visit and inpatient procedure, such as diagnosis, lab values and prescriptions. Those data could open up the taps for all kinds of clinical innovation—if researchers could get to them.
So what’s the solution? How do we make clinical data locked in EMRs work for research while keeping confidential information confidential?
Vector sat down with Jonathan Bickel, MD, Boston Children’s Hospital’s senior director of Clinical Research Information Technology (CRIT) and director of Business Intelligence, to learn what he thinks should be done.
Q. Are we using EMRs to their fullest potential?
A. We use EMRs in a hospital setting to keep track of what we’re doing, to document and record the results of the care we provide. It’s what we should be doing, and there are lots of benefits to doing things electronically.
One thing that’s not often done is to take the information that’s captured in every clinical record and learn from it.
Think about it this way: Our EMR system at Boston Children’s contains records on 1.9 million patients, with more than 20 million diagnosis codes, 100 million lab values and 32 million clinical notes. Within those records are demographics, labs, notes, pathology, pharmacy, vital signs and more. It’s a gold mine of information that we could be learning from to improve patient care for a wide range of diseases.
Q. But we need to be able to strike a balance between privacy and research utility, right?
The EMR system is a gold mine of information that we could be learning from to improve patient care for a wide range of diseases.
A. That’s right. We do all of this wonderful clinical work, but most of those data get checked into the EMR and rarely gets checked out. When I came to Boston Children’s, one of my primary goals was to guide implementation of the tools and processes that would let researchers get those data out, and do it better, faster and cheaper.
More than anything, we needed something flexible enough that researchers could query the clinical records as to how many patients meet a particular research question, but without needing a lot of clinical or research IT support.
Q. What’s your solution?
A. We’re developing a tool based on i2b2, which stands for “informatics for integrating biology and the bedside.” It’s a data extraction and warehousing tool that sits on top of our EMR system, pulling and serving up data in a way such that clinical researchers can make use of them.
It’s a way for anyone to be able to ask questions about our patient population and get back answers in near real time without breaking patient confidentiality.
Q. How does i2b2 support records-based research?
A. The point of i2b2 is to take the information that we’ve been collecting as part of routine care and learn from it. It allows users to build queries by stringing series of search terms together to create a clinical profile and build out a virtual study cohort based on EMR data—labs, medications, vital signs, notes, pathology, everything. All of the data in the EMR are available.
For instance, you might be interested in seeing how many patients with condition W between the ages of 8 and 10 with a lab value of X have been treated with drug Y and who experienced adverse event Z. With i2b2, you can run that query and get the number of unique patients who fit those criteria.
So here’s an example. We’re working with Cincinnati Children’s Hospital Medical Center to use i2b2 to develop new algorithms for treating acute appendicitis. We’re probing ICD-9 codes, pathology reports and EMR notes for text strings related to acute inflammatory appendicitis. I received an email from them this morning with the criteria they want to search against. In two minutes I was able to tell them that we have records on 5,000 children who meet their inclusion criteria.
Q. What can you learn beyond just the number of patients?
A. Once you’ve defined your cohort, that’s just the tip of the iceberg. Remember, for those children, we have their entire medical record in the system. With the cohort defined, a researcher can then make a request to the institutional review board (IRB) to access the detailed data on the patients captured in the query.
Once the IRB has approved, we have a process for allowing the researcher to access the full data on those patients, either de-identified or identified. And we can help researchers extract the particular information that they need from the charts.
Q. How does the system address patient privacy?
A. In CRIT we have a blanket IRB approval that allows us to continue to support and maintain i2b2 and make these patient data available for researchers to go data spelunking. That approval includes a proviso stating that before we give out an ounce of information beyond aggregate numbers, the requestor has to have an IRB approval specific to their work.
Q. How much further can one go with i2b2?
A. Finding out that we have the data is just the beginning. We’re sitting atop a pile of data in which there are buried nuggets of information. Usually you would need some kind of fancy processing to mine those nuggets. As a platform, i2b2 allows developers to create and attach plugins that help unlock the data in different ways.
We don’t have the functionality running at the moment, but there are ways of plugging i2b2 into things like SMART platform apps such that you can run analyses without ever having to export the data outside of our system.
Q. What are i2b2′s limitations?
A. The biggest limitation is that this version of i2b2 doesn’t handle time series. You can’t tell it “I want to look at events that happen within a certain time period before and after this clinical event or that lab result.”
The real power of i2b2 will come about when you can start to do chronological analyses on the fly in the real time and start linking causes and effects. We’re working on that, and it should be coming with the next version of i2b2, which already promises to be much more powerful than what’s available today.
Q. How do you see technology like i2b2 evolving over the next five or 10 years?
A. I see i2b2—and data warehousing in general—going from being a repository that just supports one-off queries to one where data mining occurs on a routine and automated basis. One in which algorithms constantly run against huge datasets to find small patterns of meaningful information and feed those discoveries back to our clinicians.
Also, i2b2 is currently running at more than 80 hospitals, universities, companies and HMOs worldwide. I see us joining many of these individual i2b2 installations together under a large collaborative. This will increase the total numbers of patients whose data can be fed into learning algorithms and greatly increase not just the amount of knowledge we can gain, but the speed at which we gain it.