We left off having identified the spread of healthcare data across various types of healthcare organizations and teams, and ended with the question “why is collaboration so hard that companies will spend $150B+ in acquisitions to make it easier?” Spoiler, it all comes down to security and privacy. Despite all the waivers and HIPAA consent forms we sign as patients, these organizations have a tough time collaborating, despite the benefits. To clarify, privacy and security requirements are crucially important, and this evaluation is in no way a suggestion to remove or minimize them, but rather to view them as a framework within which collaboration must occur. While the value and priority of sound security continue to grow, privacy issues are a bit behind. Unlike security, privacy risks are harder to predict until it’s too late, requiring an approach wherein everything personal should be treated as private. In both cases, we must architect workflows and services with security and privacy by design rather than afterthoughts.
When it comes to data collaboration in healthcare, one of the first challenges is simply the involvement of so many teams who don’t all necessarily share the same priorities – security, privacy, legal, ethics, data, and business teams. That coverage means a lot of time and coordination before we even dive into the heavy lift required by the technical and security teams to implement solutions or strategies. In conversations and interviews with Chief Digital/Data Officers (CDOs), Chief Data Scientists, and Chief Information Security Officers (CISOs) across the healthcare industry, there’s a common scent of frustration in navigating regulatory requirements while achieving business goals. The two are often perceived to be in perpetual opposition. Typically, business and data teams view the custodians (security, privacy, legal, ethics) as blockers – after all, someone must enforce the rules. There’s much to unpack, but we’ll start our journey down the regulatory rabbit hole with privacy requirements when it comes to collaboration.
While cross-border data collaboration in healthcare (or in any industry) has its own set of unique challenges, we’ll focus on the US. The Health Information Portability and Privacy Act (HIPAA) is a major piece of US legislation that defines the responsibilities of those individuals or organizations collecting, storing, using, and sharing protected health information (PHI), sensitive personally identifiable information (SPII) like mental health records, and personally identifiable information (PII) about patients. Since we’re talking about data collaboration in healthcare, let’s start with data-sharing agreements.
According to the HIPAA Privacy Rule (45 CFR 164.504), data-sharing agreements require:
Requirements 1, 2, 3, and 7 are the relevant pieces for our discussion. Structured data (i.e., databases with labels) is relatively straightforward to inventory and protect, whereas doctors notes and the vast data lakes full of unstructured and semi-structured data are far more complex – like trying to find a needle in a haystack, in a stack of haystacks. For the purposes of simplicity, we’ll stick to structured data for now.
As you can see, requirement #7 is loaded. Individual states differ in their consent and privacy laws, which makes operationalizing a show of evidence in compliance rather tricky. Consent also challenges teams in maintaining critical monitors and data over time.
Coupled with state-by-state privacy laws, the HIPAA Privacy Rule is a federal law that sets out standards for the protection of individuals’ protected health information (PHI). The Privacy Rule establishes national standards for the security of PHI, including restrictions on the use and disclosure of PHI. It also gives individuals certain rights regarding the use and disclosure of their PHI. The Privacy Rule applies to healthcare plans, clearinghouses, and providers that conduct certain electronic health care transactions.
The time and effort to define and gain approval and consent for specific uses of PHI pose immediate, cumbersome limitations. We simply do not know how some data points may become useful in the future. Discoveries mid-study may even require new consent, suspending progress until requirements have been met. There’s a famous story about Henrietta Lacks and the use of her PHI (post-mortem) in ground-breaking cancer research. The ethical concerns exposed by her case led to such requirements (consent & authorization) and are why it seems every visit to the doctor requires yet another consent for use of collected data. There is no “blanket” box to check. Every unique use of data requires new consent.
To this point, data sharing agreements take time, but generally, the path seems clear. Figure out what data you need, and why it’s important, put the agreement together, gain consent, and go. Right? Enter deidentification.
Deidentification is the process of removing personal identifiers, such as name, address, and Social Security number, from PHI. It is required when the PHI is used for a purpose other than providing health care services, such as for research or marketing. To be truly deidentified, data must not be reasonably linkable to an individual and must be verified as such through a process of expert determination. This process is often a lengthy negotiation among all stakeholders until a balance is found between the data that can be shared versus the usefulness of that data. Remember how we identified the 7 major types of healthcare players and the different types of data they collect in Part Two of this series? Good. That’s the challenge. Much of the collaboration needs today are not directly related to delivering healthcare, which means this process must be followed, proving challenging when joining datasets from different sources. Even when data is directly related to delivering healthcare and a simple BAA does the trick, the increase in attack surface and risks of data breaches and leaks cause apprehension among the risk owners forced to accept such data flows. (For more information on specific healthcare use cases like real-world evidence and others, click here).
Deidentification also presents a technical challenge – it requires specialized people and technology to do it; costing time and money. Additionally, aside from field names (schemas) being different, removing personal identifiers makes it challenging to link data sets and create longitudinal data sets. Finally, deidentification also reduces the accuracy and precision of generated insights due to the removal of important data points, meaning there is a problem with ROI on such efforts. In sum, deidentification is not only burdensome and costly, but it affects data utility (less actionable).
Deidentification is a rather broad term and has yet more layers to uncover. One of the leading methods, differential privacy, still has the challenges we noted above. There are also differences when it comes to HIPAA and Safe-Harbor rules governing cross-border collaboration, and there are limited options available to satisfy them in the timeframes and uses that would be valuable. This brings up two additional terms, anonymization and pseudonymization, that are important to understand and are likely the most challenging, especially in cross-border collaboration.
These two terms are quite problematic for organizations because (as of January 2023) there is little guidance as to which technologies can be used, and in which ways, to achieve the definitions put forth. HIPAA requires that Protected Health Information (PHI) be anonymized when it is used for research, public health activities, or other activities that do not involve providing healthcare services. To satisfy anonymization, the ability to link data to an individual must be impossible. Pseudonymization is when there are steps that can be taken, by request and authorization, to link data to an individual, and is typically limited to the original data controller.
Perhaps some requirements are hard or vague, but we have them. Are they prohibitively difficult given the number of lives that could be saved or vastly improved? Let’s revisit some of the use cases we discussed in Part One and map out the typical number of data owners (controllers) and analyzers that would all need to coordinate through the requirements above.
Data Controllers: 3-10
Data Analyzers: 1-3
Personalized and preventative care means having a complete picture of a population’s or individual’s health. As we discussed above, that means aggregating data typically held by multiple care providers (mental, general practitioner, specialist, dentist, etc.), pharmacies, the patient, public sources, and the insurer. While all data controllers could benefit in analyzing such shared data, most efforts focus on analysis by care providers and insurers.
Data controllers: 3+
Data analyzers: 1
Clinical trial teams need coordination across patients, care providers, pharmacies, and insurers as these are all the data controllers involved. Again, that’s a lot of coordinated effort by separate teams to deliver less-than-complete data context for generating actionable insights.
Data Controllers: 1-3
Data analyzers: 1+
Genomic research teams need to gather insights from: (1) insurers to more specifically define population target groups and costs of treatments, (2) care providers recording patient care data, (3) patients who can describe symptoms, use smart devices or other medical devices, and provide consent for the use of their data; and (4) the owners of the genomic data being used as a basis of the research. Fortunately or unfortunately, having most genomic data owned by just two organizations certainly simplifies the process as opposed to having hundreds of such organizations. But limitations remain when it comes to linking that data with care providers, insurers, and across borders. This is likely why many genomic studies often default to use of historical data, pre-aggregated by other research teams or organizations. While such studies still prove useful, there is little comparison to the value that can be extracted from full-context, production data. For example, watch a summary of our work with the Dana-Farber Cancer Institute in supporting genomic studies in collaboration with the Harvard Medical School.
While this has been a very high-level description of the regulatory and technical challenges, it does shed light on the reasoning behind the M&A activity in healthcare. Centralizing ownership simplifies the use of sensitive data, but it is not a silver bullet. While the front end of the collaboration journey when it comes to data collaboration in healthcare may have been streamlined, technical and regulatory challenges remain. In our next post, we’ll go into the remaining risks and challenges when it comes to the needs of healthcare innovation teams.