Back to Blog Lobby

Collaborative Privacy-Preserving Analysis of Oncological Data

FDA Commissioner Robert M. Califf in his remarks to the 2022 NORD Breakthrough Summit, emphasized the importance of Real World Data and Real World Evidence in clinical research and the need to collaborate, especially in the areas of rare diseases and oncology.  

Real World Data is patient centric and collected from organizations administering care to patients usually over prolonged periods of time. Such data introduces privacy challenges since in order to derive insights it has to be aggregated, linked, and shared for analysis.

While collaborating on Real World Data is extremely important for constructing broader-based and larger clinical data sets that are required for improving clinical decision-making research and outcomes, stakeholders are frequently reluctant to share their data without guaranteed patient privacy, proper protection of their data sets, and control over the usage of their data. This is even more challenging when cross-border data collaboration is required. 

Fully homomorphic encryption (FHE) is a cryptographic capability that can address these issues by enabling computation on encrypted data without intermediate decryptions, so the analytics results are obtained without revealing the raw data. Duality Technologies has developed a toolset for collaborative privacy-preserving analysis of oncological data using multiparty FHE. 

Duality Technologies and Tel Aviv Sourasky Medical Center (TASMC) researchers and clinical oncologists recently collaborated to perform privacy-preserving analysis of real-world oncological data. We applied the toolset for the analysis of the real-world data set of colorectal cancer patients’ survival data, which includes 623 patients and 24 variables, amounting to 14,952 items of data. The goal of the study was to examine the effect of oxaliplatin treatment with and without cannabis for patients with colorectal cancer. Statistical analysis of key oncological endpoints was blindly performed on both the raw data and FHE-encrypted data using descriptive statistics and survival analysis with Kaplan-Meier curves and log-rank tests. The results were then compared with an accuracy goal of two decimals. Early results of this study (for the single-key FHE setting) are reported in The study included the following statistical analyses: mean, median, and standard deviation for the age of cancer onset; frequency analysis for sex; chi-square test between cannabis indicator (with or without cannabis) and diagnosis, chi-square test between cannabis indicator and sex; t-test for cannabis indicator by age of onset. Kaplan-Meier and log-rank survival analysis was performed to examine the effect of the treatment with cannabis on the overall survival of patients. 

All accuracy metrics were found to be within the predetermined accuracy goal of two decimal digits. The runtime of less than half a minute was observed for descriptive statistics and about three minutes for the survival analysis. Note that the time of the anonymization and statistical analyses performed on the raw data set by a statistician, the method commonly used in clinical oncology, is estimated to be about 10 hours, which is significantly higher than the runtime of FHE computations. 

In collaboration with Professor Alexander Gusev of Dana Farber Cancer Institute and Harvard Medical School, we also applied our toolset to a previously published data set based on two clinical trials of immunotherapy in renal cell carcinoma. 

This joint work extends and significantly improves on prior multiparty FHE framework in several different ways: First, we added the private join collaboration model where multiple parties can contribute data for the same records (e.g., individuals) in a way where the data owners do not learn which records match, and this joined data is then used for further analysis using multiparty FHE; second, we introduced a novel method enabling deeper cryptographic computations; third, we extended the list of computations to provide a more general toolset for the privacy-preserving analysis of oncological data. The computations implemented in our toolset include mean, median, standard deviation, frequency, chi-square-test, t-test, survival analysis (Kaplan-Meier plots and log-rank test), and logistic regression training over encrypted data.

For all computations, the accuracy of more than 5 decimal digits (as compared to the computations in the clear) was achieved. Besides more complex survival analysis, all computations took less than half a minute. The survival analysis took up to one minute and a half. These results imply that privacy-preserving descriptive statistics and survival analysis using multiparty FHE are already practical for privacy-enhanced collaborations on typical oncological data sets.

Duality Technologies, TASMC, and Dana Farber Cancer Institute researchers submitted a paper describing the results of this work to a prestigious multidisciplinary journal. The work will be publicly available later this year.

Sign up for more knowledge and insights from our experts