Back to Blog Lobby

Cryptography Enabled Data Science: Meeting the Demands of Data Driven Enterprises

How can cryptography and data science work together for business value? Duality’s VP of Data Science, Dr. Marcelo Blatt, and VP of Strategy, Ronen Cohen discuss how encryption can protect data-in-use. They present the real-world use cases for such technologies and describe the mathematics supporting privacy enhancing technologies (PETs).

Ronen Cohen discusses Duality’s role in the data privacy industry and a real-world use case for this technology. Duality is a PET company that enables organizations to collaborate in a privacy protected manner. Our technology combines data science with security, allowing us to join, link, and enrich sensitive datasets from multiple sources in order to protect data privacy throughout the entire process.

We’ll first look at a case study focused on healthcare and life sciences, which was conducted by Dr. Alexander Gusev of Harvard Medical School and the Dana Farber Cancer Institute. It is extremely challenging to get good real-word data in this space because there are many concerns about patient privacy. Dr. Gusev was interested in finding a way to collaborate and discover correlations between certain genes and cancer outcomes. HIPAA laws and other privacy concerns make research like this very difficult and time-consuming. Dr. Gusev utilized the Duality privacy preserving data collaboration platform to encrypt disparate datasets from multiple healthcare providers, then link them together and analyze them holistically, without ever decrypting them. He was able to do this 30 times faster than using other privacy protecting technologies – all while ensuring that none of the underlying data was exposed.  This means that researchers and other stakeholders in the healthcare, life sciences, and pharmaceutical industries can make decisions and develop treatments faster, and provide better overall care to their patients, while preserving patient privacy.

This specific project was enabled by homomorphic encryption, a privacy enhancing technology that enables “encryption in use”. It allows researchers and data analysts to encrypt sensitive datasets and analyze them without ever decrypting the data. 

Next, Marcelo Blatt presents the mathematics behind homomorphic encryption, which is based on Lattice Based Cryptography. The specific type that Blatt presents is lattice based encryption developed by Goldreich, Goldwasser, and Halevi (1996), and thus we call it GGH Encryption.

To begin, there are a few definitions that are helpful to understand:

  • A lattice is any set of regularly spaced points on a grid. The lattice below has 2 dimensions.
  • A vector is just a singular point on the lattice. This point represents a tuple (a finite ordered list) of numbers. For example, a 2-dimensional vector could be (4, 1).
  • A basis is a set of vectors that enables you to generate an entire lattice. Any point in the grid space can be represented as a linear combination of vectors in the basis.

Let’s take the following example:

In this animation, the red vectors, e1 and e2, represent the basis. When there is a linear combination of these vectors where each vector is multiplied by an integer, we’ll get a new vector, like the green vector m, where it lands on a point with integer coordinates. If the vectors are multiplied by non-integers in the linear combination, we’ll get a new vector, like the yellow vector c, that does not land on integer coordinates in the lattice.

In GGH Cryptosystem Encryption, security relies on the Closest Vector Problem (CVP). This means that if we have a bad basis where the vectors describing the basis are far from orthogonal (at a right angle), then for any point not belonging to the lattice, it is difficult to find the closest lattice point. Thus, to encrypt numbers we take the coordinates of the point expressed using the “bad basis” and add a small amount of noise. Due to CVP, it is difficult for someone to go back and figure out where the original point was.

Now the question is: how do we decrypt the encrypted messages when we need the data back? We’ll reference the vectors in the following animation:

Let c, the yellow vector, be the ciphertext and e1 and e2 the “bad basis”. Every encryption system also needs a secret key; in this case, the secret is a good basis, e’1 and e’2, the blue vectors. Given the good basis, finding the closest point is easy. Thus, the good basis must be kept safe and not given to everyone who receives the encrypted data.

The final piece we must discuss is how to do computations on encrypted data. We’ll use the following diagram to explain:

Given two plaintext messages, m1 and m2, their corresponding ciphertexts, c1 and c2, and a “bad basis” consisting of e1 and e2, we can do computations on these vectors. If we add c1 and c2, we get the vector c. This vector is the encryption of the sum of m1 and m2. Thus, computations on the encrypted data give the same results as computations on the decrypted data. The ability to compute on encrypted data is the fundamental concept of homomorphic cryptography. We can do the same process with other operations, including multiplication, rotations, inner product, matrix arithmetic, etc.

Finally, Ronen and Marcelo discuss how Machine Learning (ML) can leverage encrypted data. There are two main ways of designing an ML algorithm to work with homomorphic encryption. The first is the Transpiler approach in which C++ code gets converted into boolean circuits. This approach is very flexible and general. The second approach is crafting algorithms in a way that is homomorphic encryption friendly. Here is an example of an algorithm design that works with homomorphic encryption:

Decision Trees rely on a stream of if-else statements. By itself, the implementation of a decision tree is very simple, but if we add in homomorphic encryption, the design becomes much more complex. Instead of formulating extremely complex algorithms, we need to find another workaround. It turns out that Decision Trees can be written as a polynomial. Polynomials work well with homomorphic encryption because they are a string of multiplications and summations, which are both viable and performant with homomorphically encrypted data. Using a recurrence relation, the process of turning a decision tree into a polynomial can be automated, making this process even easier. Here, by reworking the implementation of Decision Trees, we were able to increase their usability with homomorphically encrypted data.

Recent developments in homomorphic encryption are fueling a conceptual shift from thinking of encryption solely as defensive posturing to more of a business enabler. In this post and accompanying webinar we dive into one example of how cryptography is unlocking the full potential of data and vastly improving our ability to gain insights in an expedient manner.  While this research is specific to genomic oncology research, the technology can be leveraged to deliver highly valuable results for all data driven endeavors and enterprises.

To watch the full webinar, click below.

Sign up for more knowledge and insights from our experts