Federated Learning is a machine learning technique that trains an algorithm across multiple decentralized edge devices, without the need to bring the data to a central location. In this approach, the data remains on the local devices, and the updates to the algorithm iteratively occur on the same devices. Federated learning can be useful in scenarios where the data is distributed across multiple devices or is sensitive to transfer. This technique is commonly used in areas such as healthcare, where patient data privacy is a top priority.
Federated Learning allows access to a wide variety of data sets without needing to directly share sensitive data, whether it’s between a user and a server or between organizations.
To begin, each edge device learns an initial model from local data that gets sent to the server. From there, the various user-specific models are averaged at the central server to come up with an updated global model and complete what is known as a Federated Learning Round. This process can then be repeated as required to come up with improved versions of the model.
Federated Learning comes with several benefits that include enhanced privacy, greatly reduced learning time, reduced cost of training, and enhanced regulatory compliance.
Unfortunately, there are also some drawbacks – including debate over whether it provides privacy benefits in the first place. With Federated Learning, it is possible to reverse engineer the underlying data sets based on metadata revealed by the model once it’s complete, and the model is known by all collaborating parties. That said there are platforms such as Duality platform that address those concerns by providing a secured federated learning.
Multiparty computation (MPC) is a technique in cryptography that enables multiple parties to jointly compute a function over their private inputs without revealing those inputs to each other or to any external parties. In other words, MPC allows multiple parties to perform a computation over their confidential data, without revealing any information about that data to others involved in the computation. This technique is particularly useful in scenarios where multiple parties need to collaborate and perform computations, but none of them want to share their data with the others, such as in financial transactions, healthcare data analysis, or voting protocols
In an MPC protocol, two or more parties each hold a secret input, and they want to compute a function of their inputs without revealing their inputs to each other. The primary goal of an MPC protocol is to enable the participants to compute the desired computation results while preserving the privacy of their data.
For example, suppose that two hospitals want to collaborate to identify patients with a rare health condition without revealing their identities to each other. In that case, they can use MPC to jointly compute the proper function on their entire patient datasets. This approach can help them maintain privacy while still obtaining valuable insights into rare health conditions.
In summary, MPC has numerous benefits in terms of privacy, security, and trustless collaboration. However, implementing MPC can be challenging, and it may not be suitable for all use cases due to complexity, latency, and threshold limitations.
FHE stands for Fully Homomorphic Encryption, which is a type of encryption scheme that enables computation on ciphertexts directly, without the need for decryption. This means that the encrypted data remains encrypted throughout the entire computation process, and the result of the computation is also encrypted, without any party having access to the plaintext data at any point.
This property of FHE is particularly useful in scenarios where privacy is a concern, such as in cloud computing, where the data is stored on a remote server and processed by third-party service providers. With FHE, the data can be encrypted and stored on the server, and computation can be performed on the ciphertexts without the server or the service provider ever knowing the plaintext data.
Due to its unique ability to secure data from end-to-end in all three states, HE has long been dubbed the “Holy Grail of Data Privacy” or the “Holy Grail of Cryptography.”
The idea of HE is not new, and cryptographers first proposed it in 1978. However, they didn’t know at the time if it was possible to achieve. It wasn’t until 2009 when Craig Gentry, then at Stanford, described the first plausible construction for a fully homomorphic encryption scheme, showing that FHE could be realized in principle. Since then, it has been adopted in a variety of areas, including the private and public sectors and academia, where it has been shown to perform at scale.
FHE is perhaps the most important breakthrough in theoretical computer science of the 21st century. Since Gentry’s paper was published, research and implementation efforts throughout academia, government, and industry have brought FHE from theory to reality.
HE enables computations, including machine learning and AI analysis, on encrypted data, allowing data scientists, researchers, and data driven enterprises to gain valuable insights without decrypting or exposing the underlying data or models. This enables organizations to extract value from data while maintaining privacy and complying with applicable regulations. In addition, HE provides a functional and dependable privacy layer, eliminating the trade-off between data privacy and utility. This is particularly useful for enabling collaborations between parties across sensitive data – such as privacy preserving collaborations with patient data between multiple healthcare and research centers, or inter-bank cooperation in financial crime investigations – where different parties can analyze sensitive information without exposing the underlying data to one another.
Because homomorphically-encrypted data is encrypted from end-to-end in all three states, no trusted third parties are ever required. This allows for computations to be outsourced, keeping both the data and the analytical models used to operate on the data safe, secured, and concealed. A cloud host could run a computation on the data, get an encrypted result, and give that result back to the data owner. The data owner could then decrypt that result, with the decrypted result being the same as if they had run the computation on the original data without encryption.
Fully Homomorphic Encryption (FHE) has many potential benefits, but it also has drawbacks that must be taken into consideration.
Benefits:
Drawbacks:
Therefore, FHE has a lot of potential benefits, but it does come with some drawbacks that need to be considered when deciding whether to use it or not.
A Trusted Execution Environment (TEE) is a secure area within a computer system or mobile device that ensures the confidentiality and integrity of data and processes that are executed inside it. The TEE is isolated and protected from the main operating system and other software applications, which prevents them from accessing or interfering with the data and processes within the TEE. The TEE is typically used for security-sensitive operations, such as secure storage of cryptographic keys, biometric authentication, and secure mobile payments. The TEE provides a high level of assurance that sensitive data and processes remain secure and tamper-proof, even if the main operating system or other software components are compromised.
Trusted Execution Environments are established at the hardware level, which means that they are partitioned and isolated, complete with busses, peripherals, interrupts, memory regions, etc. TEEs run their instance of an operating system known as Trusted OS, and the apps allowed to run in this isolated environment are referred to as Trusted Applications (TA). Untrusted apps run on an open part of the larger operating system referred to as the Rich Execution Environment (REE).
A trusted application has access to the full performance of the device despite operating in an isolated environment, and it is protected from all other applications. Data is usually encrypted in storage and transit and is only decrypted when it’s in the TEE for processing. The CPU blocks access to the TEE by all untrusted apps, regardless of the privileges of the entities requesting access.
To enhance security, two trusted applications running in the TEE also do not have access to each other’s data as they are separated through software and cryptographic functions.
TEE offers several benefits that include:
TEE has several major limitations as compared to software-focused privacy technologies, particularly around the financial burden of acquiring and deploying the technology, retrofitting existing solutions to use TEEs and the challenges of vendor-lock-in. In short, TEEs are inherently a hardware solution, implying that they need to be purchased, physically delivered, installed and maintained, in addition to this, special software is needed to run on them. This is a much higher “conversion” burden than software-only privacy technologies. Also, once the TEEs are installed, they need to be maintained. There is little commonality between the various TEE vendors’ solutions, and this implies vendor lock-in. If a major vendor were to stop supporting a specific architecture or, if worse, a hardware design flaw were to be found in a specific vendor’s solution, then a completely new and expensive solution stack would need to be designed, installed and integrated at great cost to the users of the technologies.
In addition to the lifecycle costs, TEE technology is not foolproof as it has its own attack vectors both in the TEE Operating System and in the Trusted Apps (they still involve many lines of code). This has been proven through several lab tests, with Quarkslab successfully exploiting a vulnerability in Kinibi, a TrustZone-based TEE used on some Samsung devices, to obtain code execution in monitor mode.
Differential Privacy is a privacy-enhancing technique that allows organizations to collect and analyze data while preserving the privacy of the individuals in the dataset. Differential Privacy adds noise to the data which makes it harder for attackers to identify individual records while still maintaining the aggregate results.
With Differential Privacy, the goal is to provide accurate results without identifying individual records. To achieve this, a randomized function is used to add noise to the data. The amount of noise added to the data is controlled by a value called the privacy budget, which limits the amount of information that can be revealed about individuals in the data set.
Differential Privacy can be used in a variety of contexts, such as collecting and analyzing medical records, conducting surveys, or tracking usage patterns of mobile phones. It provides a way to share aggregate information without compromising the privacy of individuals in the dataset.
The use of Differential Privacy ensures that individuals can safely share their data without risk of their personal information being compromised. It has become an important tool for organizations dealing with sensitive data and striving to maintain a high level of privacy for their users.
Differential Privacy is implemented by applying a randomized mechanism, ℳ[D], to any information exposed from a dataset, D, to an exterior observer. The mechanism works by introducing controlled randomness or “noise” to the exposed data to protect privacy. A Differential Privacy mechanism can employ a range of techniques such as randomized response, shuffling or additive noise. The particular choice of mechanism is, essentially, tailored to the nature and quality of the information sought by the observer. The mechanism is designed to ensure information-theoretic privacy guarantee that the output of a particular analysis remains fairly the same, whether or not data about a particular individual is included.
Differential Privacy provides many benefits to organizations, including greater control and governance over data, plausible deniability to ensure people are more willing to share their sensitive data, resistance to linking attacks, and regulatory compliance. Limitations include usefulness only for large data sets, risks of privacy leaks, and a lack of end-to-end encryption; there is also no built-in ability to collaborate on multiple data sets.
Privacy Enhancing Technologies (PETS) are a set of tools, methodologies and techniques that are designed to protect the privacy of individuals and their personal data. These technologies are used to help people maintain control over their personal information and protect them against unauthorized access or misuse by others. Examples of PETs include encryption, anonymous communication tools, digital signatures, and privacy-focused search engines.
PETS can be used in a variety of contexts, such as online transactions, data sharing, and communication systems, to ensure confidentiality, integrity, and authenticity of data. PETs are particularly useful when deploying systems that collect personal data, such as medical records, online shopping histories, or credit scores. By using PETS in these contexts, individuals can maintain control over their personal information and reduce the risks associated with data loss, identity theft, or other privacy violations.
Here are several well-known PETs:
Name | Definition |
Homomorphic Encryption | Data and/or models encrypted at rest, in transit, and in use (ensuring sensitive data never needs to be decrypted), but still enables analysis of that data. |
Multiparty Computation | Allows multiple parties to perform joint computations on individual inputs without revealing the underlying data between them. |
Differential Privacy | Data aggregation method that adds randomized “noise” to the data; data cannot be reverse engineered to understand the original inputs. |
Federated Learning | Statistical analysis or model training on decentralized data sets; a traveling algorithm where the model gets “smarter” with every analysis of the data. |
Secure Enclave/Trusted Execution Environment | A physically isolated execution environment, usually a secure area of a main processor, that guarantees code and data loaded inside to be protected. |
Zero-Knowledge Proofs | Cryptographic method by which one party can prove to another party that a given statement is true without conveying any additional information apart from the fact that the statement is indeed true. |