Cross-Border AI Data Collaboration: GDPR Compliance Checklist

Michal Wachstock

June 23, 2026 18 min read

GDPR compliance ai data collaboration complete guide

Table of Contents

Every compliance team has had this conversation: legal wants to know exactly where the data is going, the data science team wants to train the model on the richest dataset available, and the DPO is somewhere in between trying to figure out whether any of it is actually permissible under GDPR.

Cross-border AI data collaboration is genuinely hard to get right. The rules are specific, the penalties are real (up to 4% of global annual turnover under GDPR), and the “just anonymize it” shortcut does not hold up the way people think it does.

We work with organizations that process sensitive EU personal data at scale: in healthcare, financial services, government and defense, life sciences, insurance, and marketing.

This guide is the checklist we wish existed when we started: practical, technically grounded, and honest about where the gaps are.

It covers the legal transfer mechanisms you actually need, the GDPR articles that apply to AI specifically, and the Privacy-Enhancing Technologies (PETs) that satisfy Article 25 requirements without forcing you to choose between compliance and useful AI.

Before jumping into mechanics, it helps to be clear on when GDPR applies to AI. This matters because a lot of organizations incorrectly assume that if they are based outside the EU, the regulation does not reach them.

It does. Article 3 GDPR has extraterritorial scope: any organization that offers goods or services to EU data subjects, or that monitors their behavior, is subject to GDPR regardless of where it is established. For AI specifically, this means any model trained on EU personal data, or deployed to make decisions about EU residents, falls within scope.

What counts as personal data in an AI context?

This is where most teams underestimate their exposure. Personal data under GDPR includes any information that can directly or indirectly identify a natural person. In practice, that covers:

Training data containing names, email addresses, IP addresses, or location data
Health records, financial records, or behavioral data used to train or fine-tune models
Outputs that can be linked back to individuals, including re-identified data
Pseudonymized data, which is still personal data under GDPR (only true anonymization removes it from scope)

Lawful Bases for AI Processing Under GDPR

Every processing operation involving personal data requires a lawful basis. For AI, the two most commonly invoked are legitimate interests and consent. Legitimate interests can support many commercial AI applications, but it requires a genuine balancing test.

The controller’s interest must outweigh the impact on data subjects. Consent, meanwhile, must be specific, informed, freely given, and withdrawable, which makes it fragile as a basis for large-scale model training.

For special category data (health, genetic, biometric, racial or ethnic origin), additional conditions apply. This is critical for healthcare AI and any model trained on data that touches these categories, even indirectly.

Chapter V of GDPR governs transfers of personal data to third countries. The core principle is that a transfer to a country outside the EU/EEA can only happen if an adequate level of protection is ensured. There are three primary mechanisms to achieve this.

Adequacy decisions

The simplest route is an adequacy decision. The European Commission has determined that certain countries, territories, sectors, and organisations provide a level of data protection that is essentially equivalent to that of the EU.

As of 2026, adequacy decisions cover destinations including the UK, Japan, Israel, South Korea, Canada (for certain commercial organisations), Brazil, and U.S. organisations certified under the EU-U.S. Data Privacy Framework (DPF).

For transfers to the United States, the DPF, adopted in July 2023, allows transfers to participating U.S. organisations without additional contractual safeguards. However, many organisations maintain Standard Contractual Clauses (SCCs) as a contingency measure given the possibility of future legal challenges to the framework.

Standard Contractual Clauses (SCCs)

Think of SCCs as a standard legal contract, pre-approved by the European Commission, that the receiving party must sign before you can send them EU personal data. By signing, they commit to protecting that data to GDPR standards regardless of where they are based.

Since the Schrems II ruling in 2020, however, signing the contract is no longer enough on its own. You also need a Transfer Impact Assessment (TIA), a documented check that the laws in the recipient country won’t override the protections in the contract. If a government there can legally demand access to the data, the contract alone won’t protect you.

The current SCCs come in four versions depending on who is sending data to whom: controller to controller, controller to processor, processor to controller, and processor to processor.

In AI collaborations involving multiple organizations in different roles, you will often need more than one version running simultaneously.

For restricted transfers from the UK to non-adequate jurisdictions, organisations generally need the UK IDTA or the UK Addendum to the EU SCCs, plus a Transfer Risk Assessment (TRA) where required.

Binding Corporate Rules (BCRs)

BCRs are an option for large multinationals that need to transfer data regularly between their own group entities across borders. Instead of signing contracts every time, the group gets a single set of rules approved by a supervisory authority that covers all internal transfers.

They offer stronger protection than SCCs for complex corporate structures, but they take significant time and resources to obtain, typically 12 to 18 months from application to approval.

Data Processing Agreements and the multi-party AI challenge

In a typical data relationship, one party controls the data and another processes it on their behalf. In cross-border AI collaboration, it is rarely that simple.

When two organizations jointly decide what data to use and why, they become joint controllers, and both are legally responsible for compliance, not just the one who originally held the data.

This happens more often than people realize. AI joint ventures, research consortia, and multi-institutional model training projects all tend to create this shared responsibility without anyone explicitly agreeing to it.

The fix is straightforward: map the data flows early, agree in writing on who controls what, and formalize the arrangement before any processing begins.

Most compliance resources list GDPR’s principles without explaining which ones create the most practical risk for AI. Here are the four that matter most.

Purpose limitation and data minimization

These two principles catch out a lot of AI projects. Purpose limitation means you cannot take data collected for one reason and use it for something else.

A model trained on customer service data cannot be quietly repurposed for credit scoring or profiling without a fresh legal basis.

Data minimization means collecting only what you actually need and for AI, “what you need” should be defined at the model design stage, not just at the point of data collection.

Automated decision-making

If your AI system makes decisions that significantly affect individuals, think credit approvals, insurance pricing, or recruitment screening, individuals have the right to request a human review of that decision and to receive a genuine explanation of how it was reached.

This does not mean you cannot use AI for these decisions. It means your system must be built in a way that makes human oversight actually possible, not just theoretically available.

Privacy by design

This is the one that creates the most work for technical teams. Privacy protections must be built into the system architecture from the start, not added later.

That means the choice of model, training methodology, and data infrastructure all need to incorporate privacy controls from day one.

Pseudonymization and data minimization must be the default behaviour of the system, not an optional layer on top.

Data Protection Impact Assessments (DPIAs)

Before deploying any AI system that is likely to create significant risks for individuals, you are required to carry out a DPIA. AI systems that handle sensitive data, perform large-scale profiling, or involve systematic monitoring almost always meet this threshold.

A DPIA is not a form to fill in. It is a structured risk analysis that must honestly assess whether the processing is necessary, what the risks are, and what you are doing to address them.

Understanding the rules is one thing. Understanding where organizations actually fail is more useful.

Federated Learning is not automatically safe

Federated Learning is often presented as a privacy-preserving solution because the raw data never leaves each organization’s own environment. That part is true.

The risk that often gets missed is that the model updates themselves, the weights and gradients shared between parties, can leak information.

A sophisticated attacker can use techniques like model inversion or membership inference to reconstruct training data from those updates. Federated Learning alone is not enough in high-risk data environments. It needs additional controls layered on top.

“We anonymized it” is not a safe assumption

A lot of AI projects move forward on the belief that de-identified data is outside GDPR scope. In most cases, it is not. GDPR sets a very high bar for true anonymization: re-identification must be “no longer reasonably possible,” taking into account the tools and data available to a potential attacker.

Healthcare records, genomic data, and behavioral data are particularly hard to anonymize to that standard. If there is any realistic path to re-identification, the data is still personal data and full GDPR obligations apply.

Nobody agreed on who is responsible

In cross-border AI collaborations, the question of who controls the data and who merely processes it is frequently left undefined. This is a real compliance risk. Every party in the collaboration is responsible for their part of the processing chain.

If the organization receiving the data fails to meet its obligations, the organization that sent it can still face regulatory action. Roles and responsibilities need to be agreed and documented in writing before any processing begins.

Cloud infrastructure creates hidden transfer risks

Most AI workloads run on cloud platforms, and those platforms routinely store or process data across multiple countries. If your legal basis for the transfer is SCCs, your Transfer Impact Assessment needs to cover every country where the cloud provider might process the data, not just where the provider’s headquarters is located.

This is a step many organizations skip, and it leaves a significant gap in their compliance position.

This is the section that most compliance guides skip entirely: the specific technical controls that satisfy data protection by design requirements for AI.

Here is how the primary PETs map to GDPR obligations.

Fully Homomorphic Encryption (FHE)

Fully Homomorphic Encryption allows computation to be performed on encrypted data without ever decrypting it. The data owner retains the encryption key; the processing party sees only ciphertext throughout the entire computation.

This directly satisfies Article 25 data minimization requirements because the processor never has access to the underlying personal data.

For cross-border AI specifically, FHE means that EU personal data can be analyzed by a third-country processor without the data ever being transferred in a usable, identifiable form.

This is not a theoretical capability. FHE is already being explored in regulated healthcare environments for encrypted computation on patient data without decryption.

NHS England and the U.S. National Cancer Institute used PETs, including federated computation and secure processing environments, to conduct joint rare cancer research across borders without moving sensitive patient data.

Federated Learning with supplementary controls

Federated Learning lets multiple organizations train a shared AI model without anyone’s raw data ever leaving their own environment.

Each party trains on their own data locally and only shares the model updates. No raw data is pooled, no raw data is transferred. This directly supports GDPR’s data minimization requirement at the architecture level.

The catch is that Federated Learning alone is not enough. The European Data Protection Supervisor has explicitly stated that no single privacy technology is a silver bullet, and that classic security controls must sit alongside any FL deployment.

The specific risk is that model updates can still leak sensitive information if left unprotected.

The solution is to combine FL with additional safeguards. Secure aggregation, Differential Privacy, or Homomorphic Encryption applied to the model updates all significantly reduce that risk and produce a much stronger compliance position than Federated Learning on its own.

Secure Multi-Party Computation (MPC)

Secure Multi-Party Computation allows multiple parties to jointly compute a function over their combined data without any party learning the other parties’ inputs.

For collaborative AI between competing organizations (anti-money laundering consortia, cross-hospital research networks), MPC enables joint model training where no party exposes its proprietary dataset to the others.

MPC supports confidentiality, data minimization, and controlled-use architectures by allowing parties to compute approved functions without revealing their underlying inputs.

Trusted Execution Environments (TEEs)

Trusted Execution Environments are hardware-level secure enclaves that protect data and computation from the host system, including from the cloud provider’s own administrators.

TEEs can strengthen the technical safeguards available for sensitive processing by isolating workloads from the host environment and cloud operators. They may support a stronger risk assessment, but they should be evaluated alongside key management, attestation, logging, and the overall threat model.

The multi-PET approach

No single PET addresses every compliance obligation. The most defensible Article 25 implementations combine technologies: FHE or MPC for data-in-use protection, federated architecture to avoid centralizing raw data, TEEs for infrastructure-level trust, and Differential Privacy to bound re-identification risk in model outputs.

The combination of technologies should be selected based on the specific threat model, the sensitivity of the data, and the computational requirements of the AI use case.

Use this checklist before initiating any cross-border AI collaboration involving EU personal data.

Legal basis and governance

Confirm the lawful basis under Article 6 (and Article 9 for special category data) for each processing operation
Identify all data controllers and processors in the collaboration; document joint controller arrangements under Article 26 where applicable
Appoint or confirm your Data Protection Officer (DPO) if required under Article 37
Register the processing activity in your Article 30 Record of Processing Activities (ROPA)

Data transfer mechanisms

Identify all cross-border data flows, including flows to cloud infrastructure and sub-processors
Confirm adequacy status of each destination country using current European Commission decisions
Execute appropriate SCCs (correct module) for transfers to non-adequate countries
Complete a Transfer Impact Assessment for every SCC-based transfer; document supplementary measures
Maintain UK Addendum or IDTA for transfers involving UK-established parties post-Brexit
Review TIAs at least annually and when legal circumstances in the recipient country change

Data protection by design (Article 25)

Document data minimization controls at the system architecture level, not just in policy
Assess which PETs (FHE, FL, MPC, TEE) are technically feasible for the specific AI use case
Implement pseudonymization as a default in training data pipelines; document re-identification risk
Confirm that encryption is applied to data in transit, at rest, and also in use where PETs are deployed
Retain technical evidence of Article 25 compliance (system design documents, data flow diagrams, audit logs)

DPIA and risk management

Conduct a DPIA under Article 35 before deploying high-risk AI processing
If your AI system makes solely automated decisions that produce legal or similarly significant effects, assess whether Article 22 applies. In such cases, ensure appropriate safeguards are in place, including human intervention, the ability to express a view, and the ability to contest the decision
Define and document the human review mechanism for automated decisions
Identify and document the data subject rights process (access, erasure, rectification, portability) for AI-processed data

Ongoing compliance

Schedule periodic review of the lawful basis and data minimization controls as the model evolves
Monitor regulatory developments affecting international transfer frameworks
Conduct vendor due diligence on all sub-processors involved in the AI pipeline
Train all personnel with access to personal data in the AI workflow on relevant GDPR obligations

GDPR compliance for cross-border AI data collaboration is achievable. But contracts and policy documents alone will not get you there. Privacy needs to be built into the technical architecture itself.

Three things matter most. First, get your legal transfer mechanism right: either an adequacy decision you can genuinely rely on, or SCCs backed by a solid Transfer Impact Assessment.

Second, satisfy your privacy by design obligations with real technical controls, not just documentation. Third, be honest about what your de-identification approach actually achieves and whether it meets the GDPR standard.

The organizations doing this well have stopped treating it as purely a legal problem. Privacy-enhancing Technologies are now mature enough to run at the scale regulated industries require, and regulators are increasingly looking for technical evidence of compliance, not just governance paperwork.

For teams ready to move from framework to implementation,Duality’s platform is worth a serious look.

Being compliant on paper is one thing. Being compliant in production, across organizational boundaries, at regulated-industry scale is a completely different challenge. That is exactly what Duality is built for.

Duality’s platform brings together Fully Homomorphic Encryption, Federated Learning, Confidential Computing, and Secure Multi-Party Computation in one place.

Depending on the architecture, raw data can remain in place, or protected data can be processed inside isolated secure environments. Duality combines PETs such as FHE, federated analytics, confidential computing, and secure aggregation to reduce exposure during computation.

In practice, this means NHS England and the U.S. National Cancer Institute could run joint cancer research across borders without moving a single patient record.

It means financial institutions can collaborate on fraud detection and anti-money laundering models without exposing proprietary transaction data. And it means your team can do all of this up to 30x faster than with alternative methods.

Compliance and useful AI are not a trade-off. With Duality, you get both.

Frequently Asked Questions

How Do You Achieve GDPR Compliance for Cross-Border AI Data Collaboration?

Achieving GDPR compliance for cross-border AI data collaboration requires three layers of work. First, establish a valid legal transfer mechanism for any data leaving the EU or EEA. This typically means Standard Contractual Clauses combined with a Transfer Impact Assessment for transfers to countries without an adequacy decision. Second, satisfy Article 25 (data protection by design) by implementing technical controls that minimize data exposure at the processing layer, such as Privacy-Enhancing Technologies rather than relying on contractual safeguards alone. Third, complete a Data Protection Impact Assessment before initiating high-risk AI processing, and document the roles and responsibilities of each party in the collaboration through appropriate data processing or joint controller agreements.

What Is the Difference Between Anonymization and Pseudonymization Under GDPR?

Anonymization under GDPR means the data has been irreversibly processed so that re-identification of the individual is no longer reasonably possible. This removes the data entirely from GDPR scope. Pseudonymization replaces directly identifying information with artificial identifiers, but since the original data can be recovered using additional information, the data remains personal data under GDPR and all compliance obligations continue to apply. The practical implication is that most de-identification methods used in AI, including tokenization, k-anonymity, and data masking, produce pseudonymized rather than anonymized data and do not remove GDPR obligations.

Does the EU-U.S. Data Privacy Framework Remove the Need for Standard Contractual Clauses?

For certified U.S. organizations, the EU-U.S. Data Privacy Framework (DPF) adopted in July 2023 allows transfers of EU personal data without implementing additional safeguards such as SCCs. However, the DPF is subject to legal challenge, and best practice in 2026 is to maintain SCCs and TIA documentation in parallel as a fallback, even for DPF-certified recipients. For U.S. organizations that have not self-certified under the DPF, SCCs plus a Transfer Impact Assessment and supplementary technical measures remain mandatory for lawful data transfers from the EU.

What Is a Transfer Impact Assessment and When Is It Required?

A Transfer Impact Assessment (TIA) is a documented analysis of whether the legal environment in the data-receiving country allows the SCCs to work in practice. Specifically, it examines whether local surveillance laws, legal obligations to disclose data to authorities, or lack of effective data subject remedies would undermine the protections the SCCs are meant to provide. A TIA is required for every SCC-based transfer and should be reviewed whenever the legal or factual circumstances in the recipient country change. The EDPB’s guidance provides a six-step methodology for conducting TIAs, and the assessment must document any supplementary technical measures (such as encryption or pseudonymization) that address the residual risks identified.

Can Personal Data Be Fully Removed From GDPR Scope by Using Privacy-Enhancing Technologies?

Using PETs does not automatically remove data from GDPR scope, but it can reduce the level of risk and the nature of the processing significantly. Fully Homomorphic Encryption allows computation on data that remains encrypted throughout. The processor never accesses plaintext personal data, which reduces the transfer risk substantially and supports Article 25 compliance. However, the encrypted data still derives from personal data and the overall processing activity remains subject to GDPR. The practical effect of strong PET implementation is a more defensible Article 25 compliance position, a stronger TIA conclusion for cross-border transfers, and reduced DPIA risk scores, not full removal from GDPR scope.