Organizations have spent years trying to unlock value from data that sits outside their direct control.
Healthcare providers want to compare outcomes across institutions. Researchers need access to broader patient populations. Public sector organizations need to analyze information distributed across agencies and jurisdictions. AI teams want to build better models using data that cannot be centralized.
The opportunity is enormous.
The challenge is that most of this data lives in different places, follows different structures, and remains subject to privacy, security, governance, and regulatory requirements that make traditional approaches difficult to scale.
Even when organizations agree to collaborate, analysis often stalls before it begins.
Datasets rarely look alike. The same information is represented using different field names, formats, categories, and schemas. Analysts spend months aligning data structures before meaningful work can start. At the same time, organizations are reluctant to move sensitive information into centralized environments simply to make collaboration possible.
Duality 4.6 was built to address both challenges.
This release introduces Data Harmonization and expanded support for Custom Federated Workloads, enabling organizations to analyze distributed data without restructuring source systems, moving sensitive records, or rebuilding existing analytical workflows.
Data Harmonization for Real-World Data
Federated analysis has traditionally depended on a shared schema.
Every participating organization needed to organize data in a similar way before computations could run successfully. In theory this sounds straightforward. In practice it creates one of the largest barriers to collaboration.
Healthcare provides a familiar example.
One hospital records a field called “gender.” Another uses “sex.” One system stores values as “Male” and “Female.” Another uses numerical codes. Similar inconsistencies appear across claims systems, EHRs, registries, research databases, and operational platforms.
The challenge isn’t unique to healthcare. Any organization working across business units, partners, agencies, or jurisdictions encounters the same problem.
Duality 4.6 introduces a comprehensive Data Harmonization capability that enables federated computations across heterogeneous datasets.
Using the new Schema Harmonization Tool, teams can discover remote schemas, define a common analytical model, and configure transformations through a no-code interface.
Users can:
- Map source columns to a common schema
- Normalize values and formats across datasets
- Rename, transform, cast, and group fields
- Apply harmonization rules at execution time
The result is a unified analytical view without requiring participants to redesign source systems or create new copies of data.
Data remains under the control of the organization that owns it. Analysis runs against harmonized representations rather than forcing every participant into a single predefined structure.
Bring Your Own Workload
Data alignment is only one part of the equation.
Organizations also need the flexibility to run the analyses, models, and workflows that matter to their business.
Duality 4.6 significantly expands support for Custom Federated Workloads, building on the foundation introduced in version 4.5.
Data scientists and analysts can now independently develop, register, configure, and deploy federated workloads without relying on the Duality support team.
Most importantly, users can bring existing Python-based workloads into the platform without refactoring code into containers or rebuilding analytical pipelines from scratch.
Through a self-service registration process, users can define workload parameters, configure execution settings, and publish workloads for federated execution across participating organizations.
This approach allows teams to focus on the analysis itself rather than platform integration.
Existing models can continue to be used. Proven analytical methods can be reused. New workloads can be deployed more quickly and shared across collaborators with significantly less operational overhead.
How It Works
A typical workflow begins with schema discovery.
Users inspect participating datasets and retrieve structural information without exposing underlying records. Once schemas are available, a common analytical model is defined and mappings are configured across participating sources.
Fields are aligned. Formats are normalized. Harmonization rules are established.
The analyst then registers a federated workload using their own code and defines the parameters required for execution.
When a session is launched, the workload is distributed across participating organizations and executed against the harmonized datasets.
The outcome is a federated analysis that reflects the meaning of the underlying data rather than the way individual systems happen to store it.
Why This Matters
Fragmentation has become one of the defining characteristics of modern data environments.
Healthcare organizations operate across multiple EHR platforms. Financial institutions work across jurisdictions and regulatory boundaries. Public sector agencies maintain separate systems with different governance requirements. Enterprises acquire new systems faster than they can standardize existing ones.
The ability to collaborate across these environments increasingly determines the quality of analytics and AI initiatives.
Duality 4.6 reduces the operational work required to make that collaboration possible.
Organizations can continue to govern data locally, preserve existing workflows, and work with heterogeneous datasets while still participating in shared analytics and federated AI initiatives.
The result is a more practical path to extracting value from sensitive, distributed data.
What Else Is New in 4.6
Alongside Data Harmonization and Custom Federated Workloads, Duality 4.6 introduces several additional enhancements:
- Parquet File Support for more efficient data asset registration and access
- Edge Filtering to reduce unnecessary data transfer and improve execution efficiency
- Index Query Optimization to lower compute and memory requirements during federated execution
Together, these capabilities make federated analytics easier to deploy, easier to scale, and easier to integrate into real-world data environments.
As organizations continue to expand their use of AI and advanced analytics, success will increasingly depend on the ability to work across data that is distributed, governed by multiple stakeholders, and structured in different ways.
Duality 4.6 brings that reality closer to everyday practice by making both data alignment and federated execution significantly easier to achieve.