Large enterprises increasingly needed to collaborate on analytics and machine learning using data owned by multiple parties. Banks, telecom operators, insurers, retailers, and platforms each held valuable datasets, but many high-value use cases required working with joint data: risk modeling, fraud detection, forecasting, personalization, and advanced analytics.
However, this collaboration was blocked by trust and regulation. Raw datasets could not be shared or centralized, customer data could not be exposed to partners, and no party was willing to rely on a platform operator with access to sensitive information. Traditional data warehouses, data lakes, and third-party analytics platforms were incompatible with these constraints.
To enable joint analytics without data sharing, we built a decentralized DataLab platform that replaced shared datasets with shared computation. Instead of moving data into a central environment, participants connected their data through a virtual data warehouse abstraction, keeping it inside their own infrastructure or controlled environments.
The platform allowed partners to register datasets in a shared data catalog and make them discoverable under explicit rules. Access to data was requested through a marketplace, where permissions, usage restrictions, and collaboration scenarios were enforced by a decentralized protocol rather than manual agreements.
Model development took place in protected DataLab environments using depersonalized data. Analysts worked with familiar tools and languages while the system automatically applied depersonalization and distortion policies. Once models or scripts were ready, they were reviewed and approved before being executed at full scale.
Production execution happened in CleanRooms using trusted execution environments. Calculations ran across multiple parties’ datasets without exposing inputs, intermediate results, or models. Permissions, script approvals, execution logs, and billing events were managed by a decentralized protocol and recorded in an immutable ledger. The platform was delivered as software and protocol, not operated as a centralized service.
The platform provided enterprises with a single workspace to discover partner data, develop models, and run joint calculations without sharing raw data.
Participants could register and describe their datasets in a shared data catalog, define access rules, and request partners’ data through a marketplace. Data remained local or in controlled environments, accessed virtually rather than copied.
Teams could develop and train models in protected DataLab environments using depersonalized data and familiar programming languages. Approved models and scripts could then be executed automatically in CleanRooms on full-scale data, with full auditability and governance.
The platform supported activation of approved results only, such as aggregates, scores, reports, or derived datasets, which could be integrated into analytics systems, business processes, or communication channels via API. All calculations were secured by hardware-based trusted execution, decentralized permission management, and immutable execution logs.