Data Paradox: Theory vs. Practice
In modern medicine, there is a fundamental contradiction that experts call the "Data Paradox." On one hand, Artificial Intelligence has the potential to save thousands of lives by learning from real patient histories and clinical cases. On the other hand, this data – Protected Health Information (PHI) – is the most sensitive and strictly guarded information in the world.
Because of vital regulations like HIPAA and GDPR, most of today’s medical AI models are trained on "sterile" data: textbooks, research papers, or synthetic datasets. They know the theory perfectly, but they lack access to the reality of clinical practice.
Today, we are sharing a case study on a convergence of technologies from Google Research, NVIDIA, Super Protocol, and Yma Health that solves this paradox. By fine-tuning MedGemma 27B within a Verifiable Confidential Computing environment, we have demonstrated that it is possible to bridge the gap between theory and practice – safely training on the most sensitive patient data to achieve a 9.4/10 clinician recommendation score.
Barrier: Security as a Bottleneck for Innovation
For decades, the healthcare industry has sat on mountains of data that could revolutionize chronic disease diagnosis and therapy. However, the risk of data exposure during processing has remained an insurmountable barrier for the entire ecosystem – from agile HealthTech startups to established clinical providers.
Historically, stakeholders were forced to choose between two restrictive paths, neither of which fully solved the "Data Paradox":
- On-premise Silo: Building costly and slow-to-scale infrastructure. These silos lack the specialized compute power (e.g., NVIDIA H200s, B200s) required to handle the extreme VRAM demands of a 27B-parameter model like MedGemma, limiting innovation to the constraints of legacy hardware.
- Traditional Public Cloud: Compromising security for scalability. Traditional clouds require a high degree of subjective trust, as data must be decrypted in memory during active computation – a fundamental deal-breaker for institutional PHI compliance.
The industry reached a stalemate: the data is too sensitive for the traditional cloud, but the models are too large for localized infrastructure. To break this bottleneck, Super Protocol shifted the paradigm from subjective trust to hardware-enforced, verifiable proof of the entire computation environment.
Verification Gap: From Promised Privacy to Provable Trust
In healthcare, privacy must be verifiable, not just promised. While Trusted Execution Environments (TEEs) provide hardware-encrypted isolation for data in use, the TEE alone remains a "black box" – it secures the computation but cannot guarantee the provenance of the code or the integrity of the data pipeline.
Today, the primary barrier to adoption is the TEE orchestration challenge. Despite the availability of advanced TEEs, most clinical and development teams lack the specialized low-level expertise required to even initiate Remote Attestation. Without the ability to cryptographically prove that execution is occurring on genuine, untampered TEE (such as AMD SEV-SNP or Intel TDX), the trust chain is broken before it even begins.
When deploying large-scale models like MedGemma 27B, this complexity scales further, requiring a unified verification of the entire stack:
- Heterogeneous Attestation: Synchronizing CPU and NVIDIA GPU TEEs to ensure the entire execution environment is secure and the trust boundary remains unbroken as data moves across the PCIe bus.
- Workload Integrity (Code and Data Measurement): Cryptographic verification of unique hashes for the model, execution scripts, and PHI datasets to guarantee that only authorized logic is processed.
Establishing this trust chain manually is a major hurdle. This "verification gap" – the inability to initiate and bridge low-level hardware proofs with high-level AI workloads – means the environment remains unproven. Without such verification, processing PHI relies on subjective trust rather than objective proof, effectively halting the project's viability for clinical use.
Enabler: Automating the Trust Chain
To bridge the verification gap, Super Protocol provides a decentralized confidential compute layer that abstracts the complexity of TEE hardware into a universal, ready-to-use infrastructure. It removes the need for manual setup, providing a provider-agnostic environment where security is enforced by the protocol’s architecture rather than a central authority.
By automating the low-level handshake between hardware and software, Super Protocol establishes a seamless Trust Chain that extends from the physical silicon (NVIDIA/AMD/Intel) to the specific AI model and clinical data. This ensures that:
- Zero-Touch Remote Attestation: The integrity of the TEE is cryptographically verified for each execution, removing the barrier of deep technical expertise and ensuring a pristine environment.
- Workload Integrity & Fail-Closed Enforcement: Precise measurement of the model, execution scripts, and PHI datasets ensures only "Authorized Logic" is maintained. If any component fails validation, execution is blocked before it starts.
- Autonomous Environment Provisioning: Super Protocol dynamically builds the entire secure environment. It handles the secure delivery of encrypted models and datasets directly into the TEE, ensuring that the data and the compute never meet outside the protected environment.
- Decentralized Policy Enforcement: Security boundaries are enforced by the protocol itself. No party – not even the cloud provider or Super Protocol – can bypass these policies or access the execution environment.
Clinical AI in Practice: Powering Yma Health
Orchestration Impact: Bypassing the TEE Barrier
The collaboration with Super Protocol allowed Yma Health to bypass the TEE orchestration barrier by automating the entire execution lifecycle. By providing what effectively functions as Confidential DevOps-as-a-Service, Super Protocol managed the automated provisioning of TEE-enabled hardware and secure environments.
This transformed the complex MedGemma 27B pipeline into a secure, production-ready solution, allowing the AI team to treat highly sensitive, confidential infrastructure as easily as a public cloud and focus entirely on clinical fine-tuning:
- Instant Compliance Readiness: By ensuring PHI was only decrypted within validated environments, Yma Health achieved a level of data isolation that meets strict institutional requirements without building a custom security stack.
- Verifiable Evidence: Every inference task generates an immutable report that can be independently verified. This provides the Yma team with the "Technical Certainty" required for clinical audits, peer-reviewed validation, and regulatory compliance.
- Operational Agility: By utilizing standard Docker containers, the team deployed the 27B-parameter model on high-performance TEE-enabled GPUs in a fraction of the time typically required for confidential computing setups.
Phase 1: Confidential Fine-Tuning on Real Clinical Data
To address the structural limitations of "sterile" datasets, Yma Health fine-tuned MedGemma 27B using real, protected clinical dialogues.
- Hardware: Training was performed inside TEEs using NVIDIA H200 GPUs paired with AMD Genoa CPUs with SEV-SNP enabled.
- Data Lifecycle: PHI remained encrypted at rest and in transit, with plaintext accessible only within the secure boundary. Encryption keys were managed within the TEE, ensuring that neither the cloud provider nor Super Protocol could intercept the data during processing. All data, intermediate training states, and the runtime environment itself were automatically wiped upon completion. The resulting MedGemma model was delivered exclusively to Yma Health in an encrypted format.
- Zero-Access Execution: The entire process was cryptographically isolated, ensuring that raw PHI was never exposed to human operators or external systems.
Phase 2: Training Methodology & Tooling
To align MedGemma with real-world clinical reasoning, Yma Health implemented a specialized training stack where the methodology was optimized for medical accuracy and the engine for high-performance execution.
Two-Stage Alignment:
- Supervised Fine-Tuning (SFT): Focused on factual grounding – teaching the model how clinicians structure explanations, handle uncertainty, and use patient-facing language.
- Direct Preference Optimization (DPO): Aligned responses with physician preferences to ensure practical usefulness and establish safe communication boundaries.
Engine:
- Unsloth (Training Framework): Unsloth was used as a modern training framework with significant optimizations for LLM fine-tuning. It provides up to 2x faster training with 70% less memory usage, making confidential fine-tuning of large models economically feasible.
Phase 3: Data Curation & Domain Focus
The curation process prioritized clinical relevance and evidence quality over sheer volume, focusing on a high-signal 120,000-record dataset. To solve the "Data Paradox," the training centered on Metabolic Health, specifically GLP-1 receptor agonist (GLP-1RA) therapies and related cardiometabolic disorders.
The domain was structured into two subdomains:
- Approved Indications: Efficacy, safety profiles, and patient outcomes for each GLP-1RA medication, including the most recently approved drugs.
- Related Cardiometabolic Disorders: Risk factors, prevention, and lifestyle influences for conditions treated with GLP-1RAs.
This focused approach demonstrates Yma’s repeatable methodology: the same framework can be applied to any medical domain.
The training dataset was organized across multiple functional layers:
- Medical Guidelines & Protocols: ~100 documents from leading medical associations (published within the last 2-3 years) to ensure current treatment approaches.
- Scientific Studies & Reviews: ~200 research publications covering efficacy, side effects, drug interactions, and outcomes across different patient populations. Emphasis on large-scale meta-analyses for data reliability.
- General Medical Knowledge: Medical textbooks and open-source datasets establishing a foundational understanding of terminology and pathophysiology.
- Real-World Medical Conversations: Transcripts of actual medical consultations (with patient consent) enabling natural, empathetic communication patterns.
- Safety & Awareness Datasets: Specialized datasets teaching appropriate caution with sensitive inquiries and appropriate redirection to healthcare professionals when AI confidence is low.
Phase 4: Confidential Inference
Confidential fine-tuning alone is not enough for medical AI. For a model to be useful in practice, clinicians must be able to run inference on real patient data under the same privacy guarantees as during training.
- Deployment: The adapted MedGemma 27B model was deployed for inference using NVIDIA H200 GPUs paired with AMD Genoa CPUs (SEV-SNP) in TEE mode.
- Serving Engine (vLLM): Inference was served using vLLM, a production-grade engine featuring PagedAttention and support for high-concurrency workloads. It was selected as the optimal framework for serving the Yma-fine-tuned MedGemma 27B model in confidential mode, enabling it to handle realistic clinical workloads through an OpenAI-compatible API.
- Confidential Access: External access to the inference service was provided through Super Protocol’s confidential tunnels. This architecture ensured that all clinical queries remained encrypted and strictly confined to the TEE, preventing any exposure to the public network.
Result: Clinical Validation and Expert Feedback
The core proof of this project’s success is the high rating given by medical professionals. More than ten independent practicing endocrinologists from UAE hospital networks validated the adapted MedGemma 27B model using a 100-question set in both English and Arabic.
Key Performance Metrics
- 9.4/10 Overall Recommendation Score: Clinicians overwhelmingly favored the fine-tuned model for professional use, citing its reliability and accuracy.
- 4.6/5 Average Rating: The model consistently scored high across three critical dimensions: Correctness, Safety, and Practical Usefulness.
- Clinical Superiority: Specialists noted significant improvements in handling symptom progression and treatment-related reasoning compared to the baseline model.
Independent Benchmarking
In blind evaluations comparing Yma’s MedGemma mode against ChatGPT and human doctors, clinicians used a 5-point Likert scale to assess performance.
- Safety (79% Agreement): Physicians rated Yma’s safety levels significantly higher than ChatGPT (70%) and remarkably close to the human doctor baseline (82%).
- Conciseness (82% Agreement): The model provided precise medical answers without "filler" text, far outperforming ChatGPT's 64%.

Image 1: Stacked barplots showing Safety and Conciseness benchmarks
Demonstration: Clinical Reasoning in Action
The validation sessions demonstrated that the model identifies life-threatening complications that general-purpose models often miss.
- Catching "Red Flags": As shown in Image 2, the model correctly identified potential pancreatitis in a patient on Mounjaro and immediately advised urgent medical attention.
- Foundation vs. Specialized AI: This leap in utility confirms that while foundation medical models provide a powerful baseline, they require fine-tuning on real clinical data to be truly practical.

Image 2: Smartphone screens showing Ozempic/Mounjaro side effect dialogues
Yma Health Quote:
Instruction tuning made LLMs responsive. Сonfidential inference made them respect user privacy. Confidential fine-tuning makes them stable and practical while caring about overall data security. Just imagine: you can replace tons of verbose instructions with only a few compact adapters that focus the model on what matters most. Thanks to open-source enthusiasts worldwide – and with Super Protocol’s confidential computing infrastructure, this is what we’ve just achieved!
– Daniil Pimanov, Head of AI, Yma Health
Conclusion: Breaking the Paradox
This project demonstrates a new paradigm of trust where Super’s verifiable Zero-Trust architecture and confidential computing transform sensitive medical data into life-saving innovations. It provides a clear framework for how foundation models like MedGemma 27B can reach their true potential and be safely applied to real-world clinical cases.
By shifting the foundation of security from contracts to architecture, we have shown that hospitals and AI developers no longer need to export sensitive data or rely on trust-based controls. This is the definitive path for bringing powerful AI into healthcare practice, while maintaining absolute privacy and regulatory compliance.