Synthetic Data Foundry — GANs · Diffusion · Tabular · Differential Privacy
Real customer data is a liability and a bottleneck. The Foundry generates statistically faithful, privacy-preserving synthetic data — tabular, text, image, time-series — so teams can train, test and audit models without ever touching production records.
What the lab is testing right now
Benchmarking on 18 enterprise schemas for utility, fidelity and privacy.
DP fine-tuning + retrieval-conditioned generation for support transcripts and clinical notes.
Synthetic transaction streams that pass downstream fraud-model evals within 2 pts of real.
Synthetic populations for stress-testing models on edge cases and protected classes.
Everything the lab ships
- Generator libraryTabular, text, image and time-series generators with utility + privacy reports per release.
- DP toolkitε / δ accounting, privacy-budget tracking, formal disclosure-risk certificates.
- Utility evalsTSTR, downstream-task and population-statistics test suites — pass/fail in CI.
- Audit datasetsCurated synthetic populations for fairness, robustness and edge-case stress.
- Schema-aware connectorsSnowflake, BigQuery, Databricks, Postgres — discover schema, generate, write back.
- Synthetic Data Principal
- Generative Modelling Researcher
- Privacy & DP Specialist
- Eval / Utility Engineer
POST /v1/foundry/generate
{
"schema": "snowflake://prod.crm.accounts",
"rows": 500000,
"method": "tabular_diffusion_v3",
"privacy": { "dp_epsilon": 1.0, "delta": 1e-6 },
"utility_evals":["tstr_classifier", "population_stats"],
"egress": "synth.warehouse.eu-west"
}
→ 200 {
"job_id": "f_4421",
"rows_generated": 500000,
"tstr_score": 0.93,
"stats_drift": 0.04,
"dp_certificate": "axp/dp/cert/8819.pdf"
}Weeks 1–8 · first synthetic corpus shipped by week 3
- 1Weeks 1–2Schema + utility baseline
Discover schemas, agree utility tests, set DP budget.
- 2Weeks 2–5First synthetic corpus
Generate, evaluate, sign DP certificate, ship to dev / test environments.
- 3Weeks 5–8Production loop
CI integration, audit datasets, automated regeneration on schema drift.
Productionised by these squads
Receipts, not just thesis
- Tabular diffusion outperforms CTGAN on enterprise schemasICML Workshop on Synthetic Data·2025
- Audit-by-replay: stress-testing fairness with curated synthetic populationsAXP Internal Whitepaper·2026
What partners actually ask
Under formal differential privacy with ε≤1.0 and signed certificates — yes. We publish the disclosure-risk number, every time.
Utility evals (TSTR, population stats) gate every release. We don't ship corpora that fail.
Tabular (Snowflake / BQ / Databricks / Postgres), text, image, time-series. Schema-aware connectors discover and write back.
Yes — that's a primary use case. Curated synthetic populations stress-test on protected classes without touching production.
Co-build Synthetic Data Foundry with us in Weeks 1–8.
We'll respond within one business day with a scoping note, a fixed-price outcome contract, and a named principal cleared for your domain. Design partners get first-look access, joint publication rights and roadmap influence.
- • Outcome-priced — no T&M.
- • Sovereign by default — your data, your region, your keys.
- • Refund-backed if the contracted KPI isn't hit.
- • Joint publication rights and conference slots.