← Innovation Labs/Lab L5LiveData generation & privacy

Synthetic Data Foundry — GANs · Diffusion · Tabular · Differential Privacy

Real customer data is a liability and a bottleneck. The Foundry generates statistically faithful, privacy-preserving synthetic data — tabular, text, image, time-series — so teams can train, test and audit models without ever touching production records.

Research thesis

If your team needs prod data to ship, you've built a privacy debt machine. Synthetic-first is faster, safer, cheaper.

Cold-start accuracy

+ 22 pts

Time-to-test-data

6w → 2h

Privacy disclosure risk

ε ≤ 1.0

Become a design partner →See live experiments

Active experiments

What the lab is testing right now

Tabular diffusion vs CTGAN

Benchmarking on 18 enterprise schemas for utility, fidelity and privacy.

Privacy-preserving text

DP fine-tuning + retrieval-conditioned generation for support transcripts and clinical notes.

Time-series fidelity

Synthetic transaction streams that pass downstream fraud-model evals within 2 pts of real.

Audit-by-replay

Synthetic populations for stress-testing models on edge cases and protected classes.

Shippable artefacts

Everything the lab ships

Generator library
Tabular, text, image and time-series generators with utility + privacy reports per release.
DP toolkit
ε / δ accounting, privacy-budget tracking, formal disclosure-risk certificates.
Utility evals
TSTR, downstream-task and population-statistics test suites — pass/fail in CI.
Audit datasets
Curated synthetic populations for fairness, robustness and edge-case stress.
Schema-aware connectors
Snowflake, BigQuery, Databricks, Postgres — discover schema, generate, write back.

Lab team

Synthetic Data Principal
Generative Modelling Researcher
Privacy & DP Specialist
Eval / Utility Engineer

Partners we collaborate with

SnowflakeDatabricksMicrosoft FabricOpenAIHugging FaceMOSTLY AI

Example output · Job · foundry.generatejson

POST /v1/foundry/generate
{
  "schema":       "snowflake://prod.crm.accounts",
  "rows":         500000,
  "method":       "tabular_diffusion_v3",
  "privacy":      { "dp_epsilon": 1.0, "delta": 1e-6 },
  "utility_evals":["tstr_classifier", "population_stats"],
  "egress":       "synth.warehouse.eu-west"
}
→ 200 {
  "job_id":         "f_4421",
  "rows_generated": 500000,
  "tstr_score":     0.93,
  "stats_drift":    0.04,
  "dp_certificate": "axp/dp/cert/8819.pdf"
}

Engagement timeline

Weeks 1–8 · first synthetic corpus shipped by week 3

1
Weeks 1–2
Schema + utility baseline
Discover schemas, agree utility tests, set DP budget.
2
Weeks 2–5
First synthetic corpus
Generate, evaluate, sign DP certificate, ship to dev / test environments.
3
Weeks 5–8
Production loop
CI integration, audit datasets, automated regeneration on schema drift.

Flagship pods

Productionised by these squads

Cold-Start Model Pod

Privacy & DP Pod

Audit Population Pod

Synthetic Time-Series Pod

Selected publications

Receipts, not just thesis

Tabular diffusion outperforms CTGAN on enterprise schemas
ICML Workshop on Synthetic Data·2025
Audit-by-replay: stress-testing fairness with curated synthetic populations
AXP Internal Whitepaper·2026

FAQs

What partners actually ask

Is synthetic data really safe?

Under formal differential privacy with ε≤1.0 and signed certificates — yes. We publish the disclosure-risk number, every time.

Do downstream models suffer?

Utility evals (TSTR, population stats) gate every release. We don't ship corpora that fail.

What schemas are supported?

Tabular (Snowflake / BQ / Databricks / Postgres), text, image, time-series. Schema-aware connectors discover and write back.

Can we audit fairness with this?

Yes — that's a primary use case. Curated synthetic populations stress-test on protected classes without touching production.

Design-partner programme · L5 Synthetic Data Foundry

Co-build Synthetic Data Foundry with us in Weeks 1–8.

We'll respond within one business day with a scoping note, a fixed-price outcome contract, and a named principal cleared for your domain. Design partners get first-look access, joint publication rights and roadmap influence.

• Outcome-priced — no T&M.
• Sovereign by default — your data, your region, your keys.
• Refund-backed if the contracted KPI isn't hit.
• Joint publication rights and conference slots.