← Innovation Labs/Lab L5LiveData generation & privacy

Synthetic Data FoundryGANs · Diffusion · Tabular · Differential Privacy

Real customer data is a liability and a bottleneck. The Foundry generates statistically faithful, privacy-preserving synthetic data — tabular, text, image, time-series — so teams can train, test and audit models without ever touching production records.

Research thesis
If your team needs prod data to ship, you've built a privacy debt machine. Synthetic-first is faster, safer, cheaper.
Cold-start accuracy
+ 22 pts
Time-to-test-data
6w → 2h
Privacy disclosure risk
ε ≤ 1.0
Active experiments

What the lab is testing right now

Tabular diffusion vs CTGAN

Benchmarking on 18 enterprise schemas for utility, fidelity and privacy.

Privacy-preserving text

DP fine-tuning + retrieval-conditioned generation for support transcripts and clinical notes.

Time-series fidelity

Synthetic transaction streams that pass downstream fraud-model evals within 2 pts of real.

Audit-by-replay

Synthetic populations for stress-testing models on edge cases and protected classes.

Shippable artefacts

Everything the lab ships

  • Generator library
    Tabular, text, image and time-series generators with utility + privacy reports per release.
  • DP toolkit
    ε / δ accounting, privacy-budget tracking, formal disclosure-risk certificates.
  • Utility evals
    TSTR, downstream-task and population-statistics test suites — pass/fail in CI.
  • Audit datasets
    Curated synthetic populations for fairness, robustness and edge-case stress.
  • Schema-aware connectors
    Snowflake, BigQuery, Databricks, Postgres — discover schema, generate, write back.
Lab team
  • Synthetic Data Principal
  • Generative Modelling Researcher
  • Privacy & DP Specialist
  • Eval / Utility Engineer
Partners we collaborate with
SnowflakeDatabricksMicrosoft FabricOpenAIHugging FaceMOSTLY AI
Example output · Job · foundry.generatejson
POST /v1/foundry/generate
{
  "schema":       "snowflake://prod.crm.accounts",
  "rows":         500000,
  "method":       "tabular_diffusion_v3",
  "privacy":      { "dp_epsilon": 1.0, "delta": 1e-6 },
  "utility_evals":["tstr_classifier", "population_stats"],
  "egress":       "synth.warehouse.eu-west"
}
→ 200 {
  "job_id":         "f_4421",
  "rows_generated": 500000,
  "tstr_score":     0.93,
  "stats_drift":    0.04,
  "dp_certificate": "axp/dp/cert/8819.pdf"
}
Engagement timeline

Weeks 1–8 · first synthetic corpus shipped by week 3

  1. 1
    Weeks 1–2
    Schema + utility baseline

    Discover schemas, agree utility tests, set DP budget.

  2. 2
    Weeks 2–5
    First synthetic corpus

    Generate, evaluate, sign DP certificate, ship to dev / test environments.

  3. 3
    Weeks 5–8
    Production loop

    CI integration, audit datasets, automated regeneration on schema drift.

Flagship pods

Productionised by these squads

Cold-Start Model Pod
Privacy & DP Pod
Audit Population Pod
Synthetic Time-Series Pod
Selected publications

Receipts, not just thesis

  • Tabular diffusion outperforms CTGAN on enterprise schemas
    ICML Workshop on Synthetic Data·2025
  • Audit-by-replay: stress-testing fairness with curated synthetic populations
    AXP Internal Whitepaper·2026
FAQs

What partners actually ask

Is synthetic data really safe?

Under formal differential privacy with ε≤1.0 and signed certificates — yes. We publish the disclosure-risk number, every time.

Do downstream models suffer?

Utility evals (TSTR, population stats) gate every release. We don't ship corpora that fail.

What schemas are supported?

Tabular (Snowflake / BQ / Databricks / Postgres), text, image, time-series. Schema-aware connectors discover and write back.

Can we audit fairness with this?

Yes — that's a primary use case. Curated synthetic populations stress-test on protected classes without touching production.

Design-partner programme · L5 Synthetic Data Foundry

Co-build Synthetic Data Foundry with us in Weeks 1–8.

We'll respond within one business day with a scoping note, a fixed-price outcome contract, and a named principal cleared for your domain. Design partners get first-look access, joint publication rights and roadmap influence.

  • • Outcome-priced — no T&M.
  • • Sovereign by default — your data, your region, your keys.
  • • Refund-backed if the contracted KPI isn't hit.
  • • Joint publication rights and conference slots.
By submitting you agree to our outreach for this enquiry. Your details are stored in our governed lead system.