Skip to content

๐Ÿ“ฆ Distribution Manifest (Hardened)

This manifest defines the distribution policy for the 16-sector dataset fleet. To ensure Apache 2.0 compatibility, restricted and NC-SA data is never bundled; instead, the engine provides Logic-Only synthesis paths.


๐Ÿ—๏ธ Bundled Datasets (Embedded)

Safe for immediate redistribution. No raw PII or restricted records. Verified Statistical Parity.

Sector Dataset Label License File Path
Finance SEC Fundamentals Public Domain industries/finance/datasets/synthetic_parity.jsonl
Demogs Population Trends CC BY 4.0 industries/demographics/datasets/synthetic_parity.jsonl
Housing HUD Rental Trends Public Domainยน industries/housing/datasets/housing_kb.jsonl
Enviro NOAA Climatology Public Domain industries/environment/datasets/noaa_climatology.jsonl
Enviro Copernicus Climate CC BY 4.0ยฒ industries/environment/datasets/copernicus_climate.jsonl

ยน HUD Fair Market Rents are U.S. Gov Public Domain. Attribution headers included as best practice. ยฒ Mandatory attribution embedded: "Generated using Copernicus Climate Change Service information 2026."


๐Ÿ”’ Restricted Datasets (Generator-Only)

Redistribution prohibited. Run the local generator to produce your own "Statistical Twin". Statistical parameters are validated for first and second moment parity against benchmark sources.

Sector Benchmark Description Status Restriction Type Generator
Healthcare Standard Clinical Repository ๐Ÿ”’ Restricted DUA / Credentialed clinical_generator.py
Energy Global Energy Standard ๐Ÿ”’ Restricted DUA / Restricted energy_generator.py
Manufacturing Industrial Benchmarking ๐Ÿ”’ Restricted DUA / Restricted industrial_generator.py
Retail E-Commerce Parity ๐Ÿ”’ Restricted NC-SA 4.0ยณ olist_generator.py
Agriculture Global Agri-Stats ๐Ÿ”’ Restricted NC-SA 3.0ยณ faostat_generator.py
Media Creative Metadata (IMDb) ๐Ÿ”’ Restricted Non-Commercialโด imdb_generator.py
Telecom Network Performance ๐Ÿ”’ Restricted NC-SA 4.0ยณ ookla_generator.py

ยณ NC-SA Distinction: These datasets are restricted by Copyright/ShareAlike. Synthetic outputs may inherit Non-Commercial restrictions. โด IMDb Policy: Non-Commercial by ToS. No "ShareAlike" clause; derivative restriction analysis differs from CC BY-NC-SA sources.


๐ŸŒ Live Datasets (API / URL)

Sector Data Provider Method License
Finance FRED REST API Upstream-dependent
Agri-Tech USDA NASS Quick Stats API Public Domain
Housing Zillow Research Download Non-Commercialโต
Telecom FCC Geospatial Public Domain
Labor ILOSTAT REST API CC BY / Restricted
Labor Bureau of Labor Statistics (BLS) Series API Public Domain

โต Zillow Research data is for personal/academic use only. Commercial redistribution of bulk data is prohibited.


๐Ÿ›๏ธ Comprehensive Registry (Citations)

Industry Primary Source / Citation URL Format License
Finance SEC EDGAR (Fundamentals) XBRL/CSV Public Domain
Finance FRED (Federal Reserve Economic Data) API/CSV Public Domain
Environment NOAA Climate Data Online API/CSV Public Domain
Environment Copernicus Climate Change Service GRIB/NetCDF CC BY 4.0ยฒ
Healthcare CMS Hospital General Information CSV/API Public Domain
Healthcare WHO Global Health Observatory API/CSV CC BY 4.0
Labor U.S. Bureau of Labor Statistics (BLS) API/CSV Public Domain
Labor ILOSTAT (International Labour Organization) REST API CC BY 4.0
Agriculture FAOStat (Food and Agriculture Organization) CSV/API CC BY-NC-SA 3.0
Agriculture USDA NASS (Quick Stats) CSV/API Public Domain
Commerce Marketplace Parity Repository (Olist) CSV CC BY-NC-SA 4.0
Housing HUD User (Fair Market Rents) CSV Public Domainยน
Housing Zillow Research (Economic Data) CSV Non-Commercial
Media IMDb Dataset Interface TSV Non-Commercial
Telecom Ookla Open Data (Speedtest Intelligence) Parquet CC BY-NC-SA 4.0
Telecom FCC Fixed Broadband Deployment CSV/API Public Domain
Manufact. U.S. Census Bureau (ASM) CSV/API Public Domain

Full registry and veracity diagnostics available in the Data Veracity & Provenance Report.


๐Ÿงช Local Synthesis Guide

To generate a compliant parity dataset, use the dedicated scripts in the generators/ directory. These scripts enforce the required compliance wrappers and inject legally defensible provenance metadata.

  1. Clinical: python industries/healthcare/generators/clinical_generator.py
  2. Energy: python industries/energy/generators/energy_generator.py
  3. Industrial: python industries/manufacturing/generators/industrial_generator.py
  4. Olist: python industries/retail/generators/olist_generator.py
  5. FAOStat: python industries/agriculture/generators/faostat_generator.py
  6. IMDb: python industries/media_entertainment/generators/imdb_generator.py
  7. Ookla: python industries/telecom/generators/ookla_generator.py

โš–๏ธ Terms of Use - Data & APIs

By using this harness, you agree to adhere to the terms of the respective data providers. - RESTRICTED Sources: You must have a valid DUA or credentialing agreement with the relevant data provider to use raw restricted data locally. - NC-SA Sources: You agree not to use synthetic outputs for commercial competition or redistribution where prohibited (e.g., Olist, Zillow). - Attribution: You will maintain all embedded source headers in output artifacts.


Last Updated: 2026-03-24 (Hardened) โš–๏ธ๐Ÿ›ก๏ธ๐Ÿ