Show HN:我构建了一个AI数据集生成器
Show HN: I built an AI dataset generator

原始链接: https://github.com/metabase/dataset-generator

该数据集生成器通过结合人工智能和本地生成,简化了为演示、学习和仪表板创建真实数据的过程。用户使用会话提示构建器定义他们所需的数据集,指定业务类型、模式(一个大表或星型模式)和行数。 “预览数据”功能利用OpenAI(GPT-4o)生成详细的数据规范,包括模式、业务规则和事件逻辑。这会产生很小的成本。创建规范后,该工具在本地使用Faker生成数据行,确保与AI定义的模式保持一致。 下载CSV或SQL数据集,无论大小,都是免费的,并使用相同的LLM生成的规范。它还通过Docker提供一键式Metabase集成,用于即时数据探索,允许用户可视化和分析他们生成的数据。该应用程序使用Next.js、TypeScript、Tailwind CSS和ShadCN UI构建,提供现代、黑暗主题的体验。您可以通过编辑`lib/spec promisents`来定制业务类型和模式逻辑。

MatthewHefferon创建了一个开源AI数据集生成器,以简化为仪表板和演示创建真实、虚假数据的过程,这是他遇到的一个常见痛点。该工具使用GPT-4o根据用户选择的标准(如业务类型和行数)生成详细的数据库模式和业务规则。然后,Faker用符合这些规则的数据填充数据库。 用户可以预览、将数据导出为CSV或SQL,或启动元数据库实例进行即时探索。Mritchie712使用与Cursor和Claude Code类似的工作流程,利用网络研究为Definite.app的演示生成数据库模式和数据,突出了自动化数据库创建和填充以展示公司运营的潜力。Margotli发现该工具可用于学习分析或生成样本数据。
相关文章

原文

Generate realistic datasets for demos, learning, and dashboards. Instantly preview data, export as CSV or SQL, and explore with Metabase.

Features:

  • Conversational prompt builder: choose business type, schema, row count, and more
  • Real-time data preview in the browser
  • Export as CSV (single file or multi-table ZIP) or as SQL inserts
  • One-click Metabase launch for data exploration
  • Next.js (App Router, TypeScript)
  • Tailwind CSS + ShadCN UI (modern, dark-themed UI)
  • OpenAI API (GPT-4o for data generation)
  • Metabase (Dockerized, launched on demand)
  1. Clone the repo:

    git clone <your-repo-url>
    cd dataset-generator
  2. Create your .env file:

    Copy the example file and fill in your OpenAI API key:

    cp .env.example .env.local

    Then edit .env.local and add your OpenAI API key after the = sign.

  3. Start the Next.js app:

  4. Generate a dataset:

    • Use the prompt builder to define your dataset.
    • Click "Preview Data" to see a sample.
  5. Export or Explore:

    • Download your dataset as CSV or SQL Inserts.
    • Click "Start Metabase" to spin up Metabase in Docker.
    • Once Metabase is ready, click "Open Metabase" to explore your data.
    • When done, click "Stop Metabase" to shut down and clean up Docker containers.
  • /app/page.tsx – Main UI and prompt builder
  • /app/api/generate/route.ts – Synthetic data generator (OpenAI)
  • /app/api/metabase/start|stop|status/route.ts – Docker orchestration for Metabase
  • /lib/export/ – CSV/SQL export logic
  • /docker-compose.yml – Used only for Metabase, not for the app itself

When you click "Start Metabase", it will launch Metabase in a Docker container. Once ready:

  1. Click "Open Metabase" to access the Metabase interface
  2. Follow Metabase's setup process
  3. To analyze your generated data:

Cost & Data Generation Summary

Action Calls OpenAI? Cost? Uses LLM? Uses Faker? Row Count
Preview Yes ~$0.05 Yes Yes 10
Download CSV No $0 No Yes 100+
Download SQL No $0 No Yes 100+

Key Points:

  • You only pay for the preview/spec generation (~$0.05 per preview)
  • All downloads use the same columns/spec, just with more rows, and are free
  • When you preview a dataset, the app uses OpenAI to generate a detailed data spec (schema, business rules, event logic) for your chosen business type and parameters.
  • All actual data rows are generated locally using Faker, based on the LLM-generated spec.
  • Downloading or exporting data never calls OpenAI again—it's instant and free.
  1. Select your business type, schema, and other parameters.
  2. Click "Preview Data" to generate a 10-row sample (incurs a small OpenAI cost).
  3. Download CSV/SQL for as many rows as you want—no extra cost, always uses the same schema/columns as the preview.
  • One Big Table (OBT): A single, denormalized table with all relevant columns.
  • Star Schema: Multiple tables (fact + dimension) for more advanced analytics. The LLM spec guides the structure, and the generator outputs all tables locally.
  • To add new business types, rules, or schema logic, edit lib/spec-prompts.ts
联系我们 contact @ memedata.com