平均就足够了
Average Is All You Need

原始链接: https://rawquery.dev/blog/average-is-all-you-need

## LLM 与“足够好”的数据分析的兴起 大型语言模型 (LLM) 正在迅速 democratize 之前复杂的任务,数据分析就是下一个领域。虽然 LLM 最初擅长生成平均水平的*内容*,但现在它们能够提供平均水平的*数据洞察*——这是一个强大的转变。 过去,提取有意义的数据需要专门技能(如 SQL)和大量精力。现在,像 rawquery 这样的平台允许用户用简单的语言*描述*他们想要进行的分析。然后,LLM 代理处理技术复杂性——编写查询、生成图表和提供结果。 提供的示例表明,只需提出一个问题,就可以轻松确定营销活动是否影响了收入。无需复杂的归因建模或数据工程。LLM 处理“平均”工作,让用户专注于*思考*数据及其影响。 这并非关于取代数据专业人员,而是赋予每个人利用他们数据的能力。这是关于将直觉知识转化为可操作的洞察,快速高效地进行,并认识到快速提供的“平均”结果可能是真正神奇的。

Hacker News 新闻 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 平均值就足够了 (rawquery.dev) 20 分,AlexC04 2小时前 | 隐藏 | 过去 | 收藏 | 1 条评论 帮助 drfloyd51 1分钟前 [–] 如果平均值就足够了,那么任何人都可以做到。我能提供什么价值?员工如何脱颖而出?老板为什么一开始不直接让AI生成图表?每个人的收入都会低于平均水平,因为他们被解雇了。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

·6 min read·Editorial

LLMs will make more of your average stuff. And that's OK.


This is not going to be much of a hot take but whether we like it or not, whether we want to admit it or not: LLMs have eaten the world.

They first had a go at creative fields where they essentially made everyone capable of publishing an average text with some average ideas for an average audience, but incredibly fast and easy. Whereas before, average was expensive in terms of both time and effort, average became cheap.

Software is now getting the same treatment. Very likely other fields are going to experience the same average treatment; and bets are high on everything text and based on descriptive textual semantics (IP, lawyers, translators, Marvel movies...)

Now there is nothing inherently bad about average stuff, it sits, by definition, in the middle of the normal distribution of stuff. It is in fact amazing that anyone can now create average things whereas before they had to fight hard for sub-par; they now have to settle for average or do better and try to think about it.

I can't draw. But now, I still can't draw, but better.


Data is the same problem, but better.

Here is the thing about data: your intuitive knowledge of what you want from it is much higher than average. You know what is in your organisation's data and you can likely "feel" what is hidden in it. You just do not necessarily know how to get the information out of it effectively; most people do not write SQL that well, do not understand syncing strategies that intuitively, do not know how to generate charts that nicely.

You know who does incredibly, amazingly, averagely well at all of that? Any LLM.

So that is what we built rawquery for. A data platform that is designed to be operated by LLM agents. You connect your sources, and then you talk to Claude Code, Cursor, or whatever agent you use. You describe what you want in plain language. The agent writes the SQL, runs the queries, creates the charts, publishes the results.

It deals with the average. You deal with the thinking.

The scenario.

I have transactional data: literally transactional from Stripe. And I have an email campaign that I launched from HubSpot. And I want to know, simply, if the fact that I did a mailing campaign increased my average basket or the number of customers.

In the good old days I would do what is called an attribution model, which is some sort of wankery to say "hey is one linked to another or not, here are 5 different ways to prove to your manager you did something useful."

But this is a pain, first because, if you do anything that is not selling a product online that people can buy right when they click a button, it is a drag to create those attribution models effectively: is it last click, first click, weighted attribution... who knows. Nobody knows. Everybody gives up and just adds it to a dashboard and pretends it makes sense.

Now, what if you could just describe the joins and correlations you want, in plain English, and get the chart at the end?

Here is how it would look.

Step 1. Connect your sources.

You tell Claude Code: "connect my Stripe and my HubSpot to rawquery." Claude runs:

bash

rq connections create stripe-prod --type stripe \

-p api_key=sk_live_xxx

rq connections create hubspot-crm --type hubspot \

-p access_token=pat-xxx

Then syncs both:

bash

rq connections sync stripe-prod

rq connections sync hubspot-crm

Two minutes later, your data is in. Claude checks what landed:

+-------------+-------------+
|   SCHEMA    | TABLE COUNT |
+-------------+-------------+
| stripe_prod |           6 |
| hubspot_crm |           4 |
+-------------+-------------+
+-------------+-------------------+-----------+
|   SCHEMA    |       TABLE       | ROW COUNT |
+-------------+-------------------+-----------+
| stripe_prod | customers         |     2,841 |
| stripe_prod | charges           |    11,207 |
| stripe_prod | invoices          |     8,934 |
| stripe_prod | subscriptions     |       612 |
| stripe_prod | products          |        18 |
| stripe_prod | prices            |        42 |
| hubspot_crm | contacts          |     3,200 |
| hubspot_crm | deals             |       891 |
| hubspot_crm | email_events      |    24,500 |
| hubspot_crm | campaigns         |        47 |
+-------------+-------------------+-----------+

Step 2. You talk. The agent queries.

You say to Claude: "I ran an email campaign called 'spring-sale-2026' in March. Did it actually move the needle on revenue? Compare people who received the email vs. those who didn't."

Claude doesn't ask you to define an attribution model. It doesn't open a whiteboard. It runs:

sql

SELECT

CASE WHEN e.recipient IS NOT NULL THEN 'Received email' ELSE 'No email' END as cohort,

COUNT(DISTINCT c.id) as customers,

COUNT(ch.id) as transactions,

ROUND(AVG(ch.amount / 100.0), 2) as avg_basket,

ROUND(SUM(ch.amount / 100.0), 2) as total_revenue

FROM stripe_prod.customers c

LEFT JOIN hubspot_crm.email_events e

ON LOWER(c.email) = LOWER(e.recipient)

AND e.type = 'DELIVERED'

AND e.campaign_id IN (

SELECT id FROM hubspot_crm.campaigns WHERE name = 'spring-sale-2026'

)

LEFT JOIN stripe_prod.charges ch

ON ch.customer = c.id

AND ch.created >= '2026-03-01'

AND ch.status = 'succeeded'

WHERE c.created < '2026-03-01'

GROUP BY 1

ORDER BY avg_basket DESC

+----------------+-----------+--------------+------------+---------------+
|     COHORT     | CUSTOMERS | TRANSACTIONS | AVG_BASKET | TOTAL_REVENUE |
+----------------+-----------+--------------+------------+---------------+
| Received email |       847 |        1,204 |      89.41 |    107,629.64 |
| No email       |     1,994 |        2,118 |      61.27 |    129,728.86 |
+----------------+-----------+--------------+------------+---------------+

The email cohort has a 46% higher average basket. You did not write a single line of SQL. You did not set up an attribution model. You asked a question, in English, and got a table.

Step 3. Go deeper, still in English.

You follow up: "Break it down by week. I want to see if the effect wore off."

sql

SELECT

DATE_TRUNC('week', ch.created)::DATE as week,

CASE WHEN e.recipient IS NOT NULL THEN 'Email' ELSE 'Control' END as cohort,

COUNT(ch.id) as transactions,

ROUND(AVG(ch.amount / 100.0), 2) as avg_basket

FROM stripe_prod.charges ch

JOIN stripe_prod.customers c ON ch.customer = c.id

LEFT JOIN hubspot_crm.email_events e

ON LOWER(c.email) = LOWER(e.recipient)

AND e.type = 'DELIVERED'

AND e.campaign_id IN (

SELECT id FROM hubspot_crm.campaigns WHERE name = 'spring-sale-2026'

)

WHERE ch.created >= '2026-03-01' AND ch.created < '2026-04-01'

AND ch.status = 'succeeded'

AND c.created < '2026-03-01'

GROUP BY 1, 2

ORDER BY 1, 2

+------------+---------+--------------+------------+
|    WEEK    | COHORT  | TRANSACTIONS | AVG_BASKET |
+------------+---------+--------------+------------+
| 2026-03-02 | Control |          502 |      63.18 |
| 2026-03-02 | Email   |          410 |      94.72 |
| 2026-03-09 | Control |          488 |      60.44 |
| 2026-03-09 | Email   |          379 |      91.05 |
| 2026-03-16 | Control |          571 |      62.91 |
| 2026-03-16 | Email   |          238 |      78.33 |
| 2026-03-23 | Control |          557 |      58.07 |
| 2026-03-23 | Email   |          177 |      67.12 |
+------------+---------+--------------+------------+

The effect decays over 3 weeks. Week 1: +50% avg basket. Week 4: +15%. The campaign worked, then faded. You know this now. You did not need a data team to tell you.

Step 4. Save it. Chart it. Share it.

You tell Claude: "Save that weekly breakdown, make a chart, and give me a link I can send to my manager."

bash

# Save the query

rq queries create campaign-weekly-impact \

--sql '...' \

--description 'Weekly avg basket: email cohort vs control, spring-sale-2026'

# Create a chart from it

rq charts create campaign-impact \

--query campaign-weekly-impact \

--type line \

--x week \

--y avg_basket \

--series cohort

# Publish it

rq charts publish campaign-impact

That's a public URL. Your manager clicks it, sees the chart, sees the data. No login required. No dashboard tool. No "can you give me access to Looker."

Average is actually magic.

This is not only average. This is actual magic.

So let's be real: the SQL is average. The joins are average. The chart is average. And that took us less than 5 minutes and that was amazing, that is the entire point.

You did not need a data engineer to model your HubSpot data, or a meeting to agree on whether it should be last-click or first-click or linear or time-decay or whatever.

You needed a query, written fast, on data you already own. Your LLM wrote it. You confirmed it made sense. Your manager got a link.

Honestly, average is clearly magic; prove me wrong.


rawquery is a data platform with a CLI that LLM agents can use directly. Connect your data, let your agent query it, share the results. Try it free.

联系我们 contact @ memedata.com