优化Cloudflare D1数据库查询之旅

优化Cloudflare D1数据库查询之旅
Journey to Optimize Cloudflare D1 Database Queries

原始链接: https://gist.github.com/rxliuli/be31cbded41ef7eac6ae0da9070c8ef8

使用Cloudflare Workers结合D1时，我遇到了数据库性能挑战。问题排查涉及D1面板监控、分析查询语句/行数以及检查返回的元数据。D1并非仅仅因为是Cloudflare就天生速度更快；简单的查询也可能很慢。关键优化包括：对多个写入操作使用D1批量操作（尤其重要，因为写入操作比读取操作慢）、避免不必要地更新ID字段（这会触发多余的行读取）、实现基于游标的分页以避免在计数查询中进行全表扫描、将复杂的多分表连接拆分为更小的查询，并在应用层进行聚合以避免笛卡尔积爆炸，以及使用分块批量插入多条记录，以保持在SQLite的参数限制内。记住要彻底进行单元测试，因为服务器端问题会直接影响成本。持续的发现、调查、尝试解决方案和监控周期对于迭代改进和及时发现性能下降至关重要。

Hacker News 最新 | 往期 | 评论 | 提问 | 展示 | 招聘 | 提交登录优化 Cloudflare D1 数据库查询之旅 (gist.github.com) rxliuli 1小时前 3 分 | 隐藏 | 往期 | 收藏 | 讨论加入我们，参加 6 月 16-17 日在旧金山举办的 AI 初创公司学校！指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们搜索：

原文

Recently, I've been working on server-side projects using Cloudflare Workers with D1 database. During this process, I encountered several database-related challenges. Since databases are quite unfamiliar territory for frontend developers, I decided to document my experiences.

The image below shows the request records for the past 30 days, revealing dramatic fluctuations in database queries.

Solving problems begins with identifying them. Here are several methods that helped me spot issues:

Monitoring the D1 dashboard to detect anomalies in database operations
Examining query statements and row read/write counts, with special attention to queries with high counts or rows read/written
Using c.env.DB.prepare('<sql>').run()).meta to check the returned metadata, which reveals how many rows were actually read/written for each query

It's important to understand that while Workers and D1 are both Cloudflare services, using them together doesn't make D1 faster. For example, a simple query like this has an average response time exceeding 200ms:

await db.select().from(user).limit(1)

When an endpoint includes multiple database operations, it's best to use D1 batch operations, especially for write operations which are even slower than queries due to lack of read-only replicas. For instance, instead of:

await db.insert(user).values({...})
await db.insert(tweet).values({...})

Use batch operations:

await db.batch([
  db.insert(user).values({...}),
  db.insert(tweet).values({...})
])

This approach requires only a single REST request to D1 to complete multiple database write operations.

Note 1: Prisma doesn't support D1 batch operations, which led me to switch to Drizzle. Note 2: Be cautious when using batch for multiple queries, particularly when tables have columns with identical names.

Excluding IDs from Update Operations

When updating records, it's important to exclude the ID field (even if it remains unchanged). Consider this code:

await db.update(user).set(userParam).where(eq(user.id, userParam.id))

The actual SQL executed:

update "User" set "id" = ?, "screenName" = ?, "updatedAt" = ? where "User"."id" = ?

If this ID is referenced by foreign keys in other tables, it can lead to a massive number of row reads. For example, if a "tweet" table has a userId field referencing this ID and contains 1,000 records, updating the user with the ID included will cause 2,005 rows to be read instead of just 1.

The solution is to explicitly exclude the ID field from updates:

const r = await db
  .update(user)
  .set(omit(userParam, ['id']))
  .where(eq(user.id, userParam.id))

This properly limits rows read to 1, regardless of how many related records exist.

Avoiding Full Table Scans for Count Queries

I noticed an SQL statement ranking high in rows read on the D1 dashboard:

SELECT count(id) as num_rows FROM "User";

This led to a significant increase in rows read, so I implemented cursor-based pagination instead of offset-based pagination, and never provided total counts, since counting records scans all rows even with indexed IDs. This is a known issue with D1.

Avoiding Multi-table Left Joins

I discovered that a specific SQL query was causing hundreds of thousands of row reads:

SELECT "modlist"."id",
       "modlist"."updatedat",
       "modlistsubscription"."action",
       Json_group_array(DISTINCT "modlistuser"."twitteruserid"),
       Json_group_array(DISTINCT "modlistrule"."rule")
FROM   "modlist"
       LEFT JOIN "modlistsubscription"
              ON "modlist"."id" = "modlistsubscription"."modlistid"
       LEFT JOIN "modlistuser"
              ON "modlist"."id" = "modlistuser"."modlistid"
       LEFT JOIN "modlistrule"
              ON "modlist"."id" = "modlistrule"."modlistid"
WHERE  "modlist"."id" IN ( ?, ? )
GROUP  BY "modlist"."id",
          "modlistsubscription"."action";

This query joined four tables, potentially causing a "Cartesian product explosion." If both modListUser and modListRule tables contain 100 records each, a simple join could yield 10,000 results, which is not the expected behavior.

The solution was to split queries and handle grouping and transformation in the application logic:

await db.batch([
  db
    .select({
      modListId: modListUser.modListId,
      twitterUserId: modListUser.twitterUserId,
    })
    .from(modListUser)
    .where(eq(modListUser.modListId, 'modlist-1')),
  db
    .select({
      modListId: modListRule.modListId,
      rule: modListRule.rule,
    })
    .from(modListRule)
    .where(eq(modListRule.modListId, 'modlist-1')),
]) // 200 rows read instead of 10,101

Optimizing Multi-Record Inserts

For bulk inserts, instead of using:

await Promise.all(users.map((it) => db.insert(user).values(it)) as any)

Or even:

await db.batch(users.map((it) => db.insert(user).values(it)) as any)

I found it more efficient to insert multiple records in a single statement:

await db.insert(user).values(users)

However, SQLite and D1 limit the number of bound parameters to 100 per query. With 10 columns, we can insert at most 10 rows per SQL statement. For larger batches, we need to chunk the data:

await db.batch(
  safeChunkInsertValues(user, users).map((it) =>
    db.insert(user).values(it),
  ) as any,
)

Testing this approach with 5,000 records showed a performance improvement from 78ms to 14ms, making this optimization worthwhile.

Server-side issues differ significantly from client-side problems. While client-side errors only affect users, server-side errors can directly impact monthly bills and may take time to manifest. Therefore, caution and thorough unit testing are essential.

When addressing database query issues, I found it helpful to follow this process: discover, investigate, attempt to solve, monitor, try again if needed, continue monitoring, and then finalize. The first solution attempt may not succeed—it might even make things worse—but continuous monitoring is crucial for timely detection and resolution of issues.