模拟折纸难题的难点
Modeling what makes paper-folding puzzles hard

原始链接: https://www.dailyunfold.com/blog/spatial-difficulty

## 每日展开:打造稳定的谜题难度 Mert Aslan 开发了“每日展开”,一款每日纸折叠益智游戏,灵感来源于认知评估中用于空间可视化能力的标准化 VZ-2 “纸折叠测试”。虽然模拟折叠很简单,但创造稳定的难度曲线却具有挑战性。最初基于网格大小和折叠次数的尝试不足以达到目的;有些谜题感觉出乎意料地简单或困难。 游戏根据日期生成确定性的谜题,确保每个人都收到相同的挑战——简单(4x4 网格,1 次折叠)、中等(6x6,2 次折叠)和困难(6x6,3 次折叠,2 个孔)。难度现在使用一个手动调整的函数进行评分,该函数对六个因素进行加权:非中心折叠(破坏对称性)、孔的分布、混合折叠轴(需要 2D 推理)、孔的数量、折叠次数和网格大小。 破坏心理捷径的因素权重更高。一个验证步骤会在难度顺序不正确时重新生成谜题,保证困难谜题确实更难。整个系统在客户端运行,使用由日期播种的单个 PRNG,无需服务器。虽然该模型并非完美,并且没有考虑到玩家经验,但它为大多数玩家提供了平滑的难度曲线。

黑客新闻 新的 | 过去的 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 模拟折纸难题的难点 (dailyunfold.com) 7 分,作者 astralasia 1小时前 | 隐藏 | 过去的 | 收藏 | 2 条评论 帮助 astralasia 1小时前 [–] 作者在此,乐于解答任何问题。得分权重是通过测试调整的,而不是从数据中学习的。希望能听到有心理测量学或空间认知经验的人的意见。回复 smitty1e 9分钟前 | 父评论 [–] 到第三轮时,提交按钮干扰了我S24手机上的谜题。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Engineering

I built a daily paper-folding puzzle game and needed a way to generate puzzles with a consistent difficulty curve. Simulating the folds was straightforward. Figuring out why some puzzles feel impossible while others click instantly was not.

Mert Aslan·March 2026

You've probably done this as a kid. Take a piece of paper, fold it a couple of times, punch a hole through the layers, then unfold it and see where all the holes ended up. It's fun. It's also, as it turns out, a standardized cognitive test.

The paper folding test shows up in real psychometric testing. The most direct version is Ekstrom et al.'s VZ-2 “Paper Folding Test” from their 1976 Kit of Factor-Referenced Cognitive Tests. It measures spatial visualization, your ability to mentally manipulate 2D objects through sequences of transformations. When I started building, I assumed difficulty would scale with grid size and fold count. After a lot of playtesting, that turned out to be only part of the story.

I turned this into Daily Unfold, a daily puzzle game. Three difficulties each day: easy is a 4x4 grid with one fold, medium is 6x6 with two folds, hard is 6x6 with three folds and two punch holes. The game generates puzzles deterministically from the date, so everyone in the world gets the same three puzzles. But making that difficulty curve feel right? That was the real problem.

Simulating the folds

The core engine runs a forward simulation. For every cell in the original unfolded grid, it traces that cell through each fold one by one. If a cell is on the folding side, its coordinate gets mirrored across the fold line:

// For a horizontal fold at position p:
if (row <= p) {
  row = 2 * p + 1 - row;  // mirror
}
row = row - (p + 1);        // shift to new origin

The smaller side always folds onto the larger side. After mirroring, coordinates get shifted so the remaining visible portion starts at zero. Each fold shrinks the effective grid, and the next fold operates on whatever is left.

Once all folds are applied, the engine checks if the cell's final position matches any punch location. If so, that cell has a hole in the unfolded paper. With three folds, a single punch can go through up to 2³ = 8 layers, producing up to 8 holes.

Fold at center

Punch through layers

Unfolded: 2 holes

A center fold creates a clean mirror. The punch at (2,1) maps to (1,1) on the other side.

How I model difficulty

My first approach was obvious: bigger grid, more folds, harder puzzle. That produced a lot of puzzles that felt wrong. A 6x6 with two center folds could feel trivial, while a 4x4 with one weird fold stumped people. I needed something better.

After reading some spatial reasoning research and doing a lot of playtesting, I landed on a scoring function that weights six factors. The weights are hand-tuned, not learned from data. Here they are, ordered by weight:

1Off-center foldsweight: 4x

When a fold runs through the center of the grid, the result is perfectly symmetric. You can just see the mirror and you're done. In playtesting, center-fold puzzles got solved fast and with few errors.

Shift the fold line off-center and that shortcut disappears. Now the two sides are different sizes. One side wraps around the other unevenly. Some cells end up stacking, others don't. You can't just “mirror it” anymore. You have to trace each cell through the fold individually. Humans are very good at symmetry detection, and off-center folds take that tool away.

Center fold: holes mirror evenlyEasy to reason about
Off-center fold, asymmetricBreaks the symmetry shortcut

2Hole spreadweight: 4x

When all the holes cluster in one corner, you can kind of focus your attention there and ignore the rest. When they scatter across every row and every column, you can't chunk the problem anymore. Every single cell in the grid needs its own evaluation.

The engine measures this as the fraction of rows and columns that contain at least one hole. A spread of 1.0 means holes touch every row and every column. High spread forces you to think about the entire grid instead of just a region.

3Mixed axesweight: 3x

If every fold is horizontal, you're basically doing 1D reasoning. Rows mirror up and down, columns stay fixed. Easy enough. Throw in a vertical fold and suddenly you're tracking transformations in two directions at once. Working memory fills up fast.

For hard puzzles, the engine actually forces mixed axes during generation. If all the randomly generated folds happen to land on the same axis, it flips one of them so you're guaranteed to need 2D spatial reasoning.

4Punch countweight: 2.5x

Each punch creates its own independent set of holes. Two punches means two separate tracing tasks through the same fold sequence. Easy and medium puzzles get one punch. Hard gets two, which roughly doubles the work without adding any new mechanical complexity.

5Fold countweight: 2x

More folds means more transformations to simulate in your head. In playtesting, adding a second center fold felt less disruptive than making one fold off-center. That said, at three folds the chaining alone gets taxing regardless of position.

6Hole count & grid sizeweight: 1x / 2x

More holes means more to find, and a 6x6 grid gets a flat 2-point bonus over 4x4 for the larger search space. These are the obvious difficulty knobs, but in practice a puzzle with many holes from a center fold still felt easier than one with few holes from an off-center fold.

The general pattern: factors that disrupt mental shortcuts (off-center folds, scattered holes) got higher weights than factors that just add more stuff to process (hole count, grid size). The scoring function cares more about how a puzzle is structured than how much stuff is in it.

The scoring formula

score = holes      x 1.0
      + folds      x 2.0
      + offCenter  x 4.0
      + mixedAxes  x 3.0
      + punches    x 2.5
      + spread     x 4.0
      + gridBonus  x 2.0

Easy puzzles typically score 8-12. Medium lands around 13-22. Hard puzzles score 22+.

Generating puzzles that feel right

Every puzzle is generated deterministically from the date. Today's hard puzzle comes from hashing the string UNFOLD_HARD_2026-03-25. Same date, same seed, same puzzle, no matter what device you're on.

But random generation doesn't always produce good puzzles. Sometimes a “hard” puzzle rolls lucky folds and ends up feeling easier than the medium. Sometimes the hole count is boring. So there's a validation step after generation.

After generating all three puzzles for the day, the engine checks that the cognitive scores are properly ordered: hard > medium > easy. If something is out of order, it regenerates the offending puzzle with an offset seed: UNFOLD_HARD_2026-03-25_v1, then _v2, up to 10 retries per difficulty.

The rerolling itself is deterministic. The version suffix is part of the seed, so the search for a good puzzle follows the exact same path every time. No server, no database. Just a hash function and a PRNG.

// The full generation pipeline
const seed = hash("UNFOLD_HARD_2026-03-25");
const rng  = new SeededRandom(seed);

let puzzle = generate(rng, "hard");
let score  = cognitiveScore(puzzle);

// Reroll if the difficulty curve is wrong
for (let v = 1; score < mediumScore && v <= 10; v++) {
  const offsetSeed = hash("UNFOLD_HARD_2026-03-25_v" + v);
  puzzle = generate(new SeededRandom(offsetSeed), "hard");
  score  = cognitiveScore(puzzle);
}

Hard puzzles also have structural constraints baked in. The engine guarantees at least one off-center fold (to break symmetry) and folds on both axes (to require 2D reasoning). If the random generation doesn't satisfy these, it mutates the folds, flipping an axis or shifting a position, then revalidates.

LevelGridFoldsPunchesOff-centerMixed axesScore range
Easy4x411non/a≤ 13
Medium6x621possiblepossible13 - 22
Hard6x632forcedforced≥ 22

Where this falls apart

The weights are hand-tuned. There's no regression, no training data, no formal validation. The spatial cognition literature gave me a starting direction (asymmetry and mixed-axis transformations seem disproportionately hard), and playtesting gave me the specific numbers.

There are definitely puzzles the model scores as “medium” that players find brutal. This usually happens when the fold sequence produces a hole pattern that doesn't match any simple mental template. Template-matching (when you recognize a pattern and skip the reasoning entirely) is a cognitive strategy the model doesn't capture at all.

It also ignores practice effects. Someone who has played 50 days builds fold-specific intuitions that a new player doesn't have. What feels hard on day 1 might be routine by day 30. You could build an adaptive system that adjusts to individual skill, but that breaks the “everyone gets the same puzzle” property that makes daily games fun to share.

With enough data (thousands of solves with timing), you could train a real model. Maybe a regression on solve time, or something that predicts error patterns per fold configuration. For now the hand-tuned heuristic gets the job done. The difficulty curve feels smooth for most players, and that's what matters.

One PRNG, no server

The whole thing runs client-side. No server generates puzzles, no database stores them. A date string gets hashed into a 32-bit integer, which seeds a mulberry32 PRNG. It's tiny (8 lines of JavaScript), fast, and has good distribution. Every random decision during generation (axis choice, fold position, punch location) comes from this one sequence.

The result is a fully deterministic system. Same date, same puzzles, every device, every browser, no network request. The rerolling mechanism works because appending _v1 to the seed string doesn't retry randomly. It follows a specific, reproducible alternative path through the generation space.

Further reading

If spatial cognition interests you:

  • Ekstrom et al. (1976) Kit of Factor-Referenced Cognitive Tests. The VZ-2 “Paper Folding Test” is essentially this game. It's the direct ancestor and the reason I knew the task was a real measure of spatial ability.
  • Shepard & Metzler (1971) Mental Rotation of Three-Dimensional Objects. Not about paper folding specifically, but the foundational work on mental spatial transformations. Their finding that rotation time scales linearly with angle is what got me thinking about fold count as a linear cost factor.

★★★

See if the difficulty curve matches your experience

Three new puzzles every day. Easy, medium, hard.

Play Daily Unfold
联系我们 contact @ memedata.com