赞美愚蠢的问题
In Praise of Stupid Questions

原始链接: https://mathenchant.wordpress.com/2026/03/12/in-praise-of-stupid-questions/

这篇论文探讨了作者倾向于提出“愚蠢”问题的习惯——这些问题措辞不当、基于错误的假设或根本没有建设性——以及其中一个问题如何意外地带来新的数学见解。尽管作者为了鼓励开放式探究,会告诉学生们没有愚蠢的问题,但他承认有些问题比其他问题更好,回忆起过去过度思考和用考虑不周的提问打扰他人的例子。 在物理治疗期间因无聊,作者使用ChatGPT探索概率,特别是抛硬币时正面次数超过反面次数的期望时间。一个有缺陷的初始问题促使ChatGPT提供了澄清的解释,最终引向了一条富有成效的探究路线。这导致发现了一种通过抛硬币估算π(π/4)的新方法,经证实此前未发表。 虽然这种方法效率不高,但它强调了追求即使是看似荒谬的问题的价值。作者强调,在像ChatGPT这样的工具的帮助下,拥抱潜在的愚蠢可以开启新的学习和发现途径,呼应了爱比克泰德的建议,即在追求进步的过程中,甘愿显得愚蠢。最终,这篇论文赞扬了好奇心以及“挖掘”坏问题以发现更好问题的力量。

这个Hacker News讨论强调了人工智能,特别是ChatGPT和Claude,在回答“愚蠢问题”方面的价值——那些人们犹豫是否公开提出的基础性问题。用户ibobev赞扬了人工智能的公正性和无限耐心,与Reddit的Ask Philosophy和Ask Historians等平台日益严格的审核形成对比,在这些平台上,此类问题经常被删除。 另一位用户分享了一个复杂的概率难题(“抛硬币直到正面次数超过反面次数”),并展示了Claude准确且出奇快速的解决方案(停止时间为偶数的概率为零)。这证明了人工智能不仅可以提供简单的答案,还可以解决细微复杂的问题。作者现在经常使用人工智能,甚至将其用于自动化问答摘要,以用于学习目的。最终,这篇文章赞扬人工智能是探索性学习和无拘无束提问的重大改进。
相关文章

原文

Ask a silly question, get a silly answer. — Tom Lehrer, “New Math”

I ask too many questions. A case in point is the time I lost out on a place I wanted to rent when I asked my potential future landlord one question too many (“Does the pond have mosquitos in the summer?”). Another example is the time I ticked off a car salesman by asking him, after a long string of similar requests that he had gamely complied with, whether I could try changing the tires of a car I was considering buying from his dealership before I bought it. (I mean, shouldn’t every responsible consumer go through the entire owner’s manual when contemplating such a major purchase?) As a member of a local singing group, I developed such a reputation for asking the music director questions that when we went on a tour, one of my fellow singers got a laugh by interrupting the tour-guide with my signature line: “I have a question!”

But my topic today isn’t questions in general. I want to focus on the species of question that I often tell students doesn’t exist: the Stupid Question. And I want to talk about how one stupid question led me to an interesting and new (albeit a bit stupid) way to estimate the mathematical constant pi.

I tell my students “There are no stupid questions” because I want them to feel free to ask me things in the classroom without fear of ridicule from their peers, and without the kind of internalized shame that can disconnect students from their math abilities. But I’m lying, or at least over-simplifying, when I tell my students that stupid questions don’t exist; the students (and you) all know that some questions are better than others. Some questions are based on incorrect assumptions, or are ambiguous, or are even meaningless. Back when I was in high school, one of my teachers complained I asked too many meaningless questions. You may be inclined to give my younger self the benefit of the doubt and to guess that the teacher wasn’t equipped to understand my questions, but I don’t think so; the class in question was part of an advanced six-week summer program called the Hampshire College Summer Studies in Mathematics program, and the teacher in question had a Ph.D. I don’t remember what questions I asked in that class, but I’m sure some of them were obscure, confused, or yes, meaningless.

I still sometimes ask questions that don’t make literal sense. And that’s okay, because I find that in my research, murky questions can be stepping stones on the path of learning. Sometimes it’s even where new math comes from. There’s an old saying “Ask a silly question, get a silly answer,” but I say: Scratch a silly question and you might find a better one struggling to get out.

One issue for me is how much scratching I have to do before sharing a question with others, and sometimes I annoy people by not doing enough scratching in advance. Often it’s because I don’t take enough time to think about my audience, and that’s my bad, but sometimes it’s the classic problem in communicating ideas: you don’t quite know how to share an idea with people-who-aren’t-you because you aren’t a person-who-isn’t-you.

Fortunately these days I have a very patient interlocutor named ChatGPT blessed with an infinite tolerance for half-baked questions and a soothing lack of judgmentality. The Greek philosopher Epictetus said “If you want to improve, be content to be thought foolish and stupid,” but the problem with putting this into action has always been that, while most people want to improve, nobody wants to reveal their ignorance. How lucky we are, twenty centuries after Epictetus, that we can hide our ignorance from our fellow humans and reveal it only to our creations!

Over the past few years, more and more of my research has benefited from conversations with ChatGPT, despite the occasional stupidity of my questions, and there’s no better example of this than my recent discovery of new way to think about the number π/4.

PT, CHATGPT, AND PROBABILITY

One day last November I was at my local gym doing physical therapy, an activity I find tiresome because my regimen requires just enough of my mind to make it impossible for me to do any prolonged thinking. But a boring exercise routine is well-suited to holding hour-long conversations with an interlocutor that doesn’t usually respond right away. So for instance I can do a set of leg-lifts, do some mathematical day-dreaming during my pause between sets, do another set, dictate some mathematical thoughts to ChatGPT, do another set, and then see what ChatGPT came up with. “PT plus AI” takes more time than plain old PT, but it makes the time pass more in a more interesting way, with regular infusions of suspense.

I like random processes, so I thought I’d learn something new in that area by picking a topic in the theory of probability, coming up with the simplest question on that topic that I didn’t know the answer to, and then asking it. The topic I chose was the tension between two facts well-known to probabilists: (1) if you repeatedly toss a coin, you can be certain that after some finite amount of time, the total number of tosses that came up heads will exceed the total number tosses that came up tails, but (2) the amount of time it takes for this to happen, while always finite, is infinite on average.1

If that sounds like nonsense, it’s because you’re used to the world of random variables with thin tails and finite expected value, and unfamiliar with the strange world of random variables with fat tails and infinite expected value.2 Maybe someday I’ll write a Mathematical Enchantments essay about the paradoxes of fat-tailed random variables, but today I’m writing about questions, and “What is the expected amount of time it takes until the number of heads exceeds the number of tails?” wasn’t the question that I asked on that November day, because I already knew the answer to that one: infinity. I wanted to learn something new about this story, so I asked ChatGPT:

Toss a fair coin until the number of heads exceeds the number of tails. This determines a stopping time. What is the probability that this stopping time is even?

Speaking of stopping times, this is a good time for you to stop reading and do something I should’ve done but failed to do: play around with the question on your own for a minute or so to get a feeling for what’s being asked.

Do you see what’s wrong with my question?

.

.

.

It’s not hard to show that the number of tosses required until the number of heads first exceeds the number of tails is always odd, so the probability of the stopping time being even is zero! You might say my question is an impossibility question masquerading as a probability question.3

To see why it’s always an odd number, let Hn and Tn represent the number of heads and the number of tails respectively in the first n tosses, so that Hn + Tn = n. The rule “Stop when Hn > Tn for the first time” is equivalent to the rule “Stop when HnTn is positive for the first time”, but HnTn is always a whole number, and it always changes by ±1 when you toss (+1 each time you toss heads,−1 each time you toss tails), so an equivalent rule is “Stop when HnTn = 1 for the first time.” Adding the equations Hn + Tn = n and HnTn = 1 gives 2Hn = n+1. Since 2Hn is even, n+1 must be even, so n must be odd when Hn > Tn for the first time.

Even without doing the algebra, I could’ve caught my mistake if I’d just done a few examples. Then I would’ve seen that the stopping time can be 1, 3, 5, 7, etc. but never 2, 4, 6, 8, etc.

A human interlocutor pointing out my oversight, if they were in a bad mood, might have said “Why are you bothering me with questions you clearly don’t really care about, because if you did care, you would’ve spent TEN LOUSY SECONDS thinking about the meaning of what you’re asking? Is my time really worth less than yours? You may enjoy asking questions for its own sake, but I like thinking about GOOD questions, so: That’s it. I’m not answering any more of your questions. You and I are DONE.”

Luckily for me, ChatGPT is always eager—perhaps too eager—to keep conversations going. In this case it said:

There’s an ambiguity in the statement that we should resolve before computing anything:

and offered two interpretations it called A and B, asking me to pick one. Nary a word of reproach for asking a question that had two or three different interpretations and was trivial under all of them.

NOT STOPPING

How do you follow up on a blunder like that? I could’ve decided to stop conversing with ChatGPT and listen to a podcast instead, or maybe go into a different line of work, but I thought that if I asked a better question, I’d learn something new and non-trivial.

I suppose I could’ve asked “What is the probability that the stopping time is a multiple of 3?” and I would’ve learned something new to me; according to ChatGPT (when I asked it last week, not last November), the answer is exactly (1+√3−√2)/6. That’s kind of fun, and it leads someplace interesting, but it isn’t what I asked back in late 2025.

What I actually did was confirm that I want the coin-tossing process to stop when the number of heads first strictly exceeds the number of tails, and then I asked: if we stop tossing as soon as the proportion of heads becomes bigger than 1/2, what is the expected value of that proportion?

ChatGPT thought for just under 3 minutes and answered:

Nice follow-up. 🙂 Under interpretation B (stop the first time Hn >Tn), the expected proportion of heads at the stopping time turns out to be a very pretty constant: E[Hτ/τ]=π/4. Here’s a brisk derivation.

If you’d like, we can also look at higher moments (e.g. the variance of that proportion) or generalize to a biased coin.

Three features of ChatGPT’s response deserve attention. One is its use of friendliness and flattery; another is its tendency to anticipate a human’s next question and answer it; and a third is its habit of suggesting ways to keep the conversation going. Since November I’ve admonished ChatGPT to keep its responses businesslike and not to go down avenues I haven’t asked it to. I recognize the addictive potential of parasocial agents like ChatGPT and I’m determined to avoid that hazard.

But the main thing that struck me about its response on that day was that the answer π/4 is, just as ChatGPT said, very pretty—indeed, suspiciously pretty. (Compare π/4 with the (1 + √3− √2)/6 I mentioned above.) I wondered: if the answer is so pretty, wouldn’t I have already heard about it?

I asked ChatGPT to check the published literature on random walk theory4 (a branch of math that features many problems like this), and it said that, while related formulas existed in books and journal articles, nobody had actually asked my question before, as far as it could tell.

I went ahead and read ChatGPT’s derivation of the answer π/4, and I couldn’t find any mistakes; it was a solid by-the-book argument that employed a method I’ve used myself, and have even taught to students in the past. It was the kind of thing that I could’ve done in an afternoon, but not in three minutes.

So I dug deeper and started showing the result to people, sometimes revealing the formula and sometimes asking “What’s the expected value?”, and while many people solved it or had ChatGPT solve it for them, nobody could recall having seen it before.

That’s when I realized I had something worth sharing, even if it was just a morsel and not a mathematical meal, and I wrote it up for publication and sent it to the American Mathematical Monthly.5

YOU CAN’T SPELL “STUPID” WITHOUT “PI”

Approximating pi to a few decimal places is a pointless thing to do, since we humans already know pi to gazillions of digits, and since only the first dozen or so are meaningful in the real world. But I say: if you’re going to do something pointless, you might as well do it in a fun way.

The usual way to waste one’s time approximating pi is called the Buffon needle experiment, invented by the 18th century French scientist Georges-Louis Leclerc, Comte de Buffon, who founded the branch of mathematics called geometric probability theory. Leclerc showed that if you drop a needle of length L on a floor that’s divided into slats of width L, then the probability that the needle lies across a line separating two strips is 2/π. Later the Swiss astronomer Rudolph Wolf did an empirical test of Leclerc’s theorem by performing 5000 trials, 3175 of which resulted in the needle crossing a line, yielding the decent estimate π≈3.1596.

The occurrence of pi in Leclerc’s formula is not mysterious; it has its roots in the fact that the underlying probability distribution on the orientation of the needle must be rotationally symmetric, and once you start rotating things, pi has a natural tendency to pop up. The occurrence of pi in my coin-tossing formula has more obscure roots, and if you’re hoping I’ll provide an intuitive explanation, you’re out of luck. The baffling way pi creeps into statistics is the basis of an anecdote that appears at the start of Eugene Wigner’s famous essay The Unreasonable Effectiveness of Mathematics in the Natural Sciences:

There is a story about two friends, who were classmates in high school, talking about their jobs. One of them became a statistician and was working on population trends. He showed a reprint to his former classmate. The reprint started, as usual, with the Gaussian distribution and the statistician explained to his former classmate the meaning of the symbols for the actual population, for the average population, and so on. His classmate was a bit incredulous and was not quite sure whether the statistician was pulling his leg. “How can you know that?” was his query. “And what is this symbol here?” “Oh,” said the statistician, “this is pi.” “What is that?” “The ratio of the circumference of the circle to its diameter.” “Well, now you are pushing your joke too far,” said the classmate, “surely the population has nothing to do with the circumference of the circle.”

But even if the reason pi occurs is hard to explain, occur it does, and that means one could use the coin-toss procedure to estimate π by way of π/4, just as Leclerc’s experiment estimates π by way of 2/π. Hand out ten coins to ten students, have each student toss their coin until the number of heads they’ve seen is bigger than the number of tails, and have them record the fraction of their tosses that showed heads. If you average all the students’ fractions, you should get a rough approximation to π/4, right?

Mathematically, yes; practically, not really. The problem is that there’s a fifty percent chance that one of the ten students will have to do more than a hundred tosses and might give up in frustration. Increasing the number of students makes this problem only worse: if you tried this activity with a hundred students, say, there’s a good chance that one of them would have to do over ten thousand tosses. And who’d be willing to toss a coin that many times?

Mathematician and YouTuber Matt Parker would, and he did. And then he wondered what to do with all those coin-flips.

One of Parker’s passions is computing pi, as he discussed in his recent Gathering 4 Gardner talk “Update on Ridiculous Calculations of Pi” (I’ll post a link to the video of his talk when it becomes available). So when he heard about my new way of estimating pi, he had the idea of using his 10,000 coin flips to simulate a classroom in which the first student does the experiment using the first segment of Parker’s sequence, the second student uses the coin flips that come right after the flips the first student used, the third student uses the coin flips that come right after the flips the second student used, and so forth, until some unlucky student runs out of flips from Parker’s sequence. As things turned out, this unlucky student was the 63rd in the imaginary class, so Parker’s experimental estimate of pi/4 via coin-tossing ended up averaging just 62 fractions.

Here’s Matt Parker’s new video in which he tells his story:

The resulting estimate of pi—around 3.2— gives us pi to only one decimal place, and hence may set a record for minimal bang per maximal buck, where “bang” means precision and “buck” means effort. But this pathetic performance is pretty much what the theory of probability predicts, namely, that if you want the first N digits of pi you’ll need to perform 104N coin-tosses. Parker’s experiment shows us this performance-level in the case N=1.

So, my method of estimating pi is a really bad way to get anything better than π ≈ 3. Leclerc’s needles are a bit better—10,000 needles should give you two digits of pi—but leaving aside the accuracy issue, I’d rather spend ten minutes tossing a coin than spend ten minutes dropping needles, especially since someone is going to have to pick up all those needles, and I guess it’s going to have to be me. And I might cut myself on one of them! Perhaps I should ballyhoo my approach to estimating pi as a contribution to the cause of Pi Day safety, and point out that my approach is free from of well-known hazards of shared needles.

THE MORAL(S)

I suppose that, on a practical level, a take-home for the practicing mathematician is that if you use ChatGPT, don’t trust it to generate valid proofs, and even when it finds a valid proof, don’t be so sure it’s a good proof. And whatever you do, don’t have ChatGPT create a bibliography for you.

But I think a deeper lesson is about the value of stupid questions. The way to find things out is to ask a lot of questions. Ask enough questions, and you’re likely to find a new answer: new to you, and once in a while, new to others. On the other hand, if you ask a whole lot of questions, some of them will be stupid. And that’s okay! We teachers need to be patient with our students, even if no teacher can ever hope to be as unfailingly patient as a Large Language Model.

The relationship between bad questions and good questions reminds me of an old story about brainstorming told by David Black in his essay Being Creative With a Bear and Honey. A team was trying to figure out a good way to get snow off power lines in winter, and someone facetiously suggested that they put put pots of honey at the tops of the posts so that the local bears, climbing up the posts to get at the honey, would shake the posts and cause the power lines between them to shed their snow. Instead of abandoning the absurd idea, the brainstorming team discussed how you’d need to use helicopters when placing the honey pots atop the posts, and only later did someone point out that the downwash from the helicopter blades would do the job of getting the snow off the power lines—the bears could stay home. This is one of my favorite examples of how a bad idea can lead to a good idea, as long as you don’t stop.

Likewise, if you’ve got some sort of vague itch that causes you to ask a stupid question, don’t neglect the itch just because its initial expression was stupid. Follow that itch, and scratch that question! You may end up with a much better question.5

To join the Hacker News discussion of this article, visit
https://news.ycombinator.com/item?id=47356740

ENDNOTES

#1: It’s instructive to compare a protocol that has infinite expected stopping time with a protocol that has finite expected stopping time. An example of a protocol that has finite expected stopping time is “Toss until the first time the coin comes up heads”; you can write the expected stopping time as

(1/2) (1 toss) + (1/4) (2 tosses) + (1/8) (3 tosses) + (1/16) (4 tosses) + …,

because half of the time you stop after one toss, a quarter of the time you stop after two tosses, an eighth of the time you stop after three tosses, and so on; and

(1/2)(1) + (1/4)(2) + (1/8)(3) + (1/16)(3)+ … = 2,

so on average the stopping time is 2. In contrast, we’ve been dealing with the protocol “Toss until the number of heads exceeds the number of tails”; in this case you can write the expected stopping time as

(1/2) (1 toss) + (1/8) (3 tosses) + (2/32) (5 tosses) + (5/128) (7 tosses) + …

Although the terms are getting smaller, they don’t get small very quickly, and the infinite sum diverges.

#2: This is the world of the St. Petersburg paradox (described in Jordan Ellenberg’s book How Not to Be Wrong) and the peculiar properties of the double-down-until-you-win gambling strategy (also called martingale betting).

#3: What I actually asked ChatGPT (though without the added emphasis) was: “Toss a fair coin until the number of heads equals or exceeds the number of tails. This determines a stopping time. What is the probability that this stopping time is even?” The insertion of “equals or” doesn’t impact the triviality of the question, but it changes the answer: now, instead of being always odd, the stopping time is always even! What’s more, the question is ambiguous: am I allowed to stop before I toss the coin at all, since at that moment the number of heads (zero) equals-or-exceeds the number of tails (also zero)? ChatGPT pointed out the ambiguity, and also pointed out that under both interpretations, the probability that I stop after an even number of tosses is 100%, aka 1.

#4. To see the connection between coin-tossing and random walk, imagine a drunkard walking along an east-west residential street who, whenever he arrives in front of a house, either proceeds to the next house to the east or the next house to the west, apparently choosing at random. The mathematics of the drunkard’s walk is identical to the mathematics of tossing coins, where the position of the drunkard at time n (assuming he starts at “house 0” at time 0) corresponds to the difference HnTn. Every time you toss a coin, the cumulative number of heads minus the cumulative number of tail either goes up by 1 (when the coin comes up heads) or goes down by 1 (when the coin comes up tails), and there’s no way to predict which way it will go—just as there’s no way to predict, when the drunkard is at a particular house, whether his next stop will be the house to its east or the house to its west.

#5: Here’s the part of the story I’m embarrassed about: instead of trying to find my own derivation, I went ahead and used ChatGPT’s derivation, lightly edited by me. This kept me from noticing that ChatGPT’s proof, while correct, was needlessly complicated. Fortunately other people found the proof that I suspect is the sweetest possible proof, or what Paul Erdős would have called the “proof from The Book” (for more about The Book, see my essays What Proof is Best? and Chess with the Devil); this short and sweet proof is the one that I give in the revised version of my write-up. I also trusted, and initially included, ChatGPT’s list of purportedly relevant references, most of which turned out to be either irrelevant or nonexistent. I won’t make that mistake again.

#6: An example of a truly excellent question—not mine, I hasten to say—is the question one of the referees for my submission to the Monthly proposed: “Why not also a short comment at least on the effect of a surplus of 2, for instance?” That is, what if we toss the coin even longer, and only stop when the number of heads is equal to 2 more than the number of tails? It turns out that for this modified version of my question, the expected proportion of heads at the stopping time is a different nice number: the natural logarithm of 2, aka ln 2! Better yet, if you stop tossing coins when the number of heads is equal to m more than the number of tails, for some arbitrary positive integer m, then it appears that whenever m is odd, the expected proportion of heads at the stopping time is of the form a + b π with a,b rational, and that whenever m is even, the expected proportion of heads at the stopping time is of the form a + b ln 2 with a,b rational. Or so ChatGPT tells me. (To be fair, it gives proofs; I just haven’t had time to read them.) ChatGPT also says that, if instead of looking at the ratio of heads-to-tosses, we look the ratio of tails-to-heads, we get expected value 1 – ln 2. If instead we look at the ratio of heads-to-tails, we encounter the troublesome ratio 1/0 in the case where our first toss is heads, but: if we condition on the event that the first toss is tails, then the conditional expected value of the ratio of heads to tails is ln 2. Saith ChatGPT.

联系我们 contact @ memedata.com