（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40961163

本文讨论了物理建模现实的复杂性，特别关注理论模型中使用的参数问题。研究人员的目标是使用最小的参数来创建准确的表示，而不是调整参数直到实验结果一致。一个常见的批评是，当模型包含大量参数时，所得表示的有效性变得不确定。暗物质理论就是一个例子，它使用四个未检测到的粒子来解释观测到的星系旋转速率。然而，作者认为，通过对预测结果的测试，特别是通过引力透镜，这样的理论仍然是可证伪的。修正引力理论作为替代解释，提出改变引力方程而不是引入额外的粒子。虽然此类理论因其简单性而颇具吸引力，但其准确预测现象的能力仍存在争议。这篇文章还涉及机器学习 (ML) 和人工智能 (AI)，表明这两个领域之间存在相似之处。机器学习和人工智能都严重依赖优化过程，在最小化参数的同时寻求最佳解决方案。作者建议质疑此类方法的实用性，认为“现实世界适用性”的假设经常被研究人员忽视。对机器学习算法的批评主要集中在它们有效拟合实验数据的能力，而没有提供有关底层机制的见解。这些担忧反映了对暗物质理论的担忧，强调了在得出结论之前建立基于基本物理原理的稳健模型的重要性。最后，作者评论了定义智力的挑战以及对智力施加以人类为中心的限制的危险。他们断言，智力源于集体经验的模式，并援引螺丝的行为等例子来说明这一观点。通过将包括无机物体在内的一切事物视为本质上的智能，这个概念就失去了意义，使我们需要对智能进行精确的定义，以捕捉人类智能和复杂的机器学习应用之间有意义的区别。

I love the ironic side of the article. Perhaps they should add the reason for it, from Fermi's and Neumann's. When you are building a model of reality in Physics, If something doesn’t fit the experiments, you can’t just add a parameter (or more) variate it and fit the data. The model should have zero parameters, ideally, or the least possible, or, even at a more deeper level, the parameters should emerge naturally from some simple assumptions. With 4 parameters you don’t know whether you are really capturing a true aspect of reality of just fitting the data of some experiment.

This was mentioned in the first paragraph of the paper. The paper is mostly humoristic.

That said, the wisdom of the quip has been widely lost in many fields. In many fields data is "modeled" with huge regression models with dozens of parameters or even neural networks with billions of parameters.

> In 1953, Enrico Fermi criticized Dyson’s model by quoting Johnny von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”[1]. This quote is intended to tell Dyson that while his model may appear complex and precise, merely increasing the number of parameters to fit the data does not necessarily imply that the model has real physical significance.

> > In 1953, Enrico Fermi criticized Dyson’s model by quoting Johnny von Neumann: “With four parameters I can fit an elephant, and with five I can make him wiggle his trunk.”

For those who are interested, you can watch Freeman Dyson recount this conversation in his own words in an interview: https://youtu.be/hV41QEKiMlM

That's how I feel about dark matter. Oh this galaxy is slower than this other similar one. The first one must have less dark matter then.

What can't be fit by declaring the amount of dark matter that must be present fits the data? It's unfalsifiable, just because we haven't found it, doesn't mean it doesn't exist. Even worse than string/M-theory which at least has math.

The dark matter theory is falsifiable. Sure we can't see dark matter (it doesn't interact electromagnetically), but we can see its effects, and it has to follow the laws of physics as we understand them today.

It is actually a satisfying theory with regard to the Occam razor. We don't have to change our laws of physics to explain the abnormal rotations of galaxy, we just need "stuff" that we can't see and yet interact gravitationally. When we have stuff like neutrinos, it is not that far fetched. In fact, though unlikely given our current understanding of physics, dark matter could be neutrinos.

If, as it turn out, the invisible stuff we call dark matter doesn't follow the laws of physics as we know them, then the dark matter theory is falsified and we need a new one (or at least some tweaks). And it may actually be the case as a recent paper claims that gravitational lensing doesn't match the predictions of the dark matter theory.

The main competitor to dark matter is modified gravity, which calls for no new stuff, but changes the equations for gravity. For the Occam razor, adding some random term to an equation is not really better than adding some invisible but well characterized stuff, especially when we consider that the equation in question is extremely well tested. It is, of course, also falsifiable.

The problem right now is not that these theories are unfalsifiable, it is that they are already pretty much falsified in their current form (dark matter less than modified gravity), and some rework is needed.

'dark matter' is not a theory, it is the name of an observational problem.

There are many theories to explain dark matter observations. MOND is not a competitor with 'dark matter', because MOND is a theory and it tries to explain some aspects (spiral galaxy rotation) of what is observed as the dark matter problem, which consists of many more observations. There is no competition here. There are other theories to explain dark matter, like dark matter particle theories involving neutrinos or whatever, and these may be called competitors, but dark matter itself is not a theory, but a problem statement.

Yes and no...MOND's core proposition is that dark matter doesn't exist, and instead modified gravity does.

Whereas you can have many proposals for what dark matter is, provided it is capable of being almost entirely only gravitationally interacting, and there's enough of it.

MOND has had the problem that depending which MOND you're talking about, it still doesn't explain all the dark matter (so now you're pulling free parameters on top of free parameters).

If we add an arbitrary amount of dark matter everywhere, to match the observed motions of the celestial bodies, that adds an infinity of parameters, and not even a enumerable one.

This obviously can match almost anything and it has extremely low predictive power (many future observations may differ from predictions, which can be accounted by some dark matter whose distribution was previously unknown), so it is a much worse explanation than a modified theory of gravity that would have only a finite number of additional parameters.

> to match the observed motions of the celestial bodies

The point is that even with current observational data there's no reasonable distribution of dark matter that correctly explains all evidence that we have.

Your intuition that "if I have an infinite number of degrees of freedom anything at all can be fit" is leading you astray here.

the reason this isn't true is that by the hypothesis of dark matter, it follows gravity but not electromagnetism. as such it only fits distributions recoverable from evolving gravity. e.g. if we require a certain distribution today, it fixes the distribution at all other points in time, and we can use light speed delay to look into the past to verify whether the distributions have evolved according to gravity.

All observations of individual galaxies occur at a specific point in time. We can’t use light speed delay to see the evolution of individual galaxies only completely different galaxies at some other point in time. As such each galaxy gets its own value for the amount of dark matter.

At minimum this is a ~200 billion parameter model, and more if you’re looking at smaller structures.

> Sure we can't see dark matter (it doesn't interact electromagnetically), but we can see its effects

Even this is granting too much: "seeing it" and "seeing its effects" are the same thing. No one has ever "directly seen", in the sense that internet DM skepticism demands, anything other than a photon.

"Seeing" is indeed a poorly chosen word.

The problem with dark matter is that there does not exist any second relationship from which to verify its existence, like in the case of normal matter, which takes part in a variety of interactions that lead to measurable effects, which can be compared.

The amount and the location of dark matter is computed from the gravitational forces that explain the observed movements of the bodies, but there are no additional relationships with any other data, which could corroborate the computed distribution of dark matter. That is what some people mean by "seeing".

> The problem with dark matter is that there does not exist any second relationship from which to verify its existence.

This is exactly it! Dark matter is strictly defined by its effects. The only 'theory' part is a belief that it's caused by yet to be found particle that's distributed to fit observations. Take all the gravitational anomalies that we can't explain with ordinary matter, then arbitrarily distribute an imaginary 'particle' that solves them: that's DM.

The problem is that the language used to talk about DM is wrong. It's not that DM doesn't interact with EM, or the presence of DM is causing the galaxies to rotate faster than by observed mass. These are all putting the cart before the horse. What we have is unexplained gravitational effects being attributed to a hypothetical particle. If we discovered a new unexplained gravitational property, we would merely add that to the list of DM's attributes rather than say "oh then it can't be DM".

All major DM candidates also have multiple interactions: that's the WI in WIMP, for instance. In fact I don't know that anyone is seriously proposing that dark matter is just bare mass with no other properties - aside from the practical problems, that would be a pretty radical departure from the last century of particle physics.

No interactions have been found, despite a lot of resources put into the search. So currently all dark matter particle theories apart from "non-interacting" have been falsified. And non-interacting theories are probably unfalsifiable.

Radical departure may well be needed, for other reasons too.

> For the Occam razor, adding some random term to an equation is not really better than adding some invisible but well characterized stuff...

You're being too kind. It's worse. Especially when (in my understanding anyway) that added term doesn't even explain all the things dark matter does.

Adding any finite number of parameters is strictly better than adding an infinity of parameters (i.e. an arbitrary distribution of dark matter chosen to match the observations).

The distribution has to be consistent forward and backwards in time. It's a lot less arbitrary than you're implying, and adding a hundred parameters (or similar finite number) to gravity is not better.

I used to think this, but dark matter does make useful predictions, that are hard to explain otherwise.

This is partially because there are two ways to detect dark-matter. The first is gravitational lensing. The second is the rotatinal speed of galaxies. There are some galaxies that need less Dark Matter to explain their rotational speed. We can then cross check whether those galaxies cause less gravitational lensing.

Besides that, the gravitational lensing of galaxies being stronger than the bright matter in the galaxies can justify is hard to explain without dark matter.

The problem with dark matter is that there's no (working) theory on how the dark matter is distributed. It's really easy to "explain" gravitational effects if you can postulate extra mass ad-hoc to fit the observations.

I dunno if this is the correct way of thinking about it, but I just imagine it as a particle that has mass but does not interact with other particles (except at big-bang like energy levels?). So essentially a galaxy would be full of these particles zipping around never colliding with anything. And over time, some/most of these particles would have stable orbits (as the ones in unstable orbits would have flown off by now) around the galactic core. And to an observer, it would look like a gravitational tractor ahead of the rest of the physical mass of the galaxy (which is slower because it is affected by things like friction and collisions?). And so you'd see galaxies where the arms are spinning faster than they should be?

> I dunno if this is the correct way of thinking about it, but I just imagine it as a particle that has mass but does not interact with other particles (except at big-bang like energy levels?).

Not even anything that extreme. What's ruled out is interaction via electromagnetism (or if you want to get really nit-picky, electromagnetic interaction with a strength above some extremely low threshold).

If there are two different types of observations, and one parameter can explain both, that is pretty strong evidence. Put differently, dark matter is falsifyable, and experiments have tried to falsify it without success.

Besides the idea 'not all mass can be seen optically' is not that surprising. The many theories on what that mass might be are all speculation, but they are treated as such.

It's worth noting that one dark matter explanation is just: it's cold matter we just can't see through telescopes. Or black holes without accretion disks.

Both of these are pretty much ruled out though: you can't plausibly add enough brown dwarfs, and if it's black holes then you should see more lensing events towards nearby stars given how many you'd need.

But they're both concrete predictions which are falsifiable (or boundable such that they can't be the dominant contributors).

> What can't be fit by declaring the amount of dark matter that must be present fits the data?

Tons of things - just like there are tons of things that can't be fit by declaring the amount of electromagnetically-interacting matter that must be present fits the data.

You can fit anything you like by positing new and more complicated laws of physics, but that's not what's going on here. Dark matter is ordinary mass gravitating in an ordinary way: the observed gravitational lensing needs to match up with the rotation curves needs to match up with the velocity distributions of galaxies in clusters; you don't strictly need large scale homogeneity and isotropy but you really really want it, etc. Lambda-CDM doesn't handle everything perfectly (which in itself demonstrates that it's not mindless overfitting) but neither does anything else.

You also have to do other things like not break General Relativity.

Which MOND does: it creates huge problems fitting into GR.

Whereas dark matter as just regular mass that interacts poorly by other means does not.

There are modified gravity theories that are compatible/extensions to GR, e.g the f(R) gravity theories.

Nobody probably believes MOND as such is some fundamental theory, rather as a "theory" it's sort of a stepping stone. Also MOND is used often interchangeably (and confusingly) with modified gravity theories in general.

Well, the funny thing is Copernicus posits just about as many epicycles in his theory as previous geocentric theories. Only Kepler’s discovery of the equal area law and elliptical orbits successfully banishes epicycles.

Hmm..

Hodgin and Huxley did ground-breaking work on squid's giant axon and modelled neural activity. They had multiple parameters extracted from 'curve fitting' of recorded potential and injected currents which were much later mapped to sodium channels. Similarly, another process to potassium channels.

I woudnt worry too much having multiple parameters -- even four when 3 can't just explain the model.

Neuron anatomy is the product of hundreds of millions of years of brute contingency. There are reasons why it can't be certain ways (organisms that were that way [would have] died or failed to reproduce) but no reason whatsoever why it had to be exactly this way. It didn't, there are plenty of other ways that nerves could have worked, this is just the way they actually do.

The physics equivalent is something like eternal inflation as an explanation for apparent fine-tuning - except that even if it's correct it's still absolutely nowhere near as complex or as contingent as biology.

This is why I think that modeling elementary physics is nothing else than fitting data. We might end up with something that we perceive as "simple", or not. But in any case all the fitting has been hidden in the process of ruling out models. It's just that a lot of the fitting process is (implicitly) being done by theorists; we come up with new models and that are then being falsified.

For example, how many parameters does the Standard Model have? It's not clear what you count as a parameter. Do you count the group structure, the other mathematical structure that has been "fitted" through decades of comparisons with experiments?

It tends to be a parameter that can be derived from rrasoning and assumptions. This contrasts to free parameters where you say "and we have no idea what this value should be, so we'll measure it"

This is humorous (and well-written), but I think its more than that.

I'm always making the joke (observation) that ML (AI) is just curve-fitting. Whether "just curve-fitting" is enough to produce something "intelligent" is, IMO, currently unanswered, largely due to differing viewpoints on the meaning of "intelligent".

In this case they're demonstrating some very clean, easy-to-understand curve-fitting, but it's really the same process -- come up with a target, optimize over a loss function, and hope that it generalizes, (this one, obviously, does not. But the elephant is cute.)

This raises the question Neumann was asking -- why have so many parameters? Ironically (or maybe just interestingly), we've done a lot with a ton of parameters recently, answering it with "well, with a lot of parameters you can do cool things".

> Whether "just curve fitting" is enough to produce something "intelligent" is, IMO, currently unanswered

Continual "curve fitting" to the real world can create intelligence. What is missing is not something inside the model. It's missing a mechanism to explore, search and expand its experience.

Our current crop of LLMs ride on human experience, they have not largely participated in creating their own experiences. That's why people call it imitation learning or parroting. But once models become more agentic they can start creating useful experiences on their own. AlphaZero did it.

There are a whole bunch of assumptions here. But sure, if you view the world as a closed system, then you have a decision as a function of inputs:

1. The world around you 2. The experiences within your (really, the past view of the world around you) 3. Innateness of you (sure, this could be 2 but I think it's also something else) 4. The experience you find + the way you change yourself to impact (1), (2), and (3)

If you think of intelligence as all of these, then you're making the assumption that all that's required for (2), (3), and (4) is "agentic systems", which I think skips a few steps (as the author of an agent framework myself...). All this is to say that "what makes intelligence" is largely unsolved, and nobody really knows, because we actually don't understand this ourselves.

> Continual "curve fitting" to the real world can create intelligence.

I'm going to need a citation on this bold claim. And by that I mean in the same vein as what Carl Sagan would say

  Extraordinary claims require extraordinary evidence

>It's missing a mechanism to explore, search and expand its experience.

Can't we create an agent system which can search the internet and choose what data to train itself with?

you need to define what the utility function of the agent is so it can know what to actually use to train itself. If we knew that this whole debate about human intelligence in computers would either be solved already or well on its way to being solved.

Are you launching into a semantic argument about the word 'experience'? If so, it might help to state what essential properties alphago was missing that makes it 'not having an experience'.

Otherwise this can quickly devolve into the common useless semantic discussion.

Just making sure no one is confused by common computationalist sophistry and how they attribute personal characteristics to computers and software. People can have and can create experiences, computers can only execute their programmed instructions.

And I am saying they are confused because they are attributing personal characteristics to computers and software. By spelling out what computers are doing it becomes very obvious that there is nothing that can be aware of any experiences in computers as it is all simply a sequence of arithmetic operations. If you can explain which sequence of arithmetic operations corresponds to "experiences" in computers then you might be less confused than all the people who keep claiming computers can think and feel.

> By spelling out what computers are doing it becomes very obvious that there is nothing that can be aware of any experiences in computers as it is all simply a sequence of arithmetic operations.

By spelling out what brains are doing it becomes very obvious that it's all simply a sequence of chemical reactions - and yet here we are, having experiences. Software will never have a human experience - but neither will a chimp, or an octopus, or a Zeta-Reticulan.

Mammalian neurons are not the only possible substrate for intelligence; if they're the only possible substrate for consciousness, then the fact that we're conscious is an inexplicable miracle.

If an algorithmic process is an experience and a collection of experiences is intelligence then we get some pretty wild conclusions that I don't think most people would be attempting to claim as it'd make them sound like a lunatic (or a hippy).

Consider the (algorithmic) mechanical process of screwing in a screw into a board. This screw has an "experience" and therefore intelligence. So... The screw is intelligent? Very low intelligence, but intelligent according to this definition.

But we have an even bigger problem. There's the metaset of experiences, that's the collection of several screws (or the screw, board, and screwdriver together). So we now have a meta intelligence! And we have several because there's the different operations on these sets to perform.

You might be okay with this or maybe you're saying it needs memory. If the later you hopefully quickly realize this means a classic computer is intelligent but due to the many ways information can be stored it does not solve our above conundrum.

So we must then come to the conclusion that all things AND any set of things have intelligence. Which kinda makes the whole discussion meaningless. Or, we must need a more refined definition of intelligence which more closely reflects what people actually are trying to convey when they use this word.

> If an algorithmic process is an experience and a collection of experiences is intelligence

Neither, what I'm saying is that the observable correlates of experience are the observable correlates of intelligence - saying that "humans are X therefore humans are Y, software is X but software is not Y" is special pleading. The most defensible positions here are illusionism about consciousness altogether (humans aren't Y) or a sort of soft panpsychism (X really does imply Y). Personally I favor the latter. Some sort of threshold model where the lights turn on at a certain point seems pretty sketchy to me, but I guess isn't ruled out. But GP, as I understand them, is claiming that biology doesn't even supervene on physics, which is a wild claim.

> Or, we must need a more refined definition of intelligence which more closely reflects what people actually are trying to convey when they use this word.

Well that's the thing, I don't think people are trying to convey any particular thing. I think they're trying to find some line - any line - which allows them to write off non-animal complex systems as philsophically uninteresting. Same deal as people a hundred years ago trying to find a way to strictly separate humans from nonhuman animals.

This is a common retort. You can read my other comments if you want to understand why you're not really addressing my points because I have already addressed how reductionism does not apply to living organisms but it does apply to computers.

The comments where you demand an instruction set for the brain, or else you'll dismiss any argument saying its actions can be computed? Even after people explained that lots of computers don't even have instruction sets?

And where you decide to assume that non-computable physics happens in the brain based on no evidence?

What a waste of time. You "addressed" it in a completely meaningless way.

One of the hardest parts of training models is avoiding overfitting, so "more parameters are better" should be more like "more parameters are better given you're using those parameters in the right way, which can get hard and complicated".

Also LLMs just straight up do overfit, which makes them function as a database, but a really bad one. So while more parameters might just be better, that feels like a cop-out to the real problem. TBD what scaling issues we hit in the future.

I mean the devil is in the details. In Reinforcement Learning, the target moves! In deep learning, you often do things like early stopping to prevent too much optimization.

There is no such thing as too much optimization. Early stopping is to prevent overfitting to the training set. It's a trick just like most advances in deep learning because the underlying mathematics is fundamentally not suited for creating intelligent agents.

Is over fitting different from 'too much optimization'? Optimization still needs a value that is optimized. Over fitting is the result of too much optimization for not quite the right value (i.e. training error when you want to reduce prediction error)

I think the miscommunication is due to the proxy nature of our modeling. From one perspective, yes you're right because it's just on your optimization function and objectives. But if we're in the context where we recognize the practical usage of our model replies on it being an inexact representation (proxy) then certainly there is too much optimization. I mean most of what we try to model in ML is intractable.

In fact, that entire notion of early stopping is due to this. We use a validation set as a pseudo test set to inject information into our optimization products without leaking information from the test set (why you shouldn't choose parameters based on test results. That is spoilage. Doesn't matter if it's status quo, it's spoilage)

But we also need to consider that a lack of divergence between train/val does not mean there isn't overfittng. Divergence implies overfittng but the inverse statement is not true. I state this because it's both relevant here and an extremely common mistake.

Most practitioners seem to understand that what they are doing is creating executable models and they don't confuse the model based on numeric observations with the actual reality. This is why I very much do not like all the AI hype and how statistical models were rebranded as artificial "intelligence" because the people who are not aware of what the words mean get very confused and start thinking they are nothing more than computers executing algorithms to fit numerical data to some unspecified cognitive model.

> Most practitioners seem to understand that what they are doing is creating executable models and they don't confuse the model based on numeric observations with the actual reality.

I think you're being too optimistic, and I'm a pretty optimistic person. Maybe it is because I work in ML, but I've had to explain to a large number of people this concept. This doesn't matter if it is academia or industry. It is true for both management and coworkers. As far as I can tell, people seem very happy to operate under the assumption that benchmark results are strong indicators of real world performance __without__ the need to consider assumptions of your metrics or data. I've even proven this to a team at a trillion dollar company where I showed a model with lower test set performance had more than double the performance on actual customer data. Response was "cool, but we're training a much larger model on more data, so we're going to use that because it is a bit better than yours." My point was that the problem still exists in that bigger model with more data, but that increased params and data do a better job at hiding the underlying (and solvable!) issues.

In other words, in my experience people are happy to be Freeman Dyson in the conversation Calavar linked[0] and very upset to hear Fermi's critique: being able to fit data doesn't mean shit without either a clear model or a rigorous mathematical basis. Much of data science is happy to just curve fit. But why shouldn't they? You advance your career in the same way, by bureaucrats who understand the context of metrics even less.

I've just experienced too many people who cannot distinguish empirical results from causal models. And a lot of people who passionately insist there is no difference.

[0] https://news.ycombinator.com/item?id=40964328

Thanks for linking these, I was not very familiar of these works/discussions taking place in the past, but these really helped establish the context. Very grateful that these videos are readily available.

The number of parameters is just the wrong metric, it should be the amount of information contained in the parameter values, their entropy, Kolmogorov complexity or something along that line.

Lol. Loved it.

This was a lovely passage from Dyson’s Web of Stories interview, and it struck a chord with me, like it clearly did with the authors too.

It happened when Dyson took the preliminary results of his work on the Pseudoscalar theory of Pions to Fermi and Fermi very quickly dismissed the whole thing. It was a shock to Dyson but freed him from wasting more time on it.

Fermi: When one does a theoretical calculation, either you have a clear physical module in mind or a rigorous mathematical basis. You have neither. How many free parameters did you use for your fitting?

Dyson: 4

Fermi: You know, Johnny Von Neumann always used to say ‘with four parameters I can fit an elephant; and with five I can make him wiggle his trunk’.

Ya know, in academic writing I tend to struggle with making it sound nice and formal. I try not to use the super-stilted academic style, but it is still always a struggle to walk the line between too loose and too jargony.

Maybe this sort of thing would be a really good tradition. Everyone must write a very silly article with some mathematical arguments in it. Then, we can all go forward with the comfort of knowing that we aren’t really at risk of breaking new grounds in appearing unserious.

It is well written and very understandable!

> It only satisfies a weaker condition, i.e., using four non-zero parameters instead of four parameters.

Why would that be a harder problem? In the case that you get a zero parameter, you could inflate it by some epsilon and the solution would basically be the same.

> In the case that you get a zero parameter, you could inflate it by some epsilon and the solution would basically be the same.

Not everything is continuous. Add an epsilon worth of torsion to GR and you don't get almost-GR, you get a qualitatively different theory in which potentially arbitrarily large violations of the equivalence principle are possible.

They also, effectively, fit information in the indexes of the parameters. I.e., _which_ of the parameters are nonzero carries real information.

In a sense, they have done their fitting using nine parameters, of which five are zero.

Another take away (not directly stated in the article but implied): Counting the information content of a model is more than just the parameters; the structure of the model itself conveys information.

I wish there was more humor on arXiv.

If I could make a discovery in my own time without using company resources I would absolutely publish it in the most humorous way possible.

Joke titles and/or author lists are also quite popular, e.g. the Greenberg, Greenberger, Greenbergest paper[1], a paper with a cat coauthor whose title I can’t seem to recall (but I’m sure there’s more than one I’ve encountered), or even the venerable, unfortunate in its joke but foundational in its substance Alpher, Bethe, Gamow paper[2]. Somewhat closer to home, I think computer scientist Conor McBride[3] is the champion of paper titles (entries include “Elimination with a motive”, “The gentle art of levitation”, “I am not a number: I am a free variable”, “Clowns to the left of me, jokers to the right”, and “Doo bee doo bee doo”) and sometimes code in papers:

  letmeB this (F you) | you == me = B this
                      | otherwise = F you
  letmeB this (B that)            = B that
  letmeB this (App fun arg)       = letmeB this fun `App` letmeB this arg

(Yes, this is working code; yes, it’s crystal clear in the context of the paper.)

[1] https://arxiv.org/abs/hep-ph/9306225

[2] https://en.wikipedia.org/wiki/Alpher%E2%80%93Bethe%E2%80%93G...

[3] http://strictlypositive.org/

Sadly, the constant term (the average r_0) is never specified in the paper (it seems to be something in the neighborhood of 180?): getting that right is necessary to produce the image, and I can't see any way not to consider it a fifth necessary parameter. So I don't think they've genuinely accomplished their goal.

(Seriously, though, this was a lot of fun!)

They say in the text that it’s the average value of the data points they fit to. I think whether to count it as a parameter depends on whether you consider standardization to be part of the model or not

I see your point, that it's really just an overall normalization for the size rather than anything to do with the shape. I can accept that, and I'll grant them the "four non-zero parameters" claim.

Though in that case, I would have liked for them to make it explicit. Maybe normalize it to "1", and scale the other parameters appropriately. (Because as it stands, I don't think you can reproduce their figure from their paper.)

IIUC:

A real-parameter (r(theta) = sum(r_k cos(k theta))) Fourier series can only draw a "wiggly circle" figure with one point on each radial ray from the origin.

A compex parameter (z(theta) = sum(e^(z_ theta))) can draw more squiggly figures (epicycles) -- the pen can backtrack as the drawing arm rotates, as each parameter can move a point somewhere on a small circle around the point computed from the previous parameter (and recursively).

Obligatory 3B1B https://m.youtube.com/watch?v=r6sGWTCMz2k

Since a complex parameter is 2 real parameters, we should compare the best 4-cosine curve to the best 2-complex-exponential curve.

（评论） (comments)

（评论）
(comments)