（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=43160779

随机问题的分析解决方案仅限于简单情况。复杂的问题需要数值方法，这些方法并不总是蒙特卡洛模拟。可以通过数值求解fokker-planck方程，但在高维度上变得昂贵，从而使蒙特卡洛采样效率更高。在统计数据中，估计差异在人群和样本之间有所不同。人口差异使用人口平均值，除以n，而样本差异则使用样本平均值估算种群方差，并除以（n-1），以进行自由度，以进行公正的估计。随机演算与噪声探测演算，其中函数变得不可差异。它修改了计算规则以适应嘈杂的曲线，通过噪声输入参数连接光滑和非平滑的行为。

（评论） 2024-08-16

（评论） 2024-08-28

（评论） 2025-02-23

年轻血液分布式系统注意事项 2024-09-04

原文

Yes, by advanced undergraduate, I meant very advanced undergraduate. But when I was in undergrad I always heard about some students like this who were off in the graduate classes. And then in grad school, there was even a high school student in my Algebra course who managed to correct the professor on some technical issue of group theory. So I don't assume you have to be a PhD to work through this material.

Can perhaps someone suggest some resources that are, uh, less advanced undergraduate? Is this possible? Or perhaps just the resources for the prerequisites themselves? Like, what's the route from "not advanced undergraduate"?

The links above are for studying this as a pure mathematician would. If you want to study it that way, you would take most of the core classes in the undergrad curriculum:

Calculus (without proofs) Linear Algebra Real Analysis (proofs of calculus) Measure Theory

There are also higher level courses that are worth taking, because they motivated a lot of this theory. They would be imo, Functional Analysis (real analysis applied to spaces of functions), and Partial Differential Equations.

If you've knocked off some of the undergrad prereqs and feel good about proofs, this could be the right book for you: https://www.amazon.com/Probability-Martingales-Cambridge-Mat.... Another gem of a book.

A further step is Langevin Dynamics, where the system has damped momentum, and the noise is inserted into the momentum. This can be used in molecular dynamics simulations, and it can also be used for Bayesian MCMC sampling.

Oddly, most mentions of Langevin Dynamics in relation to AI that I've seen omit the use of momentum, even though gradient descent with momentum is widely used in AI. To confuse matters further, "stochastic" is used to refer to approximating the gradient using a sub-sample of the data at each step. You can apply both forms of stochasticity at once if you want to!

The momentum analogue for Langevin is known as underdamped Langevin, which if you optimize the discretization scheme hard enough, converges faster than ordinary Langevin. As for your question, your guess is as good as mine, but I would guess that the nonconvexity of AI applications causes problems. Sampling is a hard enough problem already in the log-concave setting…

Is stochastic calculus something that requires a computer to stimulate many possible unfolding of events, or is there a more elegant mathematical way to solve for some of the important final outputs and probability distributions if you know the distribution of dW? This is an awesome article. I've seen stochastic calculus before but this is the first time I really felt like I started to grok it.

In case the other responses to your question are a little difficult to parse, and to answer your question a little more directly:

- Usually, you will only get analytic answers for simple questions about simple distributions.

- For more complicated problems (either because the question is complicated, or the distribution is complicated, or both), you will need to use numerical methods.

- This doesn't necessarily mean you'll need to do many simulations, as in a Monte Carlo method, although that can be a very reasonable (albeit expensive) approach.

More direct questions about certain probabilities can be answered without using a Monte Carlo method. The Fokker-Planck equation is a partial differential equation which can be solved using a variety of non-Monte Carlo approaches. The quasipotential and committor functions are interesting objects which come up in the simulation of rare events that can also be computed "directly" (i.e., without using a Monte Carlo approach). The crux of the problem is that applying standard numerical methods to the computation of these objects faces the curse of dimensionality. Finding good ways to compute these things in the high-dimensional case (or even the infinite-dimensional case) is a very hot area of research in applied mathematics. Personally, I think unless you have a very clear physical application where the mathematics map cleanly onto what you're doing, all this stuff is probably a bit of a waste of time...

Thanks for the explanation this was very helpful. You've given me a whole new list of stuff to Google. The quasipotential/comittor functions especially seem quite interesting although I'm having a bit of trouble finding good resources on them.

They are pretty advanced and pretty esoteric. They will be very difficult to get into without a solid graduate background in some of this stuff, or unless you're willing to roll up your sleeves and do some serious learning. The book "Applied Stochastic Analysis" by Weinan E, Tiejun Li, and Eric Vanden-Eijnden is probably a decent place to start. I took a look at this book a while ago, and it's probably decent enough to get a foothold on the literature in order to figure out if this stuff will be useful for you. These guys are all monsters in the field.

It depends a bit on exactly what you want to calculate, but in general things like the probability density function of the solution of a stochastic differential equation (SDE) at time t satisfies a partial differential equation (PDE) that is first order in time and second order in space [0]. (This PDE is known to physicists as the Fokker-Planck equation and to mathematicians as the Kolmogorov forward equation.) Except in special examples, the PDE will not have exact analytical solutions, and a numerical solution is needed. Such a numerical solution will be very expensive in high dimensions, however, so in high-dimensional problems it is cheaper to solve the SDE and do Monte Carlo sampling, rather than try to solve the PDE.

Edit: sometimes people are interested in other types of questions, for example the solution when certain random events occur. Analogous comments apply. Also, while stochastic calculus is very useful for working with SDEs, if your interest is other types of Markov (or even non-Markov) processes you may need other tools.

Edit again: as another commenter mentioned, in special cases the SDE itself may also have exact solutions, but in general not.

[0] This statement is specific to stochastic differential equations, i.e., a differential equation with (gaussian) white noise forcing. For other types of stochastic processes, e.g., Markov jump processes, the evolution equation for distributions have a different form (but some general principles apply to both, e.g., forms of the Chapman-Kolmogorov equation, etc).

Certain simple stochastic differential equations can be solved explicitly analytically (like some integrals and simple ordinary differential equations can be solved explicitly), for example the classic Black Scholes equation. More complicated ones typically can't be solved in that way.

What one often wishes to have is the expectation of a function of a stochastic process at some point, and what can be shown is that this expectation obeys a certain (deterministic) partial differential equation. This then can be solved using numerical PDE solvers.

In higher dimensions, though, or if the process is highly path-dependent (not Markovian), one resorts to Monte Carlo simulation, which does indeed simulate "many possible unfolding of events".

It has been a while since I studied along these lines (stochastic chemical reaction simulations in my case) but I think the answer is often yes, but not always (I don't think). A random walk for example will be a normal distribution (and you know the mean, and you know the variance is going to infinity), so I do think in that case you end up with an elegant analytical solution if I'm understanding correctly as the inputs can determine the function the variance follows through time.

But often no, you need to run a stochastic algorithm (e.g. Gillespie's algorithm in the case of simple stochastic chemical kinetics) as there will be no analytical solution.

Again it has been a while though.

For normal distributions I think do - black scholes is an analytical solution to option pricing. Been a while since I studied stochastic calculus

I question why this is the second highest article on hacker news currently, can’t imagine many people reading this website are REALLY in this field or a related one, or if it’s just signaling like saying you have a copy of Knuths books or that famous lisp one

This is one of those archetypal submissions on HN: mathematics (preferably pure, using the word "calculus" outside of integrals/derivatives gives additional points), moderately high number of upvotes, very few comments. Pretty much the opposite of political posts, where everyone can "contribute" to the discussion.

I upvote so it sticks around longer, so it has a better chance of generating interesting comments.

I also upvote because I find it interesting to learn about stuff I didn't know about. I might not understand it, but I do like the exposure regardless.

Depends on what you want to know. If you want to get some trajectories then simulation of the stochastic differential equation is required. But if you just want to know the statistics of the paths, then in many cases you can write and try to solve the Fokker-Planck equation, which is a partial differential equation, to get the path density.

I remember studying stochastiv calculus

And I remember noting that the standard deviation in regular statistics was that “quadratic variation” was slightly different than how variance is calculated. Off by one or squared or whatever. I made a note to eventually investigate why. Probably due to some stochastic volatility.

There is the fact that the variance of the entire population is defined [0] as

  sum i=1..N (x_i - mu)^2 / N

while, given a sample of n iid [1] samples from a distribution, the best [2] estimate of the distribution variance is

  sum i=1..n (x_i - a )^2 / (n-1)

Note that we replaced the mean mu by the sample average a, [3] and divided by (n-1) instead of N.

[0] with the mean mu := sum x_i / N being the actual mean of the population

[1] independent and identically distributed

[2] best in the sense of being unbiased. It's a tedious, but not very difficult calculation to confirm that the expectation of that second expression (with n-1) is the population variance.

[3] with the sample average a := sum x_i / n being an estimate of the population mean

The other guy gives a solid explanation so don't use mine as a replacement or to assume the other is wrong.

To me there are two ways to approach the problem I think you are thinking of (sample variance I think).

(1) The sample variance depends on the sample mean which is sum(x_i) / n. Given the first n-1 of n samples, you would then know the final value (x_n = n * sample_mean - sum(x_i)_(n-1)) so at the very least n-1 could be understood as a "degrees of freedom". There are only n-1 degrees of freedom. Other higher sample moments can be roughly understood with the same degrees of freedom argument. This could be wrong though, it was just something I remember from somewhere.

(2) The more mathematically inclined way is that biased_sample_variance = sum((x_i - sum(x_i) / n)^2) / n. The mean of the biased_sample_variance (across many iterations of a set of samples N), is not the population variance, but (n - 1) / n * population_variance (i.e. it is biased). So you multiply the biased_sample_variance by (n / (n - 1)) which gives the unbiased sample_variance equation: sum((x_i - sum(x_i) / n)^2) / (n - 1). The math is rather fun in my opinion, once you get into the swing of things.

I sure do hope I understood your question correctly.

Own favorite source on stochastic calculus:

     Eugene Wong,
     {\it Stochastic Processes in Information and
     Dynamical Systems,\/}
     McGraw-Hill,
     New York,
     1971.\ \

This is such a good model for how to write a beginner friendly introduction. Especially the motivation for the Ito lemma, with the dW^2 term remaining important even though it disappears in regular calculus, and the conversion to Stratonovich is really nice.

Here’s an example where I ran into this recently.

Let’s say we play a “game”. Draw a random number A between 0 and 1 (uniform distribution). Now draw a second number B from the same distribution. If A > B, draw B again (A remains). What is the average number of draws required? (In other words, what is the average “win streak” for A?)

The answer is infinity. The reason is, some portion of the time A will be extremely high and take millions of draws to beat.

Showing the calculation you described:

If p is the value drawn for A, then each time B is drawn, the probability that B>A is (1-p), So, the chance that B is drawn n times before being less than or equal to A is, p^(n-1) (1-p) (a geometric distribution). The expected number of draws is then (1/p) . Then, E[draws] = E[E[draws|A=p]] = \int_0^1 E[draws|A=p] dp = \int_0^1 (1/p) dp, which diverges to infinity (as you said).

(I wasn’t doubting you, I just wanted to see the calculation.)

The way the question was framed, it was ambiguous whether "draw again" only applied to B, or whether A would draw again as well. I'm assuming the 'infinity' answer applies only to the former case?

Does this really require stochastic calculus to prove? This should just be a standard integration, based on the fact that the expected number of samples required for fixed A being 1/(1-A).

Question for HN readers: We have defined about 50 spots (loci) in the mouse genome that contain DNA differences that modulate mortality rates. Most of them have complex age-dependent “actuarial” effects. We would like to predict age at death.

Would stochastic calculus be a useful approach in actuarial prediction of life expectancies of mice?

(And this is why I am pleased to see this high on HN.)

Stochastic calculus is like ordinary calculus in that it is most useful when one time is like another except for a few variables that describe a state, and least useful when one time is unlike another.

Because you have as many questions (loci) as you have segments that you can reasonably expect to divide time into (changing the time of death by 1/50th of a mouse lifespan would be impossible to detect unless I am wrong?), and because the time intervals are not that numerous, and also because you wouldn't really have a model for the interaction of the state variables and would be using model-free statistical methods, I think you would get all of the value there is to get out of noncontinuous methods.

I'm not prepared to say "no", and as has been noted already, it depends on the application, but from your description it seems to me more like a task for Bayesian statistics organized on graphs (the nodes & vertices kind).

(Just spitballing)

I think stochastic calculus looks at a system whose output value is a smooth/real value. Basically, it is for modeling systems like random walks where there is a little bit of random up-and-down jumping in each interval. However, if you are basically looking time versus dead-or-alive, your output is binary and time-of-death is really all the info you get and you wouldn't need/want a random walk model, just a more ordinary statistical model. Maybe if there was some other variable besides dead-or-alive you were measuring or aware of a stochastic model could help then (which is a bit like saying "if we had bacon, we could have bacon-and-eggs, if we had eggs").

Also, if what you're saying is you have 50*X bytes of information that all influence life expectancy, it sounds like a challenging problem. But also it's kind of Taylor-made for neural networks; many discreet inputs versus a single smooth output. You might try a neural network and linear model and see how much better the neural network is - then you could determine if more complex-than-linear interactions were occurring.

Just in case you missed it, https://en.m.wikipedia.org/wiki/Survival_analysis exists to answer specifically this question.

In more practical terms, if I were to approach this problem, I'd discretize it in time and apply classical ml to predict "chance to die during month X assuming you survived that long" and fit it to data - that'd be much easier to spot errors and potential issues with your data.

I'd go for the stochastic calculus or actual survival analysis only if you wanted to prove/draw a connection between some pre-existing mathematical properly such as memory-less-ness and a physical/biological properly of a system such as behavior of certain proteins (that'd be insanely cool, but rather hard, esp if data is limited). In my (very vague) understanding, that's what finance papers that use stochastic analysis do - they make a mathematical assumption about some universal mathematical properly of a system (if markets were always near optimal with probability of deviation decaying as XYZ, the world economy would react this way to these things), and then prove that it actually fits the data.

Happy to chat more, sounds like a fun project :)

I was coming here to say this is a survival analysis problem, and thus a different branch of probability and statistics. However, you can also frame it as a stochastic process if you have extra epigenetic data that is associated to those 50 DNA loci or some genes they regulate.

For example, your DNA loci of interest could have a state (methylated or unmethylated). And you could come up with a stochastic process where death occurs when a function of methylation changes at those loci (e.g. a linear model) crosses a threshold (first passage in stochastic process jargon).

Omer Karin & Uri Alon have published a similar concept to explain how the decreased capacity of immune cells to remove senescent cells leads to a Gompertz-like law of longevity, something that originates from actuarial studies! Their model is simpler as they deal with a univariate problem [1].

[1] https://www.nature.com/articles/s41467-019-13192-4

As others have said in various ways, start by fitting a survival model using glmnet.

That said, here are some folks trying to use SDEs to model cells, they even have a "dW" on their logo. This is a long way from predicting age of death, but it might eventually give insights into the exact mechanism. Also I think they're starting with bacteria and yeast, so mice might be a way off.

https://macsys.org/

Your link doesn't demonstrate the use of stochastic calculus by life insurance companies or for life insurance. It's just an undergraduate curriculum for actuarial students (that they learn all this stuff doesn't imply that's what life insurance companies use).

Here's my understanding of Ito calculus if it helps anyone:

1. The only random process we understand initially is Brownian motion.

2. Luckily, we can change coordinates.

Ito's formula/lemma is like the chain rule from calculus. It is a generalization, in that it uses a second order Taylor series expansion, whereas the chain rule only needs a first order expansion. Anyway, I think (2) is a reflection of this fact, and how the chain rule lets us compute dynamics of a derived process.

I sort of disagree with (1), since Ito's lemma is most naturally applied to ~martingales, of which Brownian Motion is an important special case.

Can someone please help me parse this sentence?

> Brownian motion and Itô calculare a notable example of fairly high-level mathematics that are applied to model the real world

What is “Itô calculare” supposed to have been? I am stumped. “Its calculation”?

That makes so much more sense! Although the pedant in me wants to argue that calculus plural is “calculi”/“calculuses” (the dictionary gives me the latter, although I’ve never seen it in the wild myself—-but I won’t pursue that because it’s beside the point!) Thanks for the help!

Day to day not so much unless you are in structured products/exotics as a structurer, at which point yeah its pretty important.

That said, already at masters level internships you could get asked much harder questions than what this article touches on. I got asked to prove the Cameron-Martin theorem once, I found that to be extremely difficult in a job interview setting.

It depends.

In a linear rates shop (i.e. not trading options), almost all of the effort goes to tuning the deterministic bit of this equation. Thousands upon thousands of lines of code to do a problem that most books don't even mention behind giving the term a symbolic name!

And then if you do trade an option it's probably good enough to use an off the shelf model to work out your delta and so on.

If you're making markets or flogging exotics and structured products then you may indeed be wrangling this stuff all the time.

I had to study quantum stochastic calculus for my PhD. Really crazy because you get totally different results for the same mathematical expression compared to normal calculus

No, I think one of the fundamental insights of stochastic calculus is that the addition of noise to a process changes the trajectory in a non-trivial way.

In finance, for instance, it leads to the concept of a "volatility tax." Naively, you might think that adding noise to the process shouldn't change the expected return, it would just add some noise to the overall return. But in fact adding volatility to the process has the effect of reducing the expected return compared to what you would have in the absence of volatility. (This is one of the applications of the result that the original article talks about in the Geometric Brownian Motion section.)

Just to add to this, the reason that the things are different is, stochastics as a subject is trying to do calculus in the presence of noise, and what noise does is, it makes your function nondifferentiable. You would think that you cannot do calculus, without smooth curves! But you can, but we have to modify the chain rule and define exactly what we mean by integration etc.

So the idea is “smooth curves do X, but non-smooth noisy curves do Υ(χ) where χ in some sense is the noise input into the system, and they aren't contradictory because Y(0) = X. (At least usually... I think chaos theory has some counterexamples where like the time t that you can predict a system’s results for, is, in the presence of exactly 0 noise, t=∞, but in the limit of nonzero noise going to zero, it's some finite t=T.)

Kinda. The differential operator in quantum Ito calculus can be applied to mathematical objects that the normal differentials aren’t properly defined on, such as stochastic variables.

still wild to me that diffusion models are fast becoming the secret sauce behind ai image generation, but their roots are buried deep in stochastic calculus

who knew brownian motion would eventually help create cat memes?

Seems like a great article. Having some prior experience with stochastic calculus, I think I understand almost everything here. Any other good introductory materials?

I’ve been planning to study this in a bit although I have some background to cover first so haven’t got on to it. From what I’ve found, the youtube channel “Mathematical Toolbox” has some videos which are quite introductory but seem good. Some people also recommend the book “An Informal Introduction to Stochastic Calculus with Applications” by Calin as a good place to start. Then Klebaner “Introduction to Stochastic Calculus with Applications” and also Evans “An Introduction to Stochastic Differential Equations” are apparently very good but harder and more formal texts, but you need some analysis and measure theoretic probability background first. The Evans is the same Evans who wrote the definitive book about PDEs fwiw. Klebaner and Evans are apparently a lot harder than Calin though even though they are all called introductions.

（评论） (comments)

（评论）
(comments)