(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=41241942

所提供的文本讨论了 Big-O 表示法应用于实际计算机算法时的局限性。 它认为 Big-O 模型忽略了机器寄存器大小、内存容量、存储访问速度和传输速率等因素,导致了不切实际的结果。 给出的示例是 O(1) 算法需要数小时才能完成,而 O(n) 竞争对手每秒处理 4 GB,一旦处理超出可用系统资源的大量数据,就变得没有竞争力。 作者还提到,虽然 Big-O 复杂性提供了见解,但它们缺乏特异性,就像在不知道商店商品数量或质量的情况下只知道平均价格一样。 复杂的空间划分方法或树细分会增加不必要的复杂性,特别是在处理大量动态对象时。 从历史上看,复杂的分区系统对于《DOOM》或《Quake》等游戏至关重要,因为它们依赖于原始计算性能。 然而,现代 CPU 擅长处理有序的项目数组,由于流水线技术,可以提供比链表/树更快的性能。 这意味着用于管理实体列表的 CPU 时间分配更少,而更多地关注 AI、渲染等方面。 在考虑胶囊的精细碰撞时,他们将胶囊视为具有厚度的三维片段,根据片段距离确定它们是否发生碰撞。 可以在此处找到一个演示分段之间计算的有用网站:[链接]。 另一种计算距离的方法是将其视为最小化问题,在每个对象内查找点并最小化它们之间的平方距离,同时确保约束成立。 在胶囊的背景下,这创建了一个“凸”优化问题,可以通过牛顿方法使用不到十个迭代步骤来解决,然后投影到相应的凸对象上。 几何经常将对象分解为简单的凸形状的联合,以实现高效的计算,一个例子是球体树覆盖具有重叠球体的对象,以便在胶囊之间快速进行粗略碰撞检测。

相关文章

原文


One interesting note on this approach is that the author suggests using a "fast" sorting algorithm like mergesort/quicksort as the sorting algorithm for best performance. But in practice, you may get better performance from a "worse" sorting algorithm: insertion sort.

The reason is that objects in collision detection systems tend to move relatively small steps between frames, meaning you can keep lists from the previous frame that are already mostly sorted. For sorting these mostly sorted lists, insertion sort tends towards O(n) while Quicksort tends towards O(n^2)!



The author covers this pretty much exactly as you describe it. From Part 2 of TFA,

> Let’s look at the sort step, which is the bottleneck of the algorithm according to the analysis.

> You can see that most of the time, the sort does nothing at all! The list is almost always already sorted from the previous frame.

> Even when it becomes unsorted, it usually just takes a couple of swaps to be sorted again.

> Here’s insertion sort in action:



If you use a modern sort, it'll just go "Oh that's sorted, we're done". Pattern Defeating Quicksort will do that and so would Glidesort, Driftsort, IPNsort and so on. Most of these algorithms will also correctly shortcut trivial "almost sorted" cases like 123457689 (only longer, if you really only have nine elements you want a custom sort for that, there will be a correct algorithm and it's probably written down somewhere already)

Rust's old sorts, both of them, get this right. There's a fair chance the current version of your C++ stdlib even gets this right in its unstable sort. In 1965 it's understandable that you reach for an insertion sort for this case, in 2024 it's embarrassing if your language doesn't provide a built-in sort where this just works.



You don’t just want to sort the data, you actually want to collect a list of overlapping pairs, (or more like you want to collect the list of pairs that have changed since last time).

Using a built in sort will give you the same complexity, but iterating over 10000 elements again to gather this list is an overhead that you can avoid with a DIY approach



For what it's worth, Haskell's built-in sort does this, and I think Python, too. (But I'm not as confident about Python. My memory is a bit hazy.)



There is an alternative to sorting every step : Build your indexing structure a little looser, so that you catch the candidates collisions when object have moved less than epsilon.

For example that can be done by increasing the radius of the spheres by epsilon. As long as the spheres have not moved by epsilon, you don't need to recompute the index.

To avoid latency peaks when you do need to recompute the index, you can start building a lagging index by sorting 10% each frame (aka amortizing the computation cost). After 10 frames you have an index that is valid as long as the position is within epsilon of the position 10 frames ago.



> For sorting these mostly sorted lists, insertion sort tends towards O(n) while Quicksort tends towards O(n^2)!

Only if you pick your pivot really, really badly. You can randomise your pivot to get O(n log n), or in the case of already almost sorted lists, you can pick a pivot by just picking the element in the middle of the list.

But you are right, that even with optimal pivots, QuickSort is still O(n log n) even in the best case. There are simple variants of merge sort that give you O(n log k) behaviour where k is the number of runs (both ascending and descending) in the data. The 'sort' you get by default in eg Haskell's standard library uses such an algorithm. I think Python does, too.



This was really well put together.

What’s funny is that I’ve been doing some form of gamedev since late 90s and most of this is abstracted by engines at this point, but this is essential to understanding how complex system simulations work.

Thanks to the author for making a very accessible article



A long time ago I did something similar but rather than sorting I just kept index lists for each direction and the objects sorted themselves. Meaning like there are 4 lists `objectIndicesSortedByLeftEdge/RightEdge/TopEdge/BottomEdge` If an object moves horizontally then it updates its own indices in the leftEdge and rightEdge arrays. This is because as it moves it likely only has to swap 1 or 2 indices at most to move itself.



I think your method seems useful for scenes that are mostly static. the "recreate the graph" approach seems better the more dynamic things are.



objects are unlikely to jump from one side the screen to the other so comparing them to stuff on the other side of the screen seems like a waste (which is what a generic sort will do).



> This naive algorithm runs in O(n2) time in Big O terms.

Is this true? The naive algorithm's outer loop (i) counts n - 1, while the inner loop (j) begins at i + 1 (so counts progressively less than n - 1).

Not a CS major, is this roughly equivalent (for large values of n) to O(n2) or is it, as it appears, something less?



You're right that it's not exactly n^2. For the i-th element we perform (n - i - 1) comparisons (indexing from 0). This adds up to a total of (n - 1) * n / 2 comparisons. (see https://en.wikipedia.org/wiki/Triangular_number)

In the end it doesn't make a difference for big-O analysis because it's used to describe the behavior when n approaches infinity, where the quadratic factor takes over.



The "optimisation" of starting the inner loop at `j = i + 1` is done to avoid testing every pair of objects twice.

[Ed: I should add that it also prevents testing an object against itself.]

The algorithm is O(n^2) because every pair will be checked once.



Big O is just complexity classes, describing how number of abstract computations scale (that's the key word) with input size (input list length). Generally speaking, if you could analytically express number of computations as a function of input size, Big O takes the largest term and drops all factors.

It does not necessarily describe performance of the algorithm. `20n2^+5n` and `2n^2 + 9001n` are both O(n^2)



> It does not necessarily describe performance of the algorithm.

Not necessarily true. It does indeed describe performance of the algorithm. It just compares scenarios with coarser granularity. You can tell from the very start that a O(1) algorithm is expected to outperform a O(N²) alternative.



My algorithms class taught to think of it not as "describing performance" in an absolute sense, but as "describing how performance changes as the size of the input data increases".

It is not necessarily true that an O(1) algorithm will outperform an O(n^2) alternative on a particular set of data. But it is true that an O(1) algorithm will outperform an O(n^2) alternative as the size of the input data increases.



This sometimes doesn't work out in practice because the scaling involved runs into a limitation your big-O model didn't account for. Typical examples are: The size of the machine registers, physical RAM, addressable storage, or transmission speeds.

If your O(1) algorithm takes an hour for any input, and the competition is O(n) it may seem like there must be cases where you're better, and then you realise n is the size in bytes of some data in RAM, and your competitors can do 4GB per second. You won't be competitive until we're talking about 15TB of data and then you remember you only have 64GB of RAM.

Big-O complexities are not useless, but they're a poor summary alone, about as useful as knowing the mean price of items at supermarkets. I guess this supermarket is cheaper? Or it offers more small items? Or it has no high-end items? Or something?



> (...) but as "describing how performance changes as the size of the input data increases".

Yes, that's the definition of asymptotic computational complexity. That's the whole point of these comparisons. It's pointless to compare algorithms when input size is in the single digit scale.



You could have an O(N^2) algorithm outperform an O(N) on the scale of 10,000 (or whatever scale you want to imagine). The big-O notation compares only asymptotic behaviour, and sometimes the lower power factors are overwhelming.



it's the summation of numbers from 1 to n which is n(n+1)/2. This reduces to quadratic complexity because big O notation ignores all coefficients and terms that scale slower



I enjoyed the use of illustration. Seems like an appropriate usage.

Sometimes I feel like these articles with interactive illustrations are more like an excuse to put together a bunch of cool demos, like there's a lot of fluff with not much substance (a bit like a TED talk), but this one didn't let the illustrations take over.



I hadn't seen this before, but isn't it similar to using something like a quad-tree to reduce the number of potential colliders?



This is awesome, very beautiful and well written!

Are there other handy 2-d algorithms that are this well explained? I know redblobgames has a treasure trove too, but curious about more.



> I won’t cover other approaches, such as space partitioning or spatial tree subdivision

I'm curious about this comment, anyone know if the algorithm in the article is generally faster than space partitioning/spatial tree subdivision?

A long time ago I used a spatial tree type of approach which seemed naively to be a pretty good approach, but I never investigated or compared algorithms other people were using (this was 80's, pre-internet).



The complexity involved in maintaining space partitioning, tree subdivision etc can end up being a big hassle, especially if you have huge numbers of moving objects.

It's much easier to write, debug and optimize something that manages a single list of entities, or a grid of i.e. 256x256 'cells' that each contain a list of entities, than it is to set up a complex partitioning scheme that maintains all your tree invariants every time an object moves.

In the days of i.e. DOOM or Quake the performance of these underlying systems was much more important than it is now, so it (IMO) made more sense for the authors of those engines to cook up really complex partitioning systems. But these days CPUs are really good at blasting through sorted arrays of items, and less good at chasing linked lists/trees (comparatively) than they used to be, due to pipelining. Your CPU time isn't going to be spent on managing those entity lists but will instead be spent on things like AI, rendering, etc.



I'm a big fan of spatial grid hashes to simplify situations like this. Nice to see other approaches. Iterative Insertion sorting would be easier to port to a compute shader than a spatial grid hash, so maybe this method is better if we get into the millions of objects range?



Isn't spatial hashing most suited to colliding objects of similar sizes, and with similar dimensions in each axis? (So you can match the grid size accordingly) If you've got a mix of very different sizes and shapes, I think sweep and prune can be a better choice



Excellent article. Sorting the list is a really simple and neat idea, without the need for clustering or a special data structure.



A tangent, but HNers interested in an article on collision detection might know: are there any similar articles that show how to compute the intersection of two capsules (as in the space that that a sphere moving in a straight line in a time step occupies) in 3D? My own hobby 3D game got stuck on that hurdle and I couldn’t find any examples anywhere :(



Capsule-capsule overlap can be detected by treating the capsules as line segments with radius/thickness. But I think you need something more complicated than capsule-capsule intersection to truly solve the problem (continuous collision detection between dynamic objects).

The Rapier project and its documentation[0] might be of interest to you. Rapier has a more sophisticated CCD implementation than most (popular in gamedev) physics engines.

> Rapier implements nonlinear CCD, meaning that it takes into account both the angular and translational motion of the rigid-body.

[0] https://rapier.rs/docs/user_guides/rust/rigid_bodies#continu...



I haven't worked on collision detection since the 1990s. But convex hull collisions are very fast with GJK. And the space swept by translating a convex hull is itself a convex hull. So continuous collision detection (no fly-through) is possible that way. But that's usually overkill. Objects that move more than their own dimension in one frame time are usually bullets, and those should get special handling.



You make one of them have thickness of both, so that the collision detection becomes line vs capsule collision.

Another trick is to rotate both so that one of the move directions is axis-aligned, and then the problem looks more like AABB.



For fine collision between capsules, you consider the capsules as 3d segments with thickness. And two of those collide if the distance between segments

You can check this site with nice animations which explain how to compute the distance between segments : https://zalo.github.io/blog/closest-point-between-segments/

One more generic way to compute the distance between shapes is to view it as a minimization problem. You take a point A inside object 1 and a point B inside object 2, and you minimize the squared distance between the points : ||A-B||^2 while constraining point A to object 1 and point B to object 2.

In the case of capsules, this is a "convex" optimization problem : So one way of solving for the points, is to take a minimization (newton) step and then project back each point to its respective convex object (projection is well defined and unique because of the convex property) (It usually need less than 10 iterations).

In geometry, we often decompose object into convex unions. One example is sphere-trees, where you cover your object with spheres.

This allow you to use fast coarse collision algorithm for spheres to find collision candidates of your capsules : You cover your capsules with spheres (a little bigger than the thickness) so that the capsule is inside the union of the spheres.



Isn’t it just a matter of checking if two lines’ perigee fall within radius sum, and if so check if the respective segments’ closest points to the perigee are within radius sum?



The title was curious to me because I expected more a post about the `intersects` function, e.g the pip algorithm... turns out it's more about the complexity involved by the number of objects to test, which in the end is also interesting.



Super interesting, my first thought before I read the article was why not a bloom filter but didn’t expect it to be “physical” collision



Not mentioned is spatial hashing, which works well without complex data structures.

Also sad that you need all this complexity to test 25 balls in Javascript. 25^2 is a small number in C, especially when your balls fit in cache.

Still a good introduction to various algorithms, though.

联系我们 contact @ memedata.com