（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=40684901

本文讨论了李群/李代数对应关系，这是一个数学概念，允许处理 3D 旋转等抽象概念，而无需处理特定坐标，同时仍然能够通过派生坐标来表示和操作它们。这种对应关系包括指数图和对数图，允许抽象概念及其关联坐标之间的无缝转换。这些方法可以产生插值和平均技术，从而提高工程师处理此类问题的效率和简便性。作者还谈到了旋转和复数之间的关系，特别是二维旋转和四元数。他们认为，正如复数的乘法等于旋转的组成一样，四元数也是如此，与 3D 旋转的传统矩阵乘法相比，效率更高。此外，四元数与李代数有着密切的联系，进一步增强了它们在此框架内的实用性。关于对多次旋转进行平均的使用，文本指出，由于旋转不交换，因此不存在直接的“平均”含义。然而，它建议根据期望的结果探索各种方法。本文最后以个人偏好四元数代替矩阵来表示和实现 3D 旋转，因为它们能够有效地组合旋转并避免万向节锁。提供了一个比较四元数和矩阵来解决相同问题的示例。作者主张使用四元数和 Arcball 等接口在 3D 软件中实现旋转，理由包括更轻松的用户体验和消除万向节锁等好处。

The lie group/lie algebra correspondence is one of the coolest things i wish i’d been taught in school. It’s the exponential and log map in the article, but in a super reusable setting.

You take some totally abstract thing you want to work with[0], say 3D rotations, but totally avoiding any details of specific coordinates that might get you into trouble. That’s the lie group.

Then you can derive representations of it in coordinates that behave nicely. This is called the corresponding lie algebra. Then for free you get a way to go back and forth between the coordinates and the abstract thing[1], and ways to compose them and stuff. Turns out that this is the exponential map and the log map talked about in the article. What’s even cooler is that you can then compose them on either side, the abstract side or the coordinate side in a way that plays nicely. You get interpolation and averaging and all that in a fairly sensible way too in most cases you’ll deal with as an engineer. And best of all, when you can phrase your problem as a combination of lie groups, you can just google for what their algebras are and get lots of work for free that would be tons of time to do yourself.

[0] the thing has to be something with a notion of smooth change, plus a bit more structure it probably has.

[1] going back and forth has some ‘connected components’ issues sometimes, but that’s just another reason it’s great to piggyback on known results.

I've had a long work week and it's almost over and using the sliders to rotate this cow is exactly what I needed. I feel serene. Thank you.

I don't get the love of quaternions in this context. How are quaternions more intuitive than matrices?

A matrix acts on a vector. Rotations act on vectors. What could be more natural than to look at rotations as matrices?

The matrix exponential then makes all this intuitive to work with, if you connect it to solutions of ODEs: dx/dt = Ax has solution exp(t A). If A is skew symmetric, then the change in x is orthogonal to x at all times. So it's a rotation that doesn't change the length. (Alternative: apply a small orthogonal change over and over: (1 + delta A)^N x)

Lie Groups/Algebras vastly generalize this, and you can call skew symmetric matrices geometric algebra, but that's not important. Important is that you generate rotations by continuously making orthogonal changes, and that the exponential map describes this process. I find this picture far more geometric and intuitive than anything else.

Might be worth pointing out that quaternions themselves are often represented as matrices as well: That’s the Pauli matrices, widely used to model quantum spin.

One advantage of quaternions is that it’s (IMO) easier to calculate things with them using only pen and paper than it is to do equivalent matrix multiplications by hand.

Personally I find them “intuitive” in the same sense as complex numbers are intuitive: They seemed weird on first exposure, but they now feel simpler to use and reason about than the alternatives I know.

The article explained a major advantage of quaternions over matrices. Quaternions interpolate well and matrices don’t. This property is very important for animation and other things in computer graphics, like computing frames along a 3d spline curve.

If you are referring to slerp, the matrix version is trivial using the log and exp maps.

    slerp(A, B, t) = exp(t log(B A^T)) A

Something more complex like splines, I am not familiar with the algorithms used for quaternions. I would bet they use Lie algebraic operations, but do they actually use anything specific to quaternions?

Not disagreeing that quaternions have some computational advantages over matrices, but most of the points people repeat as "advantages of quaternions" are really "advantages of proper Lie group/algebra operations over Euler angles" that are independent of matrix vs. quaternion representation.

The article also explains that the natural way to look at this property is in terms of matrix exponentials/logarithm.

The space of rotation matrices is not a linear space. It's obvious that interpolation is meaningless there. The space of skew symmetric matrices is linear, so you can interpolate the generators of rotations naturally.

Without this perspective, how do you even begin to understand what this interpolation does or what it actually means?

One advantage of quaternians from a computational perspective is that they are 4 numbers vs 9 for a 3x3 matrix (and applying the rotations has a similar reduction in flops)

The main advantage of quaternions is composing rotations.

The analogy would be to 2D rotations with complex numbers. When you multiply two complex numbers, you're composing the rotations (in the 2D case: it ends up just adding the arg, or the phase angle). Likewise, multiplying two quaternions lets you compose the 3D rotations. This is a lot more efficient than multiplying two 3x3 matrices.

For intuition, quaternions are closely related to the axis-angle representation which is the same as the Lie algebra so(3).

As for acting on vectors, you can just think of different rotation parameterizations as implementations of the same abstract Rotation trait. A Rotation acts on vectors, composes, etc, in exactly the same way regardless if the underlying implementation is a matrix, a quaternion, euler vector, euler angles, gibbs vector, etc.

One of the coolest things I learned in university was that you can just put rotation matrices inside the state of a Kalman filter if you just override the + operator to take a matrix and a change in vector space and the — operator to take two matrices. This allows you to estimate rotations without the fear of gimbal lock.

https://openslam-org.github.io/MTK

For simple angles you can use an element of SO(2) as your state. But if the algorithm calls for a difference of two states, the result would have to be an element of a real vector space (in this case a single number between -pi and pi). Analogue to this an addition of a state manifold and a vector also results in a state manifold. You can switch between those representations using the exp and log functions. In the papers these „new“ operators are described as boxplus and boxminus. And the elegant thing of this approach is that in most cases you just would replace plus with boxplus and minus with boxminus and it just works(tm)

Yeah implementing the "+" operator on the tangent space is pretty common not just for Kalman filters but also nonlinear optimization in general. The Ceres library supports LocalParameterization which does that.

Yeah LocalParametrization is exactly what they describe. The main contribution is a pretty slick mathematical model. When you use it everything just seems to work out.

Also ceres is pretty great. LocalParametrization in combination with jets works like a treat for EKFs. You never have to worry about calculating jacobians on manifolds ever again.

It’s pretty hard to explain in short. But estimation algorithms don’t work on angles for example. For an algorithm an angle is just a number. So the difference between 359 and 1 degree looks pretty big. It is also not possible to just put a rotation matrix inside a state, because the algorithm has no concept of orthonormality. But if you define the change of a state in terms of vectors and the state itself as a rotation matrix you can just use off the shelf estimation algorithms like least squares and Kalman filters. You just need to redefine the plus and minus operators. Which is nothing you need a project for. You can just do it yourself.

Awesome blog.

Then I clicked on the author's profile, read another of their posts, and saw:

> I first encountered programming sometime around 2010, when I was 9 years old.

I was 13 in 2010, and trying to stuff secondary school maths and science into my head. Every time I see a cool computer graphics-related blog post, it's by someone younger and way more talented than me.

I feel extremely inadequate; my inner voice immediately went to something I can't really publicly say in case I get a dozen help-lines in response.

It's not over for you, I self-taught some niche skills in the last couple years that some bigco apparently wants to pay for. I'm 35ish. Of course, I don't have any advice about how to do this, because I did it by really liking the thing and doing it all the time. But presumably some sort of structured practice would work too.

This is great. (Not only the cow part.).

What I additionally appreciated is the fact that the methods end up computing a standard rotation matrix. If you need to rotate a million vectors, all you need to do is to apply these interesting calculations once, and then run your highly optimized matrix multiplication pipeline.

I had written a tangentially related comment a few days ago. Its related to rotation in a plane. In that case things become a lot simpler, especially if your programming environment support operations on complex numbers as a first class citizen.

https://news.ycombinator.com/item?id=40333541

The key intuition is that additions are translations and multiplications are rotations. So two model average translation one can use arithmetic mean, for average rotation the geometric mean.

What's your context for averaging multiple rotations?

Averages are something that you can do with sums, where order of composition doesn't matter.

Rotations don't commute, so the concept of average as we understand it doesn't normally apply to them.

If you're holding a phone: rotate the screen away from you 180° (so you'd be seeing the back), then clockwise 90° relative to the ground (around vertical axis).

Your camera will be facing your left.

Now hold the phone normally, and do the same rotations in the opposite order. Your camera will now be facing your right.

Question: where should the camera be facing in the "average" of these two rotations?

There's no single answer — it depends on what you need from the average.

The most widely-used concept of "average" is surely a point that minimizes the sum of squared distances to each of a list of input points. Distances are canonically defined in the space of rotations, and so are averages in this sense (not always uniquely).

Commutativity has nothing to do with this; do not confuse the typical formula for averaging with the reason for doing so! Of course, there are other senses of "average" (which generally do continue to apply to the space of rotations as well).

The application for this given by GreedCtrl's reference is to spline interpolation. Another is in robotics, when combining multiple noisy observations of the orientation of an object.

>Distances are canonically defined in the space of rotations

I am sorry, but this is simply not true.

There are many distance/metric definitions that are applicable to the space of rotations, and the best choice of metric is defined by the application, which is why I asked that question.

None of them is any more "canonical" than the other. See [1][2][3] for an introduction and comparison.

[1] https://www.cs.cmu.edu/~cga/dynopt/readings/Rmetric.pdf

[2] https://rotations.berkeley.edu/geodesics-of-the-rotation-gro...

[3] https://lucaballan.altervista.org/pdfs/IK.pdf

One will find there at least four "canonical" distance definitions, and applications ranging from optometry to computer graphics to IK (which is what you referred to).

>The most widely-used concept of "average" is surely a point that minimizes the sum of squared distances...Of course, there are other senses of "average" (which generally do continue to apply to the space of rotations as well).

I know this, not all of the readers may.

What I don't know is what context the parent is coming from.

Maybe all they need is interpolating between two camera positions - which is a much simpler problem than the paper they found (and what we're discussing) is addressing.

>The application for this given by GreedCtrl's reference is to spline interpolation.

It is not clear that the reference that they have found is actually the best for their application - they only said it was something they found, and that the article we're discussing looks "simpler" for their level of mathematics.

The article we are discussing does not provide any means of "averaging" any more than two rotations, though, which motivated my question.

The bi-invariant metric as pointed out by chombier is what I have in mind. I agree that a non-canonical metric may be the right one for some applications, but those are the exceptional ones. The bi-invariant metric (which has a simple, geometric meaning given by Euler's rotation theorem) is the right starting point for thinking about distances in this space.

(Your reference [2] supports this point of view: read "simplest" as "canonical". Your reference [1] claims five distinct bi-invariant metrics, but this is wrong. The argument given is that any metric related to a bi-invariant metric by a "positive continuous strictly increasing function" is itself bi-invariant, which is not true.)

> >Distances are canonically defined in the space of rotations

> I am sorry, but this is simply not true.

It is true, there is a canonical choice given by the bi-invariant Riemannian metric on compact Lie groups, such as rotations (in this case the shortest angle between rotations)

Whether or not you want this metric in practice is another problem, of course.

> The article we are discussing does not provide any means of "averaging" any more than two rotations,

The Karcher/Fréchet mean described in the original article does average more than two rotations

> Rotations don't commute, so the concept of average as we understand it doesn't normally apply to them.

It does apply, considering that the Euclidean mean minimizes the sum of squared lengths to samples there's a fairly obvious generalization to Riemannian manifolds using geodesic distance.

There are some reasonable assumptions to be made on the manifold and/or sample distribution to ensure convergence but in practice it works pretty well.

I'm making a video game, and I want to map a joystick position to a rotation (of a sword). I have a set of keyframes for certain joystick positions, and I want to interpolate between the three nearest ones to create a continuous function from joystick position to sword rotation.

OK. What kind of positions can the joystick take?

I.e., is it a point on a hemisphere (equivalently, X/Y controller)?

And what kind of rotations are you talking about for the sword? Are they constrained in any way? I'm assuming it's attached to something - is that point moving too?

Finally why do you need to interpolate? Why three? What are you interpolating between?

I.e. why not simply convert the input of the joystick to a rotation - you're literally rotating a stick when you're moving a joystick in the first place.

You don't need interpolation to convert a rotation (of the joystick relative to its base) to the rotation of the sword unless you're doing something that I don't understand.

If you describe what you're doing in detail, I think we might save you a deep dive into deeper math than necessary for this (..or dive into even deeper math :).

If you start with the phone upright and rotate the screen away from you by turning the phone around the vertical axis, then both rotations are around the same axis and of course they do commute.

My guess is that romwell is holding the phone flat, so that the rotation away from you is about a horizontal axis; then you should experience the noncommutativity.

(The resulting orientations are 180 degrees apart, which indeed makes it difficult to say that any one orientation should be the unique average. But this is due to the geodesic structure of the space of rotations, not the noncommutative product that happened to construct these points, see above.)

Draw a line on the screen from top to bottom. You interpreted "rotate away" as turning it around this axis, which is the same axis you used for the 90-degree clockwise turn. You end up with the screen right-side-up, just facing away from you. It's the same thing I intuitively did.

Now draw a line on the screen from left to right. "Rotate away" by turning the phone around this new axis - so the top of the phone moves away from you, and the bottom of the phone moves closer to you. You end up with the screen upside-down, and also facing away from you.

Can confirm. Followed instructions, got the camera facing the same in both cases (left, with phone upside down). I interpreted the 180 part as flipping the phone around the horizontal axis.

The 180 first part was right. Make sure you are rotating 90 degrees in the same direction both times from your frame of reference (clockwise looking from the top).

It took me a long time to realize that, in the field of math, people come up with abstractions in the same way you might think of an abstraction in software engineering.

I was confused as a kid. Why create imaginary numbers? What is the point of matrixes?

It wasn't until later I realized that these representations were designed. If you make up this thing called an imaginary number, these calculations become easier. If you write linear equations like a matrix, it's a lot easier to reason about than writing out the full thing.

It sounds obvious but nobody ever told me this!

Just me?

This continues to be true when you study more abstract math. e.g. "group" is an interface, and "a group" is any type with an operation that implements the 3 required properties/methods. Likewise with vector space, ring, metric space, category, etc. Math is full of interfaces as a "design pattern".

(Interfaces in math are more like type classes and not inheritance in programming. e.g. the same set/type can be a group in more than one way.)

This talks about one of my pet peeves with a lot of 3D software: they don't use Arcball[1] interface for rotation.

Autodesk products do (3DSmax, Maya), but Blender and OpenSCAD don't. And when I worked at Roblox, I couldn't convince the PM to let me implement it, because the users got by with whatever was there — and if they wanted more, they could use something else (...which they did, and Roblox stock reflected that in 2022).

Arcball is based on quaternions[2] (using exp for interpolation).

It's the only interface with these properties:

1. Any rotation can be accomplished with a single drag

2. No gimbal locks

3. Tracing a closed loop with a sequence of drags results in coming back to where you started

Once you formalize this, you can prove it mathematically (ultimately, this comes from unit quaternions giving a 2-cover of SO(3) in the same way unit complex numbers giving exactly the rotations of a circle).

For anyone interested, here is my reference implementation of Quaternions / Arcball that you can play with on the page:

https://romankogan.net/math/arcball_js/index.html

The code is very-commented Java (Processing library, running in JavaScript by the magic of ProcessingJS).

It's one way you can feel the fancy math.

Your hands and body will understand quaternions before your brain does.

If you ever work on 3D software, please use this, and thanks in advance <3

[1] Arcball: http://courses.cms.caltech.edu/cs171/assignments/hw3/hw3-not...

[2] Quaternions for rotations: https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotati...

[3] Arcball in Processing: https://romankogan.net/math/arcball_js/index.html

why does it feel so much worse when you don't drag from the middle and instead from the e.g right side? It feels stuck and sometimes it snaps. If the point from which rotations are calculated changes then I think it should somehow be visually obvious what this means for the rotation and be shown on the screen

Oooh, yeah. This really bothered me too! Its behaviour is really intuitive when you strat with a drag from the middle, but even starting on the edge of the pink cube really makes it freak out.

This essentially forces you to start from the middle, but then you don't seem to be able to do any rotation around the axis pointing out of the screen. To do that rotation your best bet is to click and drag around an inscribed circle - but that in turn means you can't rotate around any other axis in the same movement without running into freak-out behaviour. And anything between those two extremes doesn't really behave intuitively.

Well, FWIW I increased the radius of the gizmo to 1/3 from 1/4, and made it visible.

I'm still not handling drags outside the canvas area - those will cause jumps, but everything else should feel better.

Thanks for the feedback!

Imagine you're spinning a globe with your finger. Obviously not much would happen if you poked outside the globe.

And while your finger is on the globe, you can't point at the sky (unless you finger is, well, stuck to the globe while you're rotating it - pointing to the sky would let go of the globe).

This is what you're doing here, and absolutely, I should add a visual representation for the globe and where you're poking it when you hold the mouse button down.

You feel "stuck" when the cursor attempts to leave the globe. And moving the cursor around that boundary is how you rotate around the axis facing poking from the screen.

This behavior can be changed/customized to an extent, I picked what felt good to me here.

> Imagine you're spinning a globe with your finger. Obviously not much would happen if you poked outside the globe.

And this is why arcball is not a good solution. I'm not putting my finger on a global. I'm spinning a world, terrain, airplane, etc. There's no "ball" so arc"ball" doesn't fit.

It gets worse. What if there is a ball but it's say 4 cm across on my screen. When I rotate with "arcball" am I touch that ball or some imaginary ball much larger than the visible ball. This problem exists all over depending on the thing I'm trying to rotate, arcball won't match.

> I'm spinning a world [...] there's no "ball"

I don't think I understand you, unless you insist that the Earth is flat :)

>It gets worse. What if there is a ball but it's say 4 cm across on my screen.

Then you zoom in / adjust gizmo size / etc. Surely any interface can be made unusable if you shrink it down far enough.

The arcball gizmo is usually visible; I was too lazy to paint a nice one in a bare-bones demo.

See, I don't think it's nice because it's not a proper arcball implementation.

This is what my pet peeve is about!

* There's gimbal lock at 180 degrees: you can only do some rotations with a single drag, and others require several

* Orient the gun so that the barrel is point straight at you, with handle down. Rotate it so that it continues pointing at you, but the handle points up. (It's.. not trivial with this interface).

* Drag the mouse horizontally along the top edge of the screen. I have no idea what's going on with the rotation.

Oh, and here's the thing that only arcball can do:

* Reset the rotation. Mark a rectangle on your screen (or put a pop-up window, e.g. the on-screen keyboard in Windows, over the demo). Start at a corner, and drag the mouse along the four edges, returning to the same position. The gun should return to the same initial orientation.

* Now do the same, but release the mouse at the corners of the rectangle - so you do four drags along the same path. Where's the gun facing now?

>any chance to make this work on mobile?

I wrote it a while ago with ProcessingJS, whose support for touch events was lacking.

Will try to fix it and/or rewrite in P5JS, stay tuned :)

> unit quaternions also don’t form a vector space

Unit quaternions are the lie group. If you want something you can add willy-nilly, you want the lie algebra of all quaternions, which represent rotational velocities, just as axis-angle represents rotational velocities.

Comparing unit quaternions to axis-angle is a bit of a category error - it would be more appropriate to compare unit quaternions to rotation matrices, and all quaternions to axis-angle.

> One advantage of using quaternions is how easy the exponential map is to compute—if you don’t need a rotation matrix, it’s a good option.

You rarely need a rotation matrix at all when using quaternions. As mentioned in the article, you can compute rotations using `pqp^-1`.

I think the easiest way to understand quaternions is just to read about geometric algebra. It took hundreds of years to invent quaternions, but once you understand geometric algebra (which is shockingly simple), you can invent quaternions in just a few minutes. I found this article to be a good intro several years ago: https://crypto.stanford.edu/~blynn/haskell/ga.html

I think the exponential map is still the most robust way to think about rotations, since it gives you tools to deal with the Lie group of SO(3) in the most straightforward way (switching between coordinates, dealing with differentiation and tangent spaces, etc.)

Even when going through all the formulations with geometric algebra, you still land on using rotors (isomorphic to quaternions) and motors (isomorphic to dual quaternions) to represent SO(3) / SE(3) spaces - but I think for that purpose 3x3 rotation matrices / 4x4 transformation matrices with exponential maps are still much more useful. (Quaternions do have an advantage that it stores up less space and also faster to multiply with each other - but when you want to transform points with it then matrices are still faster. In overall in terms of efficiency it really depends on the situation.)

Oops, yes I did.

For curious readers: `2` is a quaternion, but `exp(2)` doesn't generate a rotation, it generates 7.389... just like normal exponentiation on reals. So if you have a real term in your lie algebra, it will generate things that scale the object as well as rotating it.

So instead you want only `ai + bj + ck`, as `exp(ai + bj + ck)` will always have magnitude 1

（评论） (comments)

（评论）
(comments)