Marco Giancotti,
Cover image:
Image by vackground.com, Unsplash
A part of me still hasn't recovered from learning that some people believe there is no such thing as an untranslatable word. I've written about why I disagree before, but that explanation didn't satisfy me completely. There was a stronger argument to be made, I thought, but I couldn't put it into words. Now I remember, though: you need to see language as (a little bit) like math. Call me crazy, but I think that language translation is like a change of basis in linear algebra.
Me making weird connections like this might simply be an occupational hazard. Both my PhD research and my first job had to do with controlling the position and orientation of spacecraft and rocks in space, which means that I spent years juggling vectors, matrix multiplications, and reference frames almost daily. Still, I think it is simple enough to be understood by anyone, so hear me out.
(You might remember linear algebra from high school. It's that subfield where you write about stuff like this:
If the mere sight of the above is like a punch in the face for you, don't worry. I'm not going to math you to death in what follows. I will only remind you of a tiny basic part of it that I think relates to languages.)
During those same mathematical days, I was also learning Japanese. The language fascinated me for many reasons, like its beautiful dissociation between written and spoken words and its many unique quirks, but I was also struck early on by something a bit more meta: how hard it is to translate things to and from it.
These two concurrent interests made it hard for me not to see a connection. For almost a decade now, I've held it in a corner of my mind without telling anyone, perhaps because I thought it would be seen as too outrageous, but hey! Now LLMs are popular and they literally handle words and concepts as vectors with linear algebra operations, so maybe my analogy isn't that out there. Let me give it a try.
The Case of Vectors
Contrary to popular belief, a vector is not "a list of numbers" but an abstract object with no predefined way to express it.

But a vector is not very useful in this abstract state. We need a way to write it down so that we can manipulate it with algebra and communicate it to others. We do this by choosing a frame of reference—or, more accurately, a set of vectors to use as "basis" to quantify all others.

This is where numbers come into play. By projecting the vectors against the basis vectors, you can assign them lists of numbers (two numbers each, in this two-dimensional case):
Those numbers are the coordinates of the vectors in that basis. For instance, the vector can be read as "half as long as the vector along the direction of , and times as long as the vector along the direction of ."
The important thing I want to convey here (and the last mathy thing to remember) is that if you were to choose a different basis, the same vector would have different coordinates.

In this new basis, the two vectors are written as:
In short, change the basis (e.g. from " & " to " & ") and the same abstract object (vector) will be represented with different numbers. You can do all the same operations with them, like changing the vector's length and direction, calculating the angle between the vectors, and so on, and you'll get the same results in both bases, because they are operations on the same objects. The choice of basis is merely cosmetic from this point of view.
The Case of Language
Now let's turn to language and see the parallel.
Contrary to popular belief, a concept is not a word or a group of words but an abstract object in your mind.

But a concept is not very useful in this abstract state. We need a way to write it down so that we can manipulate it with grammar and communicate it to others. We do this by choosing a language that is shared with the receiver of the message.
For the concept in my head right now, and choosing the English language to represent it, you get this:
"Going to Tokyo"
This is where words come into play. By projecting the abstract idea onto the standard English vocabulary and grammar, you can assign it a list of words:
Those words are the equivalent of the coordinates of the vectors in linear algebra. The way I just wrote it is not accurate, though, because it makes it look as if English only had 3 dimensions (3 words), just like the 2-element vectors were 2-dimensional. English actually has hundreds of thousands of words, so we would need a vector that long to fully represent it, with blank spaces for all the words that aren't involved in this case.
The English language offers its speakers many other words, like camel, frolic, and or, but in this case none of them was necessary to express the idea that was in my mind, so they remain empty () and absent from my utterance of "going to Tokyo".

Of course, similar to vectors with bases, you can express the same concept in a different language. If I chose Italian as the "basis", you'd get:
"Andare a Tokio"
The concept is the same, but now its representation (its "word coordinates") is different:
You can, in theory, say those two different sequences of sounds in those two languages, and obtain the same effect in the mind of the receiver. The only requirement is that all people involved know both languages.
Cosmetics Matter
Alright, the parallel seems plain enough. Sadly, expressing all language in that vector-like format would use up a lot of ink and is not really practical for everyday use. (LLMs kind of achieve that feat, but in a more convoluted and definitely not human-readable way.)
Why do I think it is interesting, then? Because "untranslatable" words exist.
The words that people sometimes call "untranslatable" are terms that have a clear and widely understood meaning in one language, but no equivalent in another one. I gave some examples from Japanese before, and the web is awash with blog posts enumerating curious words like that from many other languages. The key takeaway is that what only takes a single word in Language 1 can only be expressed with some accuracy in Language 2 if you use many words to explain them in all of their facets.
I wrote that the choice of basis in linear algebra is "cosmetic", because the result of an operation on vectors does not change depending on the basis. But that is only the ideal, mathematical way to look at it. We humans are not so ideal. We are weak and fallible and sometimes we even stink.
Which vector representation do you like better between these two?
or
Both represent the same vector under different bases, because I chose a base that has the vector as one of the basis vectors. Here is what the two cases look like graphically:

In theory, both representations are exactly equivalent to each other. A computer wouldn't have any preference. But the second representation, the clean one with a simple one and zero, is not only easier to remember and grasp for a person, but it is also easier to handle mathematically. Things simplify easily with it, everything that is multiplied by zero becomes zero, and similarly for the multiplications by one. The calculations proceed faster and with fewer errors.
This means that, at least for our feeble organic minds, the choice of basis does matter.
The same holds for language. A concept that took several words in English...
in Japanese is a single word: joukyou (上京)
Looking at it in the other direction, a native word in one language is like one of its "basis vectors"—simple and straightforward—but the underlying concept might need to be "spread out" onto several words when applying a different language (i.e. "basis").
Arguably, having a compact word makes it easier not only to express the concept, but also to think about it. This is the Sapir-Whorf debate, but I'll leave that for another day. Instead, I want to show you what this means for untranslatability, because there are people who vehemently deny their existence.
Losing in Translation
I think there are two ways in which this analogy makes it quite obvious that "practical untranslatability" is a thing.
First, communication is costly, and we don't have infinite time and space to put in all the words that are needed. Even if the word could in theory be explained in the other language, usually it's not worth it.
For example, the Japanese term mono no aware (物の哀れ) could be rather accurately translated as
a gentle, poignant sadness or pathos felt in response to the transient nature of all things, a deep awareness of their impermanence that evokes a subtle, bittersweet sorrow and a profound, quiet empathy for their passing.
In a dictionary, perhaps that's okay. But a dictionary is not translation. Translation is about conveying the meaning of full texts, and you can't do that kind of multi-line expansion for every word.
And so the translator simplifies it to the gist, e.g. as the pathos of all things. This conveys the majority of the meaning and it is usually enough, but it does lose a lot in the process.
This is exactly analogous to the data analysis technique called Principal Component Analysis (PCA), where one simplifies a vector by picking only its largest coordinates and disregarding all others. This means choosing a subset of the basis vectors that are more closely aligned with the vectors of interest, and ignoring the existence of the other basis vectors, effectively reducing the dimensions of the data. Translators use a version of PCA every time they (begrudgingly) accept to leave the finer nuances of a concept unsaid for the sake of space.
But, even assuming you do have time to explicate, using many more words increases the risk of introducing unintended nuances that come bundled with those extra words. This is like doing PCA but selecting inappropriate basis vectors, which introduce lots of small errors in the calculation of the vector's coordinates. Is "sorrow" too emotionally charged in that long translation of mono no aware? Does the use of "passing" unnecessarily remind English speakers of people dying?
You eventually hit diminishing returns: using more words confuses the reader instead of clarifying things further.

The second problem with translation is precision. Even if you can use many words, and even if none of those are misleading, words are still finite in number. Unlike ideal numerical coordinates, which can take any value down to the finest detail, you only have a small selection of words to convey a given bit of meaning.
Suppose you realize that the word "subtle" is not accurate enough in the "...evokes a subtle, bittersweet" part of the translation above. Maybe you do want to convey subtlety, but feel that the simple word "subtle" feels too strong in this case. You might try to soften it with an adverb, like "somewhat subtle" or "slightly subtle", but there aren't many other options out there. What if none of them is perfect for your current needs?
In this sense, language is "quantized": you can jump from one level of intensity of some meaning to the next, but you can't express anything in between.
This is a problem shared by all computers. Unlike ideal numbers, the numbers in a processor necessarily have a finite number of decimal places. So when you want to work with the ideal vector
the storage limitations of your computer might mean that you have to content yourself with this truncated version:
(Modern computers can usually handle many more digits than that, but you get the idea.)
Graphically, it looks something like this:

(Incidentally, AI engineers sometimes intentionally quantize large language models to make them take up less memory, but this tends to make them dumber, because they lose nuance.)
With language, like with computers, we're forced to "lower the resolution" of our concepts whenever we put them into words. This happens twice in translated text: once when the author first writes down their thoughts, then once more when the translator transports that already-degraded concept into a different language with different "quantization steps".
Between the Lines
I hope these rather unorthodox leaps between linguistics and mathematics helped make it almost obvious that some words and ideas are untranslatable in practice. I also hope you don't take the analogy too seriously, because it won't go much further than this. You might be tempted to begin talking about "word matrices" and whatnot, but I doubt it would help clarify things. That kind of advanced linear algebra with concepts might work for LLMs, but it doesn't seem to map to anything intelligible for a human being, not to mention make you any wiser.
Besides, language has something going for it that doesn't seem to have a mathematical equivalent: what does it mean to "read between the lines"?
It's hard to pin down, but I think it has to do with the structure and context of the words communicating something that is not contained in any of the words themselves. Perhaps it is the clever use of those "negligible coordinates"—the fringe nuances—of words scattered around the text to produce a collective effect on the reader.
A good translator might not be able to exactly translate a given word or sentence, but they might be able to "write between the lines" so that it doesn't matter very much. ●
Cover image:
Image by vackground.com, Unsplash