NewScientist (26 October 2024, paywall) has an interesting article on how mathematical languages mimic English word frequencies under the title of “The laws of physics appear to follow a mysterious mathematical pattern,” by Alex Wilkins:
A strange pattern running through the equations of physics may reveal something fundamental about the universe or could be a sign that human brains are biased to ignore more complex explanations of reality – or both.
This insight comes from a physicist’s version of Zipf’s law, an observation by linguists that the most common word in a language appears twice as often as the second most common word, three times as often as the third, and so on. In English, for example, the word “the” tends to make up around 7 per cent of any large text, with the next most frequent word, “of”, occurring around 3.5 per cent of the time. What’s more, it turns out that Zipf’s law appears to hold in other situations, such as income distribution or the population of cities.
Now, Andrei Constantin at the University of Oxford and his colleagues have found that a similar law applies to the symbols used to construct the laws of physics. They looked at three sources of equations: those used in The Feynman Lectures on Physics; a list of equations named after people found on Wikipedia; and a set of proposed equations describing the inflation of the early universe. By treating each symbol and mathematical operator in the equations as a word and ranking their frequency, they could analyse the equations in a similar way to Zipf’s law.
Although the article is a little short on details, and I don’t really have time to track it down and try to make sense of it. What interests me is the categorization of the symbols. “The,” according to Merriam-Webster, belongs in the categories of definite articles, adverb, and preposition (there’s also an entry called combining form, but that appears to be inappropriate to the subject at hand). “Of” is classified into preposition and auxiliary verb.
Do the mathematical language symbols sort into similar categories? This, too, is interesting:
“You might expect that this [distribution] would differ quite significantly between the three different sets of equations because they come from different places,” says team member Deaglan Bartlett at Sorbonne University in France, but to their surprise, that wasn’t the case. Instead, all three sets seemed to fit the same pattern. That wasn’t true when applying the same analysis to randomly generated mathematical expressions.
Mathematical languages are used for specialized communications concerning both theoretical and natural constructs, so from that perspective, and Zipf’s Law, this is actually unsurprising. But as a reasoning, or logic manipulation and application tool, it may be more surprising.
Whether there’s some mystical or mysterious aspect to it seems doubtful to me, but I’m always interested in being wrong about such things.