Efficient Compression in Color Naming and its Evolution

The world’s languages vary widely in the way they encode colors into words. For example, English has separate terms for “black,” “blue” and “green,” but other languages have only a single term for these colors. Previous studies suggested that color terms across languages evolve in discrete stages, starting with two basic terms (dark vs. light), and in each stage a new term is added in a relatively fixed order (first red, then yellow and green, then blue, followed by other basic color terms). However, this proposal does not explain why this order should occur and how languages can transition from one stage to another.

A new study suggests an answer to these open questions. Noga Zaslavsky and Prof. Naftali Tishby, from the Edmond and Lily Safra Center for Brain Sciences at the Hebrew University, together with their collaborators Prof. Terry Regier of the University of California, Berkeley, and Prof. Charles Kemp of Carnegie Mellon University, developed a computational model that explains why languages categorize colors the way they do. Based on this model, the researchers were able to simulate how color categories may evolve through a continuous trajectory. Click on the video clip (https://youtu.be/hqWkhD8921U) to see the evolutionary process predicted by the model.

The key idea behind this model is that languages efficiently compress colors into words by optimizing the tradeoff between the complexity and accuracy of the language, known as the Information Bottleneck principle. This general principle is not specific to color, and has been applied widely in machine learning. This suggests that it could potentially explain word meanings in other semantic domains beyond color, and may help to design artificial intelligence systems with human-like semantics.