How Do Machines ‘Grok’ Data?

For all their brilliance, artificial neural networks remain as inscrutable as ever. As these networks get bigger, their abilities explode, but deciphering their inner workings has always been near impossible. Researchers are constantly looking for any insights they can find into these models.

A few years ago, they discovered a new one.

In January 2022, researchers at OpenAI, the company behind ChatGPT, reported that these systems, when accidentally allowed to munch on data for much longer than usual, developed unique ways of solving problems. Typically, when engineers build machine learning models out of neural networks — composed of units of computation called artificial neurons — they tend to stop the training at a certain point, called the overfitting regime. This is when the network basically begins memorizing its training data and often won’t generalize to new, unseen information. But when the OpenAI team accidentally trained a small network way beyond this point, it seemed to develop an understanding of the problem that went beyond simply memorizing — it could suddenly ace any test data.

The researchers named the phenomenon “grokking,” a term coined by science-fiction author Robert A. Heinlein to mean understanding something “so thoroughly that the observer becomes a part of the process being observed.” The overtrained neural network, designed to perform certain mathematical operations, had learned the general structure of the numbers and internalized the result. It had grokked and become the solution.

To read more, click here.