wooded path on the campus of Carnegie Mellon University in Pittsburgh. The hulking vehicle, named Navlab, wasn’t notable for its beauty or speed, but for its brain: It was an experimental version of an autonomous vehicle, guided by four powerful computers (for their time) in the cargo area.

At first, the engineers behind Navlab tried to control the vehicle with a navigation algorithm, but like many previous researchers they found it difficult to account for the huge range of driving conditions with a single set of instructions. So they tried again, this time using an approach to artificial intelligence called machine learning: The van would teach itself how to drive. A graduate student named Dean Pomerleau constructed an artificial neural network, made from small logic-processing units meant to work like brain cells, and set out to train it with photographs of roads under different conditions. But taking enough photographs to cover the huge range of potential driving situations was too difficult for the small team, so Pomerleau generated 1,200 synthetic road images on a computer and used those to train the system. The self-taught machine drove as well as anything else the researchers came up with.

Navlab didn’t directly lead to any major breakthroughs in autonomous driving, but the project did show the power of synthetic data to train AI systems. As machine learning leapt forward in subsequent decades, it developed an insatiable appetite for training data. But data is hard to get: It can be expensive, private or in short supply. As a result, researchers are increasingly turning to synthetic data to supplement or even replace natural data for training neural networks. “Machine learning has long been struggling with the data problem,” said Sergey Nikolenko, the head of AI at Synthesis AI, a company that generates synthetic data to help customers make better AI models. “Synthetic data is one of the most promising ways to solve that problem.”

To read more, click here.