How a subfield of physics led to breakthroughs in AI – and from there to this year’s Nobel Prize

Two researchers whose work has led to the AI revolution won the 2024 Nobel Prize in physics. A materials physicist explains statistical mechanics, the physics field behind their discoveries.

Author: Veera Sundararaghavan on Oct 09, 2024
 
Source: The Conversation
Neural networks have their roots in statistical mechanics. BlackJack3D/iStock via Getty Images Plus

John J. Hopfield and Geoffrey E. Hinton received the Nobel Prize in physics on Oct. 8, 2024, for their research on machine learning algorithms and neural networks that help computers learn. Their work has been fundamental in developing neural network theories that underpin generative artificial intelligence.

A neural network is a computational model consisting of layers of interconnected neurons. Like the neurons in your brain, these neurons process and send along a piece of information. Each neural layer receives a piece of data, processes it and passes the result to the next layer. By the end of the sequence, the network has processed and refined the data into something more useful.

While it might seem surprising that Hopfield and Hinton received the physics prize for their contributions to neural networks, used in computer science, their work is deeply rooted in the principles of physics, particularly a subfield called statistical mechanics.

As a computational materials scientist, I was excited to see this area of research recognized with the prize. Hopfield and Hinton’s work has allowed my colleagues and me to study a process called generative learning for materials sciences, a method that is behind many popular technologies like ChatGPT.

What is statistical mechanics?

Statistical mechanics is a branch of physics that uses statistical methods to explain the behavior of systems made up of a large number of particles.

Instead of focusing on individual particles, researchers using statistical mechanics look at the collective behavior of many particles. Seeing how they all act together helps researchers understand the system’s large-scale macroscopic properties like temperature, pressure and magnetization.

For example, physicist Ernst Ising developed a statistical mechanics model for magnetism in the 1920s. Ising imagined magnetism as the collective behavior of atomic spins interacting with their neighbors.

In Ising’s model, there are higher and lower energy states for the system, and the material is more likely to exist in the lowest energy state.

One key idea in statistical mechanics is the Boltzmann distribution, which quantifies how likely a given state is. This distribution describes the probability of a system being in a particular state – like solid, liquid or gas – based on its energy and temperature.

Ising exactly predicted the phase transition of a magnet using the Boltzmann distribution. He figured out the temperature at which the material changed from being magnetic to nonmagnetic.

Phase changes happen at predictable temperatures. Ice melts to water at a specific temperature because the Boltzmann distribution predicts that when it gets warm, the water molecules are more likely to take on a disordered – or liquid – state.

Statistical mechanics tells researchers about the properties of a larger system, and how individual objects in that system act collectively.

In materials, atoms arrange themselves into specific crystal structures that use the lowest amount of energy. When it’s cold, water molecules freeze into ice crystals with low energy states.

Similarly, in biology, proteins fold into low energy shapes, which allow them to function as specific antibodies – like a lock and key – targeting a virus.

Neural networks and statistical mechanics

Fundamentally, all neural networks work on a similar principle – to minimize energy. Neural networks use this principle to solve computing problems.

For example, imagine an image made up of pixels where you only can see a part of the picture. Some pixels are visible, while the rest are hidden. To determine what the image is, you consider all possible ways the hidden pixels could fit together with the visible pieces. From there, you would choose from among what statistical mechanics would say are the most likely states out of all the possible options.

A diagram showing statistical mechanics on the left, with a graph showing three atomic structures, with the one at the lowest energy labeled the most stable. On the right is labeled neural networks, showing two photos of trees, one only half visible.
In statistical mechanics, researchers try to find the most stable physical structure of a material. Neural networks use the same principle to solve complex computing problems. Veera Sundararaghavan

Hopfield and Hinton developed a theory for neural networks based on the idea of statistical mechanics. Just like Ising before them, who modeled the collective interaction of atomic spins to solve the photo problem with a neural network, Hopfield and Hinton imagined collective interactions of pixels. They represented these pixels as neurons.

Just as in statistical physics, the energy of an image refers to how likely a particular configuration of pixels is. A Hopfield network would solve this problem by finding the lowest energy arrangements of hidden pixels.

However, unlike in statistical mechanics – where the energy is determined by known atomic interactions – neural networks learn these energies from data.

Hinton popularized the development of a technique called backpropagation. This technique helps the model figure out the interaction energies between these neurons, and this algorithm underpins much of modern AI learning.

The Boltzmann machine

Building upon Hopfield’s work, Hinton imagined another neural network, called the Boltzmann machine. It consists of visible neurons, which we can observe, and hidden neurons, which help the network learn complex patterns.

In a Boltzmann machine, you can determine the probability that the picture looks a certain way. To figure out this probability, you can sum up all the possible states the hidden pixels could be in. This gives you the total probability of the visible pixels being in a specific arrangement.

My group has worked on implementing Boltzmann machines in quantum computers for generative learning.

In generative learning, the network learns to generate new data samples that resemble the data the researchers fed the network to train it. For example, it might generate new images of handwritten numbers after being trained on similar images. The network can generate these by sampling from the learned probability distribution.

Generative learning underpins modern AI – it’s what allows the generation of AI art, videos and text.

Hopfield and Hinton have significantly influenced AI research by leveraging tools from statistical physics. Their work draws parallels between how nature determines the physical states of a material and how neural networks predict the likelihood of solutions to complex computer science problems.

Veera Sundararaghavan receives external funding for research unrelated to the content of this article.

Read These Next

Recommended for You