Hurricane Lee wasn’t bothering anyone in early September, churning far out at sea somewhere between Africa and North America. A wall of high pressure stood in its westward path, poised to deflect the storm away from Florida and in a grand arc northeast. Heading where, exactly? It was 10 days out from the earliest possible landfall—eons in weather forecasting—but meteorologists at the European Centre for Medium-Range Weather Forecasts, or ECMWF, were watching closely. The tiniest uncertainties could make the difference between a rainy day in Scotland or serious trouble for the US Northeast.
Typically, weather forecasters would rely on models of atmospheric physics to make that call. This time, they had another tool: a new generation of AI-based weather models developed by chipmaker Nvidia, Chinese tech giant Huawei, and Google’s AI unit DeepMind. For Lee, the three tech-company models predicted a path that would strike somewhere between Rhode Island and Nova Scotia—forecasts that generally agreed with the official, physics-based outlook. Land-ho, somewhere. The devil, of course, was in the details.
Weather forecasters describe the arrival of AI models with language that seems out of place in their forward-looking profession: “Sudden.” “Unexpected.” “It seemed to just come out of nowhere,” says Mark DeMaria, an atmospheric scientist at Colorado State University who recently retired from leading a division of the US National Hurricane Center. When he started a project this year with the US National Oceanographic and Atmospheric Administration to validate Nvidia’s FourCastNet model against real-time storm data, he was a “skeptic” of the new models, he says. “I thought there was no chance that it could work.”
DeMaria has since changed his stance. In the end, Hurricane Lee struck land on the edge of the range of the AI predictions, reaching Nova Scotia on September 16. Even in an active storm season—just over halfway through, there have been 16 named Atlantic storms—it’s too early to make any final judgments. But so far the performance of AI models has been comparable to conventional models, sometimes better on tropical storm tracking. And the AI models do it fast, spitting out predictions on laptops within minutes, while traditional forecasts take hours of supercomputing time.
Conventional weather models are made up of equations describing the complex dynamics of Earth’s atmosphere. Feed in real-time observations of factors like temperature, wind, and humidity and you receive back predictions of what will happen next. Over the decades, they have gotten more accurate as scientists improve their understanding of atmospheric physics and the data they gather grows more voluminous.
Fundamentally, meteorologists are trying to tame the physics of chaos. In the 1960s, meteorologist and mathematician Edward Lorenz laid the foundations of chaos theory by noticing that small uncertainties in weather data could result in wildly different forecasts—like the proverbial butterfly whose wing flap causes a tornado. He estimated that the state of the atmosphere can be predicted at most by two weeks ahead. Anyone who has watched the approach of a distant hurricane or studied the weekly outlook ahead of an outdoor wedding knows that forecasting still falls far short of that theoretical limit.
Some hope that AI can eventually push predictions closer to that limit. The new weather models don’t have any physics built in. They work in a way similar to the text-generation technology at the heart of ChatGPT. In that case, the machine-learning algorithms are not told rules of grammar or syntax, but they become able to mimic them after digesting enough data to learn patterns of usage. Similarly, the new weather forecasting models learn the patterns from decades of physical atmospheric data collected in an ECMWF data set called ERA5.
This did not look guaranteed to work, says Matthew Chantry, machine-learning coordinator at the ECWMF, who is spending this storm season evaluating their performance. The algorithms underpinning ChatGPT were trained with trillions of words, largely scraped from the internet, but there’s no sample so comprehensive for Earth’s atmosphere. Hurricanes in particular make up a tiny fraction of the available training data. That the predicted storm tracks for Lee and others have been so good means that the algorithms picked up some fundamentals of atmospheric physics.
That process comes with drawbacks. Because machine-learning algorithms latch onto the most common patterns, they tend to downplay the intensity of outliers like extreme heat waves or tropical storms, Chantry says. And there are gaps in what these models can predict. They aren’t designed to estimate rainfall, for example, which unfolds at a finer resolution than the global weather data used to train them.
Shakir Mohamed, a research director at DeepMind, says that rain and extreme events—the weather events people are arguably most interested in—represent the “most challenging cases,” for AI weather models. There are other methods of predicting precipitation, including a localized radar-based approach developed by DeepMind known as NowCasting, but integrating the two is challenging. More fine-grained data, expected in the next version of the ECMWF data set used to train forecasting models, may help AI models start predicting rain. Researchers are also exploring how to tweak the models to be more willing to predict out-of-the-ordinary events.
One comparison that AI models win hands down is efficiency. Meteorologists and disaster management officials increasingly want what are known as probabilistic forecasts of events like hurricanes—a rundown of a range of possible scenarios and how likely they are to occur. So forecasters produce ensemble models that plot different outcomes. In the case of tropical systems they’re known as spaghetti models, because they show skeins of multiple possible storm tracks. But calculating each additional noodle can take hours.
AI models, by contrast, can produce multiple projections in minutes. “If you have a model that's already trained, our FourCastNet model runs in 40 seconds on a junky old graphics card,” says DeMaria. “So you could do like a whole gigantic ensemble that would not be feasible with physically based models.”
Unfortunately, true ensemble forecasts lay out two forms of uncertainty: both in the initial weather observations and in the model itself. AI systems can’t do the latter. This weakness springs from the “black box” problem common to many machine-learning systems. When you’re trying to predict the weather, knowing how much to doubt your model is crucial. Lingxi Xie, a senior AI researcher at Huawei, says adding explanations to AI forecasts is the number one request from meteorologists. “We cannot provide a satisfying answer,” he says.
Despite those limitations, Xie and others are hopeful AI models can make accurate forecasts more widely available. But the prospect of putting AI-powered meteorology in the hands of anyone is still a ways off, he says. It takes good weather observations to make predictions of any kind—from satellites, buoys, planes, sensors—funneled through the likes of NOAA and the ECMWF, which process the data into machine-readable data sets. AI researchers, startups, and nations with limited data-gathering capacity are hungry to see what they can do with that raw data, but sensitivities abound, including intellectual property and national security.
Those large forecasting centers are expected to continue testing the models before the “experimental” labels are removed. Meteorologists are inherently conservative, DeMaria says, given the lives and property on the line, and physics-based models aren't about to disappear. But he thinks that improvements mean it could only be another hurricane season or two before AI is playing some kind of role in official forecasts. “They certainly see the potential,” he says.