Statistics in the Eye of the Storm

October 28, 2020

in Bhogle the Mind

As I wake up rather early on a Saturday morning in Bangalore, India, the skies are overcast. Could we heading for another wet weekend?

It is that time of the year when the Bay of Bengal in the east tends to be excessively turbulent. There’s always a depression of some sort, and often these depressions go on to become violent cyclones.

In fact we had a particularly evil cyclone—Phailin—two weeks ago. Phailin hit India’s eastern coast on October 12, and it was as bad as they get with wind speeds of over 200 km per hour. Thankfully, and to India’s great collective relief, it wasn’t the devastating killer it was feared to be—a cyclone of similar intensity in 1999 had killed tens of thousands.

So what’s changed? The first change is that communication systems are now vastly improved. Everyone seems to have a phone and TV in India, so weather alerts and updates are now easier to transmit.

But there’s also a second big change: India’s weather prediction machinery, too, has improved significantly in the past decade.

Weather prediction is one of the hard problems to solve in analytics for a variety of reasons. First, this is a global phenomenon; everyone has read how the fluttering of a butterfly’s wings in Chile can cause a massive storm in Siberia. Second, the underlying equations are complex and take frightfully long to solve at the desired resolution, which is why parallel processing is essential—as Richardson so evocatively demonstrated with his idea of a forecast factory. Third, the prediction model is highly nonlinear and, over time, the ‘order’ in the prediction quickly degenerates to ‘chaos’. So predicting what’s likely to happen in the next five days is much harder than predicting what’s going to happen after five hours.

Classroom Image

So, how do we go about with weather prediction? In layman’s terms, we kick off the computation by entering all the input data, say at 6 a.m. in the morning, and asking the model to start ‘time-marching’ forward. So at 6.05 am, the model might obtain its prediction for 7 a.m. and proceed ahead in this vein. [This of course needs good number crunching capability; if the 7 a.m. prediction were to show up at 7:15 a.m. things would become ridiculously absurd.]

To avoid such ridiculous situations, we need parallel processing; we need to partition the globe based on latitudes. So, if we have four parallel processors, we could launch our 6 a.m. kick-off by asking processor 1 to handle North Pole to Tropic of Cancer, processor 2 to handle Tropic of Cancer to Equator, and so on.

This approach seems simple enough, but therein lies the rub! Because weather is a global phenomenon, the parallel processors have to continually compare notes “at the border” and rejig values before starting the next time march. This complicates things enormously because we have both a processing and communication problem and we must therefore also manage the necessary synchronization. It is also easy to see that the complexity will grow enormously if we have 16 or 256 parallel processors instead of just four.

If all this isn’t bad enough, we must further grapple with the nonlinearity of the prediction model. Over time we reach the stage where the ‘noise’ in the model completely dominates the ‘signal’. We are then groping in the dark unless we strengthen the signal in some way. One way to bolster the signal is to couple a ‘regional’ model to the ‘global’ model, although this coupling comes with its own formidable set of challenges.

Why then are our predictions getting better? First, much more data is now available via a variety of devices, and much more quickly. In particular, this has made local predictions almost completely reliable. After all, if you wire up your patient to every diagnostic device and monitor, his health scans and plots every minute, you know almost everything you need to know about the state of his health. Second, all this data enables us to come up with smarter heuristic models, especially to predict the likely trajectory of a cyclone and pinpoint the location where the storm is likely to break in. And, third, better visualization is helping us see trends faster with better communication and then enabling us to transmit our news faster.

As I end this post, I see that the bleak morning has metamorphosed into a sunny and glorious early winter day. We may not have a wet weekend after all and there may not be another fearsome cyclone threatening the east coast of India. But it is reassuring that we are now better equipped to cope up with the challenge because we have more data, more analytics and better models. To me there can’t be a more stirring reaffirmation of the need to celebrate this International Year of Statistics.