In 2012, three researchers from the University of Toronto submitted an image classifier to a contest and opened a much larger argument. Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton's model scored a 15.3 per cent top-five test error in the ILSVRC-2012 competition, while the second-best entry managed 26.2 per cent. The gap, set out in their original NeurIPS paper, wasn't a marginal win that needed a persuasive press release. It was a result other computer-vision researchers had to explain.

What holds my attention isn't just the winning number. The paper says the network trained for five to six days on two NVIDIA GTX 580 GPUs, each with 3GB of memory. Graphics cards were no longer merely drawing a synthetic world for a screen; here they were helping a network sort the visible world into categories. That shift now seems inevitable because it succeeded, but few changes in computing are inevitable before someone makes them run.

AlexNet didn't arrive through a single brilliant trick. It joined a deep convolutional network to a sufficiently large labelled image set and a GPU implementation quick enough to train it at useful scale. The authors also used non-saturating units and dropout, choices that mattered, but the awkward physical detail remains decisive: the experiment had to fit across two graphics cards. Software theory met memory limits, heat and a week of waiting.

IEEE Spectrum's history of AlexNet gets this balance right. ImageNet, CUDA and neural networks had each been developing before the contest, without producing this particular shock on their own. Krizhevsky had already written GPU convolutional-network code and extended it for ImageNet and multiple GPUs. A dataset, an unfashionable method and hardware built for another market met at exactly the useful moment.

There is a version of AI history that makes progress sound like an orderly succession of better ideas. I don't trust it. Plenty of ideas sit in papers for years because they are too expensive, too slow or too fiddly to win an ordinary comparison. Then a machine built for videogames makes a previously impractical calculation bearable, and the field suddenly discovers that its taste has changed. After a result like 15.3 against 26.2, skepticism starts to look less like rigour and more like a backlog.

It is tempting to treat this as the direct origin story of every generative model now taking up server halls and headlines. That would flatten too much: an image classifier is not a language model, and today's systems contain many later inventions. Still, AlexNet exposed a habit that remains with us. We talk about intelligence as though it lives solely in algorithms, while the winning idea often depends on which computation can be bought, powered and repeated enough times.

Two GTX 580 cards are modest hardware by current standards. Their place in this story is useful precisely because they look modest now. A research field can pivot on a benchmark result, but it can also pivot on an engineer finding that hardware from one culture of computing is suddenly good enough to remake another.

Sources: