The Seven-Year AI-tch – Our Analysis Revisited

Ben Hanson

2 years ago

Artificial intelligence might be top of mind for everyone, personally and professionally, right now, but it’s far from a new phenomenon.

The new crop of generative AI tools, and the transformer architecture behind them, are built on decades’ worth of domain-specific research and development. For every person confronting deep questions about the nature of intelligence today, there have been ethicists and philosophers grappling with them for decades.

And every enterprise wrestling with decisions around what can be automated, what really constitutes creativity, what’s happening with copyright, and a whole lot more, is joining a long-running conga line of companies that have been weighing up the balance of human / machine labour.

So, ahead of The Interline’s first deep-dive report on AI this spring, I dipped into the archives to pull out some AI research I wrote (as part of what was then called The WhichPLM Annual Review) in 2017 – a full seven years ago. I wanted to see what I thought, what we predicted, what the experts I interviewed said, where those experts are today… and how much of it stands up to scrutiny the best part of a decade later.

First up: a long-form piece I wrote, researched, and interviewed for that asks what we’re really talking about when we talk about intelligence.

I’ll be adding my thoughts and annotating as we go, to see what I got right in 2017, what I got wrong, and where circumstances might have completely changed.

Ask someone to name a science fiction concept, and the odds are good that they will come back with either domestic and industrial robots that are indistinguishable from humans, or a benevolent (or malevolent) supercomputer that comes to rule the world. These themes have recurred in future-set tales for more than a hundred years. They are popular because, as with all good speculative storytelling, they ask us to consider what our place, as thinking, breathing humans, is likely to be in a world where either our physical or mental capacities have been outmatched by machines.

The upshot of having such a body of sci-fi dedicated to these subjects is that some of the smartest people in the world have spent a great deal of time considering their implications. Some putting pen to paper; others toiling away in laboratories and research centres to bring the ideas to life; each spurring the other on with fresh ideas.

Science fiction has cast a very long shadow over the tech conversation over the last few years in particular. In 2022 you couldn’t walk a virtual block without someone shouting at you about Neal Stephenson’s Snow Crash as the proto-manual for the metaverse. What’s interesting right now, though, is the distance between how otherworldly the idea of AI seems in fiction, and how mundane it’s starting to feel in reality.

By which I don’t mean that AI is boring, but that its effects are creeping up on us so incrementally – as part of the tools we use every day – that we don’t tend to notice the impact until it’s already happened. Which is precisely why people talk about it being too late to put the genie back in the bottle where generative image and video models are concerned; by the time you have a consumer-facing application with millions of users, it’s too late to ask whether that app should have been made in the first place.

Given how well the workings of our bodies are understood, compared to the almost complete opacity that shrouds the way we think, you might expect science to have made the most progress on those physical robots. In practice, the opposite is true.

There is a name for this subversion of expectations: Moravec’s Paradox. Outlined by roboticist and futurist Hans Moravec in the 1980s, the paradox is expressed like this: “it is comparatively easy to make computers exhibit adult-level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility.”

In brief, what Moravec meant is that our mental processes and actions – the results of a consciousness, individualism, and autonomism that generations of scientists and faith scholars have been unable to adequately define – have proved far easier to replicate than the coordinated motions of our bones, tendons and muscles, which entry-level biology has long since explained away.

Nvidia founder and CEO Jensen Huang speaks at the Nvidia GTC conference in San Jose.
Photograph – Justin Sullivan/Getty Images

The mobility side of Moravec’s paradox is still a challenge. Even the best contemporary attempts at having robots mimic human face and body movements come off as unsettling, and companies have, by and large, paused – if not abandoned – their attempts to make robots that look and move the way we do. While people might make good models for thinking and problem-solving, it transpires that there are far more practical examples to follow for picking things up and moving them, navigating through warzones, or conducting fine-grain assembly tasks.

This look at robotics would have been largely accurate until very recently. For the most part, the last seven years have been characterised by a lot of progress in non-humanoid robotics (Boston Dynamics’ Spot being the easiest example) and some very halting steps in the direction of mimicking humans in robotic form. But the race towards robots that look like us – and that are powered by dedicated AI chips and models – is seemingly back on, with NVIDIA just this week showcasing consolidated “superchips” and AI models custom-designed to control and govern humanoid robots.

We have, however, made considerable progress on having machines pass common (and even complex) intelligence tests, as well as on introducing techniques that allow for computers to perceive and recognise the world around them. These advances all fall under the broader umbrella of Artificial Intelligence (AI). And that’s an umbrella you have more than likely seen pop up in the news a lot lately.

“Japanese company replaces office workers with artificial intelligence,” reads a typical AI headline from The Guardian at the beginning of 2017. In isolation, it may sound alarmist or unbelievable, but it is accurate: Fukoku Mutual Life Insurance in Japan made 30 experienced staff redundant because their jobs, calculating policy payouts, could be done cheaper and better by an A.I. Fukoku expects this to increase productivity by 30%, and to become a profitable investment in less than two years.

Maybe it’s because automation and the resulting mass layoffs are happening on a much grander and more pressing scale in manufacturing, but I find it remarkable how unremarkable these kinds of stories have become. They hit the headlines and fade quickly, inspiring fear in other insurance workers, perhaps, but not registering on the tectonic scale you might expect given that AI programs are now putting even white collars out of work. To put this into context, Moravec’s Paradox talks about the likelihood of replicating the intelligence of a one-year-old; I have a baby in the house as I write this, and I’m not expecting him to be performing any insurance calculations or underwriting policies for a while yet. Seems as though we’ve moved rather rapidly from crawl to walk to run there, doesn’t it?

Obviously I no longer have a baby in the house (he’ll be seven this summer!) but the rest of this holds very true. From a cultural perspective, we’ve undergone a very clear case of “boiling the frog” – becoming steadily inured to the idea that AI was going to replace ‘some’ jobs, but taking comfort in the idea that AI wasn’t coming for ‘our’ jobs. And while we’re not exactly seeing mass layoffs that are solely attributable to AI just yet, I think everyone would agree that the conversation has shifted into a different gear.

It’s remarkable, though, to think that it seemed logical, seven years ago, that the knowledge jobs we automated would be in the realms that computers have historically specialised in: maths, science, statistics and data analysis. Instead, those roles are in greater demand than ever, and where AI has affected those sectors, it’s done so by uncovering novel classes of potential antibiotics that now need human testing. With generative models being remarkably poor at maths, the roles we’ve wound up actually trying to automate (or at least augment) are now in the creative sectors – in music, art, video, and writing.

Perhaps that aforementioned wealth of sci-fi literature has made these kinds of things feel mundane. Or perhaps it’s because we all already interact with AI of one form or another almost every day – whether we realise it or not.

Self-driving cars are, of course, the most obvious example. Hugely complex arrays of hardware sensors and interpretation and analysis algorithms, autonomous vehicles are a sci-fi trope writ large, and will soon be hitting streets near you if they haven’t already. Less obvious but no less valid as applications of AI are chatbots (that initial gateway of interaction you sometimes encounter on a company’s support or sales channels before reaching a human advisor) and visual recognition platforms like Google Photos, which uses algorithms to tag your photographs with their locations, subjects, settings, and a host of other information so you can later search for a sunset or a smile.

Each of these examples, however, has something in common. All, to my mind and the minds of many actual experts, fall short of the definition of intelligence. In fact, these and other examples like them are only really categorised as AI because, for the 99% of us who are not intelligence programmers or philosophers of mind, a catchy acronym is needed to capture the potential of a raft of different technologies.

This was a very charitable timeline for self-driving cars! Even now, we’re only moving towards level three autonomy for passenger vehicles in very narrow circumstances. Chatbots and computer vision for photo work, though? Those have definitely come a long way.

“When I’m talking to a business audience, I use AI as an umbrella term, and under that umbrella are specific applications of AI like cognitive services, machine learning, deep learning, neural networks and so on,” said ShiSh Shridhar, [who was at the time of this interview a Director of Business Development, Data, Analytics and IoT for Microsoft, and who now leads Retail at Microsoft For Startups]. “For a business audience, that’s a confusing set of terms, and to complicate things even further, a lot of people use them interchangeably. As a result, I stick to the top-level term rather than having a brand or retailer worrying about the small nuances that separate individual components like neural networks and deep learning. I realise it’s a big umbrella, but it’s an appropriate one.”

While ShiSh’s advice is sound, we are, however, going to have to talk about these components individually at least to begin with, as foundations for the visible outcomes of AI. It is only by understanding the essential building blocks of machine intelligence that we can assess where and why it exceeds, equals, or falls short of human intelligence – and why, in many cases, a machine not being truly smart does not necessarily matter in the open market.

This is a very timely talking point, with a lot of predictions from AI figureheads as to when artificial general intelligence (AGI) will be realised. In practice, how will we distinguish that from the systems we have today? As we’ll see next, though, I think calling the current class of generative AI models “machine learning” is doing them a disservice, since the companies behind them are actively trying to build general intelligence.

“I think machine learning is the preferable term for what we’re talking about today,” said Julian McAuley [who was an Assistant Professor at University College San Diego in 2017, and who was elevated to a full Professor there in 2021], whose research into behavioural models and visual recognition caught my attention as I was gathering material for these features. “The research I do would probably be called AI by the public, and even by our department here at the University, but I think that term is overused. We are not specifically trying to build up a general intelligence; we’re taking a data-driven approach to solving a specific problem – nothing more. When people shop online and are shown product recommendations or adverts that they feel are uncanny, they tend to assume that an AI recommendation system is truly intelligent and knows something intimate about them. The truth is less exciting: all the typical algorithm needs to know to do its job is that one person performed an action that was similar to another person’s action, like buying a product, and it can recommend them both similar other products as a consequence.”

McAuley’s example, as well as the others I referred to before – photo categorisation, self-driving vehicles, and chatbots – are good examples of the state of commercial AI today. They seem intelligent at first glance, but their smarts are actually confined to very small, very specific areas. To choose the right pre-canned responses, a chatbot need only concern itself with the customer’s immediate concerns. To recognise a mountain range in a photograph, Google Photos will look for indicators that a mountain is present, but it cannot be said to actually know what a mountain is, and neither does it need this additional context to achieve its purpose. Generally intelligent these things are not.

It would have been difficult to predict, in 2017, the course that commercial AI would take. A huge amount of effort and capital continues to be invested in domain-specific, hyper-focused, narrow AI models, but contrary to expectations the AI systems that people interact with the most today, in 2024, are broadly very capable at the expense of depth and accuracy. This is why, largely speaking, fashion has focused on making use of off-the-shelf, industry-agnostic tools in the last 18 months, and is only now turning to more industry-specific applications.

Let us shed a little more light on why this is the case, and why the distinction matters. To recognise that mountain, Google Photos (or an equivalent, although Google leads the pack in this area) uses neural networks, which are algorithms arranged in an approximation of the way we believe our brains work, designed to allow machines to look at things, and then to remember and learn from them. These are arranged in a way that permits what is called deep learning, which means that the results the end user receives are arrived at from the separate or sequential input of several – or hundreds – of layers of hyper-focused neural networks.

One such network layer might detect naturally-occurring edges and nothing else; one might look for the texture of a rock face; another might scan the top portion of every image for sky, and the portions below that for peaks and snow. At the top level, yet another neural network will take the composite output of the nets below it and conclude, with 90% certainty, that there is a mountain in the photograph at hand. If the algorithm has a library of other mountain images to compare this one to, perhaps it will also go away and try to match its peaks and valleys to other photographs, before inferring that this picture was taken in Yosemite.

Make no mistake, this is an exciting, road-tested application of a class of technologies with massive potential to change the way we live our lives – particularly when we realise that its use need not be confined to static photographs. Recently, Microsoft unveiled a research project it calls “Seeing AI,” which uses the same principles as other photo recognition software, but applies them to both pre-recorded and real-time moving video as well as still images. In use, a deep learning solution evaluates the real world through a smartphone camera, and turns this hugely complex data feed into synthesised voice narration designed to help blind and partially-sighted people to navigate their environment, recognise people, read mail, and discover products.

Let me be unambiguous about this: I find the Seeing AI application incredibly impressive. I am only 35 years old, but this is one of several occasions where I have been reminded of just how much has been achieved in my lifetime. Using a handheld computer more powerful than the mainframe that used to occupy almost the entirety of my father’s business premises, and a camera sensor the size of a fingernail, a partially-sighted person can now have an AI in the cloud on the other side of the planet describe the world to them in real-time.

But while machine learning systems like these are terrific at the tasks they are assigned – looking for recognisable things in pictures or moving video – a significant amount of work has gone into achieving this level of recognition. As the name suggests, machine learning systems are not created with intelligence and understanding built in, but rather acquire it through the same processes that human beings do.

There’s a lot to unpick here, making this a good place to end part one of this look-back – even if it does land us in the middle of what was originally a full story. There are early hints in the “Seeing AI” use case that felt pioneering at the time of what would go on to become the “multi modality” of the new crop of large language models, and even the seeds of AI agents and agentic devices like the Rabbit R1, Humane AI Pin and so on. And it goes without saying that understanding and labelling images is a very different proposition to generating them.

In the next instalment, we’ll see how I – and all the experts I interviewed – focused heavily on the mechanics of machine learning and training data, without really considering the legality of it. And as we now know, this has quickly become one of the most pressing issues facing AI.