This article was originally published in The Interline’s AI Report 2024. To read other opinion pieces, exclusive editorials, and detailed profiles and interviews with key vendors, download the full AI Report 2024 completely free of charge and ungated.


Key Takeaways:

  • After the initial explosion of apps and services built on top of transformer-based text and image models, the first wave of pure-AI solutions is facing commercial realities, and the balancing act between margin and market fit is becoming harder to strike.
  • There are viable AI-native and AI-supported solutions today, but it remains unclear how far the current architectures will scale, what properties might emerge – and whether a different approach might eventually be needed.
  • Despite a huge influx of investment, ongoing overheads and uncertainty are already starting to separate the AI applications that have a pathway to the future from the more experimental and potentially short-lived services.

In late 2023, about a year from the initial launch of ChatGPT, investors and analysts started to draw a line between generative AI as a novelty and AI as a serious business proposition. And that process of demarcation was not particularly kind to the first crop of solutions.

Writing last September, the team behind Sequoia Capital – who you would expect to be more optimistic about the general commercial viability of AI, given that they have backed close to 70 visible AI and machine learning companies and 24 more that are in “stealth mode” – had this to say:

…early signs of success don’t change the reality that a lot of AI companies simply do not have product-market fit or a sustainable competitive advantage, and that the overall ebullience of the AI ecosystem is unsustainable.

Really, though, this should be a surprise to no-one. And it should certainly surprise nobody in a role where AI-native solutions (or solutions that suddenly have new AI capabilities) are being pitched to them constantly. There are already far more AI products than the market can possibly sustain. There are more launching every week. And a shake-out that will leave many of them on the floor is inevitable.

From the perspective of a venture capital firm, though, what looks at first glance like doublethink is also pretty predictable. If you believe there’s a huge amount of money in the AI banana stand in the long run, it makes sense to have as many hands in the pile as possible. Even if you recognise that most AI plays will fail, having a stake in the one or two that really take flight will counterbalance your losses and you, the party with the money, the connections, and the foresight will emerge with a net positive position.

And make no mistake, the consensus is that AI is going to generate a huge amount of money, and both financial backers and tech giants are moving accordingly. At the time I’m writing this, OpenAI is building to the release of the low-latency, natively multi-modal voice mode of GPT-4o, quietly sitting on Sora, its video generation model, and potentially getting ready to unveil a partnership with Apple. Microsoft’s Build conference has recently made it clear that AI is the future of that company’s software and hardware businesses. And just last week Google flipped the switch to make its generative AI “overviews” the default mode for web search, upending the internet in the process.

As big and bold as these moves might be, I don’t think they meaningfully change the way we should be thinking about what AI is for, what it means to use it, and what it means to package and sell it.

Neither do I think that any of these new products and partnerships is going to do anything to reverse or slow the overall trajectory of AI, or reframe Sequoia’s aggregate assessment of the first salvo of generative AI applications.

So everyone still believes that AI is going to change the world, even though concrete evidence of that transformation is still in short supply, and there have been a lot of halting attempts towards it.

The line that the investors at Sequoia have drawn, then, is between “Act One” and “Act Two” of generative AI, which I think is a useful framework for understanding where we stand.

The former was characterised by novel, fun, experimental use cases for AI; the latter will be defined by deeper applications where generative models are perhaps part of a wider solution that brings in context, data, reinforcement and more from other business systems, and potentially leans on different specialised sub-models as part of what’s called a “Mixture of Experts” approach to both training and inference.

Now, I don’t believe that these analysts – or anyone else for that matter – is expecting that the move towards these deeper applications of AI will lead to lighter, more unique, innovative use cases going away. If you page forward in this document to the technology vendor profiles and executive interviews that make up the next section, you’ll see that both coexist now and will continue to do so – even if the next step in AI’s journey is set to be more serious than the one that went before.

And without wishing to steal the march on our first AI market analysis (which you’ll find at the end of this report, and which is focused on interpreting the information we found and were provided with this year) I want to look at what I see as the two tracks that will need to be navigated as part of that journey: one focused on commercial and market forces, and the other on architecture and capability.

Both will be winding, uncertain roads, so join me as I do some divination – thinking about where AI goes from here, and which commercial and technical barriers might make act two a harder sell than the first.

Commercial and market track

We’ve written plenty, here and elsewhere, about how AI quickly went from being primarily a scientific, research, and academic concern to exploding as a consumer and enterprise software category. That change from narrow, domain-specific models that can outcompete human capability in very focused areas, to the big, unwieldy, inscrutable general purpose models of today is the defining AI story of our time.

But implicit in that transition from AI that plays Go and AI that predicts the three-dimensional structure of proteins, to AI can that sing, write poems, code, and make art is another shift: from judging AI purely on academic terms, to judging it by the same criteria as other personal and professional software.

Beyond research orgs and peer-reviewed publications, price, retention, acquisition, active user counts, IPO potential and more – these are the forces that are now steering the direction of AI.

Whatever your feelings about the rise of VCs as the controlling force in technology, there’s little doubt that the relatively closed Silicon Valley tech investment circle has heavily influenced technology development and adoption – and these firms are now able to set the rules for what constitutes success.

And working from that rulebook, the existing batch of generative AI applications have set relatively poor benchmarks.

Looping back to Sequoia’s market analysis again, for example, we can see that consumer-facing AI applications do not fare well in the fabled “stickiness,” with one-month retention for ChatGPT mobile users (i.e. the share of total downloaders who continue to use the app a month later) sitting at 56% compared to TikTok, Instagram, and YouTube, which range from 69% to 85%. And ChatGPT is also the outlier here; the median one-month retention figure for top, AI-first companies is just over 40%.

This is not brilliant news from a product usage point of view, and it’s underlined by comparing active users between AI apps and services, and the other platforms that compete for attention on people’s smartphones. According to Sequoia again, only 14% of ChatGPT’s heavily-promoted userbase is considered active, compared to between 64% and 85% for YouTube, Instagram, and WhatsApp.

So by the Valley’s favourite metrics, generative AI tools – or at least the ones people keep on their phones – are seemingly fun to play with, but not compelling enough to come back to regularly.

Now, I think you can mount a fair argument that generative AI tools like image creators and chatbots should not really be judged this way. There are occasions, every day, for people to watch videos, send messages, and post photos, but comparatively few reasons to chat to an AI (outside of AI services that specifically offer everyday virtual companions) or have it create visuals for you day in and day out. And unlike social platforms, which prioritise engagement and advertising above all else, the ideal customer for a generative service is actually one that pays the monthly fee but doesn’t actually query the model all that often.

Because this is where the heaviest commercial and market concern sits: the idea that the economics of generative AI are proving hard to balance. And the idea that for AI to change the world everyone has to use it, but having everyone using it might make it cost and resource-prohibitive to run.

Some parts of the AI cost / usage puzzle are solvable with mundane subscription licensing maths. If GPT3, for instance, cost X to train and commercialise – taking into account talent, compute, time, and marketing – then that model needs to be run for Y months, at a calibrated monthly price, across a userbase of Z size to recoup its costs and start making money.

This is the way profitability and growth are calculated for most SaaS and subscription applications, since – beyond hardware, energy, and the development of new features – they are static deployables with predictable ongoing overheads.

This is less applicable to generative AI, since the cost of inference is both significant and spiky. Every query I make to ChatGPT or Midjourney incurs an inference cost that’s measured in terms of cloud compute and fractions of a cent, and that cost varies depending on the efficiency of the model, the type of query made, and the different modalities it might span. If I ask ChatGPT to answer a text query and then follow up with an image generation request, or a web search (both part of the multi-modal setup that model now has) the cost of inference will be slightly different each time, making predicting and influencing user behaviour an important consideration.

And while API calls to OpenAI’s models are quite affordable, as an analysis from Andreesen Horowitz (again, hardly an unbiased actor in this whole affair) suggests, with margins for OpenAI of between 30% and 90% depending on the composition of 1,000 tokens served through GPT-3, OpenAI’s own direct, paying customers, still face usage limits because, beyond these limits, the cost to permit a user to keep interacting with the model no longer makes sense.

This, remember, is also only confining ourselves to one already-deployed model. Whereas, since GPT-3 was deployed, OpenAI has had to train GPT-3.5, GPT-4, GPT-4o (the ‘o’ stands for ‘omni’), and is now in the process of training GPT-5.

Each of the point releases of GPT-3 and GPT-4 brought with it a mixture of efficiency improvements and new capabilities, making their amortised costs difficult to parse, but according to Kevin Scott, CTO of Microsoft, the compute requirement and supercomputer infrastructure built for training GPT-5 constitute a giant “whale” compared to the “small shark” of GPT-3 – making it clear that pushing the capability window of generative AI forward is going to be a progressively more expensive task every time it needs to be done. And it needs to be done often.

Kevin Scott, chief technology officer and EVP of AI, Microsoft, on stage May 21 at Microsoft Build 2024 in Redmond, Washington. (Photo by Dan DeLong for Microsoft)
Kevin Scott, chief technology officer and EVP of AI, Microsoft, on stage with Sal Khan, founder and CEO, Khan Academy, May 21 at Microsoft Build 2024 in Redmond, Washington. (Photo by Dan DeLong for Microsoft)

So in brief, the cost of making AI better could dwarf all the revenue that frontier model companies like OpenAI, Anthropic, Mistral, and Google are making from enterprise and consumer customers. Which puts this entire software category into a familiar but uncomfortable market bracket: big ideas that don’t make money today, but that have the potential to make avalanches of it in the future.

But there’s also more to consider when we look at the cost side of the generative AI balance sheet. Companies like OpenAI are, right this minute, signing content licensing deals with online publishers like Vox Media and Axel Springer and The Atlantic, image services like Shutterstock, as well as with social platforms and communities like Stack Overflow and Reddit. And the only reason OpenAI isn’t also pursuing similar deals with platforms like YouTube and Instagram is that Google and Meta, respectively, want to train their own models on that user-generated content.

Individually, these licensing and technology agreements do not represent huge costs (Reddit signed a similar deal with Google for AI training purposes for “just” $60 million USD) but they are also definitely not free – and the volume of them will rise as major publishers and online businesses recognise that their competitors are being paid for content that the giant AI companies are largely scraping from them for free anyway.

Crucially, these content licensing agreements – which I fully expect to see the first fashion brand or retailer signing in the near future – are also more of a stopgap than a solution. With large AI companies being sued by The New York Times and Getty Images, and at least one of those cases working its way to court, there is still a largely unanswered question hanging over the basic legality of generative AI. And the assumption is that the publishers, creators, and communities signing content agreements with OpenAI and others are doing so with the understanding that the precedent – if any – set in those copyright cases will either provide them with a much stronger legal footing to litigate or renegotiate in a couple of years’ time, or a good collective bargaining position to renew their deals.

In essence, then, an ongoing stream of new content is not something AI companies can draw a line under as a static cost or a one-time expense.

And there is recent analysis predicting that the market for AI training data may be worth $30 billion USD annually. This is a neat new revenue stream for platforms that can sustain the user backlash, or that have fallen out of favour with users but that still hold treasure troves of photos or text, but also yet another cost for the creators of the frontier models – and the apps and services built directly on top of them.

(This is all obviously less of a concern for the start-ups and scale-ups that are building new applications on top of those foundations, but it remains important to be aware of, since you needn’t actually run ChatGPT yourself to be affected by its fortunes.)

All of this, too, will be further influenced (directly and indirectly) by how the cultural backlash around AI develops. Our contributor, Aasia D’Vaz-Sterling wrote elsewhere in this report that AI is, in its simplest form, a technological revolution that’s dragging a cultural evolution with it. And inherent in that realisation is the understanding that society is incredibly unpredictable.

We need only look at how long it has taken society to reckon with the harms and the benefits of social media to see how slowly and uncertainly etiquette, legislation, and culture develop. And I don’t think it’s hyperbolic to say that the change being ushered in by AI is potentially on another scale entirely to even the transformation brought about by user generated content and the algorithmic feed.

The timeline for sorting out what AI really means for work, relationships, creativity, fashion and other industries is going to be long – and while the world finds some equilibrium there, we will continue to hear (rightly, in my opinion) from people decrying the fact that’s what’s being sold in the interim owes a large amount of its value to the fact that it was trained, without permission, on essentially the aggregate creative output of the human race.

I also think it’s easy, with a business hat on, to assume that this vocal crowd will go away, and that everyone will simply decide that training AI on public data was necessary to the building of something bigger. This, in my mind, is the wrong way for fashion organisations or other businesses to be thinking: we only need to look at Twitter (now X) losing a reported half of its users to see that social factors can heavily influence the success of software and services that are otherwise unchanged in terms of capabilities.

And as Slack and the overall re-tooling of enterprise software to look and behave more like consumer software has demonstrated, the trends that shape the apps and services we use in our personal lives go on to shape the ones we’ll engage with willingly at work.

To attempt to tie a bow around a very complicated set of circumstances, then, AI is currently walking a very taut commercial and market tightrope. And no matter what new or existing models wind up being capable of over the next 12 to 24 months, ethics, legality, and pure money in versus money out are likely to be the more important considerations.

Now, do I, personally, think the AI industry is going to be able to successfully overcome all this? Probably. Individual companies are going to have extremely variable outcomes, simply because so much unavoidable uncertainty is baked into the essential fibres of the AI business, but on balance I believe this category is already simply too big to fail. Some of these market forces will, in effect, be steamrolled over because of scale.

But I also believe that if the future of AI is to be bigger and bolder, it’s also going to need to be leaner, more sensitive, and much more aware of the practicalities of delivering what the market wants.

Architectural and capability track

Near the start of this year, I found myself in a bit of a philosophical debate on LinkedIn. This is not normally my thing, but this was a very fundamental, first-principles disagreement on what constitutes intelligence.

It was sparked by a post from the OpenAI team that quietly followed in the wake of the unveiling of Sora, the still-unreleased video generation model that appears to represent a marked leap forward in the ability of generative AI systems to create believable-looking video outputs. In that post, some of OpenAI’s researchers argued that video generation with this level of fidelity – allowed to scale – could become, in their own words, a “world simulator”.

The post walks a weird line between technical jargon and over-simplification, but the basic contention is this: if a video generation model can consistently put out clips that look close enough to the real world to pass human inspection then they could, theoretically, be used to simulate parts of the whole of the real world. Because the outputs of Sora exhibit consistency with 3D perception and spatial representation as we know it, and because they can also showcase object permanence and interaction, the feeling amongst those researchers was that the model might have some emergent or inherent understanding of the world.

Despite it being overwhelmed by the clamour surrounding Sora itself, people still jumped loudly onto this idea as part of a growing sentiment that these large models, in their own lanes, aren’t just a potential pathway to general machine intelligence – they’re evidence that it’s already developing. There were, for example, senior and experienced folks arguing that because Sora could output believable-looking liquids, and it could show objects floating on those liquids with behaviour consistent with reality, that meant that the model had an innate ability to simulate water.

This is something I disagree with. I also believe it represents a dangerous kind of magical thinking where AI is going to become more capable and powerful just because. And this has broader implications than just video generation, since it cuts to the heart of one of the biggest ideas in AI: that the applications and the architectures we have today are a series of stepping stones that will, inevitably, take us towards a more complete and more universal kind of AI, or the fabled artificial general intelligence, or AGI.

Here’s my own take on this quandary. I am not a physicist or an astronomer, but I did spend a bit of time at school learning about the solar system. If you asked me to draw you the planets, in the right order and the right orbits, I might be able to do it correctly.

Importantly: I might be able to draw you the solar system correctly more than once. Maybe I’d get it right nine times. The question is: after that ninth time, would you trust me to plot a trajectory for a space shuttle? Because, after all, I don’t really understand orbital mechanics – I’ve just seen a lot of drawings and renders of the solar system, and I’ve learned how to recreate them, or something like them. And as a consequence, on the tenth time I could get it wrong and not even know it.

This might be a glib analogy, but it says something serious about the core of how AI progresses from here. To my mind at least, being able to represent something accurately is not the same as understanding it. And we can apply the same principle to language, images, video and other modalities.

In fact people much smarter than I am have done exactly that. Writing back in 2020, Emily Bender of the University of Washington Department of Linguistics, in the US, and Alexander Koller, from the Department of Language Science and Technology at Saarland University, in Germany concocted the “intelligent octopus” analogy for language models – which has gained renewed attention, for obvious reasons, in 2023 and 2024.

In that paper, the authors compare language models to a smart octopus that has one tentacle attached to an undersea cable carrying inbound text communication to a country, and one carrying outbound. The octopus is able to detect the electrical signals moving the data (which represents words) through those cables, but having never left the ocean and, you know, being an octopus, it has no frame of reference for what those electrical signals actually mean, or if they carry meaning at all.

Nevertheless, the octopus keeps observing the signals, and over time it builds up a statistical model that allows it to predict to a very high degree of accuracy what signals follow others.

In practice, let’s say one day an undersea cable gets cut (it doesn’t matter which) and the person on the other end of it continues to converse with the octopus without knowing it. This leads to a scenario where a human being is carrying on a conversation that they firmly believe to be with another human being, even though the entity doing the replying is not human, has no innate understanding of language, and is consequently destined to, at some point, return text that no intelligent human would send.

This, in a nutshell, is the difference between an AI model that can output convincing text, images, or video, and that can be extremely useful as a tool in these contexts, and a genuine machine intelligence. And, to be clear, we do not have any examples of the latter. Our brightest minds also do not agree whether it’s even possible to create one.

As Bender and Koller put it:

“Without access to a means of [hypothesising] and testing the underlying communicative intents, reconstructing them from the forms alone is hopeless, and [The octopus’s] language use will eventually diverge from the language use of an agent who can ground their language in coherent communicative intents”

As exciting as generative AI is today, we should take care to think about language and image models the same way. Simply because they can speak, draw, and write does not mean they grasp language, physics, or anything else.

As this also has deep practical implications, instead of being a fun philosophical exercise. This gulf between output and understanding is why transformer and diffusion transformer models make mistakes. What we commonly call “hallucinations” are not ephemeral dreams or any kind of “ghost in the machine”. They are simple artefacts of a system that does a fantastic job at approximating intelligence but that does not actually possess it.

A more accurate name for hallucinations would be “incorrect inference”. A model starts from one place and ends in another that, to an intelligent human observer or a much simpler database populated with facts, is unambiguously wrong. But to the model itself, running that inference, everything looks fine and logical because it has predicted the next most likely token without having an external reference frame to refer to in order to assess whether that token was “correct”.

This same issue is also the root cause of the concern that pervasive adoption of AI in creative work has a chance of eroding or eliminating newness and invention. A model can synthesise novel outputs, but people – especially in the arts – hold deep reservations over whether it can create new works that are not based on permutations of its pre-existing training data.

For a lot of people in the AI community, though, these are cans that we can kick pretty far down the road. And I would actually tend to agree that answering the question of “how do we build intelligent systems, and what happens when we do?” is a tomorrow problem.

But I think we have a today problem when we consider the alternative: what happens if we don’t or can’t build intelligent systems? Because proceeding on the assumption that AI will continue to get progressively, or even exponentially, better on an endless curve towards genuine intelligence is destined to lead to AI initiatives that fall far short of their vision.

So when we think about what we, as fashion professionals, want to achieve with AI, we also need to be asking what will happen if current architectures end up becoming a long-term plateau (long enough, at least, that those commercial and market forces I’ve already mentioned begin to drag the sector down) or, worse, a dead-end?

I personally do not believe we need genuine and general artificial intelligence (we have come a long way already with non-intelligent systems) to accomplish a lot of what we want to achieve in fashion. And, to be clear, I do believe that generative AI represents a significant leap forward in the way we interact with computers and what those computers are capable of. I am not someone who sees the faults with current AI models as making the entire pursuit pointless.

I do, though, believe that we as an industry need to reckon with the downsides of what we have – capability wise – instead of assuming that the future will automatically solve them for us.

And for all the sci-fi talk about AGI, the real challenge of the current architectures remains inaccuracy and unreliability. The headline-grabbing stories about Google’s “AI Overviews” recommending that users eat rocks, or put glue on their pizzas and petrol in their pasta, are funny but they belie the fundamental mistrust that people still have when it comes to interacting with AI models.

For fashion’s purposes, it’s easy to conceive of some low-stakes scenarios where these kinds of hallucinations only manage to create a PR problem for brands, such as inept style advice given by an eCommerce chatbot. But it’s equally easy to imagine how hallucinations in sustainability data, for example, can create real liability.

I don’t believe this to be an intractable problem. We only need to look at how quickly generative image models moved past “person with seven fingers” and stilled those easy counter-arguments. But it is one that, under the current AI architecture umbrella, is never going to be entirely solved.

Nevertheless, as you move into the next section of this report you’re also going to find compelling answers to this and a lot of other questions – and you’re going to see measurable progress from companies that have a clear and compelling ambition for what they believe AI is capable of doing in the here and now. And that includes both genuine departures from existing enterprise platforms, and natural evolutions of the same.

This is, ultimately, what I find the most compelling about the state of AI in fashion today. For all the commercial and technical friction, it really does feel as though we are currently in the “selling books on the internet” stage of AI’s evolution. And just like the Cambrian explosion of new ideas that eventually came from that – Uber and Netflix and Slack and so on – I do genuinely believe we are going to see transformative apps and services in the future that we can’t conceive of today, without needing to completely rearchitect the foundations we have to achieve them, or to bank on emergent properties that might never come.

It’s a tired analogy, but back in the mid-to-late 90s, nobody really knew what the internet was going to be for. Then a lot of the people who thought they did wound up folding. And the eventual winners were the disrupters who came in and changed things, and the smart companies that played the middle of the field and saw the opportunity to add entirely new elements to their existing offers.

After spending seven years charting the progress of AI (see our 2017 AI Report for a blast from the past!) I don’t see the AI era being any different. Just as I’ve spent several pages now analysing it all from the same commercial and technical vantage point as we would take on any enterprise software segment, the time is now upon us for software developers and customers to treat it the same way: as power and potential to be translated, through effort, into practicality.

It’s an exciting time to be in technology for fashion. Just make sure you keep a keen eye on where it’s headed.