[Featured images provided by Photoroom.]

Hey, and welcome back to The Interline Podcast.

I don’t know if you’ve ever thought about it this way, because I don’t think it had quite crystallised in my head in this form either, but shopping online means buying a fiction. What we’re actually exchanging money for isn’t product, but images, text, and maybe video. And then we’re hoping that those things accurately depicted the thing that arrives on our doorsteps the following day. A lot of the time they do, sometimes they don’t.

Actually, depending on the market segment, there’s something missing in between the story and the actual article, a third or maybe even half of the time, hence the industry’s sky-high average return rates. And because those e-commerce images and attributes are all we have to buy from, demand for those is constant. And the bar is also extremely high for accuracy. If the image is the fiction of the product, how closely it corresponds to the real thing. It’s a very real predictor of customer satisfaction, return rates we talked about, markdowns and plenty more. 

For a long time now, this has been putting a real squeeze on image supply. How often have you looked at a product that comes in a few different SKUs, different colours, different materials, but the same platform or silhouette? And then you come away annoyed because you could see video of it in one colour, but none of the others. Or maybe you could see it from some angles, with some materials, and some with others, but you didn’t get total lateral coverage across every colourway. I know it’s annoyed me over the years. That happens because photographing and videoing final production samples is cost and time intensive. And because in a lot of cases, brands are wrangling different calendars, and that means that some samples are shot at different times and some are not shot at all. 

If this all sounds like I’m making the business case for replacing photography with AI, then you caught me. I mean, kind of. From a pure economics point of view, with a cold and calculating commercial hat on, being able to just not do any of the time or cost intensive stuff and to actually end up with potentially improved image and video coverage as a result, that seems like an easy sell. It seems like something that’s hard to argue against. But a fair amount of people don’t trust AI images, whether they’re labeled or not. And there’s also a lot of uncertainty over who should be creating them, or enhancing existing photos in some cases, and what standard they should be creating them to. 

If you buy a hoodie from a marketplace or a multi-brand retailer based on generative photography or videography, and then the physical thing doesn’t match the images, whose responsibility is that? Who’s liable? The retailer or the brand they stocked? The marketplace platform or the seller who sold through it? And as we fast-walk into the world that computational photography and image gen models build, where literally anyone can generate a believable looking photo of anything at little or no cost, what does that do to the fiction of buying something online? 

These are all big, heady society level questions, but they’re also the foundations for the roadmaps of a big cohort of generative photography products and applications. It’s a hotly contested space in technology and there seem to be a few notable players emerging there. Photoroom is one of those players. It launched in 2020, literally a couple of weeks before we launched The Interline. And while it was originally centred around homegrown models for quickly removing backgrounds from product shots and other kind of common Shopify seller workflows, the company got ahead of the generative AI explosion and they managed to ride that to a $500 million valuation by the start of 2024, by which point they were processing around five billion images a year. 

Today, PhotoRoom has clients like Netflix, DoorDash, Decathlon as well as a big stable of much smaller users who sell through Shopify, Amazon and other marketplaces. And it finds itself at or at least near the forefront of not just building a product to edit and generate photography, but also understanding where that product fits into the world and wrestling with what trust means in an e-commerce landscape that I think is about to get rocked by some pretty fundamental questions.

Photoroom recently produced a survey report called The State of Gen AI in Marketplaces 2026. So I brought on their CEO, Matt Rouif, to run down some of those findings, but also to build out yours’ and my understanding of a space that’s changing way faster than I think a lot of people realise. 

Let’s see what he had to say. 

NB. The transcript below has been lightly edited.


Matt Rouif, welcome to The Interline Podcast.

Thank you.

Pleasure to have you here. Thanks for joining. 

Now, we always start these shows with two things with every guest. We do a question about their day-to-day lives and we ask them to define something. So first up, what does your typical working day look like? So I talk to a pretty broad spectrum of people for these shows and some people have really well-defined roles and this is a very short question to answer. For CEOs, there’s typically a bit more to wrestle with and you’re the CEO of a company that’s now I think 100+ people. 

So, walk me through what your day-to-day is like.

Yeah, Photoroom is a bit more than 120 people. The peculiar thing about a CEO job is like your job is to get yourself out of your job. So the routine kind of changes every other month, every six months. I’m trying to organise my week into weekly times. I have a good and quite senior management team now, and the CEO job is to build and organise the team. The team is the product. So, I would concentrate on meeting with the leadership, marketing, product, sales, finance, two days of the week. There is one day of the week repeating the vision, mission, updating the team to make sure everyone has clarity about where we’re going. 

And then my passion at the end of the day is AI and photography. So, talking to users and talking to agents makes the rest of it. I’m always playing with the latest AI and trying to ask users, why are you using Photoroom? What would you improve? I would go in the street, every time I see someone using it and having a problem, I try to get deeper and deeper. Product interviews, at least 10-15 minutes every day. Overall, talking to the users – I love that.

photoroom recolor.

Cool, and I did read an article recently that said that the best kind of AI strategies start from the top, i.e. they start from people who are hands-on with the latest models and things like that. So I think it’s interesting to note that you’re still in that playing around mindset, even though you’re managing 100+ people.

Yeah, I mean, I’m working with tons of agents both on the text and image side. Every morning I actually start with reviewing emails that my agent is writing for me, for customers, for leadership. So, if you want to build the future, you have to live in the future. And you have to be using AI all the time. 

That’s a good attitude to have. It’s kind of similar to the one that I have, although I spend more time experimenting with it from a writing and understanding the model’s point of view so that we can then confidently help analyse and talk about them afterwards. 

Right, now let’s do our definition. It’s a question that sounds simple, but it’s probably a bit to unpack, so I’ll do the unpacking. And the question is, what is a seller? So you bill Photoroom as being designed for everybody from side hustlers, people who sell just as a hobby or as a side project, to Amazon. And there’s a kernel in there that I think is really important context for everything else we’re going to talk about. The codification of e-commerce and transactional platforms, things like Shopify, full service partners who do inventory, warehousing, and distribution for you, that’s all really opened up the floodgates. 

So historically, you needed to be a good sized company with a lot of infrastructure, insurance, everything else to sell online in a way that looked professional. Now, basically anybody can do the same thing all the way from product detail page PDP to fulfillment. Selling is a very democratised activity. 

Paint that picture for me: what do you define as a seller? And tell me if you feel any tension when you’re building a product for an audience across a spectrum as big as that.

Yeah, that’s a question and a definition we always have at Photoroom. I’ll tell you what’s a bit unique about it. For us, what really differs is that a seller is someone selling physical goods online. And so it goes really from the side hustler, and we call them aspiring sellers or aspiring entrepreneurs or side hustlers. Really, they’re testing the waters, they have this idea of a business just getting started or they just want to have a good impact on the environment, give a second hand to Amazon or DoorDash, which are also customers of us, which are actually sellers, but also selling platforms. 

And so really the primitive of the seller is someone selling a physical good online.

And the common with that is today, people buy pictures and they receive a product. And so for all these people, 95% of conversion is seeing a picture of what you’re buying and then receiving something after online. Some might have a physical moment, a shop, a brick and mortar, but common ground is they’re selling on the internet and they’re selling physical goods.

photoroom virtual models.

I think that’s important framing. I think the idea that the kernel that you got out there was that we buy on the basis of images and then we judge satisfaction and quality and everything else on how closely the thing that lands on our doorstep corresponds to the image that we saw.

Exactly. You want to shed the best light on the product. I mean, you kind of want to put some, I don’t know if it’s makeup, but really looking good. The nightmare of a seller is the return costs of physically having to return and on all the returns and the trust that you might break with your buyer if you lie. Commerce is about trust. So if there is a huge gap between this picture and the product, it has a huge cost. It’s all about trust here.

Absolutely, and that’s something we’re going to get into, I think, as we go you want to showcase the product in the best light, but not an unrealistic light. Because all that does is get you a sale, and then that sale translates into a return, as it turns out that the product doesn’t match the original kind of visual depiction. 

So staying within that context then, understanding what a seller is, let’s talk about why images matter so much. We’ve just said people buy on image, they receive something physical. You worked on a report recently, The State of GenAI in Marketplaces, and as part of the survey portion of that you found that 90% of shoppers put product visuals at the top of the decision-making pyramid. So images go way before text, brand loyalty, everything else. That seems right, seems logical. 

We’re a visually-geared species, and it seems pretty clear that, pre-AI at least, the demand for images outstripped the supply. Because there was such a vital decision-making criterion for people, they wanted more images. And in a lot of cases, I imagine sellers were not able to provide those images for logistical reasons. 

So, you know, I’m going off personal evidence here, but it’s been a pretty common experience for me to check out a product that comes in multiple colourways or materials. Let’s pick a sneaker, for example, right? And you find that there’s photography and videography of some of the colourways from some angles and not others. Or there’s videography, but only for one version. You have somebody doing a walk on and a walk off with the shoes, but not in the colour that you’re interested in. It’s very clear that sellers can’t or couldn’t pre-AI produce images in the volume and the accuracy that the buyers demanded. 

So give me some grounding in the scope and the scale of the problem. People clearly want images and they’re the most important decision-making criterion. Give me some grounding in that.

Yeah, of course. As you say, 87% of shoppers say visuals are the most important and the complexity of it is post money. One day of studio, minimal, cheap photoshooting for companies is around between 5-10K. That’s kind of how low you can get in traditional photography.

Not only is it expensive – we see a lot of brands that have the money to do it – but it’s also slow. You need to get the apparel or the product that you want to photograph. You can’t compress the time that it’s going to take. And we do see, in terms of numbers, we do see 40-50% more sales of an item in a catalogue if you do have the right images. Not having the right colour for instance. It’s a huge, huge problem. It’s really money you leave on the table if you’re selling something. 

So that’s a huge problem and then it’s also limiting it. You mentioned the colour side and sometimes, you know, you have a logistics issue. You don’t have this colour anymore, the fashion has changed. So there’s so many things that can get in there, and then you have this incompressible time of taking the traditional photography path to get the image out there. And so teams just drop, they don’t have the resources, it’s too much work for the organisation and they just don’t make it happen. 

And as you say, a missing colourway is even worse in fashion. So, we’re covering all the verticals but we have a specific focus on fashion and we just released, like three or four months ago, colour change, a colour specific tool and we saw huge engagement on that. Like having a way to have the different variations of your product has been a huge, huge unlock for our users. So that’s kind of the problem of the supply side. It’s very, very difficult. 

Why it matters on the demand side is your brain is processing images 10x faster than text. Our brain is hard-coded overall to process images faster, So, you make decisions in a snap – really almost a non-rational decision on images where you involve more of the thinking brain than when you go over a description. So all of that, if you can connect faster decisions with higher conversion and then creating images is like 10-100x cheaper, then it’s a huge unlock for all the sellers. 

So this is kind of setting the pace and setting the environment of what’s happening right now in the fashion industry.

photoroom recolor.

Perfect. That’s a good summary. 

Getting a bit more specific, what does the marketplace demand for visual content look like? And how does that differ from own channel demand? And where should that content come from? 

So I went back to your report again. I think you found that about 60% of people lose trust in a marketplace if they see inconsistent images. And more than 70% of people believe it’s actually the marketplace’s responsibility, not the seller’s, to enforce content parity and quality. 

Weirdly to me, that’s the opposite of what I think I would have heard if I’d walked into the boardroom at a multi-brand retail or marketplace a year or so ago. I’m not doing any selling myself. We’ve done a bunch of work over the years with marketplaces, with companies that sell through them. And I know that those companies, the marketplace owners, have set really high bars for the brands they’re willing to stock. And those bars have transferred basically all the asset and data provision responsibility, i.e. images, written information, clients information and so on, onto the sellers and the providers. Because otherwise, if you’re a marketplace, brand intake is just an infinitely scaling challenge. Every new brand you bring on board, you’re shouldering a huge burden. 

It sounds like from a customer point of view at least, that script has flipped and people believe it’s the marketplace’s ownership of that. Do you think marketplaces and multi-brand retailers see it the same way that customers do? Do they see it as their responsibility? And is it the case that AI has basically obliterated a pre-existing bottleneck so that marketplaces can then create that content and have infinite headroom to do it? Because historically they would have said, no, you supply me with the images because otherwise this is a huge burden and a huge job for me. Now it feels like they could change that framing.

Yeah, I totally see talking to marketplaces directly that most marketplaces do believe that the work should be on the brand themselves, on the sellers to provide the images. And they actually sometimes don’t feel like they have the right to kind of change some of the assets for bigger brands.

Well, the stats are clear from the consumer side. If 70% believe that it should come from the marketplace responsibility, it’s this idea of breaking trust. If a seller is lying with the images, on the marketplace, you’re not breaking the trust of this specific seller on the marketplace. You’re much more damaging the trust that you have in this specific marketplace. So that’s where the marketplace owner and people running this should be really conscious about that.

Yeah, I mean if you buy from Amazon, your transactional relationship is with Amazon, not with the company selling through Amazon.

Yeah, exactly, the guarantee. And so that’s why, in the past, there was this kind of guarantee and you have a high bar on the images to remove the sellers that were not playing the game. 

The challenge and what everyone is saying, especially the marketplace is because they want to increase their GMV and the inventory, is that images are often the bottleneck. So now with AI, you can really get a lot more sellers on your platform to play with your guidelines. 

What we see is a huge uplift. Rappi and the food space has been increasing by 20-30% by getting more involved in the images. So just you get more business. So what I do see is the marketplaces are a bit more dynamic. They are closer sometimes to the customer. And so they do see the customer going to other places and becoming a bit irrelevant if you don’t act on what’s possible. Like if the inventory gets bigger, you have more liquidity and then the best marketplace is your competitor. So there’s this huge challenge of staying in the game and really doing what’s the best and what’s possible to increase the inventory. 

What I do think is, with AI, you can have customised images for everyone. And so what does make sense for me is that the marketplace can brand as much as a seller, the imagery to give the best UX user experience to the buyer. And we are doing that with multiple marketplaces, you have the seller providing some images and some guidelines of how they want to communicate and also images or information. And the marketplace is also customising them. You don’t want to see the same images on one marketplace and its competitor. 

So actually, it’s more like as the cost of imagery is going down, the marketplace and the sellers will build a combination of what’s going to sell. Because the marketplace knows the seller better, they know what they’re looking for. They want something that looks specialised for the vertical. In sports, it could be someone specialised in skiing, for instance. So you do want to have this customisation that comes to the need and the segment that you have as a marketplace. So it’s on both sides.

I think as a marketplace, then, I would be asking myself a question which would be: do I want to do that image enhancement generation iteration myself as a job or do I want to offer it as a service to my sellers? Kind of white label or something along those lines. 

Do you see that kind of dual pronged approach or do you think most marketplaces are just saying, well, this is representative of me. My transactional relationship is with the consumer, therefore, this is my job. I’ll take the seller’s recommendations and stuff as intake, but it’s my taste, my curation, and everything that puts it out. Or do you think there’s a place for the platform approach as well?

I do think it can be an additional service that some sellers are willing to pay for and at the end of the day what matters is are you converting, are you increasing your GMV? The numbers speak and, today, I think the problem with the premium services that is sold that we’ve seen is it’s less number, it’s more difficult to adapt so the most successful marketplaces we’ve seen are rolling out as an option to all their sellers and as a strong brand approach and a strong innovative approach to say, let’s do the kind of easy, minimum, and have 100% coverage of our images. We know that if we have five images for every SKU, we’re going to sell more. So let’s do that kind of basis for everyone. 100% coverage, five, six images for every SKU. We know it is going to increase your GMV and it’s a win for everyone, so let’s do that now. 

Now we might have some premium sellers that we can reward or help with a better service and communicate some tips so they can sell even more. It’s kind of a mix of a reward and help to boost yourself or your best sellers. And this is applied sometimes as a second step. Like, I can give this service, I can build some retention for these top selling providers, sellers on my marketplace by giving them extra service for that. Maybe they have retail media in the platform and so I can also do ads as a different visual and I can provide that. Sometimes it’s part of a premium subscription, sometimes it’s not. It’s kind of what we’re seeing.

It makes sense because the marketplace, again as the owner of the relationship with the customer, if it fits their business model and if it’s in their commercial interests, can do client telling. They can start to build in what they know about the consumer into a premium service, into ways to give their sellers an edge selling through them as opposed to selling through a competitor.

Yeah, and the more you’re branded and unique to the marketplace, the less you can actually advertise. You create good images and if your sellers are starting to use them and you manage to get your brand guidelines into a marketplace, it’s a way to advertise in case the seller is using you outside of the marketplace. The marketplace can be very smart about how to use that as a marketing service too.

It makes sense. And you have Photoroom users on both sides of the equation there. Marketplace users and individual seller users.

Back to your question, what’s interesting is initially the marketplaces came to Photoroom because the best sellers of the marketplace were using Photoroom and so that gave the marketplace to say: it’s working for my best sellers so I might as well get all the benefit for everyone and let’s look at the economics. And often the economics work super well so they roll out to everyone and then the innovation comes from the sellers that are using the new AI features like virtual try-on that would be in Photoroom and so there’s this loop of always increasing and elevating the bar for visuals for sellers.

Yeah, and I think it’s very clear across this conversation that the unit economics of generative imagery versus traditional photography will continue to make sense if you look at this through a commercial lens. If that’s your sole lens, then that’s definitely why you would make that decision. 

Aside from visual content, what about AI when it comes to product attributes, tagging and catalogue building and so on? For fashion, when I talk about attributes, I mean material, trims, fit, and so on, maybe sustainability profiles, certifications. That’s another place where I think marketplaces and multi-brand retailers, department stores have shifted most of the burden onto their sellers. But it’s also an area where a lot of them struggle, which is where multi-brand retailers are struggling right now. Because a lot of the data that they get given from a brand level is a mess to begin with, or it’s all kind of consolidated into one blob as opposed to being granular and separated. 

And a lot of what marketplaces and retailers want to build, whether that’s semantic, natural language search, or personalisation like you just talked about, requires you to have a really granular data foundation and really impeccable governance. How do you see that side of things playing out?

Yeah, I think we’ve seen a strong impact. So we’ve seen about a 90% reduction in the report for onboarding time, for a new platform where a seller wants to sell, and a lot more attributes that are being filled, like 37 more additional attributes per product automatically filled. That’s what we’ve seen with Miracle, we are working with them. 

I think if you take a step back, it’s important to go back to why is cataloguing so important? ⁓ We know it’s so important in e-commerce overall, but I think it’s important to state that what’s happening with agent e-commerce is, and what’s new and what’s the opportunity for sellers is, AI for the buyer is able to match a need from the user with a product that has attributes. And for ChatGPT, Gemini to solve your need – a quarter of buyers are using ChatGPT or Gemini six months ago for a purchase. So it’s huge and it’s growing very fast. 

And for that to happen, you need to have the AI being able to find the attributes. And that was not so important before because it was mostly ads that were solving that or you knew this product or brand. Now, the AI can match a need that is expressed by a user. And they can match this need with all the attributes that are made. So this is very, very important now. And I do believe actually on the visual side that some of the attributes are going to be actually provided to the AI with the imagery too. An image is worth a thousand words. So if you want the exact dimension of your product on a fashion marketplace, actually you just get five dimensions on your text. But if you want all the dimensions, all the lines you want to sketch, these are in the image and the AI will be able to match that. So it’s very important that the information provided both by the attributes and the image are there. It’s even more true with fashion where you want to have virtual try-on and see how this piece of apparel would fit you. 

There is another topic of localisation, of how do you match the attributes between …say, a French 38 is not the same as an Italian 38. The AI is also helping you when the attributes are not the same for different marketplaces or for cross-country selling.

Yeah, I do have some concerns, I have to say, around generated images, synthetic data, and things when it comes to extrapolating sizing information and points of measure and things like that. I think it’s potentially a solvable problem, but it’s a solvable technical problem that relies on you having, potentially, a complementary and slightly different data input than just the image. But I think maybe time will tell on that one. 

I know people definitely want virtual try-on. I know that virtual try-on is proceeding in parallel, sort of trajectory to generative imagery as well. I feel like, I don’t know, I feel like there’s an opportunity there but virtual try-on has been tried a lot of times before and hasn’t seen mass consumer adoption for a bunch of reasons. I’ll be keen to see how it plays out this time.

Yeah, I think it’s a good point. I love your… We had some small marketplaces trying virtual try-on and see, like, a 3x conversion, right? So it’s interesting. What matters is what’s your return there. I agree with you that image alone is not enough to tell you the right size, but a multimodal AI is really where the industry is going. And so a mix of, you have the imagery, you have some depth data that can come from stereoscopy from your phone, the video also includes a lot of data. That’s kind of the idea of the world model. That’s why all the AI labs are going where you want to understand the physics of the world from the capture. We’re not there yet, but this is where the research is heading and making huge progress. 

I agree with you that it’s not been there, but there are huge scientific leaps that are being made right now in labs that might have a big impact in the coming six months, 18 months, I think, that are interesting.

I would agree. I would agree. I think one of these days we have a separate conversation about world models and the difference between visual representation and simulation. But I think if you cast your mind back even a year ago, the idea that you could automatically, through multimodal AI, you could recognise material types, could recognise patterns, you could take a good guess at construction techniques and things like that, that all feels viable. That all feels like something you could go after now.

You mentioned this earlier. So the marketplace kind of feeling that maybe it was not their place to generate imagery on behalf of sellers. That feels like a risk to both parties to some extent. If we think about every product detail page, and let’s imagine a hypothetical PDP where maybe half the images are either enhanced or generated by the retailer or the marketplace. Maybe some of the text is as well. The attributes are automatically tagged. Does that mean that the brand and seller is ceding a historic amount of control to the intermediary? Because smaller brands have always had to accept a bit of a devil’s bargain when it comes to getting stocked. But even bigger brands, Nike and so on, have had quite back and forth relationships with marketplaces because they get distribution, but that distribution comes at the expense of ownership of the channel and it comes at the expense of brand integrity in some cases. 

This, to me, the whole space feels like it represents a new risk vector for returns as we talked about. Like, if you generate imagery or you enhance imagery in a way that is not going to be substantiated by the product that actually arrives, who has that liability? As you know, that’s an open question to me. And how much of the control over the route to the consumer does the brand and the seller want to give away to the marketplace? There’s two questions being baked into that to me.

Yes, trust is the most important factor. And if there is a return that is higher because the generated images are not the same as a product. So people buy products and people buy pictures, they receive products, then it impacts both the brand and the marketplace. So that’s the tricky part, I think.

The tricky part that I would add as extra context on that when I talk to marketplaces is some of them wish the brands were doing it. I do think marketplaces didn’t invest as much historically in the brand and they’re closer to the customers, they are moving faster. They wish the brands were doing it. 

We work with Palm Angels as a brand, for instance, and they initiated it in this case. So you see a few brands that are very proactive. Zara is also working on that. So there are some brands that are taking advantage of it and they’re taking control of the storytelling of the brand and their visuals. And these are getting the best result. 

What I do see is the marketplaces are acting because they know they’re going to sell more and it’s how you bring growth and stay relevant in the high world. And they wish the brands were acting and the brands are a bit slower to move. We worked mostly with marketplaces historically, we’re working with more and more brands like Palm Angels, but my advice to brands is to start seeing what’s happening in this case, because it is a big opportunity. Visual media has always been very innovative from camera, like 200 years ago, to movie, 100 years ago. So this is just another step. It’s not like these were very innovative technology by definition.

I think being proactive is a key part of it.

Okay, so I want to talk a bit about Photoroom’s product now because I think you operate at a pretty interesting juncture of the market and I think the fact that you just talked about the journey from working with marketplaces to working with brands is part of this. 

What does your user base look like at the individual role level and do you see it changing? You know, I don’t want to describe what you’re doing as being like the Canva for AI, because Canva wants to be that. It looks to me like the ambition is the same. You want to make generative product photography accessible to everyone. And we just talked about the importance of being proactive from a brand point of view. I’m keen to know what that means for the people who would normally have held the reins on that side of things. who your user base is, that would be purely marketing and e-comm and so on. 

But you’re also leaning into something that I’d class as maybe closer to product iteration with recolour, which allows users to change garment colours, maybe adjust components and so on. And those are product development activities, not marketing or comms or e-commerce. 

Do you see the user base staying as marketing and sales and e-commerce, or do you think there is a push for you there into the sort of product creation user base as well?

I love this question. So I love Canva, I love to be compared to the Photoroom is Canva for AI. I think we really decided to double down on being the visual solution for every e-commerce business in the world. So really doubling down on e-commerce specific solutions and workflows and models is very important to us. 

I think it’s three layers and you see more and more of these AI companies like Cursor on the developing side. But basically, we’re the best at leveraging the labs model for the e-commerce specific use case. We do some pipeline pre-processing, post-processing that is the best for e-commerce. We develop our own AI models like background removal, AI shadows, which give generalised shadows to your product. It’s very important for PDP. Recolouring that you mentioned. And then we package that into an e-commerce-specific solution product.

Product fidelity is very important for e-commerce. How do you make that in the user experience very important and make it easier for your user to check that? How do you update your Shopify visuals? We have a Shopify connector. So really these three aspects of making the most of the AI labs and models, some specific models that we train internally and then building the product around that is how we specialise, and our verticals specification is e-commerce.

Our users are from side hustlers, small business owners, resellers and marketplaces to some more established brands like Palm Angels that I mentioned – and Amazon is using us, DoorDash is using us. So the biggest e-commerce platform in the world for building their tool.

And I think you mentioned it there, being the best environment for the models because generative photography is a pretty contested space. And the generation is the easy part. It’s a weird thing to say, like a couple of years into the whole generative AI revolution, but anyone can generate an image for free or for some cheap tokens. I’m not going to call it a solved problem, but it’s not far off. 

Yes.

It feels like the differentiation happens in building the application stuff around it. And a big part of the application stuff for this is organisation labeling, repeatable workflows, batching, and things like that. Maybe API-based programmatic content creation. Do you see that as the space that you want to spend the most attention on? I’m keen to know how big a factor that is in your product building.

Yeah, so we have a batch editor where you can edit 250 images at once, in 4K. And then we have an API for bigger volumes that is also quite used. We have users editing millions of images. What I say when e-commerce is kind of a focus on the primitive, the photo is not what the user, the buyer, is seeing. The buyer is seeing five photos, probably a video and a description. So you really want to have some consistency between them. That’s something, for instance, we invest heavily on, to be able to see and edit your PDP page visuals all at once and understand how they fit together. So we do invest a lot in that. 

I wanted to come back to your question on recolour and product iteration. I think it’s very interesting. And I do think what Photoroom and other AI providers in other verticals do is really make some businesses and some products that couldn’t exist before possible to exist. You can recolour, put online, and sell even if you haven’t printed this item yet. And so this gives life to product that maybe in your catalogue you didn’t know if there was demand for. So that’s very important and you’ll see a big impact on fashion for that.

Yeah, I think there’s a very, very weird thing happening at the moment where design and marketing are basically the same activity. Historically, you could not have sketched something and then instantly seen what it would look like photographed. And the fact that those things can happen in the same environment and the same workflow, it changes design. And I think it’s going to take us a while to unpack what that actually means for design and product development.

There’s a couple of final stats I want to pick up from the report. The first is that a majority, small majority, 55% still, of people say that poorly executed AI images decrease their trust in whatever platform’s trying to sell them something. And you also describe Photoroom as helping to deliver visuals that convert better without looking like AI. 

What does AI look like now when it’s detectable? I don’t think we’re talking about having six fingers on anyone’s hand anymore. There’s still some tells that you can spot if you spend enough time on these things. What does AI photography look like? And on the flip side, how often is AI imagery put in front of the average shopper right now and they don’t notice it?

Yeah, I mean, every image that you take on your smartphone is AI edited. So it’s kind of like everything will be AI generated at every pixel. Today, it’s not six fingers. It’s not the realism of the photo that matters while you still have some challenges. It’s more about the productivity that matters and the model. Like if you’re talking about humans, how much realism you get on the skin. So it’s still a bit glowy, the texture of the skin is not represented as well. For product fidelity, you see that in the fine details, the logo that will be on the piece of apparel, some part of the texture might have some details that are changing, and that’s where you can break the trust to the user, and that’s why Photoroom is developing special algorithm machine learning models to really understand and check the specific productivity items that are very important. 

And so we’re helping our user to do that. We pick the best model based on that, not based on creativity. And this is where the ‘AI slop’ is, that if you just generate one key image with low context and don’t take the time to check, you might see these AI images. 

Particularly if you then upscale them afterwards. Then I think upscaling is where a lot of the artifacts and things get introduced and that I’ve noticed.

Exactly, and you want to pay attention to the context window, how many images, how do you process the source images, how do you tell the models what is a source image, what is an inspiration image. All of these are very important in the pipeline to get the highest fidelity. Upscaling is not a good solution.

Then there’s the cultural side of things, right? So your report showed that around 40% of people are uneasy about AI images in general. And even if you label them as being touched or created by AI, that doesn’t really move the needle. That doesn’t sort of get people to say, OK, it’s labeled. I’m fine with it. 

How much cultural resistance to generative photography still is there? And if transparency, if labeling, isn’t the answer, what do you think it’s going to take to wear that resistance down?

You know, when you talk to customers, one thing that is happening is you tell them IKEA images have been 3D rendered for the past 15 years, and then they’re surprised about it. But people are really happy to buy at IKEA. So I think the end goal is people understanding the information that is coming from the image and seeing if it’s a good product for them. So as long as you focus on trust and you make it easy for every business to do it, I think people are super happy to buy from it. 

They’re uneasy because they see things, like what can you do on the political side, how images can be used for the media and manipulation. On the e-commerce side, if you really focus on trust and it really reflects the professionalism of the seller, that is important. I had a marketplace owner tell me, you know, the problem is you do want to reflect a bit of the care and the craft that was put in by the seller, by the images. If every image is perfect, then it looks like all my sellers are five stars. And so what’s the point of five stars is everyone is five stars. But we do see people buying from that. 

And there is the opportunity that everyone gets a white glove image that really understands their pain point. Like you want to see this chair in your living room. Virtual try-on if it’s possible, it’s fantastic.

The question is, is it building trust? So it will happen. There are some technical limitations, but I think people will see the benefit, at least on the e-commerce side, very fast. And we’re seeing that uplift.

Yeah, and I think, to your point, IKEA’s also been my reference point for a lot of this. 3D CG was not wrapped up in cultural tension the same way that AI is. Nobody was using math, at least at mass scale. Nobody was using 3D modeling and texturing and rendering for mass deception and everything else that AI is being rolled up in.

And I think there comes a point, particularly in the enterprise, particularly in work context, where you have to be willing and able to separate that wider cultural challenge from, as we said, the pure unit economics, the pure value side of things here.

Yeah, it’s a very powerful tool. So if you use it with trust and you have an IKEA brand that is important, IKEA cannot lie to users in their imagery. So they really try to have the 3D rendered image to be as good to reflect the best, to reflect how the product is. And AI is just a better tool than 3D for that. And it will help a lot of businesses. But they need to build the trust with the user. And we’re helping them to. 

I think it’s a better tool than 3D in almost all respects when it comes to visualisation, but not necessarily the accuracy side of things, particularly if you think about virtual fit and so on. I feel like there’s a complementary role for the two to play there. 

Sorry, go ahead.

Yeah, it’s true. Yeah, IKEA is interesting because you need to design in 3D, so you do have the 3D model anyway. I think most businesses don’t have the cost of 3D, and AI can get as good, and will get as good as 3D in the future.

So, finally, we’ve been talking about all this as though humans are going to be the ones looking at product listings, placing orders, and so on. But I think there’s reason to suspect that maybe AI agents are going to be on the receiving end of buying if we look a little ways out. 

Provocative question – are we optimising a dead end with this? Are we putting a lot of time and effort and model and time and tokens and things into creating images designed for people to see them when, in fact, what AI agents want is an image-free, pure data, back-end access? 

Do you think there’s a dual-pane future here? Do you think there’s a future where every marketplace has one human-designed front-end, filled with great quality images, generated or otherwise, and a machine-readable back-end or an MCP server that just deals in raw data? Or am I thinking about this wrong, and there are kind AI-native marketplaces that I’m just not understanding yet?

Yeah, well, I think it’s not a dead end in the sense that clean catalogues really serve human and AI agents equally. So you need structured data. We know that if you have multiple visuals, even if it’s an agent at the end, if you have visuals, you’ll buy more. Because at the end of the day, it’s still a human clicking ‘buy today’, and the buy button just makes it faster and reduces some of the selection.

I would add that more and more when we talk to marketplaces, are people who love shopping. So depending on your vertical, people want to see the images and they enjoy spending time on it. So it’s not at all a dead end. And then I think the most important part is ‘an image is worth a thousand words’ is true both for humans and agents. Images contain so much information that you can’t write in text. I mean, there’s no words. We’re just not a text species. There is a lot more information that you can put in an image. It’s very difficult to write down. Like, dimension, size, all of this is connected to the image side. And so at the end of the day, even the agent will need the images to be able to serve that. And then they will be able to serve this image to the end buyer. 

If you’re, let’s say, looking for outdoor gear, but you know the need from the human is coming from skiing in a specific place, well, you want to serve an image that is specific to skiing. So the agent will be able to render the base image into a specific use case, and you’ll have one image for each transaction. I think this is the future. This is where we optimise, and people can buy at first sight because this is the first image that matches exactly their need with all the information.

And so I don’t think it’s a dead end at all. It’s going to be even more images that are going to be needed by agents to provide the information to the buyer.

I like that framing of the agent doing the shortlisting, if you like, and then providing the user with a shortlist and the user is then making a determination based on the full spectrum of written information, visual information, and so on. 

I also think you’re going to have more of a technical viewpoint on this than I do. It feels like real-time personalised imagery or near-time personalised imagery is feasible-ish or feasible fairly soon, by reducing the amount of time spent on that. O while you’re providing a written answer, be generating a customised, personalised visual representation of the product in the back end so that the two can be served at the same time. That’s then a completely different surface for generated imagery and a completely different conversation. 

But I buy your point, the idea that the image matters right now. And I think the image matters as we go into the future as well.

Exactly.

Perfect. Matt, thank you so much for joining. I’ve enjoyed this conversation. I hope we get to have it again soon.

Me too. Thank you so much for your time.


And that’s the end of my chat with Matt. 

You can probably tell there’s some real uncharted territory here. And I think this is one of those occasions where the underlying tech and the applications being built on top of it are running way out in front of fashion’s ongoing attempts to restructure to capitalise on them. 

It’s clear there are some pretty interesting elastic products and platforms being built here and being used by everyone from the biggest companies to the smallest sellers.

It’s also clear that the unit economics are super compelling in isolation. But I don’t know if sellers or buyers have really come close to reckoning with just how embedded generative images are about to become in the process of selling things online. I do buy to some extent Matt’s argument that the IKEA catalogue being staged and rendered and indeed most homewares and furniture photography and automotive advertising and product photography being CG as well. I buy that that’s kind of equivalent and that it demonstrates that people will trust synthetic images. But the world also had, since the original Toy Story, so 30 plus years, to understand what CG was and to become familiar with it. And when it comes to AI, we’ve had a 10th of that time to adjust to generative or enhanced photography. Maybe less time than that if we consider that image generation has only been properly mature and accurate since Nano Banana launched last August. 

Needless to say, this is a topic we need to revisit and I enjoyed having the chance to talk to Matt about it at what feels like a critical juncture. I think he was the right person. 

Next week we’ll have something very different though so maybe just keep all of this pinned in the back of your head for now and I’ll talk to you again really soon.