The next wave of beauty technology is promising a lot when it comes to personalisation: smart mirrors that diagnose your skin; apps that know your tone before you do; AI models that recommend what’s going to work for you, specifically, not the demographic average. 

But the current reality is a sparser version of that promise, because some of these systems are trained on narrow, incomplete data – when it comes to both the consumer and the products being recommended to them. And incomplete data is anathema to the successful realisation of what beauty says it wants to accomplish with AI.

Just as it has in a lof of industries, artificial intelligence has become the preferred shorthand for general-purpose modernisation in the beauty industry. It appears in product launches, research collaborations, and investor briefings as evidence that the sector is keeping pace with digital transformation. 

The theory is clear enough, algorithmic analysis of faces and skin will allow more accurate diagnosis, better product matching, and ultimately more inclusive formulations. People seemingly prefer interacting with vast catalogues of complex claims and formulations using natural language. And an increasing volume of shoppers are also using AI apps as a mediation layer between them and the products they want to buy.

The practical reality is… present, but patchy.  There’s no question that image analysis, manipulation, and generation has come a long way. And beyond the glossier stuff, large brands report using AI for forecasting, marketing, R&D and packaging. In labs, computer-vision models are tracking pigmentation, texture and elasticity via high-resolution photos.

But as quickly as the roll-out of AI and AI-adjacent tools is proceeding, so too is the recognition that some of these systems have been trained in extremely selective ways. 

In dermatology research, a study using the Diverse Dermatology Images (DDI) dataset found that state-of-the-art models performed up to 30 percent worse on darker skin tones. 

Similar reviews in JAMA Dermatology noted that most published AI dermatology papers failed to report ethnicity or tone, suggesting that the training data behind them remains incomplete. Some research into AI-generated dermatology imagery has found persistent under-representation of darker tones, highlighting the limits of synthetic datasets for medical and cosmetic use. 

The evidence points to a fundamental problem: the data still doesn’t cover enough real people. And this is by no means a new problem either, The Interline conducted interviews and ran an analysis back in September 2021 about racial bias in AI systems.

Since then, though, the consumer beauty gap has only widened when it comes to the ways the industry is using AI and, by extension, the use cases it has for granular user data as compared to the strictures involved in gathering it. High-resolution facial images can be classed as biometric data under privacy laws, depending on how they’re processed and stored. Collecting them at scale poses consent, storage and regulatory hurdles. Many brands rely instead on academic datasets, open repositories, or pre-trained models. 

Those may deliver technical capability but not always relevance, which means historical imbalances carry forward into new tools. How many companies rolling out AI skin analysis, for example, are auditing the DEI policies and the inclusivity of the datasets used by their third party suppliers?

For a case in point, consider the measurement systems journey that beauty has been on. The six-point Fitzpatrick scale, still widely used to classify skin tone, was developed in the 1970s for assessing sun sensitivity rather than imaging. Systems that rely on it can misclassify yellow or olive undertones, creating distinctions that are difficult to correct downstream. 

To address these gaps, Google introduced the Monk Skin Tone Scale in 2022, offering ten gradations designed to improve representational balance. Adoption beyond the tech sector has been gradual. While platforms like Google and Pinterest have integrated Monk, many beauty brands still default to legacy scales due to technical integration costs and legacy data.

These shortcomings are not only technical, they affect commercial reliability. When a virtual try-on tool repeatedly recommends lighter shades, the consumer perceives exclusion, not optimisation. When a diagnostic mirror identifies dryness or acne inconsistently across tone groups, the issue becomes visible in customer feedback and social media rather than internal testing. The industry’s pursuit of “hyper-personalisation” depends on the same conditions it still struggles to secure, abundant, representative, and well-labelled data.

Some start-ups are now building their own datasets, with clear consent and the requisite level of inclusivity and representation baked in. Proven Skincare, for example, markets a proprietary “Skin Genome Project” database used to personalise product recommendations. Perfect Corp continues to expand its AI portfolio for skin and face analysis and simulation. L’Oréal highlights long-running research into skin- and hair-type diversity, including patented colour-measurement instruments designed to improve accuracy in tone assessment. These are practical attempts to strengthen the data pipeline rather than the algorithms themselves. 

Where real data is scarce some researchers are turning to synthetic augmentation to fill the gap. Generative models can produce diverse facial images without collecting new personal data, though validation remains limited. Others are developing bias-testing frameworks to measure performance across tone groups before commercial release. These approaches show progress but also confirm how little is standardised. While several standardisation groups, including ISO and IEEE, are developing frameworks for bias testing and diversity reporting, no single protocol has yet been widely adopted.

The challenge of transparency, though, runs deeper than data collection itself. Many beauty brands promote “AI-powered” tools without explaining what system they use, how it’s tested, or what data it runs on. Claims of “AI-powered” analysis often rely on vendor partnerships whose underlying systems and training data are not publicly disclosed. Without disclosure, neither consumers nor regulators can distinguish between systems that are genuinely adaptive and those that simply automate fixed rules. That kind of opacity is typical in early digital adoption, but it slows the whole industry’s ability to learn what actually works. The lack of consistent baselines makes it harder to spot bias, fix mistakes, or compare results across markets.

This is, to be clear, a common problem that beauty shares with essentially every other sector that is blithely rolling out AI. But this does not make it any less of a concern.

Despite these constraints, the overall direction remains toward deeper integration of AI into the beauty value chain. The incentives are strong: personalisation promises higher conversion rates, lower return costs, and more targeted marketing. Data-driven formulation supports inventory planning and compliance. The question is whether these benefits can be sustained without a corresponding investment in data integrity. 

Optimism about AI in beauty isn’t misplaced, but the deployment is still pulling ahead of what the data foundations can support. The systems being built today will shape how consumers understand personalisation for years to come, and unlike fashion – where consumers demonstrate some willingness to come back to the well of virtual try-on after an unsatisfactory experience – beauty is sufficiently personal, tactile, and intimate that a bad interaction could sour a consumer for a long time. 

So until the data itself becomes broader, deeper, more diverse, and better defined, every advance risks carrying forward the same limitations.