
Improving Commerce Search with AI: From Typos and Synonyms to Natural Language
Hello, I'm Park Hyo-yeon, a PM on Dalpha's commerce team.
Since last year, our team has been trying various approaches to solve the 'search failure' problem that frequently occurs in commerce.
Today, I'd like to briefly introduce the search issues that are common across many commerce platforms, and the technical methods we used to improve them with AI.
1. Search Failure, a Silent Revenue Loss
You've probably had a moment where you searched but the product you wanted didn't show up, so you just closed the app.
Even if it looks like it's working fine on the surface, the losses caused by search are bigger than you'd think.
Customers leave without a word, and that moment rarely gets captured in conversion rates or statistics.
That's why many planners don't even realize there's a search problem,
or, even when they know, they deprioritize it behind average order value or ad revenue.
But recently, more and more teams are using AI
to try various improvements, such as typo handling, keyword expansion, and meaning-based search.
If it feels overwhelming, start by gauging the size of the problem.
We recommend measuring at least one of the metrics below.
Metric | Definition | Ideal Value |
|---|---|---|
No Result Rate | The percentage of search terms that return 0 results | 5% or lower |
Search-to-click conversion rate | The percentage of searches that lead to a product click | 20–30% or higher |
Search-to-cart conversion rate | The percentage of searches that lead to an add-to-cart | 10% or higher |
Search-to-purchase conversion rate | The percentage of searches that lead to an actual purchase | 2–5% or higher |
Search NDCG | A quality metric for the ranking of search results. The higher meaningful results are surfaced, the higher the score | 0.8 or higher (the closer to 1, the better) |
2. Types of Search Failure
Once you've grasped the size of the search failure problem, the next step is to identify specifically what types of search failures are occurring.
Most commerce platforms use a keyword-matching-based search system built on the commonly known Elastic Search. In such cases, the representative types of search failureare as follows:
Typos and spacing errors
Cases where results are missed due to spelling errors or run-together compound words
e.g.,
ppaski,adidasslippers,cat moisturizer (mistyped)
Different words, same meaning
Cases where the user's phrasing doesn't connect with the product's registered keywords
e.g.,
sneakers↔trainers,shirt↔button-up
Natural-language search terms (long-tail queries)
Search terms that don't directly map to product names, like themed keywords or situation-based phrasing
e.g.,
housewarming gift recommendations,linen shirt for summer vacation,low-calorie snacks
Searches for products or brands that don't exist
Cases where no results can be returned because the item isn't actually sold
e.g., discontinued products, brands not carried, nonexistent options, etc.
Compared to typical search terms, search terms like the above have relatively lower search volume, but
they often carry specific purchase intent or context, so their conversion rates tend to be high.
However, because the ways of phrasing them are extremely varied and not standardized,
it's realistically impossible to handle them all with conventional keyword-based search.
3. Vector Search That Understands 'Search Intent'
What we introduce to solve this kind of problem is Semantic Search.
The key is to connect search terms and products not by 'whether the keywords match' but by 'similarity of meaning',, and at the center of this is the embedding model.
🧠 What is an embedding model?
It's a model that converts human language or product information into a chunk of numbers called a 'vector'.
Because these vectors carry 'meaning', even if the words are different, expressions with similar meaning can be mapped to nearby positions.
e.g., "baby snacks" and "infant snacks" use different words, but in vector space they end up close to each other.
🧩 Which model should you use?
General commerce: the most general-purpose natural-language model
Specialized commerce like fashion or food: domain-specific models that capture characteristics like color, seasonality, and material well
However, in many cases this 'base model' alone isn't enough. To enable more precise, commerce-appropriate search, you need to fine-tune the model to match how your service's customers search and your product catalog.
4. Building AI Search Tailored to Your Service
🔧 What is fine-tuning?
It refers to optimizing the embedding model to fit your service. For this, you need behavioral datathat tells the model 'what product a search term intends to find'.
💾 What does training data look like?
Below is an example of training data built from actual search behavior data. It can be built relatively easily from behavioral data logs such as post-search clicks, cart additions, and purchases.
query | positive product | negative product |
|---|---|---|
low-calorie ice cream | Lala Sweet Chocolate Ice Milk, Binggrae Dewisanyang Zero | 1 espresso capsule, Usefulmall cutting board set |
cushion foundation | Laneige Neo Cushion 21N, Innisfree My Cushion SPF50+ PA+++ | tint lip balm, hand cream set |
wedding guest look | monotone flare dress, linen blended jacket two-piece set | graphic T-shirt, training pants, swimsuit top-and-bottom set |
By accumulating thousands to hundreds of thousands of these query-correct-incorrect data points and training on them, the model learns to "bring the correct products closer and push the incorrect ones farther away". It gradually optimizes into an embedding model perfectly tailored to your service.
🤖 What this makes possible
Typos:
low-cawlorie ice cream→Lala Sweet Chocolate Ice MilkSynonyms:
low-sugar ice cream,diet ice cream→Lala Sweet Chocolate Ice MilkNatural language/long-tail:
clothes to wear to a wedding→monotone flare dress,linen blended jacket two-piece setBrand recognition:
Laneige cushion→Laneige Neo Cushion 21N
Of course, beyond the methods introduced so far, many more technical modules actually work together in practice. For example, things like Korean typo-correction models, commerce-specific tokenizing, automatic synonym expansion, and automatic search-tag augmentation.
Many teams tell us, "But we don't have any usable data."
In that case, you need to start by accumulating small behavioral logs, organizing the parts you can start with, and optimizing them one by one.
Search doesn't improve overnight. But if you gradually build a structure that leverages data and complements it with AI, you can create your own search model that responds flexibly to a variety of search intents—without people having to attach tags one by one.

Hyoyeon Park

