Multimodal Search: Smart Search That Thinks Like a Human
AI Insights
6 min read

Multimodal Search: Smart Search That Thinks Like a Human

We recommend this for
- Anyone who wants to learn about 'multimodal search,' the trend in e-commerce search services
- Anyone who wants to offer customers a new search experience
- Anyone who wants to try out Dalpha's text-search and image-search AI

Estimated reading time: about 3–5 minutes

If you're busy, at least read this!

- The latest search trend, recently adopted by Naver too, is 'multimodal search'—which, much like a person using multiple senses, simultaneously learns diverse combined information such as text, images, and voice to think intelligently.

- Dalpha's text-search AI finds the products best suited to the keywords a customer searches for. Unlike conventional search, it leverages product images, so 1) you can find the product you want from images alone even if the keyword isn't in the product name, and 2) you can search using just the feel or occasion of the product you want.

- Dalpha's image-search AI lets customers find similar products using only an image they have. A customer can find similar items just by having an image of the clothing they want to buy.

- Multimodal search is a smart search feature that reads the customer's mind, making their shopping experience more convenient.

Multimodal Search: Smart Search That Thinks Like a Human

How Naver Recently Strengthened Its Search Service

Recently, many companies have been winning over customers by offering a variety of search services. They're moving beyond conventional plain-text search to give consumers a more convenient search experience and drive purchases. A prime example is the 'multimodal document search' service Naver recently launched.

네이버 멀티모달 예시

Naver's 'multimodal document search' lets customers find a product from an image alone, even if they don't know its name. For example, if you search a photo of sneakers with Naver's 'Smart Lens,' you can go on to actually purchase them based on the sneakers' product name. Naver is also applying this multimodal technology to shopping search, focusing on advancing the search experience within Naver Shopping.

So What Exactly Is Multimodal?

In a word, multimodal technology is technology that 'thinks intelligently, like a human.' Just as people use various senses—sight, hearing, smell, and more—to reason out a result, it simultaneously learns diverse combined information such as text, images, and voice to deliver the results users want. This 'multimodal' technology was recently introduced in OpenAI's GPT-4 and is one of the technologies drawing the most attention.

Don't you think customers feel fatigued trying to find what they want among countless products? Multimodal search is the most effective way to help customers facing this inconvenience. Customers no longer need to repeat searches over and over—they can find the products they want through image-based search.

You Can Bring Multimodal Search to Your Platform Too

You see the benefits of multimodal search, but are you still unsure how to adopt it? Even without developing the technology yourself, there's a way to bring multimodal search to your platform. Try Dalpha's multimodal search AI.


Dalpha's Text-Search & Image-Search AI That Reads Customer Intent

What Kinds of Multimodal Search Does Dalpha Offer?

While Naver, a portal search service, focused on finding the exact product through multimodal search, Dalpha's multimodal search is built specifically for e-commerce—it 'reads customer intent to suggest similar products.' That's the difference.

It can find the exact name of a searched product, but it can also offer a kind of 'similar-product suggestion,' identifying the intent behind the customer's text or image search and presenting all the most fitting products.

Curious how Dalpha's multimodal search AI grasps customer intent? Let us introduce 1) the text-search AI and 2) the image-search AI.

1) Text-Search AI

Dalpha's text-search AI finds 'the products best suited to the keywords a customer searches for.'

How is this kind of search different from conventional text search? It leverages multimodal information—namely, new data in the form of product images—in the search process.

Conventional text search delivered results based on the product names and categories a client registered, or on specific manually entered keywords, so the available data was limited. But with Dalpha's text search, the target of text search expands beyond data like product names and categories to include 'product images,' making it easier to deliver results that match the customer's search intent.

  • Search for specific product keywords: Because conventional text search uses only limited information, it had the inconvenience of not working well when keywords were specific. When you search the keyword 'white square-neck puff blouse,' even if the product name doesn't contain those keywords, it analyzes the product images itself and delivers accurate results that match the keywords.

  • Search using just the feel or occasion of the product you want: You can also search flexibly using the feel or occasion of the product a customer wants. When you search 'a dress for spring' or 'festival look,' it finds products that fit those keywords. Because it derives results by analyzing images, even if the product name doesn't include those words, it can accurately find clothing that matches the feel or occasion of the product the customer wants.

검색 결과
Product search results from Eosa Market, a Chinese clothing wholesale platform and an actual Dalpha client — example 1
검색 결과
Product search results from Eosa Market, a Chinese clothing wholesale platform and an actual Dalpha client — example 2

2) Image-Search AI

Image-search AI is a search method that suggests your own products most similar to the image a customer searches, based on that image. It's similar to the feature recently adopted by various e-commerce players like Naver and Musinsa.

Because it analyzes 'a product's features' and suggests all similar products even if they aren't exactly identical, customers can find similar items just by having an image of the clothing they want to buy.

For example, if you search an image of 'cargo pants,' it provides results from the most similar cargo-pants product group.

검색 결과
Product search results from Eosa Market, a Chinese clothing wholesale platform and an actual Dalpha client — example 3

There are also special extra features available only in Dalpha's image-search AI. It offers a 'color palette' that lets you browse searched products by color, and a 'secondary search' feature that lets you narrow results with additional keywords—delivering an even more advanced solution.


Try Multimodal Search That Reads Your Customers' Minds Right Now

Multimodal search makes the customer's shopping experience even more convenient. By combining all available information—images, text, and more—it reads the customer's mind.

You can let customers easily find and buy trending products, or items they saw on social media and wanted, on your commerce platform. Or even when they're not sure exactly what they want to buy, they can find products simply by describing the situation in which they're needed, such as 'summer festival' or 'sports day.'

If you'd like to adopt this kind of smart search, try the text-search and image-search demos in Dalpha's AI Store right now.

Try the text-search AI demo

Try the image-search AI demo

If you're curious about detailed usage or want to build an AI other than the ones we introduced today, please reach out to Dalpha.

Jinsu Choi

Jinsu Choi

You might also like...

How can we help?

We'll get back to you shortly.