
5 Web Crawling Use Cases: Companies That Became Worth Hundreds of Billions Through Crawling
Did you know there are many companies worth hundreds of billions of won built on the back of crawling? We've put together the stories of companies that grew enormously by using web crawling smartly.
Summary
Clay AI - LinkedIn crawling
BuiltWith - website URL crawling
ChatGPT - crawling websites, books, social media, code, and papers
Harvey AI - legal material crawling
CB Insights - crawling company information worldwide
What Is Web Crawling?
Web crawling is a technology that automatically collects data from websites. A program automatically visits web pages and pulls out the information it needs. Put simply, it automatically scrapes together the information you want from the internet.
Companies gather this kind of information through web crawling:
Product prices and reviews
Company job postings
News articles
Social media posts
Website technology information
Companies That Used Web Crawling Smartly
Web Crawling Use Case 1. Clay AI
About the Company
Clay AI is a company founded in 2020. It builds a service that helps sales teams find and manage potential customers. It automatically suggests a list of people likely to buy your product, and it even helps you send messages. Today it's valued at 700 billion won.

Features
Because Clay helps salespeople find prospects who might buy their product, it mainly crawls LinkedIn data. Let me show you a few examples.
Just enter a company name, and it crawls all of that company's job postings and then summarizes, in a single line, what problem the company is currently trying to solve. It then helps you use that summarized problem to write a cold email.

Enter a company name and it can also pull out the company's mission.

It can also visit a company's pricing page and scrape its pricing policy.

Crawled Data
It scrapes information made public on social media like LinkedIn and Twitter.
It uses data posted on company websites and in job postings.
It also pulls in information published by media outlets like TechCrunch.
Web Crawling Use Case 2. BuiltWith
About the Company
BuiltWith is a company founded in 2007. Its main service is finding and reporting what technologies websites around the world use. It now holds information on over 700 million sites. It's known to generate more than 2 billion won in annual revenue.
Features
When you enter a website URL, it tells you which services that website uses to run.

You can also find out which companies are using a particular SaaS.

Crawled Data
It crawls the HTML code of countless websites to identify which services they use.


Web Crawling Use Case 3. OpenAI
About the Company
Did you know that OpenAI, famous for ChatGPT, runs the most crawling bots in the world? Building a language model like GPT requires training the model on real data. OpenAI uses GPTBot, a bot that roams the internet and crawls the data needed for model training.



Web Crawling Use Case 4. Harvey AI
About the Company
Harvey is a legal AI startup that became a unicorn just two years after its founding. It builds AI that helps lawyers do their work, assisting with tasks like reading and understanding legal documents or reviewing contracts.
How It Uses Crawling
To build an AI model specialized in law, it appears to crawl the countless legal materials available publicly on the internet.


Web Crawling Use Case 5. CB Insights
About the Company
CB Insights builds a service that collects information on companies worldwide and analyzes "which startups will succeed" and "which technologies are taking off." It's especially popular with investors and large enterprises.


How It Uses Crawling
According to materials released in 2014, it appears to have built crawlers that roam news and company pages worldwide, checking tens of thousands of articles every day.

It collects company news in real time from news outlets around the world
It gathers information on newly registered patents from patent offices
It analyzes job postings to see what talent companies are looking for
It collects companies' investment news and amounts
It finds product launch news on company websites and social media
In Closing…
All the companies we've looked at so far share one common trait: they collected and used 'information anyone can see' in a 'new way.'
Clay AI uses LinkedIn information to help salespeople, BuiltWith uses website code to report company information, and OpenAI built an AI from the text of the internet. Harvey uses public legal information to help lawyers, and CB Insights uses company news to help investors.
What new service could you build by gathering the information you see every day?
Web crawling is easier than you think. With AI, you can make a big impact.
Just enter your company name. We'll tell you which AI you should use. It's free.


kyungsuk chon

