5 Web Crawling Use Cases: Companies That Became Worth Hundreds of Billions Through Crawling
AI Use Cases
4 min read

5 Web Crawling Use Cases: Companies That Became Worth Hundreds of Billions Through Crawling

Did you know there are many companies worth hundreds of billions of won built on the back of crawling? We've put together the stories of companies that grew enormously by using web crawling smartly.

Summary

  • Clay AI - LinkedIn crawling

  • BuiltWith - website URL crawling

  • ChatGPT - crawling websites, books, social media, code, and papers

  • Harvey AI - legal material crawling

  • CB Insights - crawling company information worldwide

What Is Web Crawling?

Web crawling is a technology that automatically collects data from websites. A program automatically visits web pages and pulls out the information it needs. Put simply, it automatically scrapes together the information you want from the internet.

Companies gather this kind of information through web crawling:

  • Product prices and reviews

  • Company job postings

  • News articles

  • Social media posts

  • Website technology information

Companies That Used Web Crawling Smartly

Web Crawling Use Case 1. Clay AI


About the Company

Clay AI is a company founded in 2020. It builds a service that helps sales teams find and manage potential customers. It automatically suggests a list of people likely to buy your product, and it even helps you send messages. Today it's valued at 700 billion won.

Features

Because Clay helps salespeople find prospects who might buy their product, it mainly crawls LinkedIn data. Let me show you a few examples.

Just enter a company name, and it crawls all of that company's job postings and then summarizes, in a single line, what problem the company is currently trying to solve. It then helps you use that summarized problem to write a cold email.

Enter a company name and it can also pull out the company's mission.

It can also visit a company's pricing page and scrape its pricing policy.

Crawled Data

  • It scrapes information made public on social media like LinkedIn and Twitter.

  • It uses data posted on company websites and in job postings.

  • It also pulls in information published by media outlets like TechCrunch.

Web Crawling Use Case 2. BuiltWith


About the Company

BuiltWith is a company founded in 2007. Its main service is finding and reporting what technologies websites around the world use. It now holds information on over 700 million sites. It's known to generate more than 2 billion won in annual revenue.

Features

When you enter a website URL, it tells you which services that website uses to run.

You can also find out which companies are using a particular SaaS.

Crawled Data

It crawls the HTML code of countless websites to identify which services they use.

Web Crawling Use Case 3. OpenAI


About the Company

Did you know that OpenAI, famous for ChatGPT, runs the most crawling bots in the world? Building a language model like GPT requires training the model on real data. OpenAI uses GPTBot, a bot that roams the internet and crawls the data needed for model training.

Web Crawling Use Case 4. Harvey AI


About the Company

Harvey is a legal AI startup that became a unicorn just two years after its founding. It builds AI that helps lawyers do their work, assisting with tasks like reading and understanding legal documents or reviewing contracts.

How It Uses Crawling

To build an AI model specialized in law, it appears to crawl the countless legal materials available publicly on the internet.

Web Crawling Use Case 5. CB Insights


About the Company

CB Insights builds a service that collects information on companies worldwide and analyzes "which startups will succeed" and "which technologies are taking off." It's especially popular with investors and large enterprises.

How It Uses Crawling

According to materials released in 2014, it appears to have built crawlers that roam news and company pages worldwide, checking tens of thousands of articles every day.

  • It collects company news in real time from news outlets around the world

  • It gathers information on newly registered patents from patent offices

  • It analyzes job postings to see what talent companies are looking for

  • It collects companies' investment news and amounts

  • It finds product launch news on company websites and social media

In Closing…


All the companies we've looked at so far share one common trait: they collected and used 'information anyone can see' in a 'new way.'

Clay AI uses LinkedIn information to help salespeople, BuiltWith uses website code to report company information, and OpenAI built an AI from the text of the internet. Harvey uses public legal information to help lawyers, and CB Insights uses company news to help investors.

What new service could you build by gathering the information you see every day?

Web crawling is easier than you think. With AI, you can make a big impact.


Just enter your company name. We'll tell you which AI you should use. It's free.

kyungsuk chon

kyungsuk chon

You might also like...

How can we help?

We'll get back to you shortly.