OpenAI is Now Crawling the Internet with GPTBot

OpenAI Expands Its Web Presence through GPTBot’s Internet Crawling

Paramveer Singh August 7, 2023August 7, 2023 0 Comments

OpenAI Expands Its Web Presence through GPTBot's Internet Crawling

Control OpenAI’s ChatGPT’s access to your website or specific sections using robots.txt.

Details about GPTBot, the latest web crawler from OpenAI, have been released.

What’s GPTBot?

It’s OpenAI’s web crawler utilized for browsing the internet, assimilating information to enhance its AI capabilities (e.g., ChatGPT), and furnishing AI-generated responses for queries and prompts.

Useragent

The user agent token for GPTBot is ‘GPTBot,’ and its complete user-agent string appears as: ‘Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot).

Robots.txt

To prevent GPTBot from accessing your website or specific sections, you can utilize robots.txt. Adding GPTBot to your site’s robots.txt will disallow its access.

User-agent: GPTBot

Disallow: /

For granting GPTBot access to specific sections of your site, include the GPTBot token in your site’s robots.txt as follows

User-agent: GPTBot

Allow: /directory-1/

Disallow: /directory-2/

GPTBot Documentation

Documentation for GPTBot is available. You can access the documentation at GPTBot

GPTBot’s IP ranges

GPTBot’s IP ranges have been disclosed by OpenAI. Presently, there’s only one listed, yet it’s plausible they’ll expand this list in the future. IP ranges that GPTBot uses

Why It Matters

If you prefer not to have OpenAI utilize your content, you can prevent GPTBot from crawling your site. This follows the same procedure used for blocking web crawlers like GoogleBot and BingBot. These organizations are likewise exploring alternatives to robots.txt for similar intentions.

OpenAI Expands Its Web Presence through GPTBot's Internet Crawling

What’s GPTBot?

Useragent

Why It Matters

Leave a Reply Cancel reply