OpenAI Expands Its Web Presence through GPTBot’s Internet Crawling
OpenAI Expands Its Web Presence through GPTBot's Internet Crawling
Control OpenAI’s ChatGPT’s access to your website or specific sections using robots.txt.
Details about GPTBot, the latest web crawler from OpenAI, have been released.
What’s GPTBot?
It’s OpenAI’s web crawler utilized for browsing the internet, assimilating information to enhance its AI capabilities (e.g., ChatGPT), and furnishing AI-generated responses for queries and prompts.
Useragent
The user agent token for GPTBot is ‘GPTBot,’ and its complete user-agent string appears as: ‘Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot).
Robots.txt
To prevent GPTBot from accessing your website or specific sections, you can utilize robots.txt. Adding GPTBot to your site’s robots.txt will disallow its access.
User-agent: GPTBot
Disallow: /
For granting GPTBot access to specific sections of your site, include the GPTBot token in your site’s robots.txt as follows
User-agent: GPTBot
Allow: /directory-1/
Disallow: /directory-2/
GPTBot Documentation
Documentation for GPTBot is available. You can access the documentation at GPTBot
GPTBot’s IP ranges
GPTBot’s IP ranges have been disclosed by OpenAI. Presently, there’s only one listed, yet it’s plausible they’ll expand this list in the future. IP ranges that GPTBot uses
Why It Matters
If you prefer not to have OpenAI utilize your content, you can prevent GPTBot from crawling your site. This follows the same procedure used for blocking web crawlers like GoogleBot and BingBot. These organizations are likewise exploring alternatives to robots.txt for similar intentions.