Effective use of the robots.txt file for websites: Recommendations from the Google analyst
Gary Illais, a Google analyst, recently highlighted the importance of using robots.txt for website owners in a LinkedIn post. He suggests using this file to prevent web crawlers from accessing URLs that trigger actions such as adding items to a cart or wish list. Illais stresses the importance of blocking access to URLs with "?add_to_cart" or "?add_to_wishlist" parameters through the robots.txt file.
"Looking at what we're crawling from the sites in the complaints, way too often it's action URLs such as 'add to cart' and 'add to wishlist.' These are useless for crawlers, and you likely don't want them crawled." - Gary Illyes
🚀Illais also pointed out that while using the HTTP POST method can also prevent such URLs from being crawled, crawlers can still make POST requests, so using robots.txt remains a good idea. For example, if your website has URLs like "https://example.com/product/scented-candle-v1?add_to_cart" and "https://example.com/product/scented-candle-v1?add_to_wishlist" - you should add a disallow rule for them in your robots.txt file.
- 📌 Using the robots.txt file allows you to reduce the load on servers by preventing web crawlers from accessing unnecessary URLs.
- 📌 Proper use of robots.txt can significantly improve the performance of web crawlers.
- 📌 The robots.txt standards were developed back in the 1990s and are still relevant today.
🚀Illais confirms that Google's crawlers fully respect robots.txt rules, with rare exceptions that are well documented for scenarios involving "user calls or contract requests". He also emphasizes that compliance with the robots.txt protocol is one of the main principles of Google's website crawling policy.
Статтю згенеровано з використанням ШІ на основі зазначеного матеріалу, відредаговано та перевірено автором вручну для точності та корисності.
https://www.searchenginejournal.com/google-reminds-websites-to-use-robots-txt-to-block-action-urls/519215/