There are some pages on the website which are not shown to the end user. Those pages may be a login page to the intranet or the content management system (CMS). Those pages can be easily found in a text file called robots.txt which is present on the web server of most of the websites.
Website owners use /robots.txt file to stop the robots or automatic crawlers to reach the restricted pages which they don't want a user to see.
For example, a Google search will never show link of the CMS if searched for a website i.e., Google bots will not crawl the pages which are described in robots.txt file.
User-agent: * means that the pages are not allowed for every robot, not just a particular one.
Disallow: /page-name/ means that page-name is not allowed to be crawled by a bot.
You can have more knowledge about robots.txt HERE.
Website owners use /robots.txt file to stop the robots or automatic crawlers to reach the restricted pages which they don't want a user to see.
For example, a Google search will never show link of the CMS if searched for a website i.e., Google bots will not crawl the pages which are described in robots.txt file.
How the robots.txt file looks like :
User-agent: * means that the pages are not allowed for every robot, not just a particular one.
Disallow: /page-name/ means that page-name is not allowed to be crawled by a bot.
You can have more knowledge about robots.txt HERE.
How a Hacker can use robots.txt :
The Disallow: /page-name/ lines in the file does not mean that the user cannot browse them. So, once we know the names and location of those pages through robots.txt we can easily surf those pages which may contain some sensitive information about the web server and the technologies used.
So, if it's written like Disallow : /panel/cgi-bin in the robots.txt file of http://someWebsite.com and we try to visit
http://someWebsite.com/panel/cgi-bin
in our browser we may see a login interface to reach the CMS of the website.
You may find some really interesting pages while looking at the pages in robots.txt file.
Share your experiences with me in the comment box.
I hope you enjoyed the article.
Share it with your friends if it was helpful.
Thanks for reading.
Happy Hacking !!
You may find some really interesting pages while looking at the pages in robots.txt file.
Share your experiences with me in the comment box.
I hope you enjoyed the article.
Share it with your friends if it was helpful.
Thanks for reading.
Happy Hacking !!
Hey guys, you can go check out this tool https://www.websiteplanet.com/webtools/robots-txt/ on Website planet
ReplyDeletewhich opens the robots.txt file and gives some information about it.