Question about 'robots.txt' file

(Rob D) #1

Who can tell me how to set up RapidBot so that it allows access only to the main index page (HOME page) and disallows access to everything else in the site? The site in question has a lot of folders and pages, so it would be impractical to disallow every single one of them. Surely, there should be a way to lump them up together, somehow?

If not using RapidBot, how to write a code for robots.txt file that would do the same?

(Doug Bennett) #2

DOn’t know a thing about rapidbot.

Might try this :

User-agent: *
Allow: /index.php
Disallow: /

Assuming index.php is your home page.
Havn’t tried it so make sure you test:

(Rob D) #3

Thanks, Doug,
Won’t the Disallow: / line of code override the Allow: /index.php line?

(Doug Bennett) #4

No it should not the way I understand it. Robots.txt files are processed top down.
If it hits an explicide allow it should pass.
I have used simular robots.txt files with sub directories(never tried in the home directory) like this and it works:

User-agent: *
Allow: /folder/subfolder/file.html
Disallow: /folder/subfolder/

As I said you should test it.

  1. update your robots.txt file
  2. go to the URL above (log onto google if needed)
  3. select test robots.txt file
  4. You should see your robots.txt file
  5. If not or you have updated select the submit button (lower right)
  6. submit button (on the pop-up)
  7. refresh page until changes appear
  8. enter test URL at bottom.

(Rob D) #5

Thanks again, Doug. This is very helpful.