I received an AdGrant from Google, but my site (www.wikibeaks.org) keeps getting disallowed. One of the complaints from the Google console was that the crawler couldn’t find a robots.txt file, so it can’t index the site - that’s the default now, so the crawler doesn’t expose private links. I thought that checking the “(robots)index this page” box would take care of this, but it seems it doesn’t. So I created a simple robots.txt file. I’m not sure what I need to put in it … I want the crawler to only see the published pages, and it seems too much hassle to explicitly allow them. I need to get this working so I can get re-indexed and re-approved.
What directories/files should I exclude? Here’s what I have now:
robots.txt file created for http://www.wikibeaks.com/
May 8, 2018
Exclude Files From All Robots:
End robots.txt file