What do I need to specify in my robots.txt fie

I received an AdGrant from Google, but my site (www.wikibeaks.org) keeps getting disallowed. One of the complaints from the Google console was that the crawler couldn’t find a robots.txt file, so it can’t index the site - that’s the default now, so the crawler doesn’t expose private links. I thought that checking the “(robots)index this page” box would take care of this, but it seems it doesn’t. So I created a simple robots.txt file. I’m not sure what I need to put in it … I want the crawler to only see the published pages, and it seems too much hassle to explicitly allow them. I need to get this working so I can get re-indexed and re-approved.

What directories/files should I exclude? Here’s what I have now:

robots.txt file created for http://www.wikibeaks.com/

May 8, 2018

Exclude Files From All Robots:

User-agent: *
Disallow: /resources
Disallow: /rw_common
Disallow: /blog_files
Disallow: /markdown
Disallow: /Wikibeaks_readme.txt

End robots.txt file

==================

You should allow everything, and then specify the pages you don’t want to be crawled/indexed. It looks like you’re blocking the a lot of important stuff that would cause issues.

If you REALLY don’t want something seen on the internet, you shouldn’t publish it.

Why not go for the whole shebang and enter this in your robot.txt :smile:

User-agent: *
Disallow: /

I didn’t know how much stuff - such as page drafts would be indexed and searchable. I only want to block the minimum amount of stuff. If I don’t need to block the RW general files, I’ll just block the readme - which is just a note reminding me not to delete the Google authentication file.

** Just got off the phone with Google AdWords. They had me modify my robots.txt file as below, and it’s now submitted for review. Apparently needs the specific google permissions … asterisk isn’t good enough.

robots.txt file created for http://www.wikibeaks.com/

May 8, 2018

Exclude Files From All Robots:

User-agent: *
Disallow:

User-agent: Googlebot
Disallow:

User-agent: Googlebot-image
Disallow:

End robots.txt file

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.