What do I need to specify in my robots.txt fie

kentuckienne · 9 May 2018 15:41

I received an AdGrant from Google, but my site (www.wikibeaks.org) keeps getting disallowed. One of the complaints from the Google console was that the crawler couldn’t find a robots.txt file, so it can’t index the site - that’s the default now, so the crawler doesn’t expose private links. I thought that checking the “(robots)index this page” box would take care of this, but it seems it doesn’t. So I created a simple robots.txt file. I’m not sure what I need to put in it … I want the crawler to only see the published pages, and it seems too much hassle to explicitly allow them. I need to get this working so I can get re-indexed and re-approved.

What directories/files should I exclude? Here’s what I have now:

robots.txt file created for http://www.wikibeaks.com/

May 8, 2018

Exclude Files From All Robots:

User-agent: *
Disallow: /resources
Disallow: /rw_common
Disallow: /blog_files
Disallow: /markdown
Disallow: /Wikibeaks_readme.txt

End robots.txt file

==================

NeilUK · 9 May 2018 15:48

You should allow everything, and then specify the pages you don’t want to be crawled/indexed. It looks like you’re blocking the a lot of important stuff that would cause issues.

If you REALLY don’t want something seen on the internet, you shouldn’t publish it.

TechBill · 9 May 2018 19:56

Why not go for the whole shebang and enter this in your robot.txt

User-agent: *
Disallow: /

kentuckienne · 9 May 2018 20:01

I didn’t know how much stuff - such as page drafts would be indexed and searchable. I only want to block the minimum amount of stuff. If I don’t need to block the RW general files, I’ll just block the readme - which is just a note reminding me not to delete the Google authentication file.

** Just got off the phone with Google AdWords. They had me modify my robots.txt file as below, and it’s now submitted for review. Apparently needs the specific google permissions … asterisk isn’t good enough.

robots.txt file created for http://www.wikibeaks.com/

May 8, 2018

Exclude Files From All Robots:

User-agent: *
Disallow:

User-agent: Googlebot
Disallow:

User-agent: Googlebot-image
Disallow:

End robots.txt file

system · 9 June 2018 06:01

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do you prevent a page form beeing seen by any search engines Classic	21	2022	25 February 2016
Adding a robots.txt file Classic	6	1014	15 September 2021
SEO Tools Evaluation General	4	353	21 April 2020
Question about 'robots.txt' file General	4	784	6 March 2017
Right to Be Forgotten Classic	3	319	19 April 2019

What do I need to specify in my robots.txt fie

What directories/files should I exclude? Here’s what I have now:

robots.txt file created for http://www.wikibeaks.com/

May 8, 2018

Exclude Files From All Robots:

End robots.txt file

robots.txt file created for http://www.wikibeaks.com/

May 8, 2018

Exclude Files From All Robots:

End robots.txt file

Related topics