Right to Be Forgotten

I received a request to add a robots.txt file in order to insure that an individual (who was mentioned in a court case) is not picked up by search engines. Even though this is a public record, he doesn’t have anything to do with the reasons for the case and I do not want to get involved in the extraterritorial application of EU law, so I have agreed to exclude the page from the search engines. It will still be on the court’s web site.

My question concerns the syntax necessary to exclude a page.

The URL facing the public is along the lines of:

http://mywebsite.com/documents/courtcase.pdf

But on the server, the structure is:

/html/mywebsite/documents/courtcase.pdf [note absence of “.com”]

In creating the robots.txt file, you are supposed to begin the URL with a “/”, so which is correct?

(a) /mywebsite.com/documents/courtcase.pdf or
(b) /html/mywebsite/documents/courtcase.pdf

Server is MediaTemple gridserver, if it makes a difference.

A similar question was asked before on the forum but the links are all dead.

It should be:

User-agent: *
Disallow: /documents/courtcase.pdf

You can exclude the “User-agent” part if you already have that in your robots.txt file.

2 Likes

You can get pretty complete robots.txt file instructions from google.com webmasters section. If you hit the next button on the top and bottom of this instructions page you’ll find instructions and a link to robots Googles tester tool. It works quite well and is easy to use.

https://support.google.com/webmasters/answer/6062596?hl=en&ref_topic=6061961

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.