Stop Google Indexing a Page

BrandonCorlett · 2 January 2016 02:47

@Turtle, I intentionally put this out there as a general tip for users rather than a specific answer to the podcast question. Mostly because finding the best possible solution for @bpequine involves a further conversation about her needs in regards to her specific question, current skill set, current stacks/plugins owned, and current budget.

However, since you brought it up…

Page Safe comes with 3 stacks;
- Page Safe, protects the entire page
- Stack Safe, controls whether or not a stack is displayed or not based on whether or not a Page Safe stack has ben unlocked

Logout, allows any button to be used as a logout button for Page Safe
With both of the “Safe” stacks the content is only ever pulled from the server if/when a password has successfully been entered. This would indicate that crawlers would be unable to access this information regardless of whether or not they adhere to the “no follow” standards.

Browsers do cache information, however I think it highly unlikely that this information makes it’s way onto the web since this is intended for local use on a per device basis. We would have to look up the terms and conditions of the browser to know for certain. In any case you can prevent browser caching.
You are right that you may not want your entire site blocked from indexing. SEO Helper can be used on a page by page basis or in a partial to be uniform across pages. So the choice is yours.
Not all robots may abide by 4.0.1+ standards, but I’m certain that google and other major search engines do. From the sound of it her primary concern isn’t hackers or looking to steal information, but rather google simply doing what it does best, indexing.
Blocking robots from seeing the resources folder should definitely be an option from within RW. I’m actually surprised this hasn’t come up before. I know that @joeworkman’s Total CMS protects the cms-content folder from robots. So If you store the file using Total CMS you should be good to go without any technical setup. @nikf, is this something that would be easy enough to add in to a small update?
Good point about the search engine having already indexed it. I know that google does have resources available for you to petition to have content removed, but they don’t make it all that easy.
The password protection of the file is a clever solution.
RapidBot 2 is a great product and was my first into into robot.txt files.
I think the most you could do to protect your resources is to store them in a secure location off of the server like on Amazon S3. Then you could use a file delivery system such as Rapid Cart Pro or Cartloom to deliver a download link to users that have access to a page protected by Page Safe. This would send users an email with a unique link to download the file. Preventing anyone from knowing the true location of the file. It would also alert you anytime someone downloads that file.
Yup, can’t stop someone from republishing the list if they have it, but that isn’t the scenario that she is experiencing. She is concerned about search engines i.e. Google
I agree that a PDF may not actually be the best solution. I could see some reasons why it would be a better medium. Although, I think that in most cases using something like @joeworkman’s [Power Grid Stacks + Page Safe/Stack Safe + an optional use of Easy CMS/Total CMS for online editing of the table] would be a better solution.
Great thoughts and I’m sure that some or all of this will be useful to someone. Hopefully @bpequine!

Cheers!!

Brandon

Topic		Replies	Views
Easy way to prevent robots indexing resource folders and content? Classic	7	1276	26 January 2018
How do you prevent a page form beeing seen by any search engines Classic	21	2029	25 February 2016
How can I protect warehoused resources in lockdown folders? General	5	798	4 June 2017
Dev Best Practices:Testing Site--Avoid Indexing Classic	4	535	30 March 2019
Hiding and Private Pages General	5	754	5 January 2021

Stop Google Indexing a Page

Related topics