How to hide pages using htaccess

Gabrielle · 5 May 2018 16:23

Hi all :-).

I have now FINALLY loaded my own brand new website here: http://www.stormdesignprint.com.

Now I’m just doing all the backend stuff. I have lots of old files that are no longer valid so I’ve removed them from my server BUT I would also like to stop Google from indexing these old pages. So when I go here: site:www.stormdesignprint.com I can see a list of pages I’d rather not see, so what code do I add for those pages in my htaccess file? Is it that easy?

Many thanks for your time :-).

DaveFox · 5 May 2018 16:26

It’s a robots.txt file you need -http://www.robotstxt.org/robotstxt.html

Gabrielle · 5 May 2018 16:27

Hi Dave, I have sitemap plus in my website and I think it generates a robotstxt file, but I’m not sure how to actually use it.

swilliam · 5 May 2018 16:27

Robots .txt will stop the crawl from new and existing pages but it won’t necessarily remove them from googles index.

You can do it with google webmaster tools though

DaveFox · 5 May 2018 17:28

Apologies - I should have read post properly! Full info here: https://support.google.com/webmasters/answer/1663419?hl=en

Gabrielle · 6 May 2018 12:06

Ok thanks all, looks like I may as well leave things as they are.

joeworkman · 6 May 2018 20:45

The robots.txt file will stop search engines from indexing a page initially. However, from my knowledge, if you are linking to that page from any other page that Google indexes, it will eventually get indexed. The only way to guarantee that a search engine does not index a page is to use meta tags directly on the page. Luckily there are settings inside the page inspector in RW7 that will do exactly that.

Gabrielle · 8 May 2018 13:36

Thanks Joe, but the pages don’t exist anymore in any project so I can’t use the page inspector. They are really old dummy pages that I used when I was creating a site but didn’t want the public to see them. I just noticed that they are still on Google from years ago. Don’t worry I’m sure it’s only me that will be slightly irritated by their presence, nobody else will care.

manofdogz · 8 May 2018 15:49

The most important thing to do is 301 redirects from the old pages to relevant new ones. Easy to do in cPanel if that’s what you have and should be available in other control panels. More fiddly in htaccess but no major deal. I just did a site:stormdesignprint.com search and there are pages showing 404 not founds - you definitely do not want these. So long as these old pages permanently redirect you are fine with Google.

Google will relegate these pages after a while and, even though they may be possible to find, they will not come anywhere near the first pages of Google results so in practice won’t be seen.

You can also request removal: so long as Google recognises there is no such page (i.e. make sure it has gone from your server) it will be listed for removal. You can do this is in Google Search Console - https://www.google.com/webmasters/tools/removals

manofdogz · 8 May 2018 16:14

A further thing you need to do in htaccess is decide whether your priority site is “www” or “not www” and set one to divert to the other.

Google effectively sees https:// and https://www as 2 different sites. As you have it now, Google sees these 2 sites and treats one as having duplicate content. This will water down your seo.

which you choose is up to you - there is no inherent benefit for one or the other. If you have always been known as www, it’s on your stationary, and the www version has better ranking, use this. If the other way around, use the non www. If it’s all the same, just choose the one you personally prefer.

Once you have done this, add both versions of the site to your Google Search Console account

Here’s some htaccess code that I use - choose www or non www:

(if RewriteOptions inherit is present, add the force code below it)

#Force non-www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.stormdesignprint.com [NC]
RewriteRule ^(.*)$ http://stormdesignprint.com/$1 [L,R=301]

#Force www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^stormdesignprint.com [NC]
RewriteRule ^(.*)$ http://www.stormdesignprint.com/$1 [L,R=301,NC]

Note - it looks like you already have ‘force https’ in your .htaccess in which case I don’t think you will need https in the above code. You may wish to check this or talk to your hosting engineers about this however

Gabrielle · 9 May 2018 15:10

Manofdogz thanks for the info. So in that case I’ll have 2 sets of Google tracking id’s in my website rather than one, is that right?

manofdogz · 10 May 2018 23:12

No - the Google tracking id will be the same for both. Also, in Google Search console, select your preferred ‘canonical’ site and only add a site map for that one.

Gabrielle · 15 May 2018 13:42

Manofdogs, so I am already using the following code in my htaccess:
RewriteEngine on
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

so are you saying that i need to ALSO add in this code then?
#Force www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^stormdesignprint.com [NC]
RewriteRule ^(.*)$ https://www.stormdesignprint.com/$1 [L,R=301,NC]

They won’t conflict with each other will they?

Gabrielle · 15 May 2018 14:02

Also I took a look at sorting the canonoical side of things but in my search console it says:
Use Search Console to specify URLs on one domain as canonical over their counterparts on another domain. For example, example.com rather than www.example.com. Use this only when you have two similar sites that differ only by subdomain. Don’t use this for http/https counterpart sites.

so it’s saying not to follow through with that if you are going from http to https, what do you think of this?

manofdogz · 17 May 2018 11:11

Should be no conflict but don’t have “RewriteEngine on” twice. Rushing a round at the moment - will try to take a proper look tomorrow / weekend.

Gabrielle · 24 May 2018 10:59

Still getting nowhere with the canonical side of things, it’s confusing. Might leave it.

system · 23 June 2018 20:59

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do you prevent a page form beeing seen by any search engines Classic	21	2009	25 February 2016
Hide a page from being indexed by Google Classic	3	400	16 January 2022
Easy way to prevent robots indexing resource folders and content? Classic	7	1258	26 January 2018
How to stop Google from showing a now deleted site in search results? Classic	5	839	22 August 2016
Robots text to preclude indexing of one page only Classic	5	228	27 May 2021

How to hide pages using htaccess

Related topics