How to hide pages using htaccess


(Gabrielle Vickery) #1

Hi all :-).

I have now FINALLY loaded my own brand new website here: http://www.stormdesignprint.com.

Now I’m just doing all the backend stuff. I have lots of old files that are no longer valid so I’ve removed them from my server BUT I would also like to stop Google from indexing these old pages. So when I go here: site:www.stormdesignprint.com I can see a list of pages I’d rather not see, so what code do I add for those pages in my htaccess file? Is it that easy?

Many thanks for your time :-).


(Dave Farrants) #2

It’s a robots.txt file you need -http://www.robotstxt.org/robotstxt.html


(Gabrielle Vickery) #3

Hi Dave, I have sitemap plus in my website and I think it generates a robotstxt file, but I’m not sure how to actually use it.


(scott williams) #4

Robots .txt will stop the crawl from new and existing pages but it won’t necessarily remove them from googles index.

You can do it with google webmaster tools though


(Dave Farrants) #5

Apologies - I should have read post properly! Full info here: https://support.google.com/webmasters/answer/1663419?hl=en


(Gabrielle Vickery) #6

Ok thanks all, looks like I may as well leave things as they are.


(Joe Workman) #7

The robots.txt file will stop search engines from indexing a page initially. However, from my knowledge, if you are linking to that page from any other page that Google indexes, it will eventually get indexed. The only way to guarantee that a search engine does not index a page is to use meta tags directly on the page. Luckily there are settings inside the page inspector in RW7 that will do exactly that.


(Gabrielle Vickery) #8

Thanks Joe, but the pages don’t exist anymore in any project so I can’t use the page inspector. They are really old dummy pages that I used when I was creating a site but didn’t want the public to see them. I just noticed that they are still on Google from years ago. Don’t worry I’m sure it’s only me that will be slightly irritated by their presence, nobody else will care.


(LJ) #9

The most important thing to do is 301 redirects from the old pages to relevant new ones. Easy to do in cPanel if that’s what you have and should be available in other control panels. More fiddly in htaccess but no major deal. I just did a site:stormdesignprint.com search and there are pages showing 404 not founds - you definitely do not want these. So long as these old pages permanently redirect you are fine with Google.

Google will relegate these pages after a while and, even though they may be possible to find, they will not come anywhere near the first pages of Google results so in practice won’t be seen.

You can also request removal: so long as Google recognises there is no such page (i.e. make sure it has gone from your server) it will be listed for removal. You can do this is in Google Search Console - https://www.google.com/webmasters/tools/removals


(LJ) #10

A further thing you need to do in htaccess is decide whether your priority site is “www” or “not www” and set one to divert to the other.

Google effectively sees https:// and https://www as 2 different sites. As you have it now, Google sees these 2 sites and treats one as having duplicate content. This will water down your seo.

which you choose is up to you - there is no inherent benefit for one or the other. If you have always been known as www, it’s on your stationary, and the www version has better ranking, use this. If the other way around, use the non www. If it’s all the same, just choose the one you personally prefer.

Once you have done this, add both versions of the site to your Google Search Console account

Here’s some htaccess code that I use - choose www or non www:

(if RewriteOptions inherit is present, add the force code below it)

#Force non-www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.stormdesignprint.com [NC]
RewriteRule ^(.*)$ http://stormdesignprint.com/$1 [L,R=301]

#Force www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^stormdesignprint.com [NC]
RewriteRule ^(.*)$ http://www.stormdesignprint.com/$1 [L,R=301,NC]

Note - it looks like you already have ‘force https’ in your .htaccess in which case I don’t think you will need https in the above code. You may wish to check this or talk to your hosting engineers about this however


(Gabrielle Vickery) #11

Manofdogz thanks for the info. So in that case I’ll have 2 sets of Google tracking id’s in my website rather than one, is that right?


(LJ) #12

No - the Google tracking id will be the same for both. Also, in Google Search console, select your preferred ‘canonical’ site and only add a site map for that one.


(Gabrielle Vickery) #14

Manofdogs, so I am already using the following code in my htaccess:
RewriteEngine on
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]

so are you saying that i need to ALSO add in this code then?
#Force www:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^stormdesignprint.com [NC]
RewriteRule ^(.*)$ https://www.stormdesignprint.com/$1 [L,R=301,NC]

They won’t conflict with each other will they?


(Gabrielle Vickery) #15

Also I took a look at sorting the canonoical side of things but in my search console it says:
Use Search Console to specify URLs on one domain as canonical over their counterparts on another domain. For example, example.com rather than www.example.com. Use this only when you have two similar sites that differ only by subdomain. Don’t use this for http/https counterpart sites.

so it’s saying not to follow through with that if you are going from http to https, what do you think of this?


(LJ) #16

Should be no conflict but don’t have “RewriteEngine on” twice. Rushing a round at the moment - will try to take a proper look tomorrow / weekend.


(Gabrielle Vickery) #17

Still getting nowhere with the canonical side of things, it’s confusing. Might leave it.


(system) #18

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.