Now I’m just doing all the backend stuff. I have lots of old files that are no longer valid so I’ve removed them from my server BUT I would also like to stop Google from indexing these old pages. So when I go here: site:www.stormdesignprint.com I can see a list of pages I’d rather not see, so what code do I add for those pages in my htaccess file? Is it that easy?
The robots.txt file will stop search engines from indexing a page initially. However, from my knowledge, if you are linking to that page from any other page that Google indexes, it will eventually get indexed. The only way to guarantee that a search engine does not index a page is to use meta tags directly on the page. Luckily there are settings inside the page inspector in RW7 that will do exactly that.
Thanks Joe, but the pages don’t exist anymore in any project so I can’t use the page inspector. They are really old dummy pages that I used when I was creating a site but didn’t want the public to see them. I just noticed that they are still on Google from years ago. Don’t worry I’m sure it’s only me that will be slightly irritated by their presence, nobody else will care.
The most important thing to do is 301 redirects from the old pages to relevant new ones. Easy to do in cPanel if that’s what you have and should be available in other control panels. More fiddly in htaccess but no major deal. I just did a site:stormdesignprint.com search and there are pages showing 404 not founds - you definitely do not want these. So long as these old pages permanently redirect you are fine with Google.
Google will relegate these pages after a while and, even though they may be possible to find, they will not come anywhere near the first pages of Google results so in practice won’t be seen.
You can also request removal: so long as Google recognises there is no such page (i.e. make sure it has gone from your server) it will be listed for removal. You can do this is in Google Search Console - https://www.google.com/webmasters/tools/removals
A further thing you need to do in htaccess is decide whether your priority site is “www” or “not www” and set one to divert to the other.
Google effectively sees https:// and https://www as 2 different sites. As you have it now, Google sees these 2 sites and treats one as having duplicate content. This will water down your seo.
which you choose is up to you - there is no inherent benefit for one or the other. If you have always been known as www, it’s on your stationary, and the www version has better ranking, use this. If the other way around, use the non www. If it’s all the same, just choose the one you personally prefer.
Once you have done this, add both versions of the site to your Google Search Console account
Here’s some htaccess code that I use - choose www or non www:
(if RewriteOptions inherit is present, add the force code below it)
Note - it looks like you already have ‘force https’ in your .htaccess in which case I don’t think you will need https in the above code. You may wish to check this or talk to your hosting engineers about this however
No - the Google tracking id will be the same for both. Also, in Google Search console, select your preferred ‘canonical’ site and only add a site map for that one.
Manofdogs, so I am already using the following code in my htaccess:
RewriteEngine on
RewriteCond %{HTTPS} !on
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
Also I took a look at sorting the canonoical side of things but in my search console it says:
Use Search Console to specify URLs on one domain as canonical over their counterparts on another domain. For example, example.com rather than www.example.com. Use this only when you have two similar sites that differ only by subdomain. Don’t use this for http/https counterpart sites.
so it’s saying not to follow through with that if you are going from http to https, what do you think of this?