Google Sitemap Issue - Bot can't Recrawl...?!

Hi, Here is the code that the gentleman above gave me to use. I won’t use it if you don’t think I should…I just thought it may help tidy things up, but I’m 100% guessing. My file/folder arrangement is attached.

 Redirect 301 /portfolio.htm https://deborahgriceartist.uk/painting/mywork.html

Redirect 301 /paintings.htm https://deborahgriceartist.uk/paintings/oncanvas.html

Redirect 301 /miniatures.htm https://deborahgriceartist.uk/miniaturepaintings/onpanel.html

Redirect 301 /drawings.htm https://deborahgriceartist.uk/drawingsonpaper/mixedmedia.html

Redirect 301 /prints.htm https://deborahgriceartist.uk/limitededition/intaglio.html

Redirect 301 /cv.htm https://deborahgriceartist.uk/about/cv.html

Redirect 301 /biography.htm https://deborahgriceartist.uk/aboutme/history.html

Redirect 301 /contact.htm https://deborahgriceartist.uk/submit/hello.php



<IfModule mod_rewrite.c>
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteRule .* https://deborahgriceartist.uk/index.html [L]
</IfModule>

<IfModule mod_rewrite.c>
  RewriteEngine on
  RewriteCond %{SERVER_PORT} 80
  RewriteRule ^(.*)$ https://deborahgriceartist.uk/$1 [R=301,L]
</IfModule>    `Preformatted text`

@teefers Please can I ask you what is going on here? I’ve not had this warning before…it’s from the google search console. I have an instagram connect plugin, looks like it’s creating issues. Not sure about the other problem.

I think and someone with more knowledge may give you a better answer but the top picture about instagram is google following one of your links which takes it to the instagram servers at which point the instagram server has a robots.txt file telling google that you cannot index the content any further.

The second picture is a reference to a font on your website is using https instead of https to get the font and google is telling you that all content to be referenced should be over https. How your find where that font reference is will be the part I cannot help you with.

1 Like

Had a big issue with doing something similar

Okat,

The redirects(301’s) you added assuming that you have tested them and they work okay should be good. It’s always best practice to give a specific page to the equivalent new page. So that is a good thing.

Now I gave you code above that works great for the removing of the www and redirects to https along with the proper ErrorDocument 404 directive, why did you remove that and replace it with code that doesn’t remove the www and removes the use of an error document?

This is the “hodgepodge” I was talking about.

Let have a look

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule .* https://deborahgriceartist.uk/index.html [L]
</IfModule>

Do you know what this code does? It takes any request that isn’t a found file (!-f) or directory (!-d) and does a mod_rewrite without a return code to the home page. This is a bad thing for multiple reasons. It creates a poor user experience, you will get an SEO penalty for duplicate content, and assuming the host has any intrusion detection running, you will stop sending the proper HTTP status codes for things like "Unauthorized”, “forbidden” and other errors.

It’s always best from the user experience side to either redirect someone to an equivalent page (like you are doing with the 301’s above) or tell them that there is a problem with the URL(the custom 404 page).

So I’d get rid of that part for sure.


Now the next directive at the end:

<IfModule mod_rewrite.c>
RewriteEngine on
RewriteCond %{SERVER_PORT} 80
RewriteRule ^(.*)$ https://deborahgriceartist.uk/$1 [R=301,L]
</IfModule> 

This works, it’s an old (early Apache days) way of redirecting by port. Port 80 is the standard port for http and it’s redirecting to a https. The problem is it doesn’t address the removing or adding www. The code I gave above does the same thing but also in a single directive takes care of removing the www. If you don’t either force www or remove it, search engines will treat the www and non-www versions as duplicate content and penalize your rankings.

Okay, I’m giving these details for other folks as well. So here is the htaccess file I’d start with for you site. That’s assuming that you have tested the 301 redirects.

# Disable directory browsing (security) 
Options -Indexes
#  Force HTTPS and Remove WWW (this line is a comment)
RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} ^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://%1%{REQUEST_URI} [L,NE,R=301]
#  404 Error page assignment 
ErrorDocument 404 /Error/404.html
# set page redirects
Redirect 301 /portfolio.htm https://deborahgriceartist.uk/painting/mywork.html
Redirect 301 /paintings.htm https://deborahgriceartist.uk/paintings/oncanvas.html
Redirect 301 /miniatures.htm https://deborahgriceartist.uk/miniaturepaintings/onpanel.html
Redirect 301 /drawings.htm https://deborahgriceartist.uk/drawingsonpaper/mixedmedia.html
Redirect 301 /prints.htm https://deborahgriceartist.uk/limitededition/intaglio.html
Redirect 301 /cv.htm https://deborahgriceartist.uk/about/cv.html
Redirect 301 /biography.htm https://deborahgriceartist.uk/aboutme/history.html
Redirect 301 /contact.htm https://deborahgriceartist.uk/submit/hello.php
2 Likes

Thanks for the lesson. Very useful.

My first bit of code was done as at the time I wanted people to get a page of some kind even if they misspelled or got sent to areas of the site that did not exist. I guess it could create a poor user experience. Not really what I was thinking about at the time.

Cheers!

1 Like

Hi, I’m starting to get it now, thank you for your reply, it is very much appreciated. Today I renamed all my jpgs, titles (apart from drawings page) and alt tagged them all. I have one page left to do then I thought I could do the 301 redirect, with the site as finished as it can be.

But, I’ve come across something that’s really made my head spin: it was when I was checking the 301 redirects on https://www.redirect-checker.org/index.php

These pages worked beautifully: https://deborahgriceartist.uk/painting/mywork.html https://deborahgriceartist.uk/about/cv.html

However the others don’t work. When I went and looked in the address bar at the page URLs they are Miniatures | DEBORAH GRICE
Miniatures | DEBORAH GRICE
Drawings | DEBORAH GRICE
Where is the ‘painting’ element of the URL coming from. My folders in RW look like this:


Also, one thing that I think I’ve guessed is the folder/file name from '/drawings.htm in the 301 redirect - what folder should it connect to?

Redirect 301 /drawings.htm https://deborahgriceartist.uk/drawingsonpaper/mixedmedia.html

:roll_eyes: SO sorry to be a total nuisance. :roll_eyes:

Thank for your patience with me. Just thinking aloud, would it be easier to just have a total 301 redirect than individual pages? Not sure what the code would be (no surprise there…)

PS. I’ve found where the 'painting’s element has come from, I found it in the IONOS folder. I’m not sure what to do next to solve the 301 redirects and to make them work…sorry.

I’ve never heard of this tool. I took a quick look at it and don’t know what it is supposed to check.

I mentioned a couple of times you need to very careful about 301’s. So it could be a cache issue or it could be the tool or something else.

I’m assuming that you put the redirects live on the site? I’d recommend removing them right away if they are still set to 301’s. You can dig yourself a really bad hole with bad 301’s.

Change all the 301’s to 302’s until you are positive beyond any doubt that they are right.

I don’t know all the URLs of your site or where you want them redirected to.

I start with a list of every old full URL, every way then can be entered(HTTPS with and without www, HTTP with and without www)

I use made with love to test redirects before I ever even think about putting them live.

You just copy and paste the htaccess file into the main area on their page, then one at a time I paste a URL into the top line and hit the test button. It will show the new URL along with what rule got taken. I then copy the new URL into a browser tab and make sure it works.

I do that with 302’s, the I retest every URL live, then change them to 301’s and retest every URL again.

Hi Doug,
I haven’t added the 301 redirect code to .htaccess as I knew it wasn’t working. The code in there at present is as follows:

Options -Indexes

RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} ^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://%1%{REQUEST_URI} [L,NE,R=301]

ErrorDocument 404 /error/404.html

I think the best way I could be helped here is if you could tell me, please, the code to redirect:

https://deborahgriceartist.uk/painting/OIls.html to Paintings | DEBORAH GRICE and
CV | Deborah Grice to CV | Deborah Grice

I could then see your ‘workings’ and do the rest myself. I think I’ll have to go in to my hard drive to find all the old URLS but this would be a superb starting point for me. None of the 301 code work in the madewithlove site so I’m doing something very wrong…

Hi @Pipspaws.

I’m going to show you how I do the redirects. It’s different from the single line Redirect statement. I use Mod_rewirite's for most everything.

So for the two redirects that you asked about here is the code (make sure you leave the RewriteEngine On before these) :

RewriteRule ^painting/OIls\.html$ https://deborahgriceartist.uk/painting/paintings/oncanvas.html? [NC,R=302,L]
RewriteRule ^about/biography\.html$ https://deborahgriceartist.uk/about/cv.html? [NC,R=302,L]

Now I have them set for testing with a return code of 302([NC,R=302,L]). Once your testing is complete, then change that to a [NC,R=301,L] and retest. I noticed that the first one you had the old URL with mixed case painting/OIls so I set the compare to be case-insensitive([NC,). I aso noticed that you aren’t using tidy links for the new addreses?

Now how I test redirects I put a excel spreadsheet with full URL’s to the old address everyway they could have been entered.
So the first old address could be entered as

https://deborahgriceartist.uk/painting/OIls.html
http://deborahgriceartist.uk/painting/OIls.html
https://www.deborahgriceartist.uk/painting/OIls.html
http://www.deborahgriceartist.uk/painting/OIls.html

So I would test each one of these with toe 302 set and then change it to a 301 and retest.

2 Likes

Hi, Thanks, Doug for the above. I’ll have a play and see what I can do.

I do have the tidy link box ticked thoguh, but it doesn’t work for me…? Here’s a screen shot…

When you use tidy links you should leave the file name set to index.html. you only change the folder name:

So each page is in a folder of it’s own and the so you don’t need the index.html as part of the URL.

With tidy links you would use a URL like this:
https://deborahgriceartist.uk/painting/paintings/
and that would actually open the page:
https://deborahgriceartist.uk/painting/paintings/index.html

The shorter URL makes it easier for users to read and can help with seach engines.

Right thanks, I’ll sort this out now… :grinning:

Hi again, Thank you I’ve done the above and it’s worked :muscle:t3: I’ve also been and tidied the folders in IONOS. I’ve tried to update your code with the new tidy website urls and I can’t get it to work in madewithlove. What have I done wrong? Does it work for you please…

RewriteRule ^painting/OIls\.html$ https://deborahgriceartist.uk/portfolio/paintings/html? 
 [NC,R=302,]L]

I get this warning:

Okay,

Did you or didn’t you do tidy links?

If you did tidy links then something isn’t working right. I think you need to do a republish all files.

When I go to https://deborahgriceartist.uk/painting/paintings/ I get this:

So if you think you got all the folder names squared away, then do a Re-Publish All Files:
2021-01-15_14-44-34

As for the htaccess, the RewriteRule has to be a single line, the [NC,R=302,L] needs to be on the same line as the rest of the rule. You also need to end the new URL at the last / (after the folder name). You left the html part of the new file name.

So here is that same one but for a tidy link URL:

RewriteRule ^painting/OIls\.html$ https://deborahgriceartist.uk/painting/paintings/? [NC,R=302,L]

Hi, the new tiny url is actually: https://deborahgriceartist.uk/portfolio/paintings/ and works, but even modifying it in to your code above (thank you), I can’t get the madewithlove site to work, which leaves me to thinking I’m not using it properly. I would have thought I would need the code to be from what I have learnt from you (all one line):

RewriteRule ^painting/OIls\.html$ https://deborahgriceartist.uk/portfolio/painting/? [NC,R=302,L]

I’ve tried it in the actual .htaccess file and I get an error 500 when it goes live. I’m sorry I must be really testing your patience…

in the post where you included the error screenshot (this rule was not met) the rewrite rule ended in portfolio/paintings/html?

If you look at teefers original, the html? bit should be the full html filename. S if you are using tidy html then it would become portfolio/paintings/index.html?

The line of code from above, I tested with made with love before I posted it here.

Again:

RewriteRule ^painting/OIls\.html$ https://deborahgriceartist.uk/painting/paintings/? [NC,R=302,L]

If you are getting a 500 error then it’s more than likely non-plain text in the htaccess file. Make sure that you use a real PLAIN text editor, not a word processor or even TextEdit.

https://deborahgriceartist.uk/painting/OIls.html Gets redirected to https://deborahgriceartist.uk/painting/paintings/

I just retested and it works fine.

1 Like

@Pipspaws,

I’m still getting errors on https://deborahgriceartist.uk/painting/paintings/

Did you republish all files?

Hi Doug,
Thanks for you response. Yes I’ve republished all sites. There was an error in the address that I gave you: Paintings | DEBORAH GRICE is correct. My fault, sorry.

I have downloaded CotEditor to help with plain text as I was using text editor as you predicted.

I’ve just put the correct code in to madewithlove and it says ‘this rule was not met’, what am I doing wrong - it must be because I’m not sure how to use it.

I just put this code into .htaccess and this link on google sitemap:
RewriteRule ^painting/OIls\.html$ https://deborahgriceartist.uk/portfolio/paintings/? [NC,R=302,L]

When I clicked this sitelink on google:

it goes to Paintings | DEBORAH GRICE with the 404 error page. Not to the new page…

I have, as you advised opened an excel page and typed up my old urls and put their new destinations next to them…ready for when I get the 301 redirect bit to work.