Google Sitemap Issue - Bot can't Recrawl...?!

Hello,
My website https://deborahgriceartist.uk/ is showing perfectly in other search engines like Bing, etc but on Google and Google’s Incognitio page it is showing a very old (5 month old) defunct site. I was advised to do as redirect 301 which I have. I looked today and my current, correct website doesn’t show at all (Upto yesterday it did show, there was a mix between the old and the new sites) I have been in touch with IONOS at length and they are baffled as they say I have loaded the website correctly. They sent me this email today, it wasn’t written very clearly: (please see below)
The code that they sent is already on the .htaccess area. I’m only a novice and am so frustrated. I asked for a Google redirect 5 days ago. I have attached some screen shots of the report that arrived today on my console page. If you can tell me how to fix this mess I’d be so grateful. Thank you very much indeed. Debbie


As per admins, the 301 redirect in deborahgriceartist.uk within .htaccess

RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.(.)$ [NC]
RewriteRule ^(.
)$ https://%1%{REQUEST_URI} [R=301,QSA,NC,L]

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

Kindly check and confirm on your next reply if the website are now correct.


I’m on my iPad right now,

This what Google has in there cache from December 31. If you try and select any of the other pages they can’t be displayed because to many redirects.

Can’t help much more from the iPad.

1 Like

Thanks for looking. I’ve been fiddling about with the site this evening. If you get a chance, please would you look again tomorrow? I’ve noticed that the webspace directory on IONOS is a right mess, could that have anything to do with the problem?

Thanks, Debbie

I’m not sure if it would have anything to do with it not crawling but having three folders “aboutme” would confuse what is the current live content. If you have just changed folder name sinRW then it’s possible the other “about” folder would be no longer required. Only you will know what you did with folder names to answer that question.

Remember that RW only uploads files and overwrites files of the same name to the web server, it never deletes folders or files.

Is some of the mess the “old pages” that Google had been showing?

Here’s the thing about search engine indexes, if they find something, they index it. They don’t rely on sitemaps as being gospel as to what’s on a site. They’ll use a search engine strictly as a starting point.

Heck, you don’t even need a sitemap at all to have a site or pages indexed by search engines. Part of what search engine robots do is look for links on the page and catalog those links and then “crawl” and index those pages.

So if a search engine knows that a site or pages exist, or another page still has a link to that page, they will periodically go rescan the page and as long as they still find it they keep it listed and crawl it’s links again.

So there’s a few ways to remove an old URL (page) from search engines listings.

  • Remove the URL so it doesn’t exist. If you link to it, the browser returns a 404 not found. You’ll lose any and all search engine “juice” or rankings doing it this way. It also creates an awful user experience.
  • Add a robots noindex meta tag. This leaves the content but “requests” search engines not to index its content.
  • add a robots.txt file at the root level of the domain with directives “requesting” not to be indexed.
  • Do permanent 301 redirects to new URL’s. This will help keep at least some of the search engine “juice” that the old URL’s had. It also provides the best user experience.

All of these methods aren’t Instantaneous in removing old URL’s from showing up on SERPs. It takes time for these changes to propagate through the massive databases search engines keep.

So here’s the thing, I’m still on my iPad but I don’t see any redirects in the screenshots above. The only htaccess listed simply does the https and the www stuff, but no old page to new page type. So did you do them some other way? Or is that listing not your htaccess file?

Since with the google cached version of your site I get messages about pages not being able to be retrieved because of too many redirects, I’m guessing things are broken in however you’ve done the redirects.

I’d also be very concerned that you said you’ve used 301 permanent redirects and they could be broken. See the thing about 301’s that give you at least partial credit with search engines is the are considered permanent, meaning they have no expiry date. So they can be cached forever.

So, if you want help figuring out what’s going on here, we’ll need more information on how you did the redirect and copies of the htaccrees file and or screenshots of any control panel redirects you have done.

If you are going to copy code (htaccess file) into the forum don’t forget to mark it as preformatted text </>.

1 Like

Hi Teefers,

Thanks for your input. I’m so stuck with this so I really appreciate your time.
I have tidied the webspace directory and have moved everything in to a new folder.
Please see screen grab.
I have also attached a grab of the redirect .htaccess file which shows that I sent a redirect on the 8th.

The current .htaccess file is as above, should the redirect code still be in there? Did I do it correctly?

As far as I can tell from my meagre understanding the website and files in IONOS are all tidy. It’s just getting Google to see that there have been changes that’s my problem. Please can you tell me what to do next? Thank you, Debbie

Hi Pyrobit,

I spent some time yesterday tidying my IONOS website directory so that there are no duplicate file names. It looks like this now. Is this better? Thanks, Debbie

Hi, What’s this file please? Thanks, Debbie

That folder is where RW places a zipped backup file of your project. If you go into it you may find several files for different versions of your project. If you are happy that you have backups of your project on your computer then you can delete the files in this folder and turn off “make a backup” in the publish settings of RW

Hi @teefers @Pyrobrit, This is my update .htaccess file. Does it look please? Thanks, Debbie

Hi Debbie,

If your website still works correctly with no errors and all links are ok then you have done a good job. I cannot help you with the answers on google crawling your website, I’ve never checked to see what google is doing to my websites. I guess it’s about time I did.

The screenshot of the htaccess alone isn’t enough information.

I do see what I think are issues. But to give you a proper answer I’ll need more information.

For example, you have a redirect to a 404 error document, any the next line down you set the 404 page to the document that you just redirected away from?

You have a general redirect (everything) to https://deborahgriceartist.UK/, not sure what that rule is doing here (I’m assuming this screenshot is from that website).

It would be a lot easier if you could tell me what you want to redirect from where.

I’m going to be a cheeky thing now, and am happy to pay for any help as I don’t expect you to do anything for free. But I’m not sure what to write in the box - please could you write me some code?
All I want is for my current website to show in google and bypass the old website.

The domain hasn’t changed, just some of the pages and folders have as I learnt a bit more about optimising them. I’m totally self taught. I do see that the Google search’s ‘failed’ site map links do now point to a 404 page. However, it is a generic one, not the one I have designed. I’m not sure how to make that happen. (I shall attach a screen grab of the 404 error page info I designed in case I’ve done that wrong.

The bottom line is that if you click on deborahgriceartist.uk the home page and links with in that site are perfect and all is well. All I want to happen is to get rid of the old goggle sitemap links (which I understand will take time) and whilst I’m waiting, for those ‘links’ to go to a personalised 404 error page or link (if possible) to my updated site.

Current .htaccess panel also attached.

That’s it really - It sounds simple but I’m very stuck.

Thanks so much Teefers.

I recently (about a year ago) rebuilt my website in RW. I only had a small number of pages so wrote individual rewrites for each page to point to the new page as shown below. I’ve added my https rewrites in as well so you can see what I have.

Redirect 301 /about.htm https://www.cotswoldfireworks.co.uk/about/index.html

Redirect 301 /contact.htm https://www.cotswoldfireworks.co.uk/contact/index.html

Redirect 301 /gallery.htm https://www.cotswoldfireworks.co.uk/gallery/index.html

Redirect 301 /news.htm https://www.cotswoldfireworks.co.uk/index.html

Redirect 301 /testimonial.htm https://www.cotswoldfireworks.co.uk/index.html

Redirect 301 /weddings.htm https://www.cotswoldfireworks.co.uk/weddings/index.html

<IfModule mod_rewrite.c>
  RewriteEngine on
  RewriteCond %{REQUEST_FILENAME} !-f
  RewriteCond %{REQUEST_FILENAME} !-d
  RewriteRule .* https://www.cotswoldfireworks.co.uk/index.html [L]
</IfModule>

<IfModule mod_rewrite.c>
  RewriteEngine on
  RewriteCond %{SERVER_PORT} 80
  RewriteRule ^(.*)$ https://www.cotswoldfireworks.co.uk/$1 [R=301,L]
</IfModule>

Okay,

it looks like you want to remove www and foce https.

Here are some simple generic rules for doing that. I have used this all over the internet; you don’t have to change a thing.

#  Force HTTPS and Remove WWW (this line is a comment)
RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} ^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://%1%{REQUEST_URI} [L,NE,R=301]

As for the 404 page, looking at the way you set it up, you have it in a folder called Error, so this should work:

ErrorDocument 404 /Error/404.html

I always use all lowercase ANSI characters for folder names but, so If you want to change that to error just change the case in the ErrorDocument statement.

So a complete replacement htaccess file would look like this:

#  Force HTTPS and Remove WWW (this line is a comment)
RewriteEngine On
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} ^www\. [NC]
RewriteCond %{HTTP_HOST} ^(?:www\.)?(.+)$ [NC]
RewriteRule ^ https://%1%{REQUEST_URI} [L,NE,R=301]
#  404 Error page assignment 
ErrorDocument 404 /Error/404.html

Hi again,

Thank you so much for the code - Wow!

I have done exactly what you asked, and the 404 error page works perfectly. Do I need other error pages or just the one - I’ve seen them for 401, 403, 500 etc, but I’m not sure why I might need them.

Also, I haven’t initiated Nick’s (above) advice as it seems yours might be all I need? Am I correct? I could have a go at using it (it might need checking though :grimacing:) if that is what is required. I took out all the other code in the .htaccess file and just have yours now. Is that what you intended?

Your help is much appreciated.

Thank you very much for your help and the code. I shall have a go at transferring my details in your code and see what happens…it looks over the years of ignorantly faffing with my website I have tied myself into a knot…I’ll let you know how I get on. Thanks again. Debbie

Hi, I’ve played with the .htaccess code that Nick sent. I have attached a screen shot alongside a grab of what I have called the file/ folder names in the RW site. I have put the same information in the same place for each line, so if I have got it wrong on this example, it’ll be wrong in them all (but simple to fix). Google did a recrawl on the 11th and still found the old pages (!), I hope that doing this might solve all that. if you agree to me putting this code in place, then would I just put it under the code that you gave me earlier? As I mentioned, I may not need this code after having used yours - I just thought I’d like to show some willing… thanks

Hi,

I’ve had a play: does this look ok to you? I’m not sure about the file and folders names being in the right place. A google re-crawl on the 11th still found the old pages, its really aggravating. I just want a simple and clean presence on Google.

I’ve looked at your website - what an interesting line of work you are in. I do hope you had a successful New Year. I’m sure when they get this Covid situation under control you will be inundated with business!

Thanks, Debbie

As a rule, I don’t like to comment on htaccess files unless I can see the entire file. The reason for this is something “upstream” on the file can affect what’s happening “downstream”, and you can easily get yourself into a loop.

It’s difficult to read screenshots unless it’s only a couple of lines of code.

So if you would like my input (I’m happy to provide) then please feel free to copy and paste the content here on a post. Just make sure you have a couple of blank lines paste the code in and select the code and hit the pre-formatted text button above where you type. </>

You can if you like, those aren’t common errors.

Be carful with htaccess files

The htaccess file is a very powerful thing. They’re also called more accurately the Apache Web Server local directives file. You can and are reconfiguring the web server on the fly.

Most of what I see in folks htaccess files is a hodgepodge of stuff that they found here and there.

For example in some of what you have in your last screenshots, you have two <IfModule mod_rewrite.c> blocks. The only reason to have a <IfModule mod_rewrite.c> block is to allow Apache to skip processing a section of directives if it’s not currently processing that module. You don’t usually even need one block, but if you want those two blocks should have been combined.

Also, be careful playing with 301’s. I always test(and retest) redirects with 302’s. And when I’m confident that everything works correctly I change them to 301’s and retest.

301’s are permanent.

2 Likes