Feb 5, 2024

mulit country magento store setup Robots, sitemaps canonicals and hreflang


Hi all i have a new site that i created on magento with 5 stores
CA
US
UK
EU
INT (default - no pricing) 
(canonical is on page. hreflang is in sitemaps)

Right now i am getting constant duplicate content issues as well as google indexing the /us/ site links only no matter where in the world you are searching (UK search results are /us/ urls.

My setup is

robots.txt points to a  Sitemap: https://www.psbspeakers.com/media/sitemap/int/sitemap.xml
in that site map i have the site map set up like

<url>

<loc>https://www.example.com/int/passif-50</loc>
<changefreq>daily</changefreq>
<priority>0.5</priority>
<image:image>
<image:loc>https://www.psbspeakers.com/media/catalog/product/p/s/passif-50-front-3-4-pair-on-stands-left-grill-scaled-e1656525484796.jpg</image:loc>
</image:image>
<lastmod>2024-01-26</lastmod>
<xhtml:link hreflang="en-GB" rel="alternate" href="https://www.psbspeakers.com/uk/passif-50"/>
<xhtml:link hreflang="en-IE" rel="alternate" href="https://www.psbspeakers.com/eu/passif-50"/>
<xhtml:link hreflang="en-CA" rel="alternate" href="https://www.psbspeakers.com/ca/passif-50"/>
<xhtml:link hreflang="en-US" rel="alternate" href="https://www.psbspeakers.com/us/passif-50"/>
<xhtml:link hreflang="x-default" rel="alternate" href="https://www.psbspeakers.com/int/passif-50"/>
</url>

i am ip redirecting entries from CA, US, UK, EU to there own store that all have their own robots and sitmap 


so my question is:
Why am i getting duplicate content errors?
Why is google only indexing /us/ urls no matter the country the searcher is in.?

Locked
Informational notification.
This question is locked and replying has been disabled.
Community content may not be verified or up-to-date. Learn more.
Last edited Feb 5, 2024
All Replies
Feb 5, 2024
Ok, a few points

1) hreflang does NOT 'avoid' duplicate content issues. Google will typically STILL only choose one version to index. It wont magically index all the variations separately. 

2) You need to make sure all the pages have their own sitemap entry. ie will need five <url> blocks, one for each version of the page. And each of those blocks would then in turn have five <xhtml:link>

3) The 'Performace' report in the console only reports the 'canonical' URL, NOT hte URL shown in search results.
 So even when hreflang is working perfectly, it will report a a UK visitor as having visited the /us/ page, even if they DID see the /uk/ URL in their search result. 

4) Auto redirects can make things MUCH worse. Ie if when Google tries to crawl the UK url, it gets blindly redirected to /us/ (because Googlboe in US!) - then it can't read the /uk/ page. It can't validate that the hreflang is correct, so will almost certainly render hreflang ineffective!


So in short: indexing just /us/ is probably to be expected. hreflang may or may not be working (hard to tell via the console), but likely not if there are auto-redirects. 

Feb 6, 2024
Doesnt matter if they are in seperate sitemaps, or a single combined one. And doesnt matter if submitted in multiple. 

Just that all 5 URLs (incase of "passif-50" example) need to be submitted at least once individually, each with hreflang references. 
Feb 6, 2024
If seeing the 'user declared canonical' as /us/, when it crawled the /ca/ URL it must of followed the rediret. 

... ie the canonical was extracted from the page AFTER following the redirect. Alas dont have the 'view crawled page' button to see it. 
false
6833680357971990496
true
Search Help Center
true
true
true
true
true
83844
Search
Clear search
Close search
Main menu
false
false