Jul 1, 2022
What is the process for getting clean URLs (without subdomain and extension) indexed?
I prefer clean URLs... so my internal links lack the
.html
or .php
extension, and I'm omitting the www.
subdomain. The page on the server might be my-page.html
or my-page.php
, and the clean URLs are resolved by .htaccess
rules so the browser finds the page and shows example.com/my-page
. This makes me smile.For consistency, I entered the clean URLs as canonical links and in the sitemap (as
https://example.com/my-page
) that I submitted to Google. My hope was to see these clean URLs used everywhere consistently.Unfortunately, I now see a mess in Search Console. Some indexed pages still have
.html
. Some still have www
. Some pages were not indexed.I'm attempting to clean up the mess by resubmitting individual URLs, but I have dozens to fix. It seems each link has to be submitted separately. It takes about 30 seconds to submit each one, and I hit my submission cap each day. This will take forever.
There must be a better way to clean up the mess. Any suggestions to make it easier?
Also, did I make the wrong choice with the canonicals and sitemap? If so, what should I have done?
Here's what I have in
.htaccess
:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} ^www\. [NC]
RewriteCond %{HTTP_HOST} ^(www\.)?(.+)$
RewriteRule (.*) https://%2%{REQUEST_URI} [R=301,NE,L]
# if x.php is a file, add .php to x
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule !.*\.php$ %{REQUEST_URI}.php [NC,QSA,L]
# if x.html is a file, add .html to x
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule !.*\\.html$ %{REQUEST_URI}.html [NC,QSA,L]
# if x.index.html is a file, add index.html to x
RewriteCond %{REQUEST_FILENAME}\index.html -f
RewriteRule !.*\index\.html$ %{REQUEST_URI}index.html [NC,QSA,L]
Community content may not be verified or up-to-date. Learn more.
Last edited Jul 2, 2022
All Replies (4)