Should I use a file extension or not?

But this question reminded me of it.
When I have a URL on my website it can be displayed and accessed any of the following ways:
http://www.somesite.com/subdirectory
http://www.somesite.com/subdirectory/
http://www.somesite.com/subdirectory/index.htm
http://www.somesite.com/subdirectory/index.html
http://www.somesite.com/subdirectory/index.php
http://www.somesite.com/subdirectory/index.asp
http://www.somesite.com/subdirectory/some-relevant-keywords
http://www.somesite.com/subdirectory/some-relevant-keywords.htm
http://www.somesite.com/subdirectory/index.php?page=some-relevant-keywords
http://www.somesite.com/subdirectory/?page=some-relevant-keywords
http://www.somesite.com/subdirectory/?page=some-relevant-keywords&even=more-keywords
etc...
Now, I can understand the merits of adding keywords in the URL. Even the most basic SEO guide will mention to do just that. ... but for the sake of sanity, clarity, ease of reading, ease of use, and so on,including web compliance ...
Is it preferred to have a file-extension or not?
Really, deep down my logic tells me: yes, it should. The reason being is this stems back to the days of the past when the internet was mostly USENET, FIDONET, FTP and GOPHER.
See, if a URL has no filename, then it normally is considered a directory. This is where index.htm came about, because this by default lists the directory if no index file is found. However, soon enough, web programmers started overriding this and using index.htm to actually serve the content of that web directory as a page. The main difference, was markup language was added in, and this was parsed in the browser. With this markup language, the Content-Type:text/html; tag in the response header became the indicator to what filetype it was for any file. HTML seems to be the only "filetype" that just doesn't have consistently named extensions, except for when they are saved.
Unfortunately, once web pages became the main thing, it became a security error to actually display the directory contents, so everything stayed hidden with only the actual URL content being displayed.
Not to mention the cross-platform file-naming wars.. windows based require a 3 or less digit extension, and unix/mac can have more. So should it be .HTM or .HTML or NONE and let the platform decide?
So in essence, I guess what I am trying to figure out is beyond SEO and dealing more with aesthetics and web compliance.

ANSWER:-

Use a .extension where there is more than one representation or where the client software is absolutely stupid and refuses to accept the Content-Type alone (QuickTime, RealPlayer, Outlook, etc I am looking at you):
  • http://www.somesite.com/subdirectory - this can be your auto-negotiation version that uses Canonical META tags to point to the actual representation
  • http://www.somesite.com/subdirectory/ - it is always worth supporting trailing slashes on any URL but using Canonical META tags (not redirects as this is an unnecessary slow down) to point to the correct URL
  • http://www.somesite.com/subdirectory/index.htm andhttp://www.somesite.com/subdirectory/some-relevant-keywords.htm - the three character extension limit doesn't apply to HTTP (only the underlying FileSystem/OS) so the client can save this as index.html or a.a if they wanted to, whilst still being able to access it
  • http://www.somesite.com/subdirectory/index.html - if you serve a .atom, a .xml or similar version then it makes sense to also honour the .html version (and Canonically link to it via LINK tags on the auto-negotiated version) - use HTTP Content-Location headers to point to the auto-negotiation version though - remember you can also go multi-lingual (.en, .es, etc...) or multi-charset (.utf8, .utf16, etc...)
  • http://www.somesite.com/subdirectory/index.php andhttp://www.somesite.com/subdirectory/index.asp - unless you are serving the source code then these make no sense to support
  • http://www.somesite.com/subdirectory/some-relevant-keywords - SEO is a constantly changing art and if this works for you then great
  • http://www.somesite.com/subdirectory/index.php?page=some-relevant-keywords,http://www.somesite.com/subdirectory/?page=some-relevant-keywords andhttp://www.somesite.com/subdirectory/?page=some-relevant-keywords&even=more-keywords - if there are an infinite number of ways to manipulate the content then this is great - but usually pages deserve their own URL not a query string and these type of URLs are to be avoided (try getting someone computer illiterate to type one of those in)

0 comments:

Post a Comment

Don't Forget to comment