Main Page Content
The Xhtml Transition It S Not That Difficult
Now that XHTML 1.0 is W3C's Recommendation for the latest version of HTML, you should have started to prepare your code for it. You're already coding to the HTML 4.01 recommendation and validating your code (well if you aren't you should start *now*), so all you need is to know how to make that transition? Not to mention when? But let's start with why:
Why the transition to XHTML?
XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4. The XHTML family document types are XML based, and ultimately are designed to work in conjunction with XML-based user agents. Well-designed HTML documents that distinguish structure and presentation will adapt more easily to new technologies. Not to mention the fact that some of the most popular elements of the past are deprecated today, going on obsolete.
XHTML is a reformulation of the three HTML 4 document types as applications of XML 1.0. It is intended to be used as a language for content that is both XML-conforming and, if some simple guidelines are followed, operates in HTML 4 conforming user agents.
The XHTML family is the next step in the evolution of the Internet. By migrating to XHTML today, content developers can enter the XML world with all of its intended benefits, while still remaining confident in their content's backward and future compatibility.
The Route
There are three steps on your way to perfect XHTML coding. If you haven't been coding to the HTML 4.01 recommendations this would be your first step. Next, you make little adjustments to your coding habits, while still validating your code against the HTML 4.01 recommendations. Finally, you make the complete transition, by changing the HTML Version Information in your DTD declarations.
Step One: Coding to the HTML 4.01 Recommendations
Adding HTML Version Information
In your leap to HTML 4.01, your first amendment to your code is adding HTML version information at the top of each document. Here you have three document types to choose from: strict, transitional or frameset.
- The HTML 4.01 Strict DTD includes all elements and attributes that have not been deprecated or do not appear in frameset documents. For documents that use this DTD, use this document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd"> - The HTML 4.01 Transitional DTD includes everything in the strict DTD plus deprecated elements and attributes (most of which concern visual presentation). For documents that use this DTD, use this document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"> - The HTML 4.01 Frameset DTD includes everything in the transitional DTD plus the tags for frames. For documents that use this DTD, use this document type declaration:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">
Deleting your font elements and your color/alignment attributes
The second adjustment you make is deleting your font
elements, not only from your documents, but from your mind as well. The essential change between HTML 3.2 and HTML 4.0, and then 4.01, is separating presentation from content. Therefore most elements dealing with presentation are deprecated, in favor of Cascading Style Sheets (CSS). For the same reason, color and alignment attributes should also be removed.
Adding the title
attribute
Another adjustment is an addition, both to your code and your mind. Use the title
attribute basically everywhere. In your anchors, your abbreviations and anywhere you feel an explanation might ease the accessibility to your content.
Increasing accessibility for people with physical limitation
I've already mentioned the title
attribute, but you can do so much more. The alt
attribute, the accesskey
attribute, the lang
attribute, the label
attribute. Use them all.
Remembering the meta tags
Meta tags are good for many things with regard to specifying information about the content on a page. Mark your audience by defining the content-type and the content-language of your HTML page. Examples:
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
<META http-equiv="Content-Language" content="en-us">
Replacing your name
attributes with id
's
The final adjustment is a replacement, still both to code and mind. Wherever you have used the name
attribute in the past, start using the id
attribute. The id
attribute uniquely identifies any item in your content, which is pretty useful, not only for increased CSS usage, but also for marking destination anchors of links. This is also important with regard to XHTML transition further on, since the name
attribute is deprecated in XHTML within the a
, applet
, form
, frame
, iframe
, img
and map
elements. Caveat: Support for this behavior of ID
is shaky in earlier browsers, also including a NAME
for anchoring might be a good idea, for backwards compatibility.
Step Two: Adjusting your code to XHTML, but not your DTDs
The second step in your XHTML transition is to add those eccentric XHTML features to your HTML code. XHTML documents must be well-formed. This means that all elements must be nested correctly, have closing tags or be closed in the empty tag with a space and a slash ( />).
Keeping the tags lowercase
XML is case-sensitive and, therefore, it is necessary to lowercase all HTML elements and attributes when used in XHTML documents. This also includes cascading style sheets.
Closing and correctly nesting all tags
If an element is made up of opening and closing tags, use the closing tag. Even those that have been marked optional in past versions of HTML. It is equally as important to nest tags correctly, to close the previously opened <em>
before closing the paragraph it resides in.
"space-slashing" empty tags
Space-slashing means adding a space and a slash at the end of all empty tags - tags that don't have closing tags. This is an assistant indicator for XML that the tag has ended. The XML specifications claim that you could add a closing tag to those empty tags, but as I understand it, the support for that is shaky at best. The reason for adding space-slashing is mainly for backwards compatability, elderly browsers might choke on your page when you don't.
Wrap attribute values in quotes
All attribute values must be quoted, even those which appear to be numeric. Example:
border="1"
.Adding the lang
and xml:lang
attributes
Use both the lang
and xml:lang
attributes when specifying the language of an element. The value of the xml:lang
attribute takes precedence. Example:
<html lang="en" xml:lang="en">
Stopping the Attribute Minimization
XML does not support attribute minimization. Attribute-value pairs must be written in full. Attribute names such as nowrap cannot occur in elements without their value being specified nowrap="nowrap"
. Caveat: Some older HTML user agents are unable to interpret boolean attributes when these appear in their full (non-minimized) form, as required by XML 1.0. Note, this problem doesn't affect user agents compliant with HTML 4. The following attributes are involved: compact, nowrap, ismap, declare, noshade, checked, disabled, readonly, multiple, selected, noresize, defer.
Adding Character Encoding
To specify a character encoding in the document, use both the encoding attribute specification on the xml declaration (e.g., <?xml version="1.0" encoding="EUC-JP"?>
) and a meta http-equiv statement (e.g., <meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' />
). The value of the encoding attribute of the xml declaration takes precedence.
Embedding Style Sheets and Scripts
Use external style sheets if your style sheet uses < or & or ]]> or --. Note that XML parsers are permitted to silently remove the contents of comments. Therefore, the historical practice of "hiding" scripts and style sheets within comments to make the documents backward compatible is likely to not work as expected in XML-based implementations.
Adjusting to allowed nesting
XHTML has stricter nesting rules than HTML. You have to be more careful as to how you build up your code and which elements you nest within another. Some combinations of nesting elements are forbidden. The elements in question are following:
a
cannot contain othera
elements.pre
cannot contain theimg
,object
,big
,small
,sub
, orsup
elements.button
cannot contain theinput
,select
,textarea
,label
,button
,form
,fieldset
,iframe
orisindex
elements.label
cannot contain otherlabel
elements.form
cannot contain otherform
elements.
Adding the XML namespace
attribute
The XML namespace
attribute is needed in all XHTML documents. It's a good practice to start adding them to the root element (<html>) right away. The correct syntax is as follows:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
Step Three: Making that Transition
Adding the XML declaration
An XML declaration is not required, but strongly encouraged. Whenever the character encoding differs from the default (UTF-8; UTF-16), it is necessary.
Changing the HTML Version Information
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/html4/strict.dtd"><!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd"><!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/html4/frameset.dtd">
Simple example of an XHTML document
Finally, let's put together a basic XHTML document showcasing what has been mentioned before in this article.
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>evolt.org</title> <?xml version="1.0" encoding="EUC-JP"?> <meta http-equiv="Content-type" content='text/html; charset="EUC-JP"' /> </head> <body> <p>evolt.org a community for the web developers, by the web developers.</p> <hr noshade="noshade" /> </body></html>