|
XML Concept: Valid XML and ValidationBog'liq Ceponkus, Hoodbhoy - Applied XML - Toolkit for ProgrammersXML Concept: Valid XML and Validation
Two categories of XML documents exist: those that are
well formed
and those that are
well formed and
valid
. A valid XML document is one whose structure can be checked off
against a Document Type Definition (DTD).
A parser performs two levels of checking on an XML document. After checking for
syntactical correctness (that is, whether the document is well formed) the parser then
checks to see if the document’s contents are structured according to the rules set out in
the DTD.
If the parser can verify that an XML document’s content is in accordance with the rules
specified in the DTD, the document is said to contain valid XML. The process of insuring
that the structure is valid is called—you guessed it—
validation
.
XML Goodie: Unicode Compliance
If you develop a lot of Web pages in languages other than English, you’ll be very happy
to learn that XML is specified to be
Unicode compliant
. Unicode (also called double-byte
characters) is a standard for text in all languages. Normal (ASCII) text uses 8 bits to
represent each character. As a result, it can only represent 256 (2) unique characters.
Unicode, on the other hand, uses 16 bits to represent each character, and is thus able to
represent up to 65,536 (2) characters: virtually enough to represent all characters of all
languages known to humanity (excluding Klingon and a few others).
In terms of writing XML documents, provided you have the right Unicode editor, you can
create your documents in any language with which you are comfortable. XML parsers are
specified to be language independent.
For more information on Unicode in general, check out the Unicode Consortium’s Web
site at
www.unicode.org
.
By default, parsers automatically recognize whether a document is in Unicode. If you
want to include an explicit statement, your XML declaration should include the encoding
= “UTF8” attribute.
|
| |