The Quick Answer
XML (Extensible Markup Language) is a text format for storing and transporting structured data. It uses opening and closing tags to define elements, and every XML document must follow these rules:
- Every opening tag needs a matching closing tag (or be self-closing)
- Tags are case-sensitive
- Elements must be properly nested
- There must be exactly one root element
- Attribute values must be in quotes
Formatting XML with proper indentation does not change its meaning—it makes it readable for humans while machines parse either form identically.
XML Syntax Rules
Basic Element Structure
<element>content</element>
An XML element consists of an opening tag, content, and a closing tag. Elements can contain text, other elements, or both.
A Complete XML Example
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="fiction">
<title lang="en">The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
<price>10.99</price>
</book>
<book category="non-fiction">
<title lang="en">Thinking, Fast and Slow</title>
<author>Daniel Kahneman</author>
<year>2011</year>
<price>14.99</price>
</book>
</bookstore>
This document has one root element (<bookstore>), two child elements (<book>), and each book has four nested elements. The category and lang values are attributes.
Self-Closing Tags
Elements with no content can use a self-closing syntax:
<image src="photo.jpg" alt="A photo" />
This is equivalent to <image src="photo.jpg" alt="A photo"></image>.
XML Components
The XML Declaration
<?xml version="1.0" encoding="UTF-8"?>
The declaration is optional but recommended. It tells parsers which XML version and character encoding the document uses. When present, it must be the very first line—no whitespace or comments before it.
Elements vs Attributes
Data can go in child elements or in attributes. There is no strict rule for which to use, but a common guideline:
| Use | Elements | Attributes |
|---|---|---|
| Complex data | Yes | No |
| Repeated items | Yes | No |
| Metadata | Sometimes | Yes |
| Simple identifiers | Sometimes | Yes |
| Data that needs children | Yes | No |
Example — same data, two approaches:
<!-- Data in elements -->
<person>
<name>Alice</name>
<age>30</age>
</person>
<!-- Data in attributes -->
<person name="Alice" age="30" />
Both are valid. Elements are more flexible; attributes are more compact.
Namespaces
Namespaces prevent name collisions when combining XML from different sources. They use a URI as a unique identifier (the URI does not need to point to a real page):
<root xmlns:h="http://www.w3.org/1999/xhtml"
xmlns:f="http://example.com/furniture">
<h:table>
<h:tr>
<h:td>Row 1</h:td>
</h:tr>
</h:table>
<f:table>
<f:material>Wood</f:material>
</f:table>
</root>
Here, h:table and f:table are different elements despite sharing the name "table."
CDATA Sections
CDATA sections let you include text that would otherwise need escaping:
<script>
<![CDATA[
if (a < b && c > d) {
doSomething();
}
]]>
</script>
Without CDATA, the < and & characters would cause parse errors.
Comments
<!-- This is a comment -->
Comments cannot appear before the XML declaration, cannot be nested, and cannot contain -- inside the comment body.
Why Formatting Matters
Minified XML saves bytes but is difficult to read:
<bookstore><book category="fiction"><title>The Great Gatsby</title><author>F. Scott Fitzgerald</author><year>1925</year><price>10.99</price></book></bookstore>
Formatted XML with indentation shows the hierarchy clearly:
<bookstore>
<book category="fiction">
<title>The Great Gatsby</title>
<author>F. Scott Fitzgerald</author>
<year>1925</year>
<price>10.99</price>
</book>
</bookstore>
Both are semantically identical. Formatting helps with:
- Debugging — spotting unclosed tags or wrong nesting
- Code review — understanding structure at a glance
- Version control — cleaner diffs when each element is on its own line
- Documentation — making config files readable for the next person
Common XML Errors
1. Missing Closing Tags
<!-- Wrong -->
<name>Alice
<!-- Correct -->
<name>Alice</name>
Every opening tag must have a matching closing tag or be self-closing.
2. Improper Nesting
<!-- Wrong -->
<b><i>text</b></i>
<!-- Correct -->
<b><i>text</i></b>
Elements must close in reverse order of opening—last opened, first closed.
3. Unquoted Attribute Values
<!-- Wrong -->
<book category=fiction>
<!-- Correct -->
<book category="fiction">
Attribute values must be enclosed in single or double quotes.
4. Unescaped Special Characters
Five characters have special meaning in XML and must be escaped in text content:
| Character | Escape | Name |
|---|---|---|
< |
< |
Less than |
> |
> |
Greater than |
& |
& |
Ampersand |
" |
" |
Double quote |
' |
' |
Apostrophe |
<!-- Wrong -->
<note>Use x < 10 & y > 5</note>
<!-- Correct -->
<note>Use x < 10 & y > 5</note>
5. Multiple Root Elements
<!-- Wrong — two roots -->
<name>Alice</name>
<name>Bob</name>
<!-- Correct — single root -->
<people>
<name>Alice</name>
<name>Bob</name>
</people>
6. Case Mismatch
<!-- Wrong -->
<Name>Alice</name>
<!-- Correct -->
<name>Alice</name>
XML tags are case-sensitive. <Name> and <name> are different elements.
XML vs JSON vs YAML
| Feature | XML | JSON | YAML |
|---|---|---|---|
| Readability | Moderate | Good | Best |
| Verbosity | High | Low | Lowest |
| Comments | Yes | No | Yes |
| Attributes | Yes | No | No |
| Namespaces | Yes | No | No |
| Schema validation | XSD, DTD, Relax NG | JSON Schema | No standard |
| Binary data | Base64 in CDATA | Base64 string | Binary tag |
| Typical use | Config files, SOAP, document markup | Web APIs, config | Config files, CI/CD |
When XML is the right choice:
- Document-oriented data with mixed content (text + markup)
- Systems that require formal schema validation (XSD)
- SOAP web services and enterprise integrations
- Data with namespace requirements
- RSS feeds, SVG graphics, XHTML, Android layouts
When JSON or YAML may be simpler:
- Web APIs exchanging structured data
- Application configuration where comments (YAML) or simplicity (JSON) matter
- Data without attributes or mixed content
Indentation: Spaces vs Tabs
Both work. Pick one and be consistent within a project:
- 2 spaces — compact, common in web development
- 4 spaces — more visual separation, common in enterprise XML
- Tabs — width adjustable per editor, some teams prefer this
The choice has no effect on parsing. It is purely a readability preference.
Frequently Asked Questions
What does "well-formed" XML mean?
Well-formed XML follows all syntax rules: matched tags, proper nesting, one root element, quoted attributes, and escaped special characters. A document can be well-formed without conforming to any schema.
What is the difference between well-formed and valid XML?
Well-formed means correct syntax. Valid means the document also conforms to a specific schema (DTD, XSD, or Relax NG) that defines allowed elements, attributes, and data types.
Does indentation affect XML parsing?
Not in most cases. XML parsers treat whitespace between tags as text nodes, but most applications ignore insignificant whitespace. However, whitespace inside elements (like <pre>) is preserved.
Can XML have comments?
Yes. Use <!-- comment -->. Comments cannot appear before the XML declaration and cannot be nested.
Is XML case-sensitive?
Yes. <Book> and <book> are different elements. This applies to element names, attribute names, and attribute values (when used for matching).
What is the maximum size of an XML document?
There is no formal limit in the XML specification. Practical limits depend on the parser and available memory. Streaming parsers (SAX, StAX) can handle very large files without loading everything into memory.
How do I include special characters in XML?
Use entity references (<, &, etc.) or wrap content in a CDATA section. CDATA is better for large blocks of text with many special characters.
What is an XML namespace?
A namespace is a URI that qualifies element and attribute names to prevent collisions. It does not need to resolve to a web page—it just needs to be unique.
Should I use elements or attributes for data?
Use elements for complex, repeatable, or hierarchical data. Use attributes for simple metadata like identifiers, types, or flags. When in doubt, use elements—they are more extensible.
Can XML store binary data?
Not directly. Binary data is typically Base64-encoded and stored as text content, sometimes inside a CDATA section.
XML Formatter
Paste unformatted XML to get properly indented output with validation. Supports 2-space, 4-space, and tab indentation.
Open XML FormatterRelated Tools
- XML to JSON Converter — Convert XML data to JSON format
- JSON to XML Converter — Convert JSON back to XML
- JSON Formatter — Format and validate JSON data
- HTML Formatter — Format HTML with proper indentation