XML Formatting Explained: Structure, Syntax, and Common Mistakes

Learn how XML is structured, why formatting matters, and how to avoid the most common XML errors.

The Quick Answer

XML (Extensible Markup Language) is a text format for storing and transporting structured data. It uses opening and closing tags to define elements, and every XML document must follow these rules:

  • Every opening tag needs a matching closing tag (or be self-closing)
  • Tags are case-sensitive
  • Elements must be properly nested
  • There must be exactly one root element
  • Attribute values must be in quotes

Formatting XML with proper indentation does not change its meaning—it makes it readable for humans while machines parse either form identically.

XML Syntax Rules

Basic Element Structure

<element>content</element>

An XML element consists of an opening tag, content, and a closing tag. Elements can contain text, other elements, or both.

A Complete XML Example

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
  <book category="fiction">
    <title lang="en">The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <year>1925</year>
    <price>10.99</price>
  </book>
  <book category="non-fiction">
    <title lang="en">Thinking, Fast and Slow</title>
    <author>Daniel Kahneman</author>
    <year>2011</year>
    <price>14.99</price>
  </book>
</bookstore>

This document has one root element (<bookstore>), two child elements (<book>), and each book has four nested elements. The category and lang values are attributes.

Self-Closing Tags

Elements with no content can use a self-closing syntax:

<image src="photo.jpg" alt="A photo" />

This is equivalent to <image src="photo.jpg" alt="A photo"></image>.

XML Components

The XML Declaration

<?xml version="1.0" encoding="UTF-8"?>

The declaration is optional but recommended. It tells parsers which XML version and character encoding the document uses. When present, it must be the very first line—no whitespace or comments before it.

Elements vs Attributes

Data can go in child elements or in attributes. There is no strict rule for which to use, but a common guideline:

Use Elements Attributes
Complex data Yes No
Repeated items Yes No
Metadata Sometimes Yes
Simple identifiers Sometimes Yes
Data that needs children Yes No

Example — same data, two approaches:

<!-- Data in elements -->
<person>
  <name>Alice</name>
  <age>30</age>
</person>

<!-- Data in attributes -->
<person name="Alice" age="30" />

Both are valid. Elements are more flexible; attributes are more compact.

Namespaces

Namespaces prevent name collisions when combining XML from different sources. They use a URI as a unique identifier (the URI does not need to point to a real page):

<root xmlns:h="http://www.w3.org/1999/xhtml"
      xmlns:f="http://example.com/furniture">
  <h:table>
    <h:tr>
      <h:td>Row 1</h:td>
    </h:tr>
  </h:table>
  <f:table>
    <f:material>Wood</f:material>
  </f:table>
</root>

Here, h:table and f:table are different elements despite sharing the name "table."

CDATA Sections

CDATA sections let you include text that would otherwise need escaping:

<script>
  <![CDATA[
    if (a < b && c > d) {
      doSomething();
    }
  ]]>
</script>

Without CDATA, the < and & characters would cause parse errors.

Comments

<!-- This is a comment -->

Comments cannot appear before the XML declaration, cannot be nested, and cannot contain -- inside the comment body.

Why Formatting Matters

Minified XML saves bytes but is difficult to read:

<bookstore><book category="fiction"><title>The Great Gatsby</title><author>F. Scott Fitzgerald</author><year>1925</year><price>10.99</price></book></bookstore>

Formatted XML with indentation shows the hierarchy clearly:

<bookstore>
  <book category="fiction">
    <title>The Great Gatsby</title>
    <author>F. Scott Fitzgerald</author>
    <year>1925</year>
    <price>10.99</price>
  </book>
</bookstore>

Both are semantically identical. Formatting helps with:

  • Debugging — spotting unclosed tags or wrong nesting
  • Code review — understanding structure at a glance
  • Version control — cleaner diffs when each element is on its own line
  • Documentation — making config files readable for the next person

Common XML Errors

1. Missing Closing Tags

<!-- Wrong -->
<name>Alice

<!-- Correct -->
<name>Alice</name>

Every opening tag must have a matching closing tag or be self-closing.

2. Improper Nesting

<!-- Wrong -->
<b><i>text</b></i>

<!-- Correct -->
<b><i>text</i></b>

Elements must close in reverse order of opening—last opened, first closed.

3. Unquoted Attribute Values

<!-- Wrong -->
<book category=fiction>

<!-- Correct -->
<book category="fiction">

Attribute values must be enclosed in single or double quotes.

4. Unescaped Special Characters

Five characters have special meaning in XML and must be escaped in text content:

Character Escape Name
< &lt; Less than
> &gt; Greater than
& &amp; Ampersand
" &quot; Double quote
' &apos; Apostrophe
<!-- Wrong -->
<note>Use x < 10 & y > 5</note>

<!-- Correct -->
<note>Use x &lt; 10 &amp; y &gt; 5</note>

5. Multiple Root Elements

<!-- Wrong — two roots -->
<name>Alice</name>
<name>Bob</name>

<!-- Correct — single root -->
<people>
  <name>Alice</name>
  <name>Bob</name>
</people>

6. Case Mismatch

<!-- Wrong -->
<Name>Alice</name>

<!-- Correct -->
<name>Alice</name>

XML tags are case-sensitive. <Name> and <name> are different elements.

XML vs JSON vs YAML

Feature XML JSON YAML
Readability Moderate Good Best
Verbosity High Low Lowest
Comments Yes No Yes
Attributes Yes No No
Namespaces Yes No No
Schema validation XSD, DTD, Relax NG JSON Schema No standard
Binary data Base64 in CDATA Base64 string Binary tag
Typical use Config files, SOAP, document markup Web APIs, config Config files, CI/CD

When XML is the right choice:

  • Document-oriented data with mixed content (text + markup)
  • Systems that require formal schema validation (XSD)
  • SOAP web services and enterprise integrations
  • Data with namespace requirements
  • RSS feeds, SVG graphics, XHTML, Android layouts

When JSON or YAML may be simpler:

  • Web APIs exchanging structured data
  • Application configuration where comments (YAML) or simplicity (JSON) matter
  • Data without attributes or mixed content

Indentation: Spaces vs Tabs

Both work. Pick one and be consistent within a project:

  • 2 spaces — compact, common in web development
  • 4 spaces — more visual separation, common in enterprise XML
  • Tabs — width adjustable per editor, some teams prefer this

The choice has no effect on parsing. It is purely a readability preference.

Frequently Asked Questions

What does "well-formed" XML mean?

Well-formed XML follows all syntax rules: matched tags, proper nesting, one root element, quoted attributes, and escaped special characters. A document can be well-formed without conforming to any schema.

What is the difference between well-formed and valid XML?

Well-formed means correct syntax. Valid means the document also conforms to a specific schema (DTD, XSD, or Relax NG) that defines allowed elements, attributes, and data types.

Does indentation affect XML parsing?

Not in most cases. XML parsers treat whitespace between tags as text nodes, but most applications ignore insignificant whitespace. However, whitespace inside elements (like <pre>) is preserved.

Can XML have comments?

Yes. Use <!-- comment -->. Comments cannot appear before the XML declaration and cannot be nested.

Is XML case-sensitive?

Yes. <Book> and <book> are different elements. This applies to element names, attribute names, and attribute values (when used for matching).

What is the maximum size of an XML document?

There is no formal limit in the XML specification. Practical limits depend on the parser and available memory. Streaming parsers (SAX, StAX) can handle very large files without loading everything into memory.

How do I include special characters in XML?

Use entity references (&lt;, &amp;, etc.) or wrap content in a CDATA section. CDATA is better for large blocks of text with many special characters.

What is an XML namespace?

A namespace is a URI that qualifies element and attribute names to prevent collisions. It does not need to resolve to a web page—it just needs to be unique.

Should I use elements or attributes for data?

Use elements for complex, repeatable, or hierarchical data. Use attributes for simple metadata like identifiers, types, or flags. When in doubt, use elements—they are more extensible.

Can XML store binary data?

Not directly. Binary data is typically Base64-encoded and stored as text content, sometimes inside a CDATA section.

Format Your XML

XML Formatter

Paste unformatted XML to get properly indented output with validation. Supports 2-space, 4-space, and tab indentation.

Open XML Formatter

Related Tools

Related Tools