How to Convert XML to JSON — A Complete Guide

Learn how XML and JSON differ, when to convert between them, and how to handle attributes, namespaces, and nested elements correctly.

Quick Answer

XML to JSON conversion transforms markup-based XML documents into JavaScript-compatible JSON objects. Elements become object keys, text content becomes values, and attributes are typically prefixed (e.g., @id) to distinguish them from child elements.

The fastest way: paste your XML into an XML to JSON converter. For scripting, use Python's xmltodict library or Node.js xml2js.


What Is XML?

XML (eXtensible Markup Language) is a markup language for encoding documents in a format that's both human-readable and machine-readable. It uses hierarchical tags with opening and closing elements.

Example XML:

<?xml version="1.0" encoding="UTF-8"?>
<user id="123" active="true">
  <name>Alice Johnson</name>
  <email>[email protected]</email>
  <roles>
    <role>admin</role>
    <role>editor</role>
  </roles>
</user>

Key characteristics:

  • Tag-based structure: Opening tags (<name>) and closing tags (</name>) wrap content.
  • Attributes: Key-value pairs inside the opening tag: <user id="123">.
  • Namespaces: Prefixes like xmlns:ns avoid naming conflicts between vocabularies.
  • Mixed content: Elements can contain both text and child elements.
  • Self-describing: Tags provide meaning, making XML documents readable without a schema.
  • Strict syntax: Well-formed XML must follow precise rules (proper nesting, quoted attributes, case sensitivity).

XML is used for configuration files, SOAP APIs, RSS/Atom feeds, document formats (DOCX, SVG), and enterprise data interchange.


What Is JSON?

JSON (JavaScript Object Notation) is a lightweight data interchange format with a simpler structure than XML.

The same data in JSON:

{
  "user": {
    "@id": "123",
    "@active": "true",
    "name": "Alice Johnson",
    "email": "[email protected]",
    "roles": {
      "role": ["admin", "editor"]
    }
  }
}

Key differences from XML:

  • No tags: Uses key-value pairs and arrays instead of opening/closing tags.
  • No attributes: All data is properties — attributes must be converted to keys.
  • Typed values: Supports strings, numbers, booleans, null, objects, and arrays natively.
  • No namespaces: No built-in namespace mechanism.
  • More compact: Typically 30-50% smaller than equivalent XML.
  • JavaScript-native: JSON.parse() instantly converts to native objects.

JSON dominates REST APIs, NoSQL databases, configuration files, and browser-server communication.


XML vs JSON: Key Structural Differences

Understanding the fundamental differences helps you convert correctly:

Aspect XML JSON
Data model Document tree with attributes Object/array hierarchy
Attributes First-class feature No equivalent (must convert to keys)
Text content Can mix with child elements Value must be a single type
Repeated elements Multiple siblings with same tag Arrays
Type system Everything is text Strings, numbers, booleans, null
Comments Supported (<!-- -->) Not supported
Namespaces Full support No support
Schema validation XSD, DTD, RelaxNG JSON Schema
Verbosity High (tags repeat) Low (keys once)

The conversion challenge: XML's richer features (attributes, mixed content, namespaces) must be mapped to JSON's simpler structure.


How XML to JSON Conversion Works

The standard conversion approach:

  1. Parse the XML into a DOM tree (Document Object Model).
  2. Process each element recursively:
    • Tag name becomes JSON object key
    • Attributes become prefixed keys (e.g., @id)
    • Text content becomes the value
    • Child elements become nested objects
  3. Handle repeated elements by converting to arrays.
  4. Handle mixed content with a special text key (e.g., #text).
  5. Output JSON with desired formatting.

Attribute Handling Conventions

Since JSON has no attributes, converters use conventions:

Prefix convention (most common):

<user id="5" active="true">John</user>

Becomes:

{"user": {"@id": "5", "@active": "true", "#text": "John"}}

Merged convention (no prefix):

{"user": {"id": "5", "active": "true", "_text": "John"}}

Underscore prefix:

{"user": {"_id": "5", "_active": "true", "__text": "John"}}

Choose based on your downstream system's expectations. The @ prefix is the most widely used convention.

Array Detection

When sibling elements share a tag name, they become an array:

<colors>
  <color>red</color>
  <color>green</color>
  <color>blue</color>
</colors>

Becomes:

{
  "colors": {
    "color": ["red", "green", "blue"]
  }
}

A single <color> would be "color": "red" (not an array) unless "always array" mode is enabled.


Converting XML to JSON in Code

Python (xmltodict library)

import xmltodict
import json

xml_string = """
<user id="123">
  <name>Alice</name>
  <age>28</age>
</user>
"""

# Convert to Python dict
data = xmltodict.parse(xml_string)

# Convert to JSON string
json_string = json.dumps(data, indent=2)
print(json_string)

Output:

{
  "user": {
    "@id": "123",
    "name": "Alice",
    "age": "28"
  }
}

Install: pip install xmltodict

Options:

# Force arrays for specific elements
data = xmltodict.parse(xml, force_list=('item', 'entry'))

# Custom attribute prefix
data = xmltodict.parse(xml, attr_prefix='_')

# Strip namespace prefixes
data = xmltodict.parse(xml, process_namespaces=True, namespaces={'http://example.com': None})

JavaScript / Node.js (xml2js)

const xml2js = require('xml2js');

const xml = `
<user id="123">
  <name>Alice</name>
  <age>28</age>
</user>
`;

const parser = new xml2js.Parser({
  attrkey: '@',
  charkey: '#text',
  explicitArray: false
});

parser.parseString(xml, (err, result) => {
  console.log(JSON.stringify(result, null, 2));
});

Install: npm install xml2js

For better performance, use fast-xml-parser:

const { XMLParser } = require('fast-xml-parser');

const parser = new XMLParser({
  ignoreAttributes: false,
  attributeNamePrefix: '@_'
});

const json = parser.parse(xml);

Browser JavaScript (DOMParser)

No libraries needed — use the built-in DOMParser:

function xmlToJson(xml) {
  const parser = new DOMParser();
  const doc = parser.parseFromString(xml, 'text/xml');
  
  function parseNode(node) {
    if (node.nodeType === 3) { // Text node
      return node.nodeValue.trim() || null;
    }
    
    if (node.nodeType === 1) { // Element node
      const obj = {};
      
      // Process attributes
      for (const attr of node.attributes) {
        obj['@' + attr.name] = attr.value;
      }
      
      // Process child nodes
      const children = {};
      for (const child of node.childNodes) {
        const result = parseNode(child);
        if (result === null) continue;
        
        if (child.nodeType === 3) {
          if (Object.keys(obj).length === 0) return result;
          obj['#text'] = result;
        } else {
          const tag = child.nodeName;
          if (children[tag]) {
            if (!Array.isArray(children[tag])) {
              children[tag] = [children[tag]];
            }
            children[tag].push(result);
          } else {
            children[tag] = result;
          }
        }
      }
      
      return { ...obj, ...children };
    }
    return null;
  }
  
  const root = doc.documentElement;
  const result = {};
  result[root.nodeName] = parseNode(root);
  return result;
}

Command Line (xmllint + jq)

For quick conversions, combine xmllint for validation with Python:

# Validate and convert
xmllint --noout file.xml && python3 -c "
import xmltodict, json, sys
print(json.dumps(xmltodict.parse(sys.stdin.read()), indent=2))
" < file.xml

Handling Edge Cases

Mixed Content

XML can mix text and elements:

<paragraph>This is <bold>important</bold> text.</paragraph>

This is tricky because JSON values can't mix types. Common solutions:

Preserve structure:

{
  "paragraph": {
    "#text": ["This is ", " text."],
    "bold": "important"
  }
}

Flatten to string (loses structure):

{"paragraph": "This is important text."}

Use ordered content arrays:

{
  "paragraph": [
    {"#text": "This is "},
    {"bold": "important"},
    {"#text": " text."}
  ]
}

Namespaces

<ns:user xmlns:ns="http://example.com/users">
  <ns:name>Alice</ns:name>
</ns:user>

By default, prefixes are preserved:

{
  "ns:user": {
    "@xmlns:ns": "http://example.com/users",
    "ns:name": "Alice"
  }
}

For cleaner JSON, strip prefixes during conversion (available in most libraries).

CDATA Sections

<script><![CDATA[
  if (a < b && c > d) {
    console.log("hello");
  }
]]></script>

CDATA allows unescaped special characters. The converter removes the CDATA wrapper:

{"script": "\n  if (a < b && c > d) {\n    console.log(\"hello\");\n  }\n"}

Empty Elements

<user>
  <name>Alice</name>
  <nickname/>
  <bio></bio>
</user>

Empty elements become empty strings or null:

{
  "user": {
    "name": "Alice",
    "nickname": "",
    "bio": ""
  }
}

Self-Closing Tags

<br/> and <br></br> are equivalent in XML. Both convert to an empty value.


Common Conversion Mistakes

  1. Ignoring attributes. XML attributes carry important data. Don't use converters that drop them.

  2. Losing array semantics. A single <item> converts to a value, but your code expects an array. Use "always array" mode or explicitly list array elements.

  3. Assuming text is typed. <age>28</age> becomes "age": "28" (string), not "age": 28 (number). Post-process if you need types.

  4. Not handling namespaces. Namespace prefixes in keys can break code expecting simple names like "user" instead of "ns:user".

  5. Mixed content confusion. If your XML mixes text and elements, test your converter's behavior carefully.

  6. Character encoding issues. XML declares encoding in the prolog (<?xml encoding="UTF-8"?>). Ensure your parser respects it.

  7. Invalid XML input. Unlike HTML, XML must be well-formed. Missing closing tags or unquoted attributes cause parse failures.

  8. Losing document order. JSON object key order isn't guaranteed in all environments (though modern JavaScript preserves insertion order). If order matters, use arrays.


Type Conversion

XML has no native types. Post-process JSON for typed values:

Python

import json

def add_types(obj):
    if isinstance(obj, dict):
        return {k: add_types(v) for k, v in obj.items()}
    if isinstance(obj, list):
        return [add_types(i) for i in obj]
    if isinstance(obj, str):
        # Try integer
        try:
            return int(obj)
        except ValueError:
            pass
        # Try float
        try:
            return float(obj)
        except ValueError:
            pass
        # Try boolean
        if obj.lower() == 'true':
            return True
        if obj.lower() == 'false':
            return False
        # Keep as string
        return obj
    return obj

typed_data = add_types(data)

JavaScript

function addTypes(obj) {
  if (Array.isArray(obj)) return obj.map(addTypes);
  if (typeof obj === 'object' && obj !== null) {
    return Object.fromEntries(
      Object.entries(obj).map(([k, v]) => [k, addTypes(v)])
    );
  }
  if (typeof obj === 'string') {
    if (obj === 'true') return true;
    if (obj === 'false') return false;
    const num = Number(obj);
    if (!isNaN(num) && obj.trim() !== '') return num;
  }
  return obj;
}

Caution: Don't blindly convert to numbers. Values like ZIP codes ("07102") and IDs ("00042") have meaningful leading zeros.


SOAP XML to JSON

SOAP responses have envelope wrappers:

<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <GetUserResponse xmlns="http://example.com/users">
      <User>
        <Name>Alice</Name>
        <Email>[email protected]</Email>
      </User>
    </GetUserResponse>
  </soap:Body>
</soap:Envelope>

After conversion, extract the payload:

const response = convertedJson['soap:Envelope']['soap:Body']['GetUserResponse']['User'];
// { "Name": "Alice", "Email": "[email protected]" }

For cleaner results:

  1. Strip namespace prefixes during conversion
  2. Extract only the body content
  3. Rename keys to match your API conventions

Many SOAP-to-REST middleware tools do this automatically.


Performance Considerations

Method Best for Notes
Browser converter Quick one-off conversions Limited to ~10 MB
Python xmltodict Scripting, automation Good balance of speed and features
Node.js fast-xml-parser Large files, streaming Fastest for big documents
lxml (Python) Maximum control Steeper learning curve

For files under 1 MB, any method works instantly. For large XML files, use streaming parsers that don't load the entire document into memory.


When to Use XML vs JSON

Use XML when:

  • Working with SOAP services
  • Schema validation is critical (XSD)
  • Document-centric data (mixed content, formatting)
  • Existing infrastructure expects XML
  • Using XML-native tools (XSLT, XPath, XQuery)

Use JSON when:

  • Building REST APIs
  • Browser/JavaScript applications
  • NoSQL databases (MongoDB, CouchDB)
  • Configuration files for modern tools
  • Payload size matters

Modern trend: Most new APIs use JSON. XML remains important for enterprise integrations, document formats, and legacy systems.


Frequently Asked Questions

How do I convert XML to JSON?

Paste your XML into an XML to JSON converter and configure attribute handling. For automation, use Python's xmltodict library or Node.js xml2js/fast-xml-parser. The main challenge is mapping XML attributes to JSON keys.

What is the difference between XML and JSON?

XML is a markup language with tags, attributes, namespaces, and mixed content support. JSON is a simpler data format with objects, arrays, and typed values. XML is more verbose and feature-rich; JSON is more compact and JavaScript-native.

How are XML attributes converted to JSON?

Attributes become object properties with a prefix (commonly @): <user id="5"> becomes {"@id": "5"}. The prefix distinguishes attributes from child elements with the same name.

Why are all values strings in the JSON output?

XML has no data types — everything is text. Converters preserve this by outputting strings. For typed values, post-process with type inference or use XML Schema information.

How do I handle XML namespaces?

Namespace prefixes are preserved in key names: <ns:element> becomes "ns:element". Namespace declarations become @xmlns attributes. For cleaner output, configure your converter to strip or expand namespace prefixes.

Can I convert JSON back to XML?

Yes, using a JSON to XML converter. Roundtrips may not be perfect if the original XML had features like comments, processing instructions, or specific attribute ordering.

How do repeated XML elements become JSON arrays?

Multiple siblings with the same tag name automatically become an array: three <item> elements become "item": ["a", "b", "c"]. Enable "always array" mode for consistent structure.

What happens to CDATA sections?

CDATA wrappers are removed, and the content becomes a plain JSON string. The unescaped characters inside CDATA are preserved as-is.

How do I convert SOAP responses?

Convert the full SOAP envelope, then extract the body content from the nested structure. Strip namespace prefixes for cleaner JSON. Many SOAP libraries and middleware handle this automatically.

Which is better for APIs: XML or JSON?

JSON dominates modern REST APIs due to smaller size and JavaScript compatibility. XML is still used for SOAP services, enterprise integrations, and where schema validation is critical. Most new projects choose JSON.


Related Tools