Learn through the super-clean Baeldung Pro experience:
>> Membership and Baeldung Pro.
No ads, dark-mode and 6 months free of IntelliJ Idea Ultimate to start with.
Last updated: March 18, 2024
XML is a human-readable markup language. However, if it’s not well-formatted, it isn’t easy to read or understand. For example, an XML file containing a single long line or XML without element indentations is difficult to visually comprehend. This is especially true when we want to display it in the Linux console.
In this tutorial, we’ll address several ways to pretty-print an XML file using Linux commands.
First of all, let’s take a look at an XML file, emails.xml, that we’ll use in our examples:
<emails> <email> <from>Kai</from> <to>Amanda</to> <time>2018-03-05</time>
<subject>I am flying to you</subject></email> <email>
<from>Jerry</from> <to>Tom</to> <time>1992-08-08</time> <subject>Hey Tom, catch me if you can!</subject>
</email> </emails>
The emails.xml file is a valid XML file. However, since it’s not well-formatted, it’s tough to read and understand.
We’ll take this file as an input example, and pretty-print it in the command line.
There are many ways to format and output an XML file. In this tutorial, we’re going to address three command-line XML utilities: xmllint, XMLStarlet, and xml_pp.
Now, let’s print the emails.xml in a human-readable format.
The xmllint command is a member of the xmllib2 package. Usually, we can use it to check if XML files are valid, parse XML files, or evaluate XPath expressions.
The xmllint utility has the –format option. With this option, we can reformat and reindent the XML. The syntax is straightforward:
xmllint --format XML_FILE
Let’s reformat our emails.xml using the xmllint command:
$ xmllint --format emails.xml
We get the output:
<?xml version="1.0"?>
<emails>
<email>
<from>Kai</from>
<to>Amanda</to>
<time>2018-03-05</time>
<subject>I am flying to you</subject>
</email>
<email>
<from>Jerry</from>
<to>Tom</to>
<time>1992-08-08</time>
<subject>Hey Tom, catch me if you can!</subject>
</email>
</emails>
Now, the data in XML is much easier to read and understand.
We also see the command adds the XML declaration <?xml version=”1.0″?> — even though we don’t have it in our input file.
We can easily reformat XML files using the xmllint command together with the –format option. The default indent is two spaces. However, we can change it by setting the XMLLINT_INDENT environment variable.
Let’s reformat and print the emails.xml again. This time, let’s set four spaces as the indent:
$ XMLLINT_INDENT=" " ; xmllint --format emails.xml
The output of the command is:
<?xml version="1.0"?>
<emails>
<email>
<from>Kai</from>
<to>Amanda</to>
<time>2018-03-05</time>
<subject>I am flying to you</subject>
</email>
<email>
...
</email>
</emails>
XMLStarlet is a command-line XML toolkit. It contains one executable called xml. Using this command, we can transform, query, validate, and edit XML documents and files.
Let’s take a look at the syntax for using the xml command:
xml [<options>] <command> [<cmd-options>]
We can use the format command (or the short form, fo) to reformat an XML file:
$ xml format emails.xml
It outputs:
<?xml version="1.0"?>
<emails>
<email>
<from>Kai</from>
<to>Amanda</to>
<time>2018-03-05</time>
<subject>I am flying to you</subject>
<
</email>
<email>
<from>Jerry</from>
<to>Tom</to>
<time>1992-08-08</time>
<subject>Hey Tom, catch me if you can!</subject>
</email>
</emails>
As the output above shows, our emails.xml is pretty-printed. Same as the xmllint command, the default indentation is two space characters.
Similar to xmllint, we also see the command adds the XML declaration if missing from our input.
Next, let’s have a look at what format options the xml command provides.
The xml format command has four options to control the output:
Let’s launch the xml format command with our emails.xml file again, and this time, we want to indent the output with eight spaces and omit the XML declaration:
$ xml fo -o -s 8 emails.xml
It outputs:
<emails>
<email>
<from>Kai</from>
<to>Amanda</to>
<time>2018-03-05</time>
<subject>I am flying to you</subject>
</email>
<email>
...
</email>
</emails>
The xml_pp command is shipped with the Perl module XML::Twig. The name xml_pp stands for “XML Pretty-Printer”.
As its name tells, the xml_pp is born to print XML documents in a pretty format. The syntax to use it is straightforward:
xml_pp [options] XML_FILES
Let’s see if it can pretty-print our emails.xml:
$ xml_pp emails.xml
The command prints:
<emails>
<email>
<from>Kai</from>
<to>Amanda</to>
<time>2018-03-05</time>
<subject>I am flying to you</subject>
</email>
<email>
<from>Jerry</from>
<to>Tom</to>
<time>1992-08-08</time>
<subject>Hey Tom, catch me if you can!</subject>
</email>
</emails>
The output shows the indentation is two space characters here as well.
Also, if we look at the beginning of the output, the XML declaration is not added by default if our input doesn’t have one.
We cannot set the indentation like we did with the xml format and xmllint commands. Also, the xml_pp command doesn’t provide an option for the user to change the indentation.
The xml_pp utility supports options to control the output in other aspects, such as:
In this article, we’ve addressed how to pretty-print an XML file using some handy utilities, such as xmllint, XMLStarlet, and xml_dd.