Tag Archives: XML - Page 4

Create individual xml files from one file with many xmls

Say you have a file with a bunch of nice-xml documents inside and you want them all as individual files. I come across this problem quite often so I thought it would be a good idea to have a script here showing how I usually sole this problem (PHP >= 4.3).

//Check the number of arguments
if($argc != 5){
    echo "Usage: splitxml <filename> <starttag> <endtag> <path>n";
    echo "<filename> - filename of file with xmls inn";
    echo "<starttag> - first tag of the xml you want to extractn";
    echo "<endtag> - last tag of the xml you want to extractn";
    echo "<path> - where you want the files to be created";

// Read file into an array
$xml = file($argv[1]);

$starttag = $argv[2];
$endtag = $argv[3];
$resultpath = $argv[4];

$tmp = array();
$counter = 0;
$recording = false;

foreach($xml as $row){
    // Found start tag - start "recording"
    if(strpos($row, $starttag) !== false){
        $recording = true;
    // Save all rows if "recording"
        $tmp[] = $row;
    // Found end tag - stop "recording" and save file
    if(strpos($row, $endtag) !== false){
        $fp = fopen($resultpath.'/file'.$counter++.'.xml', 'w');
        fwrite($fp, implode("",$tmp));
        $tmp = array();
        $recording = false;

Here is an example of how the file can look (books.xml):

    <author>Astrid Lindgren</author>
    <title>Barnen i Bullerbyn</title>
    <author>Astrid Lindgren</author>

To get these book xml into separate files just run the above script with the following parameters:

php splitxml books.xml '' '' books/

This will create the files file0.xml and file1.xml in the directory books/


    <author>Astrid Lindgren</author>


    <title>Barnen i Bullerbyn</title>
    <author>Astrid Lindgren</author>

Create XML document with PHP SimpleXML

In my work I often need to create new XML documents and with the introduction of SimpleXML in PHP 5.0.1 this has become very easy. Before PHP 5 I used the DOM-XML class which can be a little cumbersome for small and simple XML documents. I’m here going to create a simple XML document as an example of how I now normally do it.

$xml = new SimpleXMLElement('<?xml version="1.0" encoding="utf-8"?><mydoc></mydoc>');

$xml->addAttribute('version', '1.0');
$xml->addChild('datetime', date('Y-m-d H:i:s'));

$person = $xml->addChild('person');
$person->addChild('firstname', 'Someone');
$person->addChild('secondname', 'Something');
$person->addChild('telephone', '123456789');
$person->addChild('email', 'me@something.com');

$address = $person->addchild('address');
$address->addchild('homeaddress', 'Andersgatan 2, 432 10 Göteborg');
$address->addChild('workaddress', 'Andersgatan 3, 432 10 Göteborg');

echo $xml->asXML();

This will create a XML document looking like this:

<?xml version="1.0" encoding="utf-8"?>
<mydoc version="1.0">
  <datetime>2010-12-12 16:45:12</datetime>
      <homeaddress>Andersgatan 2, 432 10 Göteborg</homeaddress>
      <workaddress>Andersgatan 3, 432 10 Göteborg</workaddress>

From one row xml to nice print xml in a jiffy with xmllint

When I get to work with xml they are usually all on one row. This makes it impossible (or at least very hard) to read them. One solution I found was the superb –format option. This option will make the xmllint program to output the one row xml to a multiple nice print xml document.

Our ugly one row xml (ugly.xml):

<?xml version="1.0" encoding="utf-8"?><movies><movie><title>The Brave One</title><genre>Thriller</genre></movie><movie><title>Instinct</title><genre>Drama</genre></movie></movies>

Run the command:

xmllint --format ugly.xml

or(if no file):

echo '<?xml version="1.0" encoding="utf-8"?><movies><movie><title>The Brave One</title><genre>Thriller</genre></movie><movie><title>Instinct</title><genre>Drama</genre></movie></movies>' | xmllint --format -

And you get:

<?xml version="1.0" encoding="utf-8"?>
    <title>The Brave One</title>

Much better!

Pipe result into Vim

xmllint --format ugly.xml | vim -

Note the last ‘-‘ sign as it tells vim to read input from standard input.