Quick PHP Tip: How to parse CDATA sections using SimpleXML

Applies to:  simplexml_load_string and simplexml_load_file

Problem : SimpleXML does not parse text inside CDATA tags in an XML.

Consider the XML below:

 
$str = '<rootNode>';
$str.='<childNode>some text goes here</childNode>';
$str.='</rootNode>';

To parse it we use following syntax:

$xml = simplexml_load_string($str);

On printing it outputs:

 
SimpleXMLElement Object
(
    [childNode] => some text goes here
)

Thats OK. Now the same xml but this time the text is enclosed in CDATA tags.

 
$str = '<rootNode>';
$str.='<childNode><![CDATA[some text goes here]]></childNode>';
$str.='</rootNode>'

On printing this gives following output:

 
SimpleXMLElement Object
(
    [childNode] => SimpleXMLElement Object
        (
        )
 
)

Yes its empty. This is because SimpleXML does not parse CDATA tags. All data enclosed within CDATA is ignored by SimpleXML parser.

Solution: Set the 3rd parameter to LIBXML_NOCDATA while parsing.

simplexml_load_string(simplexml_load_file too) actually takes 3 parameters.

Provide the 3rd parameter LIBXML_NOCDATA and SimpleXML will consider CDATA nodes as text nodes and will parse them.

 
$xml = simplexml_load_string($str,'SimpleXMLElement', LIBXML_NOCDATA);

This will output as desired:

 
SimpleXMLElement Object
(
    [childNode] => some text goes here
)

Please note that using the third parameter requires PHP >=5.1 compiled with libxml.

I will be publishing a new post shortly on how to use SimpleXML to parse xml and extract data.

Related Posts

FAQ, php

If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.

Comments

One Response to “Quick PHP Tip: How to parse CDATA sections using SimpleXML”

Leave Comment

(required)

(required)


10,919 spam comments
blocked by
Akismet