Quick PHP Tip: How to parse CDATA sections using SimpleXML
Applies to: simplexml_load_string and simplexml_load_file
Problem : SimpleXML does not parse text inside CDATA tags in an XML.
Consider the XML below:
$str = '<rootNode>'; $str.='<childNode>some text goes here</childNode>'; $str.='</rootNode>';
To parse it we use following syntax:
$xml = simplexml_load_string($str);
On printing it outputs:
SimpleXMLElement Object ( [childNode] => some text goes here )
Thats OK. Now the same xml but this time the text is enclosed in CDATA tags.
$str = '<rootNode>'; $str.='<childNode><![CDATA[some text goes here]]></childNode>'; $str.='</rootNode>'
On printing this gives following output:
SimpleXMLElement Object ( [childNode] => SimpleXMLElement Object ( ) )
Yes its empty. This is because SimpleXML does not parse CDATA tags. All data enclosed within CDATA is ignored by SimpleXML parser.
Solution: Set the 3rd parameter to LIBXML_NOCDATA while parsing.
simplexml_load_string(simplexml_load_file too) actually takes 3 parameters.
- The string to parse
- Optional parameter – to return an object of class specified in this parameter. (By default it returns a SimpleXMLElement Object)
- Also optional – libxml parameters can be specified as options. This option provides the solution to our CDATA problem
Provide the 3rd parameter LIBXML_NOCDATA and SimpleXML will consider CDATA nodes as text nodes and will parse them.
$xml = simplexml_load_string($str,'SimpleXMLElement', LIBXML_NOCDATA);
This will output as desired:
SimpleXMLElement Object ( [childNode] => some text goes here )
Please note that using the third parameter requires PHP >=5.1 compiled with libxml.
I will be publishing a new post shortly on how to use SimpleXML to parse xml and extract data.
Related Posts
If you enjoyed this post, please consider to leave a comment or subscribe to the feed and get future articles delivered to your feed reader.


thanks