Monday, May 9, 2016

Reading XML in perl

To read XML, you can do it by brute force but fortunately there are libraries that work very fast. Here's a sample script that reads XML and also to format the embedded XML that had been encoded (that is < is encoded as &lt;). One thing to remember is that the XML libraries are designed to read only one XML file. If you need to read two or more XML files, you will need to use eval. The code below shows how it's done.
  1. #!/usr/bin/perl -w  
  2. use XML::LibXML;  
  3. use XML::LibXML::Reader;  
  4. use HTML::Entities;  
  5.   
  6. my $reader = XML::LibXML::Reader->new(location => 'test.xml')  
  7.             or die "cannot read test.xml -- $!";  
  8. my $inElem = 0;  
  9. my $inUser = 0;  
  10. while ($reader->read) {  
  11.    if ($reader->name eq 'user') {  
  12.         if ($reader->nodeType == XML_READER_TYPE_ELEMENT) {  
  13.             $inUser++;  
  14.         }  
  15.         else {  
  16.             $inUser = 0;  
  17.         }  
  18.     }  
  19.     if ($inUser and $reader->name eq '#text' and $reader->hasValue) {  
  20.         print $reader->nodePath, "\n";  
  21.         my $userXml = decode_entities($reader->value);  
  22.         eval {  
  23.             $userXml =~ s/^Comment:.*;\n//;  
  24.             my $xml = XML::LibXML->new();  
  25.             my $doc = $xml->load_xml(string => (\$userXml));  
  26.             print $doc->toString(1), "\n";  
  27.         };  
  28.     }  
  29. }  
  30. $reader->finish;  

No comments:

Post a Comment