How To Parse and Convert XML to CSV using Pythonby Shahid Ayoub
XML is an acronym standing for Extensible Markup Language. XML is extremely useful for keeping track of small to medium amounts of data without requiring a SQL-based backbone.
XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. ET has two classes for this purpose – ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.
As the data in XML format is not readable by general users, we may require to change it to some user friendly format like CSV. Since, CSV files can easily be opened using LibreOffice Calc in Ubuntu or Microsoft Excel in Windows, the need for XML to CSV conversion is high.
Now let us take an example of parsing dummy resident data of a state in XML format and convert it to CSV. The XML data(resident_data.xml) for the state will look like,
<State> <Resident Id="100"> <Name>Sample Name</Name> <PhoneNumber>1234567891</PhoneNumber> <EmailAddress>firstname.lastname@example.org</EmailAddress> <Address> <StreetLine1>Street Line1</StreetLine1> <City>City Name</City> <StateCode>AE</StateCode> <PostalCode>12345</PostalCode> </Address> </Resident> <Resident Id="101"> <Name>Sample Name1</Name> <PhoneNumber>1234567891</PhoneNumber> <EmailAddress>email@example.com</EmailAddress> <Address> <StreetLine1>Current Address</StreetLine1> <City>Los Angeles</City> <StateCode>CA</StateCode> <PostalCode>56666</PostalCode> </Address> </Resident> . . . . </State>
CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV file format separates values using commas as delimiters. For XML data, tags will be headers for the CSV file and values the descriptive data.
Parsing and conversion to CSV of above mentioned XML file will be done by the following python code:
import xml.etree.ElementTree as ET import csv tree = ET.parse("resident_data.xml") root = tree.getroot() # open a file for writing Resident_data = open('/tmp/ResidentData.csv', 'w') # create the csv writer object csvwriter = csv.writer(Resident_data) resident_head =  count = 0 for member in root.findall('Resident'): resident =  address_list =  if count == 0: name = member.find('Name').tag resident_head.append(name) PhoneNumber = member.find('PhoneNumber').tag resident_head.append(PhoneNumber) EmailAddress = member.find('EmailAddress').tag resident_head.append(EmailAddress) Address = member.tag resident_head.append(Address) csvwriter.writerow(resident_head) count = count + 1 name = member.find('Name').text resident.append(name) PhoneNumber = member.find('PhoneNumber').text resident.append(PhoneNumber) EmailAddress = member.find('EmailAddress').text resident.append(EmailAddress) Address = member.text address_list.append(Address) City = member.text address_list.append(City) StateCode = member.text address_list.append(StateCode) PostalCode = member.text address_list.append(PostalCode) resident.append(address_list) csvwriter.writerow(resident) Resident_data.close()
The CSV file will look like this for few records :
|Sample Namefirstname.lastname@example.org||[‘Street Line1′, ‘City Name’, ‘AE’, ‘12345’]|
|Sample Name1email@example.com||[‘Current Address’, ‘Los Angeles’, ‘CA’, ‘56666’]|
The python program written above will open a CSV file in tmp folder and write content of XML file into it and close it at the end. Make sure to close the file at the end in order to save contents.
XML files can be of much more complex structure and for this we just need to modify the above code accordingly.
I hope this was useful and for any query drop a comment here I will get back to you.