How To Parse and Convert XML to CSV using Python

by Shahid Ayoub

XML is an acronym standing for Extensible Markup Language. XML is extremely useful for keeping track of small to medium amounts of data without requiring a SQL-based backbone.

XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. ET has two classes for this purpose – ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements are done on the Element level.

As the data in XML format is not readable by general users, we may require to change it to some user friendly format like CSV. Since, CSV files can easily be opened using LibreOffice Calc in Ubuntu or Microsoft Excel in Windows, the need for XML to CSV conversion is high.

Now let us take an example of parsing dummy resident data of a state in XML format and convert it to CSV. The XML data(resident_data.xml) for the state will look like,

<State>
<Resident Id="100">
<Name>Sample Name</Name>
	<PhoneNumber>1234567891</PhoneNumber>
	<EmailAddress>sample_name@example.com</EmailAddress>
	<Address>
		<StreetLine1>Street Line1</StreetLine1>
		<City>City Name</City>
		<StateCode>AE</StateCode>
		<PostalCode>12345</PostalCode>
	</Address>
</Resident>
<Resident Id="101">
	<Name>Sample Name1</Name>
	<PhoneNumber>1234567891</PhoneNumber>
	<EmailAddress>sample_name1@example.com</EmailAddress>
	<Address>
		<StreetLine1>Current Address</StreetLine1>
		<City>Los Angeles</City>
		<StateCode>CA</StateCode>
		<PostalCode>56666</PostalCode>
	</Address>
</Resident>
.
.
.
.
</State>

CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV file format separates values using commas as delimiters. For XML data, tags will be headers for the CSV file and values the descriptive data.

Parsing and conversion to CSV of above mentioned XML file will be done by the following python code:

import xml.etree.ElementTree as ET
import csv

tree = ET.parse("resident_data.xml")
root = tree.getroot()

# open a file for writing

Resident_data = open('/tmp/ResidentData.csv', 'w')

# create the csv writer object

csvwriter = csv.writer(Resident_data)
resident_head = []

count = 0
for member in root.findall('Resident'):
	resident = []
	address_list = []
	if count == 0:
		name = member.find('Name').tag
		resident_head.append(name)
		PhoneNumber = member.find('PhoneNumber').tag
		resident_head.append(PhoneNumber)
		EmailAddress = member.find('EmailAddress').tag
		resident_head.append(EmailAddress)
		Address = member[3].tag
		resident_head.append(Address)
		csvwriter.writerow(resident_head)
		count = count + 1

	name = member.find('Name').text
	resident.append(name)
	PhoneNumber = member.find('PhoneNumber').text
	resident.append(PhoneNumber)
	EmailAddress = member.find('EmailAddress').text
	resident.append(EmailAddress)
	Address = member[3][0].text
	address_list.append(Address)
	City = member[3][1].text
	address_list.append(City)
	StateCode = member[3][2].text
	address_list.append(StateCode)
	PostalCode = member[3][3].text
	address_list.append(PostalCode)
	resident.append(address_list)
	csvwriter.writerow(resident)
Resident_data.close()

The CSV file will look like this for few records :

Name PhoneNumber EmailAddress Address
Sample Name 1234567891 sample_name@example.com [‘Street Line1’, ‘City Name’, ‘AE’, ‘12345’]
Sample Name1 1234567891 sample_name1@example.com [‘Current Address’, ‘Los Angeles’, ‘CA’, ‘56666’]

 

The python program written above will open a CSV file in tmp folder and write content of XML file into it and close it at the end. Make sure to close the file at the end in order to save contents.

XML files can be of much more complex structure and for this we just need to modify the above code accordingly.

I hope this was useful and for any query drop a comment here I will get back to you.

6 thoughts on “How To Parse and Convert XML to CSV using Python”

  1. Hello,
    Thanks for the neat script. I was wondering , If I wanted to extract value of Resident ID , how would I go about doing it.. and also if the same tag had multiple values for eg : , how do i extract a,b

    Thanks

    1. You can use attrib to retrieve id or any other attribute of an element. Moreover if there are multiple values within a tag you can use a python script to extract values on the basis of delimiter like comma(,).

  2. Hi!
    First thanks for the tutorial!
    I tried to run this on this xml:

    60
    713
    10

    and it does not work, of course, I changed to relative tag and all.
    It doesn’t get into the loop for some reason (I changed it to look for AbsoluteHigh)
    I really need help.
    Thanks

Leave a Reply to Shahid Ayoub Cancel reply

Your email address will not be published. Required fields are marked *

Tools & Practices

Tools and Technologies we use at Applied

Contact us now

Popular Posts