2/08/2012

Python and GML/XML Parsing

For my inaugural post, I will show the power and versatility of Python to address a conceptually straightforward problem that was a pain to solve through other means. Specifically, I had geospatial data for Poland in GML format (essentially XML but for spatial coordinates). However, the GML schema was missing and an initial attempt to read the file using QGIS resulted in latitude and longitude being reversed (and Poland was consequently rotated and mapped onto the Arab peninsula).

I attempted to solve this a variety of ways: I tried to use the ogr2ogr function in the GDAL framework to convert the GML file to an ESRI shapefile as follows:

ogr2ogr --config GML_INVERT_AXIS_ORDER_IF_LAT_LONG NO -f "ESRI Shapefile" -a_srs "EPSG:4326" gminy.shp admin_gminy.aspx.xml

Yet, it was still reversing latitude and longitude. An attempt to use FME - a proprietary software package for geospatial data conversion - also failed because it could not find the schema file. 

In the end, the quickest solution was to write a quick Python script to parse the XML tree and exchange the order of the coordinates. I wish I had just started with this since it took all of 15 minutes to write and would have saved me a lot of time.

Note that the geographic coordinates in the xml format appear within relevant elements as:

51.5174171815126 15.725049057718 51.5173922307042 15.7006014584274 51.5158010559858 15.6992569374652 ... 

My goal was to parse these elements and exchange the order of the coordinates so that the example above would be:

15.725049057718 51.5174171815126 15.7006014584274 51.5173922307042 15.6992569374652 51.5158010559858 ... 



Here is the Python script, which touches upon how to parse XML using the Element Tree module:

# import ElementTree module
from xml.etree import ElementTree as ET

# indicate xml file location
file_xml = '~/Desktop/admin_gminy.aspx.xml'

# open up file and parse xml tree using ElementTree module
tree = ET.parse(file_xml)

# if you want to iterate over all the elements of the tree and print them 
# (warning: can be really long)
for t in tree.iter():
    print t

# the elements of the tree that store geographic coordinates in this 
#case have the tag '{http://www.opengis.net/gml}posList' 
# this for loop will iterate over every element with relevant tag
for t in tree.iter('{http://www.opengis.net/gml}posList'):
    # split string into list of coordinates
    full_cord = t.text.split()

    # create new list of NAs for reversed coordinates
    new_cord = ['NA'] * len(full_cord)

    # for each coordinate if it's odd we want to move it up by one position, 
    # if it's even we want to shift it back by one position
    # this essentially reverses the order of latitude & longitude
    for i in range(len(full_cord)):
        if i%2==0:
            new_cord[i+1] = full_cord[i]
        if i%2==1:
            new_cord[i-1] = full_cord[i]

    # update xml tree element with reversed coordinates
    t.text = " ".join(new_cord)

# export new tree
file_out =  '~/Desktop/admin_gminy_rev.xml'
tree.write(file_out, encoding="utf-8", xml_declaration=None, method="xml")

No comments:

Post a Comment