A quick and dirty path finding algorithm for .shp files

The 'Problem'

Let's assume, you have a set of paths (lines) in a .shp file and try to come up with a way to create one .shp file for every possible combination of paths between a point A and a point B ... Trying to brute force this can be quite time consuming, i.e. not possible if the number of paths to consider is more than 5 or so ;-)

I came up with a bit of a probabilistic hack using a Monte-Carlo like approach to solve this in a timely manner. Though there is now formal guarantee, that the algorthm finds all possible connections, cranking up the number of runs to ridiculous numbers makes it very unlikely that you missed something.

I planned on writing this up as a QGIS-Plugin, but didn't find the time so far ... Let me know if there is a need for that.

Python to the rescue

First import the modules needed to read .shp-files and numpy and random to use in the algorithm:

from osgeo import ogr
import os
import numpy as np
import random

Then you can define a helper function to find the start and endpoints of a line geometry:

def get_xy_start_end(geom):
    return geom.GetPoints()[0][0], geom.GetPoints()[0][1], geom.GetPoints()[-1][0], geom.GetPoints()[-1][1]

And two other small functions, one to check if an endpoint of one line is close to another lines startpoint, given a threshold dx ...

def coord_similar(x1, x2, dx=50.0):
    if abs(float(x1)-float(x2))<dx:
        return True
    else:
        return False

... and one to cycle through all the lines in a given dictionary to check if they have already been used and to wrap the function above.

def find_connecting(x, y, L, D):
        l = []
        for i in D.keys():
            if i not in L:
                test1 = (coord_similar(x, D[i][0]) and coord_similar(y, D[i][1]))
                if test1:
                    l.append(i)
                else: pass
            else: pass
        return l

Let's open up the .shp-file and get some parameters from it

inshapefile = 'shapefile.shp'
driver = ogr.GetDriverByName('ESRI Shapefile')
indatasource = driver.Open(inshapefile, 0)
inlyr = indatasource.GetLayerByIndex(0)
extent = inlyr.GetExtent()

featurecount = inlyr.GetFeatureCount()
print featurecount, ' features found in layer.\n'

We define an empty list to contain the collection of connected lines, that have been found and read in all features into a dictionary for convenient access

finallist = []

D = {}
for i in range(featurecount):
    feat = inlyr.GetFeature(i)
    geom = feat.GetGeometryRef()
    fid = feat.GetFID()
    D[fid]=get_xy_start_end(geom)

Now we need to define the two points we want connected (in the .shp-files coordinate system of course)

start = 0
startx, starty = ... some coordinates
endx, endy = ... some coordinates

The main idea is now to start at some point, find all lines that have starting points close to it and choose one at random. Then take the endpoint of the chosen line and repeat as long as needed to reach the targeted endpoint. Then repeat the same process again and because the choice which connected lines are chosen is random, we end up (over many repeats) with all possible connections.

Of course this will result in a huge number of double connections, which we just don't save, because we found them already. Below is the code, I'll clean it up and comment at some point ... Probably ;-). However if you have questions about it, drop me an email. I might be able to help.

max_path_segments = 100
i = 0

fids = D.keys()

for r in range(50000):
    if r%1000==0: print r, '/ 50000'
    x,y = startx, starty
    k=0
    L = [0]
    while not (coord_similar(x, endx) and coord_similar(y, endy)) and k<max_path_segments:
        k+=1
        x,y = D[L[-1]][2], D[L[-1]][3]
        c = find_connecting(x, y, L, D)
        if len(c)>1:
            nfid = random.choice(c)
            L.append(nfid)
        elif len(c)==1:
            nfid = c[0]
            L.append(nfid)
        else:
            pass
        x,y = D[nfid][2], D[nfid][3]
    if L not in finallist:
        finallist.append(L)
    else:
        pass

print len(finallist)

That's it already. Note that this will stop latest after max_path_segments different connections have been found.

Now all that needs to be done, is to package each collection of lines into a seperate .shp-file and save to a new file. Done.