python - Find closest line to each point on big dataset, possibly using shapely and rtree -
i have simplified map of city has streets in linestrings , addresses points. need find closest path each point street line. have working script this, runs in polynomial time has nested loop. 150 000 lines (shapely linestring) , 10 000 points (shapely point), takes 10 hours finish on 8 gb ram computer.
the function looks (sorry not making entirely reproducible):
import pandas pd import shapely shapely import point, linestring def connect_nodes_to_closest_edges(edges_df , nodes_df, edges_geom, nodes_geom): """finds closest line points , returns 2 dataframes: edges_df nodes_df """ in range(len(nodes_df)): point = nodes_df.loc[i,nodes_geom] shortest_distance = 100000 j in range(len(edges_df)): line = edges_df.loc[j,edges_geom] if line.distance(point) < shortest_distance: shortest_distance = line.distance(point) closest_street_index = j closest_line = line ...
then save results in table new column adds shortest path point line new column.
is there way make faster addition function?
if example filter out lines every point 50m away or so, speed every iteration?
is there way make faster using rtree package? able find answer makes script finding intersection of polygons faster, can't seem make working closest point line.
faster way of polygon intersection shapely
https://pypi.python.org/pypi/rtree/
sorry if answered, didn't find answer here nor on gis.stackexchange
thank advice!
here have solution using rtree
library. idea build boxes contain segments in diagonal, , use boxes build rtree. time-expensive operation. later, query rtree box centered in point. several hits need check minimum, number of hits (hopefuly) orders of magnitud lower checking against segments.
in solutions
dict get, each point, line id, nearest segment, nearest point (a point of segment), , distance point.
there comments in code you. take account can serialize rtree later use. in fact, recomend build rtree, save it, , use it. because exceptions adjustments of constants min_size
, infty
raise, , not want lose computation did building rtree.
a small min_size
mean have errors in solutions because if box around point not intersect segment, intersect box of segment not nearest segment (it easy think case).
a big min_size
mean have false positives, in extreme case make code try segments, , in same position before, or worst, because building rtree don't use.
if data real data city, imagine know address intersecting segment distance smaller few blocks. make search practically logaritmic.
one more comment. i'm assuming there not segments large. since using segments diagonals of boxes in rtree, if have large segments in line, mean huge box assigned segment, , addresses boxes intersect it. avoid this, can increase artificially resolution of linestrins adding more intermediate points.
import math rtree import index shapely.geometry import polygon, linestring infty = 1000000 min_size = .8 # min_size should vaule such if build box centered in each # point edges of size 2*min_size, know priori @ least 1 # segment intersected box. otherwise, inexact # solution, there exception checking this, though. def distance(a, b): return math.sqrt( (a[0]-b[0])**2 + (a[1]-b[1])**2 ) def get_distance(apoint, segment): = apoint b, c = segment # t = <a-b, c-b>/|c-b|**2 # because p(a) = t*(c-b)+b ortogonal projection of vector # on rectline includes points b , c. t = (a[0]-b[0])*(c[0]-b[0]) + (a[1]-b[1])*(c[1]-b[1]) t = t / ( (c[0]-b[0])**2 + (c[1]-b[1])**2 ) # if t 0 <= t <= 1 projection in interior of # segment b-c, , point minimize distance # (by pitagoras theorem). if 0 < t < 1: pcoords = (t*(c[0]-b[0])+b[0], t*(c[1]-b[1])+b[1]) dmin = distance(a, pcoords) return pcoords, dmin elif t <= 0: return b, distance(a, b) elif 1 <= t: return c, distance(a, c) def get_rtree(lines): def generate_items(): sindx = 0 lid, l in lines: in xrange(len(l)-1): a, b = l[i] c, d = l[i+1] segment = ((a,b), (c,d)) box = (min(a, c), min(b,d), max(a, c), max(b,d)) #box = left, bottom, right, top yield (sindx, box, (lid, segment)) sindx += 1 return index.index(generate_items()) def get_solution(idx, points): result = {} p in points: pbox = (p[0]-min_size, p[1]-min_size, p[0]+min_size, p[1]+min_size) hits = idx.intersection(pbox, objects='raw') d = infty s = none h in hits: nearest_p, new_d = get_distance(p, h[1]) if d >= new_d: d = new_d s = (h[0], h[1], nearest_p, new_d) result[p] = s print s #some checking remove after adjust constants if s == none: raise exception("it seems infty not big enough.") pboxpol = ( (pbox[0], pbox[1]), (pbox[2], pbox[1]), (pbox[2], pbox[3]), (pbox[0], pbox[3]) ) if not polygon(pboxpol).intersects(linestring(s[1])): msg = "it seems min_size not big enough. " msg += "you inexact solutions if remove exception." raise exception(msg) return result
i tested functions example.
xcoords = [i*10.0/float(1000) in xrange(1000)] l1 = [(x, math.sin(x)) x in xcoords] l2 = [(x, math.cos(x)) x in xcoords] points = [(i*10.0/float(50), 0.8) in xrange(50)] lines = [('l1', l1), ('l2', l2)] idx = get_rtree(lines) solutions = get_solution(idx, points)
Comments
Post a Comment