In my previous post I was talking about the new SQL Server 2008 spatial features and how they help with most types of location data, but I mentioned the big issue is the trace back one, where a line isn’t valid if it goes back on itself. I thought I’d explain this more clearly, so consider the following points that define a line:
-0.17187595367432515, 51.50221132119076
-0.17191886901856548, 51.503840854209066
-0.1680564880371023, 51.50378742782091
-0.17191886909856548, 51.503840854209066
This defines 3 segments: vertically up, across to the right, and then back to the left, back to the second point. It’s not immediately obvious that there are two horizontal lines, because they are so close to each other, but they are there.
Points 2 and 4 are almost the same, only 1 digit difference and that at the 11th decimal place, and that’s enough to make this a valid line string. Change that single digit so points 2 and 4 are the same and it’s invalid; attempt to create a LINESTRING based on the invalid points gives:
A .NET Framework error occurred during execution of user-defined routine or aggregate “geography”:
System.ArgumentException: 24200: The specified input does not represent a valid geography instance.
The figures above are to a very high resolution, but consider a general purpose GPS unit, with perhaps only 5 significant places for it’s resolution, the chances of points overlapping increases. You can easily take two GPS readings without moving and get two different points, or three readings and three points, but giving only two locations (I have data where this has happened). The accuracy of modern GPS receivers is around 5-10m, sometimes better, sometimes worse; more expensive differential or gyro-based GPS units give more accuracy, down to centimetres, but these are more expensive and generally not available in consumer units.
Most applications that track objects, whether they be people or vehicles, could easily come up against this sort of data. It’s not hard to think of scenarios; consider tracking a car, which is stopped in traffic; a car coming the opposite way wishes to turn in front of it, so the first car reverses back a little, gives the second car room, then edges forward again as traffic moves. As we start to rely upon location based services more, our reliance upon accurate data increases. My view is that the rules on validity of line strings needs to be relaxed, although some browsing indicates that the rules are a well-established standard, so I’m on a knife edge: extend the standard or break it to make my life easier? I’m a huge fan of standards, but this one doesn’t seem to be that practical in real life. Many of the examples you see show a few individual points, or county and country boundaries, but there’s very little real life tracking data.
Multiple Line Strings
Even if Microsoft do relax the limitations on what constitutes a valid line string, it’s not going to appear until the next version, so workarounds need to be used. First is the support for multiple geometries, which is easier to explain if we go back to simple points:
0, 0
0, 10
0, 5
This is an invalid LINESTRING, so you couldn’t do this:
DECLARE @line Geography =
Geography::STGeomFromText(‘LINESTRING(0 0, 0 10, 0 5)’, 4326)
But you could represented this as a MULTILINESTRING:
DECLARE @line Geography =
Geography::STGeomFromText(‘MULTILINESTRING(0 0, 0 10),(0 10, 0 5)’, 4326)
This consists of two line strings, so it perfectly valid. The problem now becomes, how do you know where to split the list of points? You could, perhaps, build up a collection; if a point already exists in the collection, start a new set of points. It would work, but isn’t very efficient (although if you don’t create these very often that might be irrelevant), plus it’s harder to do in SQL.
Exception Handling in SQL
Another option, which I’m currently investigating, could be to just wrap the creation of the line string in a try catch. Yes, for those that didn’t know, SQL has exception handling. For example, I could do:
DECLARE @line Geography
BEGIN TRY
SET @line = Geography::STGeomFromText(‘LINESTRING(0 0, 0 10, 0 5)’, 4326)
END TRY
BEGIN CATCH
END CATCH
This solves the problem of the invalid geometry by simply ignoring the exception. The geography object exists and contains the correct data, although there’s no guarantee that any of the spatial methods will produce the correct results when used on this invalid type.
Investigations continue …
October 8, 2009 at 11:17 am
There are two more options:
1) Create a Multilinestring and add a Linestring for each two consecutive points. This way you don’t need to worry about exception handling and other issues. On the other hand this will double total number of points. Other than that, there should not be other issues, like if you want to calculate length or see if the path is intersecting with some objects.
2) Construct a geometry Linestring by putting directly all Lat,Lon values. Then call MakeValid and construct Geography by parsing WKT constructed by MakeValid, such as:
geography::Parse(geom.MakeValid().ToString())
October 9, 2009 at 9:15 am
Milan
Number 1 is an interesting option, I’d not thought about just duplicating every pair. I’m, not sure this is something I want to do given the amount of data I have, but it might be worth persuing.
Using Geometry is something I’m investigating at the moment, trying to work out wheth MakeValid changes the data in a way that makes it invalid for us.