ipona

What in heavens name am I doing?

SQL Server Spatial and Invalid Geography data

| 0 comments

I’ve been investigating the issues with dealing with invalid geography data, from live GPS points, and how that can be stored in SQL Server spatial types. Firstly I must retract the statement about using TRY/CATCH, since although you can use it, the resultant type won’t contain invalid data, it will be null. I have so much data I screwed the tests. So now my options are to create MULTILINESTRINGs or use the Geometry type, both of which I’ve considered and are both solutions that were suggested as a comment on the previous entry. I also spent some time yesterday discussing this with Johannes Kebeck (who incidentally is a great speaker), and he pretty much came to the same conclusions.

Using MULTILINESTRINGs

Let’s look at the MULTILINESTRING option in more detail, using some realistic data. Consider the following set of points:

-0.173721313476559 51.503727323059300
-0.171961784362805 51.503860889088400
-0.168142318725583 51.503820819320800
-0.168292522430424 51.502338213148900
-0.171983242034926 51.502324856117400
-0.172004699707046 51.503834175913900
-0.168099403381342 51.503794106122800
-0.165288448333734 51.503727323059300

This represents the following line:

image

It’s difficult to see the flow properly here, but it’s along, down, back, up, along again. If I shift a couple of points down a bit, you can see the flow a little better, starting from the left, doing a loop down before heading back to the right.

image

In real life we could end up with either up these situations; the difference between points 2 and 6 (the top left of the box area) in the second picture is only 4 metres, so well within the bounds of GPS drift. But, the above diagram is just to illustrate the flow of points better, so is really an aside. Let’s get back to the original point data, which we’d normally load into a spatial type using the following:

DECLARE @coords nvarchar(max) = ‘LINESTRING(-0.17372131347655934 51.50372732305939, -0.17196178436280585 51.50386088908848, -0.168142318725583 51.50382081932085, -0.16829252243042417 51.50233821314898, -0.171983242034926 51.5023248561174, -0.17196178436280585 51.50386088908848, -0.168142318725583 51.50382081932085, -0.1652884483337343 51.50372732305939)’
DECLARE @line Geography = Geography::STGeomFromText(@coords, 4326)

But of course we can’t use this code, because the data creates an invalid geography type; the lines between points 2 – 3 and 6 – 7 are the same.With the MULTILINESTRING approach, we’d use the following:

DECLARE @coords nvarchar(max) = ‘MULTILINESTRING((-0.17372131347655934 51.50372732305939, -0.17196178436280585 51.50386088908848),(-0.17196178436280585 51.50386088908848, -0.168142318725583 51.50382081932085),(-0.168142318725583 51.50382081932085, -0.16829252243042417 51.50233821314898),(-0.16829252243042417 51.50233821314898, -0.171983242034926 51.5023248561174),(-0.171983242034926 51.5023248561174, -0.17196178436280585 51.50386088908848),(-0.17196178436280585 51.50386088908848, -0.168142318725583 51.50382081932085),(-0.168142318725583 51.50382081932085, -0.1652884483337343 51.50372732305939))’
SET @multiline = Geometry::STMLineFromText(@coords, 4326)

This just repeats each pair of coordinates, apart from the first and last. It’s still an invalid geography type though, so I’m not sure of the value of this. Everything I’ve read, and the blog comment plus Johannes’s comments led me to the view that this would be valid, but I still get an exception when trying to create this.

Making valid data with Geometry types

The second option is to load the original data into a geometry type instead of a geography type:

DECLARE @coords nvarchar(max) = ‘LINESTRING(-0.17372131347655934 51.50372732305939, -0.17196178436280585 51.50386088908848, -0.168142318725583 51.50382081932085, -0.16829252243042417 51.50233821314898, -0.171983242034926 51.5023248561174, -0.17196178436280585 51.50386088908848, -0.168142318725583 51.50382081932085, -0.1652884483337343 51.50372732305939)’
DECLARE @line Geometry = Geometry::STGeomFromText(@coords, 4326)

This loads without an exception because the geometry type can contain invalid data. You can use IsValid() to check, and MakeValid() to make it into a valid type:

DECLARE @fixed Geography = Geography::Parse(@line2.MakeValid().ToString())

We now have a geography type, with corrected data. What does it look like? Well, on an image, pretty much the same:

image

The points however, show that the data has changed:

  Original     Made Valid  
1 -0.173721313476559 51.503727323059300   -0.173721313476559 51.503727323059300
2 -0.171961784362805 51.503860889088400   -0.171961784362805 51.503860889088400
3 -0.168142318725583 51.503820819320800   -0.168142318725583 51.503820819320800
4 -0.168292522430424 51.502338213148900   -0.168292522430424 51.502338213148900
5 -0.171983242034926 51.502324856117400   -0.171983242034926 51.502324856117400
6 -0.172004699707046 51.503834175913900   -0.171961784362805 51.503860889088400
7 -0.168099403381342 51.503794106122800   -0.168142318725583 51.503820819320800
8 -0.165288448333734 51.503727323059300   -0.165288448333734 51.503727323059300

It’s not so easy to see, but points 6 and 7 have changed. What’s the difference? Well, we can use the spatial features to find out:

DECLARE @p1 geography = Geography::Point(51.503834175913900, -0.172004699707046, 4326)
DECLARE @p2 geography = Geography::Point(51.503860889088400, -0.171961784362805, 4326)

DECLARE @p3 geography = Geography::Point(51.503794106122800, -0.168099403381342, 4326)
DECLARE @p4 geography = Geography::Point(51.503820819320800, -0.168142318725583, 4326)

PRINT @p1.STDistance(@p2)
PRINT @p3.STDistance(@p4)

The answer for both is 4.20861 metres apart. Is this accurate enough? Possibly, depending upon your application, but I’m not sure it is for me, especially if you consider the data could be used for legal reasons. We’d still have the original point data, but if the corrected line string data is used to make decisions, there could be an issue. It would be good to work out what the formula used is when SQL Server makes this line string valid (one day I’ll delve into the assembly and look); the real question is, could making the line valid radically change the ordering of points, as it does for some geometry types. Consider the following:

image

Made up from the following points:

MULTILINESTRING ((0 10, 5 10), (5 10, 10 10), (10 10, 10 5), (10 5, 5 5), (5 5, 5 10), (5 10, 10 10), (10 10, 15 10))

Making this valid, results in the same view, the same set of lines, but the points are completely different.

MULTILINESTRING ((15 10, 10 10), (0 10, 5 10, 5 5, 10 5, 10 10, 5 10))

If all you’re describing is a shape, or a boundary, or a set of lines, then this is perfectly acceptable, but when the ordering of the points within your line is critical, then MakeValid won’t work for you. Will this affect my real life data? Who knows, I’ve yet to perform an analysis; if MakeValid only moves points (and even then that 4m move may be too much), then it might be OK, but if it strips points and re-orders, then I’ll have to abandon all plans to use line strings.

In defence of SQL Server, I understand why the data is deemed invalid, because that’s what the standard says, and Microsoft have followed the standard. But this is one area where the standard really sucks for real life tracking data; maybe these scenarios weren’t considered, or they were considered and the view was that line strings wouldn’t be suitable.

Leave a Reply

Required fields are marked *.