How to tell if point is in water
Problem statement
Suppose you import a point to the map, representable as a single node. Like a shelter, a picnic table, a bench or anything else. An imprecision in points coordinates or in borders of water features already placed on the map may lead to situation when a node gets placed into water.
This is not uncommon for OpenStreetMap as many water bodies are still only roughly outlined after low-resolution satellite imagery available 10 more than years ago.
Regardless of what caused it, such node placement is untidy and confusing.
To address this situation fully automatically is problematic. To detect and inform a human about it should be easier.
So, given a pair of latitude/longitude numbers for a node and an existing map in some form, we want to determine with high fidelity (and in reasonable time) whether or not these coordinates point inside any water body.
Honest solution
For the case of OpenStreetMap, its contents is available in a form of data extracts which contain (multi)polygons for closed and open linear features, such as oceans, lakes, rivers, islands etc.
I do not have a working solution for it, so I can only outline some basic ideas of how I would approach the problem in an honest manner.
- Download all multipolygons corresponding to water surfaces. Both positive and negative (i.e. inner islands) water polygons inside multipolygons must be considered. The selection may be limited to represent only a chosen country or other administrative or any other type of border if there is a guarantee that the point will be inside that border.
- For given point, determine a set of polygons possibly encircling it. A quick solution here is to see whether a polygon’s bounding box contains the point or not.
- Determine which polygons encircle the point for sure. A corner case when the point lies precisely on a borderline, or indistinguishably close to it, creates complications, see below.
- For all encircling polygons, create a nesting hierarchy. Basically we need to take into account that mainland contains water bodies, which can contain islands, which can contain other smaller lakes etc. etc. An interesting question here is what to do for cases when two polygons are unordered, i.e. two lakes’ borders cross each other, or an island of land is not specified as an inner boundary of a lake. These issues are manifestation of mapping imperfections which are very common in the OSM, but they must be addressed.
- The type of the “smallest” encircling polygon determines the answer. If it is of land type, the point is outside water.
The resulting answer may be one of three alternatives.
- Yes, the point is in water.
- No, the point is outside water.
- Cannot determine. The point happens to be close enough to a water/land border so that floating point, or rather input data precision does not allow to make a conclusive answer.
Quick and dirty way
I did not have time to code the above ideas into an algorithm. What is more problematic is the sheer number of polygons to be considered for each tested point. A quick and dirty solution in an interpreted language might turn out to be too slow for practical purposes.
Calculation of whether a point is inside a vector polygon is rather computationally intensive. It is much simpler to determine an answer when map data is raster. But wait, there already exists a representation of OSM data in a form of raster images — tile servers produce it!
Another solution was quickly outlined and implemented.
- From latitude/longitude and chosen zoom level determine raster tile number.
- Download the tile image from a tile server.
- From latitude/longitude determine pixel coordinate inside the tile.
- Extract pixel color (RGB or in other format) from the tile image.
- Compare color of the pixel against predetermined color of water for given map style.
Yeah, this is cheap and dirty, as we basically relegate the most difficult part of the job to a remote computer. That’s the whole point of it.
Python code for this idea is embodied into this script.
Many things to consider
- The method won’t work if there is no fixed water color used in tiles, or if the color is used to denote other types of surfaces. For example, if water is represented with a “textured” tile in some styles of renderings. Wetland also often uses pixels of the same color as water. The Mapnik map style provides solidly filled water tiles.
- Tiles will lossy formats (e.g. JPEG) may pose a similar challenge as pixel’s colors may be distorted by compression. Standard OSM Mapnik servers serve lossless PNG.
- Higher zoom level means less chances for false positives/negatives but also incurs higher delay on the server side. Tiles with lower zoom are more likely to be generated faster or already be in a remote cache.
- Remote tile servers must operate with fresh data to generate up-to date tile images.
- The algorithm may be enhanced to report “don’t know” outcomes by comparing surrounding pixels of the tile and detecting when the central pixel is actually close non-water-colored pixels denoting closeness to water/land border.
- The method won’t work if water surfaces have overlay rendering of unrelated data. For example, sea routes, administrative borders, military and park surfaces are shaded differently even when they cover water.
- Zoom levels generated by public servers are limited. Typically, you cannot go lower than zoom 19 with standard OSM servers. Some alternative servers (e.g. Basque style) provide tiles up to zoom level 20, but they have lower capacity.
Notes on tile server load and (ab)use
The official OSM tile servers are not meant to be continuously used in a manner described above. Therefore all sorts of remote issues can be reported instead of returning a valid tile image. One can expect timeouts for tiles that failed to be generated and are not in cache, timeouts for overloaded servers, limitation of bandwidth for clients hogging the network etc.
It helps to cache recently downloaded tiles on local file system so that repeating requests for same or close points will not have to wait for a remote reply.
Ideally, for serious usage you should install your own local tile server and to tune up its rendering style to best fit your goals. For example, no additional shading of water should be applied.
A local tile server will be also faster and more reliable to access.