Images to OSM
This project uses the Mask R-CNN algorithm to detect features in satellite images. The goal is to test the Mask R-CNN neural network algorithm and improve OpenStreetMap by adding high quality baseball, soccer, tennis, football, and basketball fields to the map.
The Mask R-CNN was published March 2017, by the Facebook AI Research (FAIR).
This paper claims state of the art performance for detecting instance segmentation masks. The paper is an exciting result because "solving" the instance segmentation mask problem will benefit numerious practical applications outside of Facebook and OpenStreetMap.
Using Mask R-CNN successfully on a new data set would be a good indication that the algorithm is generic enough to be applicable on many problems. However, the number of publicly available data sets with enough images to train this algorithm are limited because collecting and annotating data for 50,000+ images is expensive and time consuming.
Microsoft's Bing satellite tiles, combined with the OpenStreetMap data, is a good source of segmentation mask data. The opportunity of working with a cutting edge AI algorithms and doing my favorite hobby (OSM) was too much to pass up.
Mask R-CNN finding baseball, basketball, and tennis fields in Bing images.
Mask R-CNN Implementation
At this time (end of 2017), Facebook AI research has not yet released their implementation. Matterport, Inc has graciously released a very nice python implementation of Mask R-CNN on github using Keras and TensorFlow. This project is based on Matterport, Inc work.
Why Sports Fields
Sport fields are a good fit for the Mask R-CNN algorithm.
- They are visible in the satellite images regardless of the tree cover, unlike, say, buildings.
- They are "blob" shape and not a line shape, like a roads.
- If successful, they are easy to conflate and import back into OSM, because they are isolated features.
Training with OSM
The stretch goal for this project is to train a neural network at human level performance and to completely map the sports fields in Massachusetts in OSM. Unfortunately the existing data in OSM is not of high enough quality to train any algorithm to human level performance. The plan is to iteratively train, feed corrections back to OSM, and re-train, bootstrapping the algorithm and OSM together. Hopefully a virtuous circle between OSM and the algorithm will form until the algorithm is good as a human mapper.
The training workflow is in the trainall.py, which calls the following scripts in sequence.
- getdatafromosm.py uses overpass to download the data for the sports fields.
- gettilesfrombing.py uses the OSM data to download the required Bing tiles. The script downloads the data slowly, please expect around 2 days to run the first time.
- maketrainingimages.py collects the OSM data, and the Bing tiles into a set of training images and masks. Expect 12 hours to run each time.
- train.py actually runs training for the Mask R-CNN algorithm. Expect that this will take 4 days to run on single GTX 1080 with 8GB of memory.
Convert Results to OSM File
createosmanomaly.py runs the neural network over the training image set and suggests changes to OSM.
This script converts the neural network output masks into the candidate OSM ways. It does this by fitting perfect rectangles to tennis and basketball mask boundaries. For baseball fields, the OSM ways are a fitted 90 degree wedges and the simplified masks boundary. The mask fitting is a nonlinear optimization problem and it is performed with a simplex optimizer using a robust Huber cost function. The simplex optimizer was used because I was too lazy code a partial derivative function. The boundary being fit is not a gaussian process, therefor the Huber cost function is a better choice than a standard least squared cost function. The unknown rotation of the features causes the fitting optimization to be highly non-convex. In English, the optimization gets stuck in local valleys if it is started far away from the optimal solution. This is handled by simply seeding the optimizer at several rotations and emitting all the high quality fits. A human using the reviewosmanomaly.py script sorts out which rotation is the right one. Hopefully as the neural network performance on baseball fields improves the alternate rotations can be removed.
In order to hit the stretch goal, the training data from OSM will need to be pristine. The script will need to be extended to identify incorrectly tagged fields and fields that are poorly traced. For now, it simply identifies fields that are missing from OSM.
The reviewosmanomaly.py is run next to visually approve or reject the changes suggested in the anomaly directory.
Note this is the only script that requires user interaction. The script clusters together suggestions from createosmanomaly.py and presents an gallery options. The the user visually inspects the image gallery and approves or reject changes suggested by createosmanomaly.py. The images shown are of the final way geometry over the Bing satellite images.
The createfinalosm.py creates the final .osm files from the anomaly review done by reviewosmanomaly.py. It breaks up the files so that the final OSM file size is under the 10,000 element limit of the OSM API.
Phase 1 - Notes
Phase 1 of the project is training the neural network directly off of the unimproved OSM data, and importing missing fields from the training images back into OSM. About 2,800 missing fields were identified and will soon be imported back into OSM.
For tennis and basketball courts the performance is quite good. The masks are rectangles with few false positives. Like a human mapper it has no problem handling clusters of tennis and basketball courts, rotations, occlusions from trees, and different colored pavement. It is close, but not quite at human performance. After the missing fields are imported into OSM, hopefully it will reach human level performance.
The good news/bad news are the baseball fields. They are much more challenging and interesting than the tennis and basketball courts. First off, they have a large variation in scale. A baseball field for very small children is 5x to 6x smaller than a full sized field for adults. The primary feature to identify a baseball field is the infield diamond, but the infield is only a small part of the actual full baseball field. To map a baseball field, the large featureless grassy outfield must be included. The outfields have to be extrapolated out from the infield. In cases where there is a outfield fence, the neural network does quite well at terminating the outfield at the fence. But most baseball fields don't have an outfield fence or even a painted line. The outfields stretch out until they "bump" into something else, a tree line, a road, or another field while maintaining its wedge shape. Complicating the situation, is that like the neural network, the OSM human mappers are also confused about how to map the outfields without a fence! About 10% of the mapped baseball fields are just the infields.
The phase 1 neural network had no trouble identifying the infields, but it was struggling with baseball outfields without fences. In the 2,800 identified fields, only the baseball fields with excellent outfield were included. Many missing baseball fields had to be skipped because of poor outfield performance. Hopefully the additional high quality outfield data imported into OSM will improve its performance in this challenging area on the next phase.
- Ubuntu 17.10
- A Bing key, create a secrets.py file, add in bingKey ="your key"
- Create a virtual environment python 3.6
- In the virtual environment, run "pip install -r requirements.txt"
- TensorFlow 1.3+
- Keras 2.0.8+.