Using ElasticSearch for Spatial Search tutorial
In this post I will demonstrate using ElasticSearch to spatially query records and filter them by attributes.
ElasticSearch is built on top of Lucence which in version 4.0 supports Spatial query features, for those interested here is an example and link to javadocs.
You will need to install ElasticSearch, read about installing it here, also install marvel plugin for configuration and testing.
For this demonstration I have used data from Geofabrik. I have used point.shp file from Great Britan but you can use data from any other country as well.
To extract the data,I wrote a quick python script using pyshp library
One can easily install it by typing sudo easy_install pyshp in shell
Mapping type setup
Before uploading the records we need to create an Index in elasticsearch and apply schema mapping where we will instruct elasticsearch to create a geo index on location attribute.
We can do this easily by issuing a curl request
curl -XPUT http://localhost:9200/places -d '
{
"mappings": {
"place": {
"properties": {
"id": {"type": "double"},
"name": {"type": "string"},
"type": {"type": "string"},
"location": {"type": "geo\_point"}
}
}
}
}
Once the schema for index places has been set up, its now time to add records to the index.
In the python script below modify the name of the shape file path and execute.
import shapefile
import urllib2
import json
sf = shapefile.Reader("points")
sr = sf.shapeRecords()
for r in sr:
try :
if r.record[2].strip() and r.record[3].strip():
req = urllib2.Request('http://localhost:9200/places/place/')
req.add\_header('Content-Type', 'application/json')
data = {'id': r.record[0].strip(),'name':r.record[2].strip(),'type':r.record[3].strip(),'location':{'lat':r.shape.points[0][1],'lon':r.shape.points[0][0]}}
response = urllib2.urlopen(req, json.dumps(data))
print r.record[2]
except Exception,e:
print e
#print "ERROR ",r.record[0],r.record[2],r.record[3] , r.shape.points[0][0], r.shape.points[0][1]
pass
The script inserts all records which have a valid name and type column into index.
Verify
Once complete we can quickly verify the number of records in the index by issuing.
GET places/_count from marvel or from a browser http://localhost:9200/places/_count
Spatial Search
In Marvel/Sense
GET places/\_search
{
"sort" : [
{
"\_geo\_distance" : {
"location" : {
"lat": 51.5286416,
"lon": -0.10159870000006777
},
"order" : "asc",
"unit" : "km"
}
}
],
"query": {
"filtered" : {
"query" : {
"match\_all" : {}
},
"filter" : {
"geo\_distance" : {
"distance" : "20km",
"location" : {
"lat": 51.5286416,
"lon": -0.10159870000006777
}
}
}
}
}
}
or by curl
curl -XGET 'http://localhost:9200/places/place/\_search?pretty=true' -d '
{
"sort" : [
{
"\_geo\_distance" : {
"location" : {
"lat": 51.5286416,
"lon": -0.10159870000006777
},
"order" : "asc",
"unit" : "km"
}
}
],
"query": {
"filtered" : {
"query" : {
"match\_all" : {}
},
"filter" : {
"geo\_distance" : {
"distance" : "20km",
"location" : {
"lat": 51.5286416,
"lon": -0.10159870000006777
}
}
}
}
}
}'
To search by Geodistance as well as a term filter, modify the query to.
GET places / \_search ? size = 100 & from = 0 {
"sort": [{
"\_geo\_distance": {
"location": {
"lat": 51.5286416,
"lon": -0.10159870000006777
},
"order": "asc",
"unit": "km"
}
}],
"query": {
"filtered": {
"query": {
"bool": {
"should": [{
"term": {
"type": "pub"
}
}]
}
},
"filter": {
"geo\_distance": {
"distance": "1km",
"location": {
"lat": 51.5286416,
"lon": -0.10159870000006777
}
}
}
}
}
}
For better visualisation I have created a nice webapp here.
Hope this helps