Batch importing data in Addok
For now, two formats are supported:
- JSON stream
- Nominatim PostgreSQL database
JSON stream
By default, Addok expect a line delimited JSON stream as input file. This means one JSON object per line.
Keys
The expected keys are the ones declared in the FIELDS
attribute of your
configuration, plus some special cases:
- type is expected even if not declared as index field
- importance is expected; must be a
float
between 0 and 1 - id is expected
- lat and lon are expected
- housenumbers has a special format:
{number: {lat: yyy, lon: xxx}}
(optionally you can add anid
key for each housenumber entry)
Example
{"id": "091220245G","type": "street","name": "Chemin du Capitany","postcode": "09000","lat": "42.985755","lon": "1.620815","city": "Foix","departement": "Ariège", "region": "Midi-Pyrénées","importance": 0.4330 ,"housenumbers":{"13": {"lat": 42.984811,"lon": 1.620876},"15": {"lat": 42.984753,"lon": 1.620853},"39": {"lat": 42.983546,"lon": 1.621076},"8": {"lat": 42.985458,"lon": 1.621405},"23": {"lat": 42.984555,"lon": 1.620902},"29": {"lat": 42.984316,"lon": 1.620935},"37": {"lat": 42.983634,"lon": 1.621068},"19": {"lat": 42.984621,"lon": 1.620882},"33": {"lat": 42.983977,"lon": 1.621015},"6": {"lat": 42.985452,"lon": 1.621171},"25": {"lat": 42.984532,"lon": 1.620908},"21": {"lat": 42.984577,"lon": 1.620895},"1": {"lat": 42.986563,"lon": 1.620000},"11bis": {"lat": 42.985138,"lon": 1.621094},"17": {"lat": 42.984682,"lon": 1.620868},"3": {"lat": 42.986394,"lon": 1.620150}}}
{"id": "091220259X","type": "street","name": "Avenue du Cardié","postcode": "09000","lat": "42.964308","lon": "1.595493","city": "Foix","departement": "Ariège", "region": "Midi-Pyrénées","importance": 0.4447 ,"housenumbers":{"13": {"lat": 42.964574,"lon": 1.595418},"19": {"lat": 42.964575,"lon": 1.595455},"15": {"lat": 42.964575,"lon": 1.595431},"6": {"lat": 42.964263,"lon": 1.595222},"1bis": {"lat": 42.964322,"lon": 1.596193},"17": {"lat": 42.964575,"lon": 1.595443},"23": {"lat": 42.964298,"lon": 1.594952},"27": {"lat": 42.964245,"lon": 1.594844},"1": {"lat": 42.964357,"lon": 1.596028},"2bis": {"lat": 42.964245,"lon": 1.596177},"4": {"lat": 42.964279,"lon": 1.595486},"2": {"lat": 42.964291,"lon": 1.595999},"8": {"lat": 42.964250,"lon": 1.595043},"5": {"lat": 42.964572,"lon": 1.595620},"33": {"lat": 42.963863,"lon": 1.594884},"9": {"lat": 42.964574,"lon": 1.595389},"31": {"lat": 42.964156,"lon": 1.594847},"21": {"lat": 42.964314,"lon": 1.595171},"11": {"lat": 42.964574,"lon": 1.595406},"7": {"lat": 42.964342,"lon": 1.595551},"35": {"lat": 42.963611,"lon": 1.594916},"3": {"lat": 42.964366,"lon": 1.595803},"25": {"lat": 42.964272,"lon": 1.594840}}}
Update and delete
If you want to manage diffs, you can add an _action
key with one of the
following values:
update
: will first deindex documentdelete
: will deindex document; only keyid
is required then
Command line
To run the actual import:
addok batch path/to/file.sjson
Then you need to index ngrams:
addok ngrams
Example with BANO
-
Download BANO data and uncompress it
-
Run batch command:
addok batch path/to/full.sjson
-
Index edge ngrams:
addok ngrams
If you only want a subset of the data (the whole BANO dataset requires 20GB of RAM), you can extract it from full file with a command like:
sed -n 's/"Île-de-France"/&/p' path/to/full.sjson > idf.sjson
Import from Nominatim
Once you have a Nominatim database up and running, just run:
addok batch --dbuser ybon
addok ngrams
Check out the available settings for PSQL.
Read from stdin
import
command can read from stdin, for example:
$ addok batch < ~/Data/geo/ban/communes-context.json
Or:
$ less ~/Data/geo/ban/communes-context.json | addok batch
More options
Run addok --help
to see the available options.