Introduction
Osm2pgsql is used to import OSM data into a PostgreSQL/PostGIS database for rendering into maps and many other uses. Usually it is only part of a toolchain, for instance other software is needed that do that actual rendering (i.e. turning the data into a map), deliver the maps to the user etc.
Osm2pgsql is a fairly complex piece of software and it interacts with the database system and other pieces of the toolchain in complex ways. It will take a while until you get some experience with it. It is strongly recommended that you try out osm2pgsql with small extracts of the OSM data, for instance the data for your city. Do not start with importing data for the whole planet, it is easy to get something wrong and it will break after processing the data for hours or days. You can save yourself a lot of trouble doing some trial runs starting with small extracts and working your way up, observing memory and disk usage along the way.
It helps to be familiar with the PostgreSQL database system and the PostGIS extension to it as well as the SQL database query language. Some knowledge of the Lua language is also useful.
This manual always documents the current version of osm2pgsql. If same information is only valid in certain versions, the section will have a note like this: Version >= 1.4.0
It is recommended that you always use the newest released version of osm2pgsql. Earlier versions sometimes contain bugs that have long since been fixed.
System Requirements
Operating System
Osm2pgsql works on
Linux,
Windows, and
macOS.
Osm2pgsql is developed on Linux and most of the developers don’t have experience with running it on anything but Linux, so it will probably works best there. This documentation is also somewhat geared towards Linux users. That being said, we strive to make the software work on Windows and macOS as well. Please report any problems you might have.
Only 64bit systems are supported.
Read the installation instructions for details on how to install osm2pgsql on your system.
Main Memory
Memory requirements for your system will vary widely depending on the size of your input file. Is it city-sized extract or the whole planet? As a rule of thumb you need at least as much main memory as the PBF file with the OSM data is large. So for a planet file you currently need at least 64 GB RAM. Osm2pgsql will not work with less than 2 GB RAM.
More memory can be used as cache and speed up processing a lot.
Disk
You are strongly encouraged to use a SSD (NVMe if possible) for your database. This will be much faster than traditional spinning hard disks.
Requirements for your system will vary widely depending on
- the amount of data you want to store (city-sized extract or the whole planet?)
- whether or not you want to update the data regularly
Database Software
You need the PostgreSQL database system with the PostGIS extension installed.
Osm2pgsql aims to support all PostgreSQL and PostGIS versions that are currently supported by their respective maintainers. Currently PostgreSQL versions 9.6 and above and PostGIS versions 2.2 and above are supported. (Earlier versions might work but are not tested any more.) PostgreSQL version 11 or above and PostGIS version 2.4 or above is recommended.
In some cases older versions of osm2pgsql have problems with newer database software versions. You should always run the newest released osm2pgsql version which should ensure you get any fixes we have done to make osm2pgsql work well with any PostgreSQL version.
Osm2pgsql does not work with database systems other than PostgreSQL. The PostgreSQL/PostGIS combination is unique in its capabilities and we are using some of their special features. We are not planning to support any other database systems. There are some database systems out there which claim compatibility with PostgreSQL, they might work or they might not work. Please tell us if you have any experience with them.
Lua Scripting Language
Some parts of osm2pgsql require the use of the Lua scripting language. It is highly recommended that you use a version of osm2pgsql with Lua support enabled, because it gives you a much more flexible configuration. If available osm2pgsql can also be compiled using the Lua JIT (just in time) compiler.
Osm2pgsql can also be compiled without Lua support. In that case the pgsql and gazetteer outputs are the only available.
Preparing the Database
Before you can import any OSM data into a database, you need a database.
Installing PostgreSQL/PostGIS
You need the PostgreSQL database software and the PostGIS plugin for the database. Please read the respective documentation on how to install it. On Linux this can almost always be done with the package manager of your distribution.
Creating a Database
To create a database that can be used by osm2pgsql follow these steps:
- Create a database user that osm2pgsql will use. This user doesn’t need
any special rights. We’ll use
osmuser
here. - Create a database that osm2pgsql will use belonging to the user you just
created. We’ll use
osm
as a name here. - Enable the
postgis
andhstore
extensions in the newly created database.
On a typical Linux system you’ll have a system user called postgres
which
has admin privileges for the database system. We’ll use that user for these
steps.
Here are typical commands used:
sudo -u postgres createuser osmuser
sudo -u postgres createdb --encoding=UTF8 --owner=osmuser osm
sudo -u postgres psql osm --command='CREATE EXTENSION postgis;'
sudo -u postgres psql osm --command='CREATE EXTENSION hstore;'
Security Considerations
Osm2pgsql does not need any special database rights, it doesn’t need superuser status and doesn’t need to create databases or roles. You should create a database user for the specific use by osm2pgsql which should not have any special PostgreSQL privileges.
Any osm2pgsql setup will need a database to work on. You should create this as
PostgreSQL superuser and change ownership (with --owner
option on the
createdb
command or an OWNER=
clause on the CREATE DATABASE
SQL command)
to the user osm2pgsql is using. This way the database user that osm2pgsql uses
doesn’t need database creation privileges.
Typical osm2pgsql setups need the postgis
and hstore
extensions to be
enabled in the database. To install these you need superuser privileges in the
database. Enable them (with CREATE EXTENSION
) as PostgreSQL superuser on the
database that osm2pgql is using before you run osm2pgsql.
Of course osm2pgsql needs to be able to create tables and write to the
database. Usually it can do this as owner of the database created for it.
Using the data, on the other hand, doesn’t need any of those rights. So the
map rendering software you are using, for instance, usually only needs to
read the data. It is recommended that you run these as a different database
user, distinct from the database user osm2pgsql is using and only give that
user SELECT
rights (with the GRANT
command).
Version >= 1.4.0 If you are using a security scheme based on
database schemas in your database you can use the --middle-schema
and
--output-pgsql-schema
options and the schema
table option in the flex
output, respectively, to tell osm2pgsql to load data into specific schemas. You
have to create those schemas and give them the correct rights before running
osm2pgsql.
Encoding
OpenStreetMap data is from all around the world, it always uses UTF-8 encoding. osm2pgsql will write the data as is into the database, so it has to be in UTF-8 encoding, too.
On any modern system the default encoding for new databases should be UTF-8,
but to make sure, you can use the -E UTF8
or --encoding=UTF8
options when
creating the database for osm2pgsql with createdb
.
Tuning the PostgreSQL Server
Usual installs of the PostgreSQL server come with a default configuration that
is not well tuned for large databases. You should change these settings in
postgresql.conf
and restart PostgreSQL before running osm2pgsql, otherwise
your system will be much slower than necessary.
The following settings are geared towards a system with 64GB RAM and a fast SSD. The values in the second column are suggestions to provide a good starting point for a typical setup, you might have to adjust them for your use case. The value in the third column is the default set by PostgreSQL 13.
Config Option | Proposed Value | Pg 13 Default | Remark |
---|---|---|---|
shared_buffers | 1GB | 128MB | Lower than typical PostgreSQL recommendations to give osm2pgsql priority to RAM. |
work_mem | 50MB | 4MB | |
maintenance_work_mem | 10GB | 64MB | Improves CREATE INDEX |
autovacuum_work_mem | 2GB | -1 | -1 uses maintenance_work_mem |
wal_level | minimal | replica | Reduces WAL activity if replication is not required during data load. Must also set max_wal_senders=0 . |
checkpoint_timeout | 60min | 5min | Increasing this value reduces time-based checkpoints and increases time to restore from PITR |
max_wal_size | 10GB | 1GB | PostgreSQL > 9.4 only. For PostgreSQL <= 9.4 set checkpoint_segments = 100 or higher. |
checkpoint_completion_target | 0.9 | 0.5 | Spreads out checkpoint I/O of more of the checkpoint_timeout time, reducing spikes of disk activity |
max_wal_senders | 0 | 10 | See wal_level |
random_page_cost | 1.0 | 2 | Assuming fast SSDs |
Here are the proposed settings for copy-and-paste into a config file:
shared_buffers = 1GB
work_mem = 50MB
maintenance_work_mem = 10GB
autovacuum_work_mem = 2GB
wal_level = minimal
checkpoint_timeout = 60min
max_wal_size = 10GB
checkpoint_completion_target = 0.9
max_wal_senders = 0
random_page_cost = 1.0
Increasing values for max_wal_size
and checkpoint_timeout
means that
PostgreSQL needs to run checkpoints less often but it can require additional
space on your disk and increases time required for Point in Time Recovery
(PITR) restores. Monitor the Postgres logs for warnings indicating checkpoints
are occuring too frequently: HINT: Consider increasing the configuration
parameter "max_wal_size".
Autovacuum must not be switched off because it ensures that the tables are
frequently analysed. If your machine has very little memory, you might consider
setting autovacuum_max_workers = 1
and reduce autovacuum_work_mem
even
further. This will reduce the amount of memory that autovacuum takes away from
the import process.
For additional details see the Server Configuration chapter and Populating a Database in the Performance Tips chapter in the PostgreSQL documentation.
Expert tuning
The suggestions in this section are potentially dangerous and are not suitable for all environments. These settings can cause crashes and/or corruption. Corruption in a Postgres instance can lead to a “bricked” instance affecting all databases in the instance.
Config Option | Proposed Value | Pg 13 Default | Remark |
---|---|---|---|
full_page_writes | off | on | Warning: Risks data corruption. Set back to on after import. |
fsync | off | on | Warning: Risks data corruption. Set back to on after import. |
Additional information is on the PostgreSQL wiki: Tuning Your PostgreSQL
Server.
The section titled synchronous_commit
contains important information to the
synchronous_commit
and fsync
settings.
Geometry Processing
An important part of what osm2pgsql does is creating geometries from OSM data. In OSM only nodes have a location, ways get their geometry from member nodes and relations get their geometry from member ways and nodes. Osm2pgsql assembles all the data from the related objects into valid geometries.
Geometry Types
The geometry types supported by PostGIS are from the Simple Features defined by the OpenGIS Consortium (OGC). Osm2pgsql creates geometries of these types from OSM data.
Geometry type | Created from OSM data |
---|---|
Point | Created from nodes. |
LineString | Created from ways. |
Polygon | Created from closed ways or some relations. |
MultiPoint | Never created. |
MultiLineString | Created from (split up) ways or some relations. |
MultiPolygon | Created from closed ways or some relations. |
Single vs. Multi Geometries
Generally osm2pgsql will create the simplest geometry it can. Nodes will turn into Points, ways into LineStrings or Polygons. A multipolygon relation can be turned into a Polygon or MultiPolygon, depending on whether it has one or more outer rings. Similarly, a route relation can be turned into a LineString if the route is connected from start to finish or a MultiLineString if it is unconnected or there are places, where the route splits up.
In some cases osm2pgsql will split up Multi* geometries into simple geometries and add each one in its own database row. This can make rendering faster, because the renderer can deal with several smaller geometries instead of having to handle one large geometry. But, depending on what you are doing with the data, in can also lead to problems. This blogpost has some deeper discussion of this issue. See the flex and pgsql output chapters for details how to configure this.
It will also mean that your id columns are not unique, because there are now multiple rows created from the same OSM object. See the Primary Keys and Unique IDs section for an option how to work around this.
Geometry Validity
Point geometries are always valid (as long as the coordinates are inside the correct range). LineString geometries are also always valid, lines might cross themselves, but that is okay. This is more complex for Polygon and MultiPolygon geometries: There are multiple ways how such a geometry can be invalid. For instance, if the boundary of a polygon is drawn in a figure eight, the result will not be a valid polygon. The database will happily store such invalid polygons, but this can lead to problems later, when you try to draw them or do calculations (such as the area) based on the invalid geometry. That’s why osm2pgsql never loads invalid geometries into your database. Instead the object is simply ignored without any message.
You can use the Areas view of the OpenStreetMap inspector to help diagnose problems with multipolygons.
Processing of Nodes
Node geometries are always converted into Point geometries.
Processing of Ways
Depending on the tags, OSM ways model either a LineString or a Polygon, or
both! A way tagged with highway=primary
is usally a linear feature, a way
tagged landuse=farmland
is usually a polygon feature. If a way with polygon
type tags is not closed, the geometry is invalid, this is an error and the
object is ignored. For some tags, like man_made=pier
non-closed ways are
linear features and closed ways are polygon features.
If a mapper wants to override how a way should be interpreted, they can use the
area
tag: The tag area=yes
turns a normally linear feature into a polygon
feature, for instance it turns a pedestrian street (highway=pedestrian
) into
a pedestrian area. The tag area=no
turns a polygon feature into a linear
feature.
There is no definite list which tags indicate a linear or polygon feature. Osm2pgsql lets the user decide. It depends on your chosen output (see next chapter) how to configure this. The example config files have lists that should cover most of the commonly used tags, but you might have to extend the lists if you are using more unusal tags.
Osm2pgsql can split up long LineStrings created from ways into smaller
segments. This can make rendering of tiles faster, because smaller geometries
need to be retrieved from the database when rendering a specific tile. The
pgsql output always splits up long LineStrings, in latlong projection, lines
will not be longer than 1°, in Web Mercator lines will not be longer than
100,000 units (about 100,000 meters at the equator). The flex output will only
split LineStrings if the split_at
transformation parameter is used, see the
Geometry transformations section in the flex
output chapter for details. See also the Single vs. Multi
Geometries section above.
Processing of Relations
Relations come in many variations and they can be used for all sorts of
geometries. Usually it depends on the type
tag of a relation what kind of
geometry it should have:
Relation type | Typical geometry created |
---|---|
type=multipolygon | (Multi)Polygon |
type=boundary | (Multi)LineString or (Multi)Polygon depending on whether you are interested in the boundary itself or the area it encloses. |
type=route | (Multi)LineString |
The osm2pgsql flex output can create a (Multi)Polygon or (Multi)LineString geometry from any relation, other geometries are currently not supported. See the Geometry transformations section in the flex output chapter for details.
If you are using the old “C transform” of the pgsql output, the geometry
types for relations of type multipolygon
, boundary
, and route
are
hardcoded. If you are using the “Lua transform” you can configure them.
Projections
Osm2pgsql can create geometries in many projections. If you are using the pgsql output, the projection can be chosen with command line options, when using the flex output, the projections are specified in the Lua style file. The default is always “Web Mercator”.
![]() |
Latlong (WGS84)The original latitude & longitude coordinates from OpenStreetMap in the WGS84 coordinate reference system. This is typically chosen if you want to do some kind of analytics on the data or reproject it later. |
![]() |
Web MercatorThis is the projection used most often for tiled web maps. It is the default in osm2pgsql. Data beyond about 85° North and South will be cut off, because it can not be represented in this projection. |
![]() |
Other ProjectionsIf osm2pgsql was compiled with support for the PROJ library, it supports all projections supported by that library. Version >= 1.4.0 Call |
Note that mapping styles often depend on the projection used. Most mapping style configurations will enable or disable certain rendering styles depending on the map scale or zoom level. But a meaningful scale will depend on the projection. Most styles you encounter are probably made for Web Mercator and will need changes to get nice maps in other projections.
Running osm2pgsql
Basic command line operation
Osm2pgsql can work in one of two ways: Import only or Import and Update.
If you are new to osm2pgsql we recommend you try the import only approach first and use a small extract of the OSM data.
Import Only
OSM data is imported once into the database and will not change afterwards. If you want to update the data, you have to delete it and do a full re-import. This approach is used when updates aren’t needed or only happen rarely. It is also sometimes the only option, for instance when you are using extracts for which change files are not available. This is also a possible approach if disk space is tight, because you don’t need to store data needed only for updates.
In this mode osm2pgsql is used with the -c, --create
command line option.
This is also the default mode. The following command lines are equivalent:
osm2pgsql -c OSMFILE
osm2pgsql --create OSMFILE
osm2pgsql OSMFILE
In case the system doesn’t have much main memory, you can add the -s, --slim
and --drop
options. In this case less main memory is used, instead more data
is stored in the database. This makes the import much slower, though. See the
chapter Middle for details on these options.
Import and Update
In this approach OSM data is imported once and afterwards updated more or less regularly from OSM change files. OSM offers minutely, hourly, and daily change files. This mode is used when regular updates are desired and change files are available.
In this mode osm2pgsql is used the first time with the -c, --create
command
line option (this is also the default mode) and, additionally the -s, --slim
options. The following command lines are equivalent:
osm2pgsql -c -s OSMFILE
osm2pgsql --create --slim OSMFILE
osm2pgsql --slim OSMFILE
For the update runs the -a, --append
and -s, --slim
options must be used.
The following command lines are equivalent:
osm2pgsql -a -s OSMFILE
osm2pgsql --append --slim OSMFILE
This approach needs more disk space for your database than the “Import Only” approach, because all the information necessary for the updates must be stored somewhere.
Usually you will need to use at least the -C, --cache
and --flat-nodes
command line options when doing imports and updates. See the chapter
Middle for details.
Getting Help or Version
To get help or the program version call osm2pgsql with one of the following options:
Option | Description |
---|---|
-h, --help | Print help. Add -v, --verbose for more verbose help. |
-V, --version | Print osm2pgsql version. Also prints versions of some libraries used. |
Logging
Osm2pgsql will write information about what it is doing to the console (to
STDERR). If you’d rather want this information in a file, use the output
redirection in your shell (osm2pgsql ... 2>osm2pgsql.log
).
Several command line options allow you to change what will be logged and how:
Option | Description |
---|---|
--log-level=LEVEL | Version >= 1.4.0 Set log level (debug , info (default), warn , or error ). |
--log-progress=VALUE | Version >= 1.4.0 Enable (true ) or disable (false ) progress logging. Setting this to auto will enable progress logging on the console and disable it if the output is redirected to a file. Default: true. |
--log-sql | Version >= 1.4.0 Enable logging of SQL commands for debugging. |
--log-sql-data | Version >= 1.4.0 Enable logging of all data added to the database. This will write out a huge amount of data! For debugging. |
-v, --verbose | Version >= 1.4.0 Same as --log-level=debug . |
Database Connection
In create and append mode you have to tell osm2pgsql which database to access.
If left unset, it will attempt to connect to the default database (usually the
username) using a unix domain socket. Most usage only requires setting
-d, --database
.
Option | Description |
---|---|
-d, --database=DB | Database name or PostgreSQL conninfo string. |
-U, --user=USERNAME | Database user. |
-W, --password | Force password prompt. |
-H, --host=HOST | Database server hostname or unix domain socket location. |
-P, --port=PORT | Database server port. |
You can also use libpq environment variables to set connection parameters. For a full list of available parameters, please consult the PostgreSQL documentation.
Version >= 1.4.0 Instead of specifying a database name with the
-d, --database
option you can also specify a connection string in the form of
a keyword/value connection string (something like host=localhost port=5432
dbname=mydb
) or a URI
(postgresql://[user[:password]@][netloc][:port][,...][/dbname][?param1=value1&...]
)
See the PostgreSQL
documentation
for details.
Processing the OSM Data
Osm2pgsql processing can be separated into multiple steps:
- The Input reads the OSM data from the OSM file.
- The Middle stores all objects and keeps track of relationships between objects.
- The Output transforms the data and loads it into the database.
This is, of course, a somewhat simplified view, but it is enough to understand the operation of the program. The following sections will describe each of the steps in more detail.
The Input
Depending on the operational mode your input files are either
- OSM data files (in create mode), or
- OSM change files (in append mode)
Usually osm2pgsql will autodetect the file format, but see the -r,
--input-reader
option below. Osm2pgsql can not work with OSM history files.
OSM data files are almost always sorted, first nodes in order of their ids, then ways in order of their ids, then relations in order of their ids. The planet files, change files, and usual extracts all follow this convention.
Osm2pgsql can only read OSM files ordered in this way. This allows some optimizations in the code which speed up the normal processing.
Older versions of osm2pgsql will sometimes work or appear to work with unsorted data, but it is not recommended to rely on this, because the processed data might be corrupted. Versions from 1.3.0 warn when you are using unsorted data. From version 1.4.0 on, only sorted OSM files are allowed.
See the Appendix C for more information on how to get and prepare OSM data for use with osm2pgsql.
Option | Description |
---|---|
-r, --input-reader=FORMAT | Select format of the input file. Available choices are auto (default) for autodetecting the format, xml for OSM XML format files, o5m for o5m formatted files and pbf for OSM PBF binary format. |
-b, --bbox=BBOX | Apply a bounding box filter in format MINLON,MINLAT,MAXLON,MAXLAT on the imported data. Example: --bbox -0.5,51.25,0.5,51.75 |
Use of the -b, --bbox
option is not recommended; it is a crude way of
limiting the amount of data loaded into the database and it will not always do
what you expect it to do, especially at the boundaries of the bounding box. If
you use it, choose a bounding box that is larger than your actual area of
interest. A better option is to create an extract of the data before using
osm2pgsql. See Appendix C for options.
Working with Multiple Input Files
Usually you are using osm2pgsql with a single input file.
Do not use osm2pgsql with more than one input file in versions before 1.4.0. It might work or it might not.
Version >= 1.4.0 Osm2pgsql can read multiple input files at once, merging the data from the input files ignoring any duplicate data. For this to work the input files must all have their data from the same point in time. You can use this to import two or more geographical extracts into the same database. If the extracts are from different points in time and contain different versions of the same object, this will fail!
Do not use multiple change files as input in append mode, merge and simplify them first.
The Middle
The middle keeps track of all OSM objects read by osm2pgsql and the relationships between those objects. It knows, for instance, which ways are used by which nodes, or which members a relation has. It also keeps track of all node locations. This information is necessary to build way geometries from way nodes and relation geometries from members and it is necessary when updating data, because OSM change files only contain changes objects themselves and not all the related objects needed for creating an objects geometry.
More details are in its own chapter.
The Outputs
Osm2pgsql imports OSM data into a PostgreSQL database. How it does this is governed by the output (sometimes called a backend). Several outputs for different use cases are available:
- The flex Output Version >= 1.3.0
-
This is the most modern and most flexible output option. If you are starting a new project, use this output. Many future improvements to osm2pgsql will only be available in this output.
-
Unlike all the other output options there is almost no limit to how the OSM data can be imported into the database. You can decide which OSM objects should be written to which columns in which database tables. And you can define any transformations to the data you need, for instance to unify a complex tagging schema into a simpler schema if that’s enough for your use case.
-
This output is described in detail in its own chapter.
- The pgsql Output
-
The pgsql output is the original output and the one that most people who have been using osm2pgsql know about. Many tutorials you find on the Internet only describe this output. It is quite limited in how the data can be written to the database, but many setups still use it.
-
This output comes in two “flavours”: With the original “C transformation” and with the somewhat newer “Lua transformation” which allows some changes to the data before it is imported.
-
This output is described in detail in its own chapter.
- The gazetteer Output
-
The gazetteer output is a specialized output used for Nominatim only. There is no information in this manual about its use, see the Nominatim documentation for details.
- The multi Output
-
The multi output is deprecated, it will be removed in a future version of osm2pgsql. If you are using it, switch to the flex output as soon as possible.
- The null Output
-
The null output doesn’t write the data anywhere. It is used for testing and benchmarking, not for normal operation. If
--slim
is used with the null output, the middle tables are still generated.
Here are the command line options pertaining to the outputs (see later chapters for command line options only used for specific outputs):
Option | Description |
---|---|
-O, --output=OUTPUT | Select the output. Available outputs are: flex, pgsql (default), gazetteer, multi, and null |
-S, --style=STYLE | The style file. This specifies how the data is imported into the database, its format depends on the output. (For the pgsql output, the default is /usr/share/osm2pgsql/default.style , for other outputs there is no default.) |
Environment Variables
Osm2pgsql itself doesn’t interpret any environment variables, but several of the libraries it uses do.
- The PostgreSQL database access library (
libpq
) understands many environment variables. - The libosmium library has some internal settings you can change.
- If osm2pgsql was compiled with the PROJ library, you can use various environment variables to control the behaviour of the library.
You can access environment variables from Lua config scripts with
os.getenv("VAR")
.
The Flex Output
Version >= 1.3.0 The flex output appeared first in version 1.3.0 of osm2pgsql.
The Flex output is experimental. Everything in here is subject to change.
The flex output, as the name suggests, allows for a flexible configuration that tells osm2pgsql what OSM data to store in your database and exactly where and how. It is configured through a Lua file which
- defines the structure of the output tables and
- defines functions to map the OSM data to the database data format
Use the -s, --style=FILE
option to specify the name of the Lua file
along with the -O, --output=flex
option to specify the use of the flex output.
Unlike the pgsql output, the flex output doesn’t use command line options for configuration, but the Lua config file only.
The flex style file is a Lua script. You can use all the power of the Lua
language. This description assumes that you
are somewhat familiar with the Lua language, but it is pretty easy to pick up
the basics and you can use the example config files in the
flex-config
directory which contain lots of comments to get you started.
All configuration is done through the osm2pgsql
global object in Lua. It has
the following fields and functions:
Field / Function | Description |
---|---|
version | The version of osm2pgsql as a string. |
mode | Either "create" or "append" depending on the command line options (-c, --create or -a, --append ). |
stage | Either 1 or 2 (1st/2nd stage processing of the data). See below. |
define_node_table(NAME, COLUMNS[, OPTIONS]) | Define a node table. |
define_way_table(NAME, COLUMNS[, OPTIONS]) | Define a way table. |
define_relation_table(NAME, COLUMNS[, OPTIONS]) | Define a relation table. |
define_area_table(NAME, COLUMNS[, OPTIONS]) | Define an area table. |
define_table(OPTIONS) | Define a table. This is the more flexible function behind all the other define_*_table() functions. It gives you more control than the more convenient other functions. |
Osm2pgsql also provides some additional functions in the Lua helper library described in Appendix B.
Defining a Table
You have to define one or more database tables where your data should end up.
This is done with the osm2pgsql.define_table()
function or, more commonly,
with one of the slightly more convenient functions
osm2pgsql.define_(node|way|relation|area)_table()
.
In create mode, osm2pgsql will create those tables for you in the database.
Basic Table Definition
The simple way to define a table looks like this:
osm2pgsql.define_(node|way|relation|area)_table(NAME, COLUMNS[, OPTIONS])
Here NAME
is the name of the table, COLUMNS
is a list of Lua tables
describing the columns as documented below. OPTIONS
is a Lua table with
options for the table as a whole. An example might be more helpful:
local restaurants = osm2pgsql.define_node_table('restaurants', {
{ column = 'name', type = 'text' },
{ column = 'tags', type = 'hstore' },
{ column = 'geom', type = 'point' }
})
In this case we are creating a table that is intended to be filled with OSM
nodes, the table called restaurants
will be created in the database with four
colums: A name
column of type text
(which will presumably later be filled
with the name of the restaurant), a column called tags
with type hstore
(which will presumable be filled later with all tags of a node), a column
called geom
which will contain the Point geometry of the node and an Id
column called node_id
.
Each table is either a node table, way table, relation table, or area table. This means that the data for that table comes primarily from a node, way, relation, or area, respectively. Osm2pgsql makes sure that the OSM object id will be stored in the table so that later updates to those OSM objects (or deletions) will be properly reflected in the tables. Area tables are special, they can contain data derived from ways and from (multipolygon) relations.
Advanced Table Definition
Sometimes the define_(node|way|relation|area)_table()
functions are a bit to
restrictive, for instance if you want more control over the type and naming of
the Id column(s). In this case you can use the function
osm2pgsql.define_table()
.
Here are the available OPTIONS
for the osm2pgsql.define_table(OPTIONS)
function. You can use the same options on the
define_(node|way|relation|area)_table()
functions, except the name
and
columns
options.
Table Option | Description |
---|---|
name | The name of the table (without schema). |
ids | A Lua table defining how this table should handle ids (see the Id Handling section for details). Note that you can define tables without Ids, but they can not be updated by osm2pgsql. |
columns | An array of columns (see the Defining Columns section for details). |
schema | Set the PostgreSQL schema to be used for this table. The schema must exist in the database before you start osm2pgsql. By default no schema is set which usually means the tables will be created in the public shema. |
data_tablespace | The PostgreSQL tablespace used for the data in this table. |
index_tablespace | The PostgreSQL tablespace used for all indexes of this table. |
All the osm2pgsql.define*table()
functions return a database table Lua
object. You can call the following functions on it:
Function | Description |
---|---|
name() | The name of the table as specified in the define function. |
schema() | The schema of the table as specified in the define function. |
columns() | The columns of the table as specified in the define function. |
add_row() | Add a row to the database table. See below for details. |
Id Handling
Id columns allows osm2pgsql to track which database table entries have been generated by which objects. When objects change or are deleted, osm2pgsql uses the ids to update or remove all existing rows from the database with those ids.
If you are using the osm2pgsql.define_(node|way|relation|area)_table()
convenience functions, osm2pgsql will automatically create an id column named
(node|way|relation|area)_id
, respectively.
If you want more control over the id column(s), use the
osm2pgsql.define_table()
function. You can then use the ids
option to
define how the id column(s) should look like. To generate the same
“restaurants” table described above, the define_table()
call will look like
this:
local restaurants = osm2pgsql.define_table({
name = 'restaurants',
ids = { type = 'node', id_column = 'node_id' },
columns = {
{ column = 'name', type = 'text' },
{ column = 'tags', type = 'hstore' },
{ column = 'geom', type = 'point' }
}})
If you don’t have an ids
field, the table is generated without Id. Tables
without Id can not be updated, so you can fill them in create mode, but they
will not work in append mode.
The following values are allowed for the type
of the ids
field:
Type | Description |
---|---|
node | A node Id will be stored ‘as is’ in the column named by id_column . |
way | A way Id will be stored ‘as is’ in the column named by id_column . |
relation | A relation Id will be stored ‘as is’ in the column named by id_column . |
area | A way Id will be stored ‘as is’ in the column named by id_column , for relations the Id will be stored as negative number. |
any | Any type of object can be stored. See below. |
The any
type is for tables that can store any type of OSM object (nodes,
ways, or relations). There are two ways the id can be stored:
- If you have a
type_column
setting in yourids
field, it will store the type of the object in an additionalchar(1)
column asN
,W
, orR
. - If you don’t have a
type_column
, the id of nodes stays the same (so they are positive), way ids will be stored as negative numbers and relation ids are stored as negative numbers and 1e17 is subtracted from them. This results in distinct ranges for all ids so they can be kept apart. (This is the same transformation that the Imposm program uses.)
Unique Ids
It is often desirable to have a unique PRIMARY KEY on database tables. Many programs need this.
There seems to be a natural unique key, the OSM node, way, or relation ID the data came from. But there is a problem with that: Depending on the configuration, osm2pgsql sometimes adds several rows to the output tables with the same OSM ID. This typically happens when long linestrings are split into shorter pieces or multipolygons are split into their constituent polygons, but it can also happen if your Lua configuration adds two rows to the same table.
If you need unique keys on your database tables there are two options: Using those natural keys and making sure that you don’t have duplicate entries. Or adding an additional ID column. The latter is easier to do and will work in all cases, but it adds some overhead.
Using Natural Keys for Unique Ids
To use OSM IDs as primary keys, you have to make sure that
- you only ever add a single row per OSM object to an output table, i.e. do
not call
add_row
multiple times on the same table for the same OSM object. - osm2pgsql doesn’t split long linestrings into smaller ones or multipolygons
into polygons. So you can not use the
split_at
option on the geometry transformation. - Version == 1.3.0 osm2pgsql doesn’t split multipolygons into
polygons. So you have to set
multi = true
on allarea
geometry transformations.
You probably also want an index on the ID column. If you are running in slim
mode, osm2pgsql will create that index for you. But in non-slim mode you have
to do this yourself with CREATE UNIQUE INDEX
. You can also use ALTER
TABLE
to make the column an “official” primary key column.
Using an Additional ID Column
PostgreSQL has the somewhat magic “serial” data type. If you use that datatype in a column definition, PostgreSQL will add an integer column to the table and automatically fill it with an autoincrementing value.
In the flex config you can add such a column to your tables with something like this:
...
{ column = 'id', type = 'serial', create_only = true },
...
The create_only
tells osm2pgsql that it should create this column but not
try to fill it when adding rows (because PostgreSQL does it for us).
You probably also want an index on this column. After the first import of your
data using osm2pgsql, use CREATE UNIQUE INDEX
to create one. You can also use
ALTER
TABLE
to make the column an “official” primary key column.
Defining Columns
In the table definitions the columns are specified as a list of Lua tables with the following keys:
Key | Description |
---|---|
column | The name of the PostgreSQL column (required). |
type | The type of the column as described above (required). |
not_null | Set to true to make this a NOT NULL column. (Optional, default false .) |
create_only | Set to true to add the column to the CREATE TABLE command, but do not try to fill this column when adding data. This can be useful for SERIAL columns or when you want to fill in the column later yourself. (Optional, default false .) |
projection | On geometry columns only. Set to the EPSG id or name of the projection. (Optional, default web mercator, 3857 .) |
Defining Geometry Columns
Most tables will have a geometry column. (Currently only zero or one geometry columns are supported.) The types of the geometry column possible depend on the type of the input data. For node tables you are pretty much restricted to point geometries, but there is a variety of options for relation tables for instance.
The supported geometry types are:
Geometry type | Description |
---|---|
point | Point geometry, usually created from nodes. |
linestring | Linestring geometry, usually created from ways. |
polygon | Polygon geometry for area tables, created from ways or relations. |
multipoint | Currently not used. |
multilinestring | Created from (possibly split up) ways or relations. |
multipolygon | For area tables, created from ways or relations. |
geometry | Any kind of geometry. Also used for area tables that should hold both polygon and multipolygon geometries. |
By default geometry columns will be created in web mercator (EPSG 3857). To
change this, set the projection
parameter of the column to the EPSG code
you want (or one of the strings latlon(g)
, WGS84
, or merc(ator)
, case
is ignored).
There is one special geometry column type called area
. It can be used in
addition to a polygon
or multipolygon
column. Unlike the normal geometry
column types, the resulting database type will not be a geometry type, but
real
. It will be filled automatically with the area of the geometry. The area
will be calculated in web mercator, or you can set the projection
parameter
of the column to 4326
to calculate it with WGS84 coordinates. Other
projections are currently not supported.
Defining Other Columns
In addition to id and geometry columns, each table can have any number of
“normal” columns using any type supported by PostgreSQL. Some types are
specially recognized by osm2pgsql: text
, boolean
, int2
(smallint
),
int4
(int
, integer
), int8
(bigint
), real
, hstore
, and
direction
. See the Type Conversion section
for details on how this type affects the conversion of OSM data to the
database types.
Instead of the above types you can use any SQL type you want. If you do that
you have to supply the PostgreSQL string representation for that type when
adding data to such columns (or Lua nil
to set the column to NULL
). This
can be used, for instance, to create JSON(B) columns. You have to provide valid
JSON from your Lua script in this case. See the example config
places.lua for how this can be done.
Processing Callbacks
You are expected to define one or more of the following functions:
Callback function | Description |
---|---|
osm2pgsql.process_node(object) | Called for each new or changed node. |
osm2pgsql.process_way(object) | Called for each new or changed way. |
osm2pgsql.process_relation(object) | Called for each new or changed relation. |
They all have a single argument of type table (here called object
) and no
return value. If you are not interested in all object types, you do not have
to supply all the functions.
These functions are called for each new or modified OSM object in the input file. No function is called for deleted objects, osm2pgsql will automatically delete all data in your database tables that derived from deleted objects. Modifications are handled as deletions followed by creation of a “new” object, for which the functions are called.
The parameter table (object
) has the following fields and functions:
Field / Function | Description |
---|---|
id | The id of the node, way, or relation. |
tags | A table with all the tags of the object. |
version | Version of the OSM object. (*) |
timestamp | Timestamp of the OSM object, time in seconds since the epoch (midnight 1970-01-01). (*) |
changeset | Changeset containing this version of the OSM object. (*) |
uid | User id of the user that created or last changed this OSM object. (*) |
user | User name of the user that created or last changed this OSM object. (*) |
grab_tag(KEY) | Return the tag value of the specified key and remove the tag from the list of tags. (Example: local name = object:grab_tag('name') ) This is often used when you want to store some tags in special columns and the rest of the tags in an hstore column. |
get_bbox() | Get the bounding box of the current node or way. This function returns four result values: the lot/lat values for the bottom left corner of the bounding box, followed by the lon/lat values of the top right corner. Both lon/lat values are identical in case of nodes. Example: lon, lat, dummy, dummy = object.get_bbox() (This function doesn’t work for relations currently.) |
is_closed | Ways only: A boolean telling you whether the way geometry is closed, i.e. the first and last node are the same. |
nodes | Ways only: An array with the way node ids. |
members | Relations only: An array with member tables. Each member table has the fields type (values n , w , or r ), ref (member id) and role . |
These are only available if the -x|--extra-attributes
option is used and the
OSM input file actually contains those fields.
You can do anything in those processing functions to decide what to do with
this data. If you are not interested in that OSM object, simply return from the
function. If you want to add the OSM object to some table call the add_row()
function on that table:
-- definition of the table:
table_pois = osm2pgsql.define_node_table('pois', {
{ column = 'tags', type = 'hstore' },
{ column = 'name', type = 'text' },
{ column = 'geom', type = 'point' },
})
...
function osm2pgsql.process_node(object)
...
table_pois:add_row({
tags = object.tags,
name = object.tags.name,
geom = { create = 'point' }
})
...
end
The add_row()
function takes a single table parameter, that describes what to
fill into all the database columns. Any column not mentioned will be set to
NULL
.
The geometry column is somewhat special. You have to define a geometry transformation that will be used to transform the OSM object data into a geometry that fits into the geometry column. See the next section for details.
Note that you can’t set the object id, this will be handled for you behind the scenes.
Geometry Transformations
Currently these geometry transformations are supported:
{ create = 'point' }
. Only valid for nodes, create a ‘point’ geometry.{ create = 'line' }
. For ways or relations. Create a ‘linestring’ or ‘multilinestring’ geometry.{ create = 'area' }
For ways or relations. Create a ‘polygon’ or ‘multipolygon’ geometry.
Some of these transformations can have parameters:
- The
line
transformation has an optional parametersplit_at
. If this is set to anything other than 0, long linestrings will be split up into parts no longer than this value. - Version == 1.3.0 The
area
transformation has an optional parametermulti
. If this is set tofalse
(the default), a multipolygon geometry will be split up into several polygons. If this is set totrue
, the multipolygon geometry is kept as one. - Version >= 1.4.0 The
area
transformation has an optional parametersplit_at
. If this is not set or set tonil
(the default), (multi)polygon geometries will be kept as is, if this is set to'multi'
multipolygon geometries will be split into their polygon parts.
Note that in general it is your responsibility to make sure the column type
of a geometry column can take the geometries created by the transformation.
A point
geometry column will not be able to store the result of an area
transformation.
Version >= 1.4.0 If a transformation will result in a polygon
geometry, but the column type of the geometry is multipolygon
. The polygon
will be “wrapped” automatically in a multipolygon to fit into the column.
This can be useful when you want all your polygons and multipolygons to be
of the same database type. Some programs handle columns of uniform geometry
type better.
If no geometry transformation is set, osm2pgsql will, in some cases, assume a default transformation. These are the defaults:
- For node tables, a
point
column gets the node location. - For way tables, a
linestring
column gets the complete way geometry, apolygon
column gets the way geometry as area (if the way is closed and the area is valid).
Stages
When processing OSM data, osm2pgsql reads the input file(s) in order, nodes
first, then ways, then relations. This means that when the ways are read and
processed, osm2pgsql can’t know yet whether a way is in a relation (or in
several). But for some use cases we need to know in which relations a way is
and what the tags of these relations are or the roles of those member ways.
The typical case are relations of type route
(bus routes etc.) where we
might want to render the name
or ref
from the route relation onto the
way geometry.
The osm2pgsql flex output supports this use case by adding an additional
“reprocessing” step. Osm2pgsql will call the Lua function
osm2pgsql.select_relation_members()
for each added, modified, or deleted
relation. Your job is to figure out which way members in that relation might
need the information from the relation to be rendered correctly and return
those ids in a Lua table with the only field ‘ways’. This is usually done with
a function like this:
function osm2pgsql.select_relation_members(relation)
if relation.tags.type == 'route' then
return { ways = osm2pgsql.way_member_ids(relation) }
end
end
Instead of using the helper function osm2pgsql.way_member_ids()
which
returns the ids of all way members, you can write your own code, for instance
if you want to check the roles.
Note that select_relation_members()
is called for deleted relations and for
the old version of a modified relation as well as for new relations and the
new version of a modified relation. This is needed, for instance, to correctly
mark member ways of deleted relations, because they need to be updated, too.
The decision whether a way is to be marked or not can only be based on the
tags of the relation and/or the roles of the members. If you take other
information into account, updates might not work correctly.
In addition you have to store whatever information you need about the relation
in your process_relation()
function in a global variable.
After all relations are processed, osm2pgsql will reprocess all marked ways by
calling the process_way()
function for them again. This time around you have
the information from the relation in the global variable and can use it.
If you don’t mark any ways, nothing will be done in this reprocessing stage.
(It is currently not possible to mark nodes or relations. This might or might not be added in future versions of osm2pgsql.)
You can look at osm2pgsql.stage
to see in which stage you are.
You want to do all the processing you can in stage 1, because it is faster and there is less memory overhead. For most use cases, stage 1 is enough.
Processing in two stages can add quite a bit of overhead. Because this feature is new, there isn’t much operational experience with it. So be a bit careful when you are experimenting and watch memory and disk space consumption and any extra time you are using. Keep in mind that:
- All data stored in stage 1 for use in stage 2 in your Lua script will use main memory.
- Keeping track of way ids marked in stage 1 needs some memory.
- To do the extra processing in stage 2, time is needed to get objects out of the object store and reprocess them.
- Osm2pgsql will create an id index on all way tables to look up ways that need to be deleted and re-created in stage 2.
Type Conversions
The add_row()
command will try its best to convert Lua values into
corresponding PostgreSQL values. But not all conversions make sense. Here
are the detailed rules:
- Lua values of type
function
,userdata
, orthread
will always result in an error. - The Lua type
nil
is always converted toNULL
. - If the result of a conversion is
NULL
and the column is defined asNOT NULL
, an error is thrown. - The Lua type
table
is converted to the PostgreSQL typehstore
if and only if all keys and values in the table are string values. A Luatable
can not be converted to any other PostgreSQL type. - For
boolean
columns: The number0
is converted tofalse
, all other numbers aretrue
. Strings are converted as follows:"yes"
,"true"
,"1"
aretrue
;"no"
,"false"
,"0"
arefalse
, all others areNULL
. - For integer columns (
int2
,int4
,int8
): Booleantrue
is converted to1
,false
to0
. Numbers that are not integers or outside the range of the type result inNULL
. Strings are converted to integers if possible otherwise the result isNULL
. - For
real
columns: Booleans result in an error, all numbers are used as is, strings are converted to a number, if that is not possible the result isNULL
. - For
direction
columns (stored asint2
in the database): Booleantrue
is converted to1
,false
to0
. The number0
results in0
, all positive numbers in1
, all negative numbers in-1
. Strings"yes"
and"1"
will result in1
,"no"
and"0"
in0
,"-1"
in-1
. All other strings will result inNULL
. - For text columns and any other not specially recognized column types, booleans result in an error and numbers are converted to strings.
If you want any other conversions, you have to do them yourself in your Lua code. Osm2pgsql provides some helper functions for other conversions, see the Lua helper library (Appendix B).
The Pgsql Output
The pgsql output is the original output osm2pgsql started with. It was designed for rendering OpenStreetMap data, principally with Mapnik. It is still widely used although it is somewhat limited in how the data can be represented in the database.
If you are starting a new project with osm2pgsql, we recommend you use the flex output instead. The pgsql output will not receive all the new features that the flex output has or will get in the future. There are a lot of old configs and styles out there that need the pgsql output, though. So it will not go away anytime soon.
Database Layout
The pgsql output always creates a fixed list of four database tables shown
below. The PREFIX
can be set with the -p, --prefix
option, the default is
planet_osm
. (Note that this option is also interpreted by the middle!)
Table | Description |
---|---|
PREFIX_point | Point geometries created from nodes. |
PREFIX_line | Line geometries created from ways and relations tagged type=route . |
PREFIX_roads | Contains some of the same data as the line table, but with attributes selected for low-zoom rendering. It does not only contain roads! |
PREFIX_polygon | Polygon geometries created from closed ways and relations tagged type=multipolygon or type=boundary . |
All tables contain a geometry column named way
(which is not the most
intuitive name, but it can’t be changed now because of backwards
compatibility).
Style File
Some aspects of how the pgsql output converts OSM data into PostgreSQL tables can be configured via a style file. The default style file that is usually installed with osm2pgsql is suitable for rendering many styles. It contains some documentation of the style syntax and works well as an example or starting point for your own changes. With a custom style file, you can control how different object types and tags map to different columns and data types in the database.
The style file is a plain text file containing four columns separated by spaces. As each OSM object is processed it is checked against conditions specified in the first two columns. If they match, processing options from the third and fourth columns are applied.
Column | Description |
---|---|
1. OSM object type | Can be node , way or both separated by a comma. way will also apply to relations with type=multipolygon , type=boundary , or type=route ; all other relations are ignored. |
2. Tag | The tag to match on. If the fourth column is linear or polygon , a column for this tag will be created in each of the point, line, polygon, and road tables. |
3. PostgreSQL data type | Specifies how data will be stored in the tag’s PostgreSQL table column. Possible values are text , int4 , or real . If the fourth column is delete or phstore , this column has no meaning and should just be set to text . |
4. Flags | Zero or more flags separated by commas. For possible values see below. |
Possible values for the flags are:
Flag | Description |
---|---|
linear | Specifies that ways with this tag should be imported as lines by default, even if they are closed. Other conditions can still override this. |
polygon | Specifies that closed ways with this tag should be imported as polygons by default. Other conditions can still override this. |
nocolumn | The two flags above automatically create a column for the specified tag. This flag overrides this behaviour such that no column is created. This is especially useful for hstore, where all key value data is stored in the hstore column such that no dedicated columns are needed. |
phstore | The same as polygon,nocolumn for backward compatibility |
delete | Prevents the specified tag (but not the entire object) from being stored in the database. Useful for tags that tend to have long values but will not be used for rendering, such as source=* . This flag only affects --slim mode imports. |
nocache | This flag is deprecated and does nothing. |
Closed ways with tag area=yes
and relations with type=multipolygon
or
type=boundary
will be imported as polygons even if no polygon flag is set.
Non-closed ways and closed ways with area=no
will always be imported as
lines.
Nodes are always placed in the “point” table, never in the “line” or “polygon” table.
Usually only objects which result in at least one of the columns declared in the style file being not NULL will be added to the database. But see below in the Use of Hstore section.
The style file may also contain comments. Any text between a #
and the end of
the line will be ignored.
Special ‘Tags’
There are several special values that can be used in the tag column (second column) of the style file for creating additional fields in the database which contain things other than tag values.
Special tag | Description |
---|---|
way_area | Creates a database column that stores the area (calculated in the units of the projection, normally Mercator meters) for any objects imported as polygons. Use with real as the data type. See also the --reproject-area option. |
z_order | Adds a column that is used for ordering objects when rendering. It mostly applies to objects with highway=* or railway=* . Use with int4 as the data type. |
osm_user | Adds a column that stores the username of the last user to edit an object in the database (*). |
osm_uid | Adds a column that stores the user ID number of the last user to edit an object in the database (*). |
osm_version | Adds a column that stores the version of an object in the database (ie, how many times the object has been modified) (*). |
osm_timestamp | Adds a column that stores the date and time that the most recent version of an object was added to OpenStreetMap (*). |
To use these, you must use the command line option --extra-attributes
.
If importing with both --hstore
and --extra-attributes
the meta-data will
end up in the tags hstore column regardless of the style file.
Schemas and Tablespaces
Usually all tables, indexes and functions that the pgsql output creates are in the default public schema and use the default tablespace.
Version >= 1.4.0 You can use the command line option
--output-pgsql-schema=SCHEMA
to tell osm2pgsql that it should use the
specified schema to create the tables, indexes, and functions in. Note that you
have to create the schema before running osm2pgsql and make sure the database
user was granted the rights to create tables, indexes, and functions in this
schema. Read more about schemas in the PostgreSQL
documentation.
Sometimes you want to create special PostgreSQL
tablespaces
and put some tables or indexes into them. Having often used indexes on fast SSD
drives, for instance, can speed up processing a lot. There are three command
line options that the pgsql output interprets: To set the tablespace for output
tables and indexes, use the --tablespace-main-data=TABLESPC
and
--tablespace-main-index=TABLESPC
options, respectively. You can also use the
-i, --tablespace-index=TABLESPC
option which will set the tablespace for the
pgsql output as well as for the middle! Note that it is your job to create the
tablespaces before calling osm2pgsql and making sure the disk behind it is
large enough.
Coastline Processing
The natural=coastline
tag is suppressed by default, even if you import the
natural=*
key. Many maps get the coastlines from a different
source, so it does not need to
import them from the input file. You can use the --keep-coastlines
parameter
to change this behavior if you want coastlines in your database. See the OSM
wiki for more
information on coastline processing.
Use of Hstore
Hstore is a PostgreSQL data
type
that allows storing arbitrary key-value pairs in a single column. It needs to
be installed on the database with CREATE EXTENSION hstore;
Hstore is used to give more flexibility in using additional tags without reimporting the database, at the cost of less speed and more space.
By default, the pgsql output will not generate hstore columns. The following options are used to add hstore columns of one type or another:
-
--hstore
or-k
adds any tags not already in a conventional column to a hstore column calledtags
. With the default stylesheet this would result in tags likehighway
appearing in a conventional column while tags not in the style likename:en
orlanes:forward
would appear only in the hstore column. -
--hstore-all
or-j
adds all tags to a hstore column calledtags
, even if they’re already stored in a conventional column. With the standard stylesheet this would result in tags likehighway
appearing in conventional column and the hstore column while tags not in the style likename:en
orlanes:forward
would appear only in the hstore column. -
--hstore-column
or-z
, which adds an additional column for tags starting with a specified string, e.g.--hstore-column 'name:'
produces a hstore column that contains allname:xx
tags. This option can be used multiple times.
You can not use both --hstore
and --hstore-all
together.
The following options can be used to modify the behaviour of the hstore columns:
-
--hstore-match-only
modifies the above options and prevents objects from being added if they only have tags in the hstore column and no tags in the non-hstore columns. If neither of the above options is specified, this option is ignored. -
--hstore-add-index
adds indexes to all hstore columns. This option is ignored if there are no hstore columns. Using indexes can speed up arbitrary queries, but for most purposes partial indexes will be faster. You have to create those yourself.
Either --hstore
or --hstore-all
when combined with --hstore-match-only
should give the same rows as no hstore, just with the additional hstore column.
Note that when you are using the --extra-attributes
option, all your nodes,
ways, and relations essentially get a few extra tags. Together with the hstore
options above, the object attributes might end up in your hstore column(s)
possibly using quite a lot of space. This is especially true for the majority
of nodes that have no tags at all, which means they would normally not appear
in your output tables. You might want to use --hstore-match-only
in that
case.
Lua Tag Transformations
The pgsql output supports Lua scripts to
rewrite tags before they enter the database. Use the command line option
--tag-transform-script=SCRIPT
to enable this.
There is inevitably a performance hit with any extra processing. The Lua tag
transformation is a little slower than the C++-based default, but this will
probably not matter much in practice. But Lua pre-processing may save you
further processing later. The Lua transformations allow you, for instance, to
unify disparate tagging (for example, highway=path; foot=yes
and
highway=footway
) and perform complex queries, potentially more efficiently
than writing them as rules in your stylesheet which are executed at rendering
time.
Note that this is a totally different mechanism than the Lua scripts used in the flex output.
The Lua script needs to implement the following functions:
function filter_tags_node(tags, num_tags)
return filter, tags
function filter_tags_way(tags, num_tags)
return filter, tags, polygon, roads
function filter_basic_tags_rel(tags, num_tags)
return filter, tags
These take a set of tags as a Lua key-value table, and an integer which is the number of tags supplied.
The first return value is filter
, a flag which you should set to 1
if the
way/node/relation should be filtered out and not added to the database, 0
otherwise. (They will still end up in the slim mode tables, but not in the
rendering tables.)
The second return value is tags
, a transformed (or unchanged) set of tags,
which will be written to the database.
filter_tags_way
returns two additional flags:
poly
should be1
if the way should be treated as a polygon, or0
if it is to be treated as a line.roads
should be1
if the way should be added to theroads
table,0
otherwise.
function filter_tags_relation_member(tags, member_tags, roles, num_members)
return filter, tags, member_superseded, boundary, polygon, roads
The function filter_tags_relation_member
is more complex and can handle more
advanced relation tagging, such as multipolygons that take their tags from the
member ways.
This function is called with the tags from the relation; a set of tags for each of the member ways (member relations and nodes are ignored); the set of roles for each of the member ways; and the number of members. The tag and role sets are both arrays (indexed tables).
As with the other functions, it should return a filter
flag, and a
transformed set of tags
to be applied to the relation in later processing.
The third return value, member_superseded
, is obsolete and will be ignored.
The fourth and fifth return values, boundary
and polygon
, are flags that
specify if the relation should be processed as a line, a polygon, or both (e.g.
administrative boundaries). Set the final return value, roads
, to 1
if the
geometry should be added to the roads
table.
There is a sample tag transform Lua script in the repository as an example, which (nearly) replicates the built-in transformations and can be used as a template for one’s own scripts.
Test your Lua script with small excerpts before applying it to a whole country or even the planet.
Pgsql Output Command Line Options
These are all the command line options interpreted by the pgsql output:
Option | Description |
---|---|
-i, --tablespace-index=TABLESPC | Store all indexes in the PostgreSQL tablespace TABLESPC . This option also affects the middle tables. |
--tablespace-main-data=TABLESPC | Store the data tables in the PostgreSQL tablespace TABLESPC . |
--tablespace-main-index=TABLESPC | Store the indexes in the PostgreSQL tablespace TABLESPC . |
--latlong | Store coordinates in degrees of latitude & longitude. |
-m, --merc | Store coordinates in Spherical Mercator (Web Mercator, EPSG:3857) (the default). |
-E, --proj=SRID | Use projection EPSG:SRID . |
-p, --prefix=PREFIX | Prefix for table names (default: planet_osm ). This option affects the middle as well as the pgsql output table names. |
--tag-transform-script=SCRIPT | Specify a Lua script to handle tag filtering and normalisation. The script contains callback functions for nodes, ways and relations, which each take a set of tags and returns a transformed, filtered set of tags which are then written to the database. |
-x, --extra-attributes | Include attributes (user name, user id, changeset id, timestamp and version). This also requires additional entries in your style file. |
-k, --hstore | Add tags without column to an additional hstore (key/value) column in the database tables. |
-j, --hstore-all | Add all tags to an additional hstore (key/value) column in the database tables. |
-z, --hstore-column=PREFIX | Add an additional hstore (key/value) column named PREFIX containing all tags that have a key starting with PREFIX , eg \--hstore-column "name:" will produce an extra hstore column that contains all name:xx tags. |
--hstore-match-only | Only keep objects that have a value in at least one of the non-hstore columns. |
--hstore-add-index | Create indexes for all hstore columns after import. |
-G, --multi-geometry | Normally osm2pgsql splits multi-part geometries into separate database rows per part. A single OSM object can therefore use several rows in the output tables. With this option, osm2pgsql instead generates multi-geometry features in the PostgreSQL tables. |
-K, --keep-coastlines | Keep coastline data rather than filtering it out. By default objects tagged natural=coastline will be discarded based on the assumption that Shapefiles generated by OSMCoastline (https://osmdata.openstreetmap.de/) will be used for the coastline data. |
--reproject-area | Compute area column using Spherical Mercator coordinates even if a different projection is used for the geometries. |
--output-pgsql-schema=SCHEMA | Version >= 1.4.0 Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the pgsql and multi outputs (default is no schema, i.e. the public schema is used). |
Middle
This chapter is incomplete.
The middle keeps track of all OSM objects read by osm2pgsql and the relationships between those objects. It knows, for instance, which ways are used by which nodes, or which members a relation has. It also keeps track of all node locations. This information is necessary to build way geometries from way nodes and relation geometries from members and it is necessary when updating data, because OSM change files only contain changed objects themselves and not all the related objects needed for creating an object’s geometry.
Database Structure
The middle stores its data in the database in the following tables. The
PREFIX
can be set with the -p, --prefix
option, the default is
planet_osm
. (Note that this option is also interpreted by the pgsql output!)
Table | Description |
---|---|
PREFIX_nodes | OSM nodes (not used if the flat node store is used |
PREFIX_ways | OSM ways |
PREFIX_rels | OSM relations |
The names and structure of these tables, colloquially referred to as “slim tables”, are an internal implemention detail of osm2pgsql. While they do not usually change between releases of osm2pgsql, be advised that if you rely on the content or layout of these tables in your application, it is your responsibility to check whether your assumptions are still true in a newer version of osm2pgsql before updating. See this issue for a discussion of this topic.
Flat Node Store
--flat-nodes
specifies that instead of a table in PostgreSQL, a binary
file is used as a database of node locations. This should only be used on full
planet imports or very large extracts (e.g. Europe) but in those situations
offers significant space savings and speed increases, particularly on
mechanical drives. The file takes approximately 8 bytes * maximum node ID, or
more than 50 GiB, regardless of the size of the extract.
If you are using the --drop
option, the flat node file will be deleted
after import.
Caching
--cache
specifies how much memory in MB to allocate for caching information.
In --slim
mode, this is just node positions while in non-slim mode it has to
store information about ways and relations too. The rule of thumb in slim mode
is as follows: use the size of the PBF file you are trying to import or about
75% of RAM, whatever is smaller. Make sure there is enough RAM left for
PostgreSQL. It needs at least the amount of shared_buffers
given in its
configuration. You may also set --cache
to 0 to disable node caching
completely. This makes only sense when a flat node file is given and there
is not enough RAM to fit most of the cache.
Bucket Index for Slim Mode
Version >= 1.4.0 This is only available from osm2pgsql version 1.4.0!
The default is still to create the old index for now until we have some more experience with it.
Osm2pgsql can use an index for way node lookups in slim mode that needs a lot less disk space than earlier versions did. For a planet the savings can be about 200 GB! Because the index is so much smaller, imports are faster, too. Lookup times are slightly slower, but this shouldn’t be an issue for most people.
If you are not using slim mode and/or not doing updates of your database, this does not apply to you.
For backwards compatibility osm2pgsql will never update an existing database to the new index. It will keep using the old index. So you do not have to do anything when upgrading osm2pgsql.
If you want to use the new index, there are two ways of doing this: The “safe” way for most users and the “doit-it-yourself” way for expert users. Note that once you switched to the new index, older versions of osm2pgsql will not work correctly any more.
Update for Most Users
In 1.4.0 this does not work yet. Currently the default is still to create the old type of index.
If your database was created with an older version of osm2pgsql you might want to start again from an empty database. Just do a reimport and osm2pgsql will use the new space-saving index.
Update for Expert Users
This is only for users who are very familiar with osm2pgsql and PostgreSQL operation. You can break your osm2pgsql database beyond repair if something goes wrong here and you might not even notice.
You can create the index yourself by following these steps:
Drop the existing index. Replace {prefix}
by the prefix you are using.
Usually this is planet_osm
:
DROP INDEX {prefix}_ways_nodes_idx;
Create the index_bucket
function needed for the index. Replace
{way_node_index_id_shift}
by the number of bits you want the id to be
shifted. If you don’t have a reason to use something else, use 5
:
CREATE FUNCTION {prefix}_index_bucket(int8[]) RETURNS int8[] AS $$
SELECT ARRAY(SELECT DISTINCT unnest($1) >> {way_node_index_id_shift})
$$ LANGUAGE SQL IMMUTABLE;
Now you can create the new index. Again, replace {prefix}
by the prefix
you are using:
CREATE INDEX {prefix}_ways_nodes_bucket_idx ON {prefix}_ways
USING GIN ({prefix}_index_bucket(nodes))
WITH (fastupdate = off);
If you want to create the index in a specific tablespace you can do this:
CREATE INDEX {prefix}_ways_nodes_bucket_idx ON {prefix}_ways
USING GIN ({prefix}_index_bucket(nodes))
WITH (fastupdate = off) TABLESPACE {tablespace};
Id Shift (for Experts)
When an OSM node is changing, the way node index is used to look up all ways that use that particular node and therefore might have to be updated, too. This index is quite large, because most nodes are in at least one way.
When creating a new database (when used in create mode with slim option),
osm2pgsql can create a “bucket index” using a configurable id shift for the
nodes in the way node index. This bucket index will create index entries not
for each node id, but for “buckets” of node ids. It does this by shifting the
node ids a few bits to the right. As a result there are far fewer entries
in the index, it becomes a lot smaller. This is especially true in our case,
because ways often contain consecutive nodes, so if node id n
is in a way,
there is a good chance, that node id n+1
is also in the way.
On the other hand, looking up an id will result in false positives, so the database has to retrieve more ways than absolutely necessary, which leads to the considerable slowdown.
You can set the shift with the command line option
--middle-way-node-index-id-shift
. Values between about 3 and 6 might make
sense, from some tests it looks like 5 is a good value.
To completely disable the bucket index and create an index compatible with
earlier versions of osm2pgsql, use --middle-way-node-index-id-shift=0
.
(This is currently still the default.)
Middle Command Line Options
Option | Description |
---|---|
-i, --tablespace-index=TABLESPC | Store all indexes in a separate PostgreSQL tablespace named by this parameter. This allows one to e.g. store the indexes on faster storage like SSDs. This option affects the middle as well as the pgsql output indexes. |
--tablespace-slim-data=TABLESPC | Store the slim mode tables in the given tablespace. |
--tablespace-slim-index=TABLESPC | Store the indexes of the slim mode tables in the given tablespace. |
-p, --prefix=PREFIX | Prefix for table names (default: planet_osm ). This option affects the middle as well as the pgsql output table names. |
-s, --slim | Store temporary data in the database. Without this mode, all temporary data is stored in RAM and if you do not have enough the import will not work successfully. With slim mode, you should be able to import the data even on a system with limited RAM, although if you do not have enough RAM to cache at least all of the nodes, the time to import the data will likely be greatly increased. |
--drop | Drop the slim mode tables from the database and the flat nodes file once the import is complete. This can greatly reduce the size of the database, as the slim mode tables typically are the same size, if not slightly bigger than the main tables. It does not, however, reduce the maximum spike of disk usage during import. It can furthermore increase the import speed, as no indexes need to be created for the slim mode tables, which (depending on hardware) can nearly halve import time. Slim mode tables however have to be persistent if you want to be able to update your database, as these tables are needed for diff processing. |
-C NUM, --cache=NUM | Only for slim mode: Use up to NUM MB of RAM for caching nodes. Giving osm2pgsql sufficient cache to store all imported nodes typically greatly increases the speed of the import. Each cached node requires 8 bytes of cache, plus about 10% - 30% overhead. As a rule of thumb, give a bit more than the size of the import file in PBF format. If the RAM is not big enough, use about 75% of memory. Make sure to leave enough RAM for PostgreSQL. It needs at least the amount of shared_buffers given in its configuration. Defaults to 800. |
--cache-strategy=STRATEGY | There are a number of different modes in which osm2pgsql can organize its node cache in RAM. These are optimized for different assumptions of the data and the hardware resources available. Currently available strategies are dense, chunked, sparse and optimized. dense assumes that the node id numbers are densely packed, i.e. only a few IDs in the range are missing / deleted. For planet extracts this is usually not the case, making the cache very inefficient and wasteful of RAM. sparse assumes node IDs in the data are not densely packed, greatly increasing caching efficiency in these cases. If node IDs are densely packed, like in the full planet, this strategy has a higher overhead for indexing the cache. optimized uses both dense and sparse strategies for different ranges of the ID space. On a block by block basis it tries to determine if it is more effective to store the block of IDs in sparse or dense mode. This is the default and should be typically used. |
-x, --extra-attributes | Include attributes of each object in the middle tables and make them available to the outputs. Attributes are: user name, user id, changeset id, timestamp and version. |
--flat-nodes=FILENAME | The flat-nodes mode is a separate method to store slim mode node information on disk. Instead of storing this information in the main PostgreSQL database, this mode creates its own separate custom database to store the information. As this custom database has application level knowledge about the data to store and is not general purpose, it can store the data much more efficiently. Storing the node information for the full planet requires more than 300GB in PostgreSQL, the same data is stored in “only” 50GB using the flat-nodes mode. This can also increase the speed of applying diff files. This option activates the flat-nodes mode and specifies the location of the database file. It is a single large file. This mode is only recommended for full planet imports as it doesn’t work well with small imports. The default is disabled. |
--middle-schema=SCHEMA | Version >= 1.4.0 Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the middle (default is no schema, i.e. the public schema is used). |
--middle-way-node-index-id-shift SHIFT | Version >= 1.4.0 Set ID shift for way node bucket index in middle. Experts only. See Bucket Index for slim mode for details. |
Expire
This chapter is incomplete.
When osm2pgsql is processing OSM changes, it can create a list of tiles that will be affected by those changes. This list can later be used to delete any changed tiles you might have cached. Osm2pgsql only creates this list. How to actually expire the tiles it outside the scope of osm2pgsql. Expire only makes sense in append mode.
Option | Description |
---|---|
-e, --expire-tiles=[MIN_ZOOM-]MAX-ZOOM | Create a tile expiry list |
-o, --expire-output=FILENAME | Output file name for expired tiles list |
--expire-bbox-size=SIZE | Max size for a polygon to expire the whole polygon, not just the boundary |
Advanced Topics
Notes on Memory Usage
Importing on OSM file into the database is very demanding in terms of RAM usage. Osm2pgsql and PostgreSQL are running in parallel at this point and both need memory.
PostgreSQL blocks at least the part of RAM that has been configured with the
shared_buffers
parameter during PostgreSQL tuning and needs some memory on
top of that. (See Tuning the PostgreSQL
Server).
Osm2pgsql needs at least 2GB of RAM for its internal data structures,
potentially more when it has to process very large relations. In addition it
needs to maintain a cache for node locations. The size of this cache can be
configured with the parameter --cache
.
When importing with a flatnode file (option --flat-nodes
), it is best to
disable the node cache completely (--cache=0
) and leave the memory for the
system cache to speed up accessing the flatnode file.
For imports without a flatnode file, set --cache
approximately to the size of
the OSM pbf file you are importing. (Note that the --cache
setting is in
MByte). Make sure you leave enough RAM for PostgreSQL and osm2pgsql as
mentioned above. If the system starts swapping or you are getting out-of-memory
errors, reduce the cache size or consider using a flatnode file.
Parallel Processing
Some parts of the osm2pgsql processing can run in parallel. Depending on the hardware resources of you machine, this can make things faster or slower or even overwhelm your system when too many things happen in parallel and your memory runs out. For normal operation the defaults should work, but you can fine-tune the behaviour using some command line options.
Osm2pgsql will do some of its processing in parallel. Usually it will use
however many threads your CPUs support, but no more than 4. For most use
cases this should work well, but you can tune the number of threads used
with the --number-processes
command line option. (Note that the option is
a bit of a misnomer, because this sets the number of threads used, they are
all in a single process.) If disks are fast enough e.g. if you have an SSD,
then this can greatly increase speed of the “going over pending ways” and
“going over pending relations” stages on a multi-core server. Past 8 threads
or so this will probably not gain you any speed advantage.
By default osm2pgsql starts the clustering and index building on all tables in
parallel to increase performance. This can be a disadvantage on slow disks, or
if you don’t have enough RAM for PostgreSQL to do the parallel index building.
PostgreSQL potentially needs the amount of memory set in the
maintenance_work_mem
config setting for each index it builds. With 7 or more
indexes being built in parallel this can mean quite a lot of memory is needed.
Use the --disable-parallel-indexing
option to build the indexes one after
the other.
Handling of Forward Dependencies
Whenever a node changes, osm2pgsql will find all ways and relations that have this node as a member and reprocess them. Similarly, whenever a way changes, their parent relations are reprocessed. So osm2pgsql will automatically handle “forward dependencies” from nodes and ways to their parents. Almost always this is the behaviour you want.
Version >= 1.4.0 This behaviour can be disabled with the command
line option --with-forward-dependencies=false
. It is used by Nominatim which
uses the specialized Gazetteer output which doesn’t need this behaviour.
Using Database While Updating
To improve performance osm2pgsql uses several parallel threads to import or update the OSM data. This means that that there is no transaction around all the updates. If you are querying the database while osm2pgsql is running, you might be able to see some updates but not others. While an import is running you should not query the data. For updates it depends a bit on your use case.
In most cases this is not a huge problem, because OSM objects are mostly independent of one another. If you are
- writing OSM objects into multiple tables,
- using two-stage processing in the Flex output, or
- doing complex queries joining several tables
you might see some inconsistent data, although this is still rather unlikely. If you are concerned by this, you should stop any other use of the database while osm2pgsql is running. This is something that needs to be done outside osm2pgsql, because osm2pgsql doesn’t know what else is running and whether and how it might interact with osm2pgsql.
Note that even if you are seeing inconsistent data at some point, the moment osm2pgsql is finished, the data will be consistent again. If you are running a tile server and using the expire functionality, you will, at that point, re-render all tiles that might be affected making your tiles consistent again.
Tips & Tricks
The flex output allows a wide range of configuration options. Here are some extra tips & tricks.
Using create_only
Columns for Postprocessed Data
Sometimes it is useful to have data in table rows that osm2pgsql can’t create. For instance you might want to store the center of polygons for faster rendering of a label.
To do this define your table as usual and add an additional column, marking
it create_only
. In our example case the type of the column should be the
PostgreSQL type GEOMETRY(Point, 3857)
, because we don’t want osm2pgsql to
do anything special here, just create the column with this type as is.
polygons_table = osm2pgsql.define_area_table('polygons', {
{ column = 'tags', type = 'hstore' },
{ column = 'geom', type = 'geometry' },
{ column = 'center', type = 'GEOMETRY(Point, 3857)', create_only = true },
{ column = 'area', type = 'area' },
})
After running osm2pgsql as usual, run the following SQL command:
UPDATE polygons SET center = ST_Centroid(geom) WHERE center IS NULL;
If you are updating the data using osm2pgsql --append
, you have to do this
after each update. When osm2pgsql inserts new rows they will always have a
NULL
value in center
, the WHERE
condition makes sure that we only do
this (possibly expensive) calculation once.
Accessing Environment Variables from Lua
In Lua scripts you can access environment variables with os.getenv("VAR")
.
The Lua config scripts that osm2pgsql uses are normal Lua scripts, so you
can do that there, too.
Getting and Preparing OSM Data
Osm2pgsql imports OSM data into a database. But where does this data come from? This appendix shows how to get the data and, possibly, prepare it for use with osm2pgsql.
Data Formats
There are several common formats for OSM data and osm2pgsql supports all of them (XML, PBF, and O5M). If you have the option, you should prefer the PBF format, because the files are smaller than those in other formats and faster to read.
OSM data files usually have a suffix .osm
, .osm.gz
, .osm.bz2
, or
.osm.pbf
, sometimes just .pbf
. They are used in osm2pgsql “create” mode.
OSM change files usually have the suffix .osc.gz
. They are used in
osm2pgsql “append” mode.
Getting the Data
The Planet File
The OpenStreetMap project publishes the so-called “planet file”, which contains a dump of the current full OSM database. This dump is available in XML and PBF format.
Downloads are available on planet.osm.org and several mirrors.
If you are new to osm2pgsql we recommend you start experimenting with a small extract, not the planet file! The planet file is huge (tens of GBytes) and it will take many hours to import.
Geographical Extracts
Importing data into the database takes time and uses a lot of disk space. If you only need a portion of the OSM data it often makes sense to only import an extract of the data.
Geographical extracts for countries, states, cities, etc. are available from several sources.
The extracts from Geofabrik are very popular. They are updated daily and also offer daily change files suitable for updating an osm2pgsql database.
If you can’t find a suitable extract, see below for creating your own.
Updating an Existing Database
The OpenStreetMap database changes all the time. To get these changes into your database, you need to download the OSM change files, sometimes called diffs or replication diffs, which contain those changes. They are available from planet.osm.org. Depending on how often you want to update your database, you can get minutely, hourly, or daily change files.
Some services offering OSM data extracts for download also offer change files suitable for updating those extracts. Geofabrik has daily change files for all its updates. See the extract page for a link to the replication URL. (Note that change files go only about 3 months back. Older files are deleted.) download.openstreetmap.fr has minutely change files for all its extracts.
To keep an osm2pgsql database up to date you need to know the replication
(base) URL, i.e. the URL of the directory containing a state.txt
file.
Versions >1.4.1 of osm2pgsql come with a script scripts/osm2pgsql-replication
which is the easiest way to keep an osm2pgsql database up to date. You need
PyOsmium installed for this to
work. Run the script regularly (for instance from a cron job or through
systemd) to automatically download recent changes and import them into your
database. (But please make sure you don’t call this script too often to keep
the load on the server to a minimum. Checking every minute for a file that
will only change once a day is not okay!) Call the script with option --help
to get usage information.
If this script is not available in your version of osm2pgsql or you want more control over the update process, there are other options. You need a program to download the change files and keep track of where you are in the replication process. Then you load the changes into the database using osm2pgsql’s “append” mode.
We recommend the
pyosmium_get_changes.py
tool from the PyOsmium project.
With it, downloading all changes since you ran the program the last time is
just a single command.
When you have changes for many months or years it might make more sense to drop your database completely and re-import from a more current OSM data file instead of updating the database from change files.
If you have imported an extract into an osm2pgsql database but there are no change files for the area of the extract, you can still use the replication diffs from planet.osm.org to update your database. But, because those extracts contain data for the whole planet, your database will keep accumulating more and more data outside the area of your extract that you don’t really need.
Preparing OSM Data for Use by Osm2pgsql
Before some OSM data file can be given to osm2pgsql it is sometimes necessary to prepare it in some way. This chapter explains some options.
For most of these steps, the recommended application is the osmium command line tool, which offers a wide range of functionality of slicing and dicing OSM data.
One handy command is osmium
fileinfo
which can tell you a lot about an OSM file, including how many objects it
contains, whether it is sorted, etc.
Creating Geographical Extracts
You can create your own extracts from the planet file or from existing
extracts with the osmium
extract
command. It can create extracts of OSM data from a bounding box or from a
boundary.
Merging OSM Data Files
If you are working with extracts you sometimes have several extracts, lets say for different countries, that you want to load into the same database. The correct approach in this case is to merge those extracts first and then import the resulting file with osm2pgsql.
You can use the osmium
merge
command for this.
Merging OSM Change Files
To speed up processing when you have many OSM change files, you can merge
several change files into larger files and then process the larger files with
osm2pgsql. The osmium
merge-changes
command will do this for you. Make sure to use the option -s, --simplify
.
Usually you should not merge change files for more than a day or so when doing this, otherwise the amount of changes osm2pgsql has to process in one go becomes too large.
OSM Data with Negative Ids
OSM data usually uses positive numbers for the ids of nodes, ways, and relations. Negative numbers are sometimes used for inofficial OSM data, for instance when non-OSM data is mixed with OSM data to make sure the are no id clashes.
Osm2pgsql can only handle positive ids. It uses negative ids internally (for multipolygon geometries in the database).
If you have negative ids in your input file, you can renumber it first. You can
use the osmium
renumber
command
for this.
Older versions of osm2pgsql will sometimes work or appear to work with negative ids, but it is not recommended to rely on this, because the processed data might be corrupted. Versions from 1.3.0 warn when you are using negative ids. From version 1.4.0 on, only positive ids are allowed.
Handling Unsorted OSM Data
OSM data files are almost always sorted, first nodes in order of their ids, then ways in order of their ids, then relations in order of their ids. The planet files, change files, and usual extracts all follow this convention.
Osm2pgsql can only read OSM files ordered in this way. This allows some optimizations in the code which speed up the normal processing.
If you have an unsorted input file, you should sort it first. You can use the
osmium sort
command for this.
Older versions of osm2pgsql will sometimes work or appear to work with unsorted data, but it is not recommended to rely on this, because the processed data might be corrupted. Versions from 1.3.0 warn when you are using unsorted data. From version 1.4.0 on, only sorted OSM files are allowed.
Working with OSM History Data
OpenStreetMap offers complete dumps of all OSM data which include not only the current version of all objects like the normal planet dump, but also all earlier version of OSM objects including deleted ones.
Like most other OSM software, osm2pgsql can not handle this data.
For some use cases there is a workaround: Create extracts from the full history
dump for specific points in time and feed those to osm2pgsql. You can use the
osmium
time-filter
command to create such extracts.
Lua Library for Flex Output
The flex output includes a small library of useful Lua helper functions.
All functions are in the osm2pgsql
namespace.
These functions are available on the flex output, they cannot be used in the Lua tag transformations of the pgsql output.
Name | clamp |
---|---|
Synopsis | osm2pgsql.clamp(VALUE, MIN, MAX) |
Description | Return VALUE if it is between MIN and MAX, MIN if it is smaller, or MAX if it is larger. All parameters must be numbers. If VALUE is `nil`, `nil` is returned. |
Example | osm2pgsql.clamp(2, 3, 4) ⟶ 3 |
Name | has_prefix(available from version 1.4.0) |
---|---|
Synopsis | osm2pgsql.has_prefix(STRING, PREFIX) |
Description | Returns `true` if the STRING starts with PREFIX. If STRING is `nil`, `nil` is returned. |
Example | osm2pgsql.has_prefix('addr:city', 'addr:') ⟶ true |
Name | has_suffix(available from version 1.4.0) |
---|---|
Synopsis | osm2pgsql.has_suffix(STRING, SUFFIX) |
Description | Returns `true` if the STRING ends with SUFFIX. If STRING is `nil`, `nil` is returned. |
Example | osm2pgsql.has_suffix('tiger:source', ':source') ⟶ true |
Name | make_check_values_func |
---|---|
Synopsis | osm2pgsql.make_check_values_func(VALUES[, DEFAULT]) |
Description | Return a function that will check its only argument against the list of VALUES. If it is in that list, it will be returned, otherwise the DEFAULT (or nil) will be returned. |
Example | local get_highway_value = osm2pgsql.make_check_values_func({ 'motorway', 'trunk', 'primary', 'secondary', 'tertiary' }, 'road') ... if object.tags.highway then local highway_type = get_highway_value(object.tags.highway) ... end |
Name | make_clean_tags_func |
---|---|
Synopsis | osm2pgsql.make_clean_tags_func(KEYS) |
Description | Return a function that will remove all tags (in place) from its only argument if the key matches KEYS. KEYS is an array containing keys, key prefixes (ending in `*`), or key suffixes (starting with `*`). The generated function will return `true` if it removed all tags, `false` if there are still tags left. |
Example | local clean_tags = osm2pgsql.make_clean_tags_func{'source', 'source:*', '*:source', 'note'} function osm2pgsql.process_node(node) if clean_tags(node.tags) then return end ... end |
Name | split_string(available from version 1.4.0) |
---|---|
Synopsis | osm2pgsql.split_string(STRING[, DELIMITER]) |
Description | Split STRING on DELIMITER (default: ';' (semicolon)) and return an array table with the results. Items in the array will have any whitespace at beginning and end removed. If STRING is `nil`, `nil` is returned. |
Example | local opening_hours = osm2pgsql.split_string(object.tags.opening_hours) |
Name | split_unit(available from version 1.4.0) |
---|---|
Synopsis | osm2pgsql.split_unit(STRING, DEFAULT_UNIT) |
Description | Split STRING of the form "VALUE UNIT" (something like "10 mph" or "20km") into the VALUE and the UNIT and return both. The VALUE must be a negative or positive integer or real number. The space between the VALUE and UNIT is optional. If there was no unit in the string, the DEFAULT_UNIT will be returned instead. Return `nil` if the STRING doesn't have the right pattern or is `nil`. |
Example | value, unit = osm2pgsql.split_unit(object.tags.maxspeed, 'km/h') |
Name | trim(available from version 1.4.0) |
---|---|
Synopsis | osm2pgsql.trim(STRING) |
Description | Return STRING with whitespace characters removed from the beginning and end. If STRING is `nil`, `nil` is returned. |
Example | local name = osm2pgsql.trim(object.tags.name) |
Name | way_member_ids |
---|---|
Synopsis | osm2pgsql.way_member_ids(RELATION) |
Description | Return an array table with the ids of all way members of RELATION. |
Example | function osm2pgsql.select_relation_members(relation) if relation.tags.type == 'route' then return { ways = osm2pgsql.way_member_ids(relation) } end end |
Command Line Options
This appendix contains an overview of all command line options.
Main Options
Option | Description |
---|---|
-a, --append | Run in append mode. |
-c, --create | Run in create mode (this is the default if -a, --append is not used. |
Help/Version Options
Option | Description |
---|---|
-h, --help | Print help. Add -v, --verbose for more verbose help. |
-V, --version | Print osm2pgsql version. Also prints versions of some libraries used. |
Logging Options
Option | Description |
---|---|
--log-level=LEVEL | Version >= 1.4.0 Set log level (debug , info (default), warn , or error ). |
--log-progress=VALUE | Version >= 1.4.0 Enable (true ) or disable (false ) progress logging. Setting this to auto will enable progress logging on the console and disable it if the output is redirected to a file. Default: true. |
--log-sql | Version >= 1.4.0 Enable logging of SQL commands for debugging. |
--log-sql-data | Version >= 1.4.0 Enable logging of all data added to the database. This will write out a huge amount of data! For debugging. |
-v, --verbose | Version >= 1.4.0 Same as --log-level=debug . |
Database Options
Option | Description |
---|---|
-d, --database=DB | Database name or PostgreSQL conninfo string. |
-U, --user=USERNAME | Database user. |
-W, --password | Force password prompt. |
-H, --host=HOST | Database server hostname or unix domain socket location. |
-P, --port=PORT | Database server port. |
Input Options
Option | Description |
---|---|
-r, --input-reader=FORMAT | Select format of the input file. Available choices are auto (default) for autodetecting the format, xml for OSM XML format files, o5m for o5m formatted files and pbf for OSM PBF binary format. |
-b, --bbox=BBOX | Apply a bounding box filter in format MINLON,MINLAT,MAXLON,MAXLAT on the imported data. Example: --bbox -0.5,51.25,0.5,51.75 |
Middle Options
Option | Description |
---|---|
-i, --tablespace-index=TABLESPC | Store all indexes in a separate PostgreSQL tablespace named by this parameter. This allows one to e.g. store the indexes on faster storage like SSDs. This option affects the middle as well as the pgsql output indexes. |
--tablespace-slim-data=TABLESPC | Store the slim mode tables in the given tablespace. |
--tablespace-slim-index=TABLESPC | Store the indexes of the slim mode tables in the given tablespace. |
-p, --prefix=PREFIX | Prefix for table names (default: planet_osm ). This option affects the middle as well as the pgsql output table names. |
-s, --slim | Store temporary data in the database. Without this mode, all temporary data is stored in RAM and if you do not have enough the import will not work successfully. With slim mode, you should be able to import the data even on a system with limited RAM, although if you do not have enough RAM to cache at least all of the nodes, the time to import the data will likely be greatly increased. |
--drop | Drop the slim mode tables from the database and the flat nodes file once the import is complete. This can greatly reduce the size of the database, as the slim mode tables typically are the same size, if not slightly bigger than the main tables. It does not, however, reduce the maximum spike of disk usage during import. It can furthermore increase the import speed, as no indexes need to be created for the slim mode tables, which (depending on hardware) can nearly halve import time. Slim mode tables however have to be persistent if you want to be able to update your database, as these tables are needed for diff processing. |
-C NUM, --cache=NUM | Only for slim mode: Use up to NUM MB of RAM for caching nodes. Giving osm2pgsql sufficient cache to store all imported nodes typically greatly increases the speed of the import. Each cached node requires 8 bytes of cache, plus about 10% - 30% overhead. As a rule of thumb, give a bit more than the size of the import file in PBF format. If the RAM is not big enough, use about 75% of memory. Make sure to leave enough RAM for PostgreSQL. It needs at least the amount of shared_buffers given in its configuration. Defaults to 800. |
--cache-strategy=STRATEGY | There are a number of different modes in which osm2pgsql can organize its node cache in RAM. These are optimized for different assumptions of the data and the hardware resources available. Currently available strategies are dense, chunked, sparse and optimized. dense assumes that the node id numbers are densely packed, i.e. only a few IDs in the range are missing / deleted. For planet extracts this is usually not the case, making the cache very inefficient and wasteful of RAM. sparse assumes node IDs in the data are not densely packed, greatly increasing caching efficiency in these cases. If node IDs are densely packed, like in the full planet, this strategy has a higher overhead for indexing the cache. optimized uses both dense and sparse strategies for different ranges of the ID space. On a block by block basis it tries to determine if it is more effective to store the block of IDs in sparse or dense mode. This is the default and should be typically used. |
-x, --extra-attributes | Include attributes of each object in the middle tables and make them available to the outputs. Attributes are: user name, user id, changeset id, timestamp and version. |
--flat-nodes=FILENAME | The flat-nodes mode is a separate method to store slim mode node information on disk. Instead of storing this information in the main PostgreSQL database, this mode creates its own separate custom database to store the information. As this custom database has application level knowledge about the data to store and is not general purpose, it can store the data much more efficiently. Storing the node information for the full planet requires more than 300GB in PostgreSQL, the same data is stored in “only” 50GB using the flat-nodes mode. This can also increase the speed of applying diff files. This option activates the flat-nodes mode and specifies the location of the database file. It is a single large file. This mode is only recommended for full planet imports as it doesn’t work well with small imports. The default is disabled. |
--middle-schema=SCHEMA | Version >= 1.4.0 Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the middle (default is no schema, i.e. the public schema is used). |
--middle-way-node-index-id-shift SHIFT | Version >= 1.4.0 Set ID shift for way node bucket index in middle. Experts only. See Bucket Index for slim mode for details. |
Output Options
Option | Description |
---|---|
-O, --output=OUTPUT | Select the output. Available outputs are: flex, pgsql (default), gazetteer, multi, and null |
-S, --style=STYLE | The style file. This specifies how the data is imported into the database, its format depends on the output. (For the pgsql output, the default is /usr/share/osm2pgsql/default.style , for other outputs there is no default.) |
Pgsql Output Options
Option | Description |
---|---|
-i, --tablespace-index=TABLESPC | Store all indexes in the PostgreSQL tablespace TABLESPC . This option also affects the middle tables. |
--tablespace-main-data=TABLESPC | Store the data tables in the PostgreSQL tablespace TABLESPC . |
--tablespace-main-index=TABLESPC | Store the indexes in the PostgreSQL tablespace TABLESPC . |
--latlong | Store coordinates in degrees of latitude & longitude. |
-m, --merc | Store coordinates in Spherical Mercator (Web Mercator, EPSG:3857) (the default). |
-E, --proj=SRID | Use projection EPSG:SRID . |
-p, --prefix=PREFIX | Prefix for table names (default: planet_osm ). This option affects the middle as well as the pgsql output table names. |
--tag-transform-script=SCRIPT | Specify a Lua script to handle tag filtering and normalisation. The script contains callback functions for nodes, ways and relations, which each take a set of tags and returns a transformed, filtered set of tags which are then written to the database. |
-x, --extra-attributes | Include attributes (user name, user id, changeset id, timestamp and version). This also requires additional entries in your style file. |
-k, --hstore | Add tags without column to an additional hstore (key/value) column in the database tables. |
-j, --hstore-all | Add all tags to an additional hstore (key/value) column in the database tables. |
-z, --hstore-column=PREFIX | Add an additional hstore (key/value) column named PREFIX containing all tags that have a key starting with PREFIX , eg \--hstore-column "name:" will produce an extra hstore column that contains all name:xx tags. |
--hstore-match-only | Only keep objects that have a value in at least one of the non-hstore columns. |
--hstore-add-index | Create indexes for all hstore columns after import. |
-G, --multi-geometry | Normally osm2pgsql splits multi-part geometries into separate database rows per part. A single OSM object can therefore use several rows in the output tables. With this option, osm2pgsql instead generates multi-geometry features in the PostgreSQL tables. |
-K, --keep-coastlines | Keep coastline data rather than filtering it out. By default objects tagged natural=coastline will be discarded based on the assumption that Shapefiles generated by OSMCoastline (https://osmdata.openstreetmap.de/) will be used for the coastline data. |
--reproject-area | Compute area column using Spherical Mercator coordinates even if a different projection is used for the geometries. |
--output-pgsql-schema=SCHEMA | Version >= 1.4.0 Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the pgsql and multi outputs (default is no schema, i.e. the public schema is used). |
Expire Options
Option | Description |
---|---|
-e, --expire-tiles=[MIN_ZOOM-]MAX-ZOOM | Create a tile expiry list |
-o, --expire-output=FILENAME | Output file name for expired tiles list |
--expire-bbox-size=SIZE | Max size for a polygon to expire the whole polygon, not just the boundary |
Advanced Options
Option | Description |
---|---|
-I, --disable-parallel-indexing | Disable parallel clustering and index building on all tables, build one index after the other. |
--number-processes=THREADS | Specifies the number of parallel threads used for certain operations. The default is to set this between 1 and 4 depending on the number of CPUs you have. Values up to 32 are possible but probably not useful. Note that each thread opens multiple connections to the database and you will probably reach the limit of allowed database connections. |
--with-forward-dependencies=BOOL | Version >= 1.4.0 Propagate changes from nodes to ways and node/way members to relations (Default: true ). |
Upgrading
Some osm2pgsql changes have slightly changed the database schema it expects. If
updating an old database, a migration may be needed. The migrations here assume
the default planet_osm
prefix.
It is frequently better to reimport as this will also recluster the tables and remove table or index bloat.
Upgrade to 0.93: Unprojected Slim Coordinates
The method of storing coordinates in the middle (slim) tables has changed. There is no migration and a reload is required.
Upgrade to 0.91: Default Projection
The default projection was moved from 900913 to 3857. This does not effect
users using -l
or -E
, but if using no projection options or -m
a
migration is needed:
ALTER TABLE planet_osm_roads ALTER COLUMN way TYPE geometry(LineString,3857) USING ST_SetSRID(way,3857);
ALTER TABLE planet_osm_point ALTER COLUMN way TYPE geometry(Point,3857) USING ST_SetSRID(way,3857);
ALTER TABLE planet_osm_line ALTER COLUMN way TYPE geometry(LineString,3857) USING ST_SetSRID(way,3857);
ALTER TABLE planet_osm_polygon ALTER COLUMN way TYPE geometry(Geometry,3857) USING ST_SetSRID(way,3857);
Upgrade to: 0.88.0: z_order
Changes
In version 0.88.0 the z_order
logic in the pgsql output was changed,
requiring an increase in z_order
values. To migrate to the new range of
values, run
UPDATE planet_osm_line SET z_order = z_order * 10;
UPDATE planet_osm_roads SET z_order = z_order * 10;
This will not apply the new logic, but will get the existing z_orders
in the
right group of 100 for the new logic.
If you are not using the z_orders
column, this change may be ignored.
Upgrade to: 0.87.0 Pending Removal
Version 0.87.0 moved the in-database tracking of pending ways and relations to in-memory, for an increase in speed. This requires removal of the pending column and a partial index associated with it.
ALTER TABLE planet_osm_ways DROP COLUMN pending;
ALTER TABLE planet_osm_rels DROP COLUMN pending;
32 Bit to 64 Bit ID Migration
Old databases may have been imported with 32 bit node IDs, while current OSM
data requires 64 bit IDs. A database this old should not be migrated, but
reloaded. To migrate, the type of ID columns needs to be changed to bigint
.