Osm2pgsql v2 Manual

Introduction

This is the manual for version 2.x.y of osm2pgsql. If you are running an older version of osm2pgsql, refer to the manual v1.

Osm2pgsql is used to import OSM data into a PostgreSQL/PostGIS database for rendering into maps and many other uses. Usually it is only part of a toolchain, for instance other software is needed for the actual rendering (i.e. turning the data into a map) or delivery of the maps to the user etc.

Osm2pgsql is a fairly complex piece of software and it interacts with the database system and other pieces of the toolchain in complex ways. It will take a while until you get some experience with it. It is strongly recommended that you try out osm2pgsql with small extracts of the OSM data, for instance the data for your city. Do not start with importing data for the whole planet, it is easy to get something wrong and it will break after processing the data for hours or days. You can save yourself a lot of trouble doing some trial runs starting with small extracts and working your way up, observing memory and disk usage along the way.

It helps to be familiar with the PostgreSQL database system and the PostGIS extension to it as well as the SQL database query language. Some knowledge of the Lua language is also useful.

This manual always documents the current version of osm2pgsql. If same information is only valid in certain versions, the section will have a note like this: Version >= 2.0.0 This manual is only for version 2 and above, refer to the manual v1 for older versions.

Sections labelled Experimental describe new features which might change at any point without notice. We encourage you to experiment with them and report how they work for you, but don’t rely on them for production use.

It is recommended that you always use the newest released version of osm2pgsql. Earlier versions sometimes contain bugs that have long since been fixed.

System Requirements

✎

Operating System

Osm2pgsql works on Linux, Windows, macOS, and other systems. Only 64bit systems are supported.

Osm2pgsql is developed on Linux and most of the developers don’t have experience with running it on anything but Linux, so it probably functions best on that platform. This documentation is also somewhat geared towards Linux users. That being said, we strive to make the software work well on all systems. Please report any problems you might have.

Read the installation instructions for details on how to install osm2pgsql on your system.

Main Memory

Memory requirements for your system will vary widely depending on the size of your input file. Is it city-sized extract or the whole planet? As a rule of thumb you need at least as much main memory as the PBF file with the OSM data is large. So for a planet file you currently need at least 64 GB RAM, better are 128 GB. Osm2pgsql will not work with less than 2 GB RAM.

More memory can be used as cache by the system, osm2pgsql, or PostgreSQL and speed up processing a lot.

Disk

You are strongly encouraged to use a SSD (NVMe if possible) for your database. This will be much faster than traditional spinning hard disks.

Requirements for your system will vary widely depending on

the amount of data you want to store (city-sized extract or the whole planet?)
whether or not you want to update the data regularly

See the Sizing appendix for some ideas on how large a disk you might need.

Database Software

You need the PostgreSQL database system with the PostGIS extension installed.

Osm2pgsql aims to support all PostgreSQL and PostGIS versions that are currently supported by their respective maintainers. Currently PostgreSQL versions 9.6 and above and PostGIS versions 2.5 and above are supported. (Earlier versions might work but are not tested any more.) PostgreSQL version 11 or above is recommended.

In some cases older versions of osm2pgsql have problems with newer database software versions. You should always run the newest released osm2pgsql version which should ensure you get any fixes we have done to make osm2pgsql work well with any PostgreSQL version.

Osm2pgsql does not work with database systems other than PostgreSQL. The PostgreSQL/PostGIS combination is unique in its capabilities and we are using some of their special features. We are not planning to support any other database systems. There are some database systems out there which claim compatibility with PostgreSQL, they might work or they might not work. Please tell us if you have any experience with them.

Osm2pgsql uses prepared statements. This sometimes causes issues with connection poolers that either don’t work correctly with prepared statements or are not configured correctly. Please make sure you understand what issues your connection pooler might have with prepared statements and configure it correctly if you are using a connection pooler. (It is also recommended that you use version 2.1.0 or newer if using a connection pooler, because it contains some key improvements.)

Lua Scripting Language

Osm2pgsql requires the use of the Lua scripting language. If available osm2pgsql can also be compiled using the Lua JIT (just in time) compiler. Use of Lua JIT is recommended, especially for larger systems, because it will speed up processing.

Security Note: Lua scripts can do basically anything on your computer that the user running osm2pgsql is allowed to do, such as reading and writing files, opening network connections etc. This makes the Lua config files really versatile and allows lots of great functionality. But you should never use a Lua configuration file from an unknown source without checking it.

Preparing the Database

✎

Before you can import any OSM data into a database, you need a database.

Installing PostgreSQL/PostGIS

You need the PostgreSQL database software and the PostGIS plugin for the database. Please read the respective documentation on how to install it. On Linux this can almost always be done with the package manager of your distribution.

Creating a Database

To create a database that can be used by osm2pgsql follow these steps:

Create a database user that osm2pgsql will use. This user doesn’t need any special rights. We’ll use osmuser here.
Create a database that osm2pgsql will use belonging to the user you just created. We’ll use osm as a name here.
Enable the postgis extension in the newly created database.
Optionally enable the hstore extension in the database, it is used in some popular configurations, but not needed by osm2pgsql itself.

On a typical Linux system you’ll have a system user called postgres which has admin privileges for the database system. We’ll use that user for these steps.

Here are typical commands used:

sudo -u postgres createuser osmuser
sudo -u postgres createdb --encoding=UTF8 --owner=osmuser osm
sudo -u postgres psql osm --command='CREATE EXTENSION postgis;'
sudo -u postgres psql osm --command='CREATE EXTENSION hstore;'

Security Considerations

Osm2pgsql does not need any special database rights, it doesn’t need superuser status and doesn’t need to create databases or roles. You should create a database user for the specific use by osm2pgsql which should not have any special PostgreSQL privileges.

This manual assumes that you have ‘peer’ authentication configured when using a local database. That means a local user may connect to the database with its own username without needing to type a password. This authentication method is the default on Debian/Ubuntu-based distributions. Never configure ‘trust’ authentication where all local users can connect as any user to the database.

Any osm2pgsql setup will need a database to work on. You should create this as PostgreSQL superuser and change ownership (with --owner option on the createdb command or an OWNER= clause on the CREATE DATABASE SQL command) to the user osm2pgsql is using. This way the database user that osm2pgsql uses doesn’t need database creation privileges.

Typical osm2pgsql setups need the postgis and hstore extensions to be enabled in the database. To install these you need superuser privileges in the database. Enable them (with CREATE EXTENSION) as PostgreSQL superuser on the database that osm2pgsql is using before you run osm2pgsql.

Of course osm2pgsql needs to be able to create tables and write to the database. Usually it can do this as owner of the database created for it. Using the data, on the other hand, doesn’t need any of those rights. So the map rendering software you are using, for instance, usually only needs to read the data. It is recommended that you run these as a different database user, distinct from the database user osm2pgsql is using and only give that user SELECT rights (with the GRANT command).

If you are using a security scheme based on database schemas in your database you can use the --schema, --middle-schema and --output-pgsql-schema options and the schema table option in the flex output, respectively, to tell osm2pgsql to load data into specific schemas. You have to create those schemas and give them the correct rights before running osm2pgsql.

Before PostgreSQL 15 all database users could add objects (such as tables and indexes) to the public schema by default. Since PostgreSQL 15, by default, only the owner of the database is allowed to do this. If osm2pgsql is using a database user that’s not the owner of the database, you have to grant this user CREATE rights on the public schema of the database, or you have to configure osm2pgsql to use a different schema as described above. See the PostgreSQL manual for the details.

Encoding

OpenStreetMap data is from all around the world, it always uses UTF-8 encoding. Osm2pgsql will write the data as is into the database, so it has to be in UTF-8 encoding, too.

On any modern system the default encoding for new databases should be UTF-8, but to make sure, you can use the -E UTF8 or --encoding=UTF8 options when creating the database for osm2pgsql with createdb.

Tuning the PostgreSQL Server

Usual installs of the PostgreSQL server come with a default configuration that is not well tuned for large databases. You should change these settings in postgresql.conf and restart PostgreSQL before running osm2pgsql, otherwise your system will be much slower than necessary.

The following settings are geared towards a system with 128GB RAM and a fast SSD. The values in the second column are suggestions to provide a good starting point for a typical setup, you might have to adjust them for your use case. The value in the third column is the default set by PostgreSQL 17.

Config Option	Proposed Value	Pg 17 Default	Remark
shared_buffers	1GB	128MB	Lower than typical PostgreSQL recommendations to give osm2pgsql priority to RAM.
work_mem	50MB	4MB
maintenance_work_mem	10GB	64MB	Improves `CREATE INDEX`
autovacuum_work_mem	2GB	-1	-1 uses `maintenance_work_mem`
wal_level	minimal	replica	Reduces WAL activity if replication is not required during data load. Must also set `max_wal_senders=0`.
checkpoint_timeout	60min	5min	Increasing this value reduces time-based checkpoints and increases time to restore from PITR
max_wal_size	10GB	1GB	PostgreSQL > 9.4 only. For PostgreSQL <= 9.4 set `checkpoint_segments = 100` or higher.
checkpoint_completion_target	0.9	0.9	Spreads out checkpoint I/O of more of the `checkpoint_timeout` time, reducing spikes of disk activity
max_wal_senders	0	10	See `wal_level`
random_page_cost	1.0	4.0	Assuming fast SSDs

Here are the proposed settings for copy-and-paste into a config file:

shared_buffers = 1GB
work_mem = 50MB
maintenance_work_mem = 10GB
autovacuum_work_mem = 2GB
wal_level = minimal
checkpoint_timeout = 60min
max_wal_size = 10GB
checkpoint_completion_target = 0.9
max_wal_senders = 0
random_page_cost = 1.0

Increasing values for max_wal_size and checkpoint_timeout means that PostgreSQL needs to run checkpoints less often but it can require additional space on your disk and increases time required for Point in Time Recovery (PITR) restores. Monitor the PostgreSQL logs for warnings indicating checkpoints are occurring too frequently: HINT: Consider increasing the configuration parameter "max_wal_size".

Autovacuum must not be switched off because it ensures that the tables are frequently analysed. If your machine has very little memory, you might consider setting autovacuum_max_workers = 1 and reduce autovacuum_work_mem even further. This will reduce the amount of memory that autovacuum takes away from the import process.

For additional details see the Server Configuration chapter and Populating a Database in the Performance Tips chapter in the PostgreSQL documentation.

Expert Tuning

The suggestions in this section are potentially dangerous and are not suitable for all environments. These settings can cause crashes and/or corruption. Corruption in a PostgreSQL instance can lead to a “bricked” instance affecting all databases in the instance.

Config Option	Proposed Value	Pg 17 Default	Remark
full_page_writes	off	on	Warning: Risks data corruption. Set back to `on` after import.
fsync	off	on	Warning: Risks data corruption. Set back to `on` after import.

Additional information is on the PostgreSQL wiki: Tuning Your PostgreSQL Server. The section titled synchronous_commit contains important information to the synchronous_commit and fsync settings.

Using a Template Database

Databases are actually created in PostgreSQL by copying a template database. If you don’t specify a database to copy from, the template1 database is used. PostgreSQL allows you to change template1 or to create additional template databases. If you create a lot of databases for use with osm2pgsql, you can do initializations like CREATE EXTENSTION postgis; once in a template database and they will be available in any database you create from them.

See the PostgreSQL manual for details.

Database Maintainance

PostgreSQL tables and indexes that change a lot tend to get larger and larger over time. This can happen even with autovacuum running. This does not affect you if you just import OSM data, but when you update it regularly, you have to keep this in mind. You might want to occasionally re-import the database from scratch.

This is not something specific to osm2pgsql, but a general PostgreSQL issue. If you are running a production database, you should inform yourself about possible issues and what to do about them in the PostgreSQL literature.

Geometry Processing

✎

An important part of what osm2pgsql does is creating geometries from OSM data. In OSM only nodes have a location, ways get their geometry from member nodes and relations get their geometry from member nodes and ways. Osm2pgsql assembles all the data from the related objects into valid geometries.

Geometry Types

The geometry types supported by PostGIS are from the Simple Features defined by the OpenGIS Consortium (OGC). Osm2pgsql creates geometries of the following types from OSM data:

Geometry type	Created from OSM data
Point	Created from nodes.
LineString	Created from ways.
Polygon	Created from closed ways or some relations.
MultiPoint	Created from nodes or some relations.
MultiLineString	Created from (split up) ways or some relations.
MultiPolygon	Created from closed ways or some relations.
GeometryCollection	Created from relations.

Single vs. Multi Geometries

Generally osm2pgsql will create the simplest geometry it can. Nodes will turn into Points, ways into LineStrings or Polygons. A multipolygon relation can be turned into a Polygon or MultiPolygon, depending on whether it has one or more outer rings. Similarly, a route relation can be turned into a LineString if the route is connected from start to finish or a MultiLineString if it is unconnected or there are places, where the route splits up.

In some cases osm2pgsql will split up Multi* geometries into simple geometries and add each one in its own database row. This can make rendering faster, because the renderer can deal with several smaller geometries instead of having to handle one large geometry. But, depending on what you are doing with the data, it can also lead to problems. This blogpost has some deeper discussion of this issue. See the flex and pgsql output chapters for details on how to configure this. It will also mean that your id columns are not unique, because there are now multiple rows created from the same OSM object. See the Primary Keys and Unique IDs section for an option how to work around this.

When using the flex output, you can decide yourself what geometries to create using the as_point(), as_linestring(), as_polygon(), as_multipoint(), as_multilinestring(), as_multipolygon(), and as_geometrycollection() functions. See the Flex Output chapter for details.

Geometry Validity

Point geometries are always valid (as long as the coordinates are inside the correct range). LineString geometries are valid if they have at least two distinct points. Osm2pgsql will collapse consecutive indentical points in a linestring into a single point. Note that a line crossing itself is still valid and not a problem.

Validity is more complex for Polygon and MultiPolygon geometries: There are multiple ways how such a geometry can be invalid. For instance, if the boundary of a polygon is drawn in a figure eight, the result will not be a valid polygon. The database will happily store such invalid polygons, but this can lead to problems later, when you try to draw them or do calculations based on the invalid geometry (such as calculating the area).

You can use the Areas view of the OpenStreetMap inspector to help diagnose problems with multipolygons.

Osm2pgsql makes sure that there are no invalid geometries in the database, either by not importing them in the first place or by using an ST_IsValid() check after import. You’ll either get NULL in your geometry columns instead or the row is not inserted at all (depending on config).

The transform() function in Lua config files projects a geometry into a different SRS. In special cases this can make the geometry invalid, for instance when two distinct points are projected onto the same point.

Processing of Nodes

Node geometries are always converted into Point geometries.

Processing of Ways

Depending on the tags, OSM ways model either a LineString or a Polygon, or both! A way tagged with highway=primary is usually a linear feature, a way tagged landuse=farmland is usually a polygon feature. If a way with polygon type tags is not closed, the geometry is invalid, this is an error and the object is ignored. For some tags, like man_made=pier non-closed ways are linear features and closed ways are polygon features.

If a mapper wants to override how a way should be interpreted, they can use the area tag: The tag area=yes turns a normally linear feature into a polygon feature, for instance it turns a pedestrian street (highway=pedestrian) into a pedestrian area. The tag area=no turns a polygon feature into a linear feature.

There is no definite list which tags indicate a linear or polygon feature. Osm2pgsql lets the user decide. It depends on your chosen output (see next chapter) how to configure this. Some of the example config files have lists that should cover most of the commonly used tags, but you might have to extend the lists if you are using more unusual tags.

Osm2pgsql can split up long LineStrings created from ways into smaller segments. This can make rendering of tiles faster, because smaller geometries need to be retrieved from the database when rendering a specific tile. The pgsql output always splits up long LineStrings, in latlong projection, lines will not be longer than 1°, in Web Mercator lines will not be longer than 100,000 units (about 100,000 meters at the equator). In the flex output you have full control over line splitting by using (or not using) the segmentize() function. See also the Single vs. Multi Geometries section above.

When using the flex output, you can decide yourself what geometries to create from a way using the as_linestring(), or as_polygon() functions. See the Flex Output chapter for details.

Processing of Relations

Relations come in many variations and they can be used for all sorts of geometries. Usually it depends on the type tag of a relation what kind of geometry it should have:

Relation type	Typical geometry created
type=multipolygon	(Multi)Polygon
type=boundary	(Multi)LineString or (Multi)Polygon depending on whether you are interested in the boundary itself or the area it encloses.
type=route	(Multi)LineString

When using the flex output, you can decide yourself what geometries to create from a way using the as_multipoint(), as_multilinestring(), or as_multipolygon() functions. Also supported now is the as_geometrycollection() function which creates GeometryCollection from all member nodes and ways of a relation (relation members are ignored).

If you are using the old “C transform” of the pgsql output, the geometry types for relations of type multipolygon, boundary, and route are hardcoded. If you are using the “Lua transform” of the pgsql output you can configure them somewhat.

Note that osm2pgsql will ignore the roles (inner and outer) on multipolygon and boundary relations when assembling multipolygons, because the roles are sometimes wrong or missing.

Relations with more than 32767 members are ignored by osm2pgsql. This protects osm2pgsql and any post-processing against bad data. The OSM database limits the number of members to 32000, but historically more members were allowed, so keep this in mind if you are working with older OSM data.

Handling of Incomplete OSM Data

Sometimes you will feed incomplete OSM data to osm2pgsql. Most often this will happen when you use geographical extracts, either data you downloaded from somewhere or created yourself. Incomplete data, in this case, means that some member nodes of ways or members of relations are missing. Often this can not be avoided when creating an extract, you just have to put that cut somewhere.

When osm2pgsql encounters incomplete OSM data it will still try to do its best to use it. Ways missing some nodes will be shortened to the available part. Multipolygons might be missing some parts. In most cases this will work well (if your extract has been created with a sufficiently large buffer), but sometimes this will lead to wrong results. In the worst case, if a complete outer ring of a multipolygon is missing, the multipolygon will appear “inverted”, with outer and inner rings switching their roles.

Unfortunately there isn’t much that osm2pgsql (or anybody) can do to improve this, this is just the nature of OSM data.

In debug mode osm2pgsql will log the ids of missing nodes and the ways they are missing from.

Projections

Osm2pgsql can create geometries in many projections. If you are using the pgsql output, the projection can be chosen with command line options. When using the flex output, the projections are specified in the Lua style file. The default is always “Web Mercator”.

When using the flex output, osm2pgsql will usually magically transform any geometry you are writing into a database table into the projection you defined your tables with. But you can use the transform() function on the geometry to force a certain transformation. This is useful, for instance, if you want to calculate the area of a polygon in a specific projection. See the Flex Output chapter for details.

Equirectangular projection. Image source: Wikipedia (https://en.wikipedia.org/wiki/File:Equirectangular_projection_SW.jpg)

Latlong (WGS84)

EPSG:4326

The original latitude & longitude coordinates from OpenStreetMap in the WGS84 coordinate reference system. This is typically chosen if you want to do some kind of analytics on the data or reproject it later.

Mercator projection. Image source: Wikipedia (https://en.wikipedia.org/wiki/File:Mercator_projection_Square.JPG)

Web Mercator

EPSG:3857

This is the projection used most often for tiled web maps. It is the default in osm2pgsql.

Data beyond about 85° North and South will be cut off, because it can not be represented in this projection.

Mollweide Projection. Image source: Wikipedia (https://en.wikipedia.org/wiki/File:Mollweide_projection_SW.jpg)

Other Projections

If osm2pgsql was compiled with support for the PROJ library, it supports all projections supported by that library.

Call osm2pgsql --version to see whether your binary was compiled with PROJ and with which version.

Note that mapping styles often depend on the projection used. Most mapping style configurations will enable or disable certain rendering styles depending on the map scale or zoom level. But a meaningful scale will depend on the projection. Most styles you encounter are probably made for Web Mercator and will need to be changed if you want to use them for other projections.

Running osm2pgsql

✎

Basic command line operation

Osm2pgsql can work in one of two ways: Import only or Import and Update.

If you are new to osm2pgsql we recommend you try the import only approach first and use a small extract of the OSM data.

Import Only

OSM data is imported once into the database and will not change afterwards. If you want to update the data, you have to delete it and do a full re-import. This approach is used when updates aren’t needed or only happen rarely. It is also sometimes the only option, for instance when you are using extracts for which change files are not available. This is also a possible approach if disk space is tight, because you don’t need to store data needed only for updates.

In this mode osm2pgsql is used with the -c, --create command line option. This is also the default mode. The following command lines are equivalent:

osm2pgsql -c OSMFILE
osm2pgsql --create OSMFILE
osm2pgsql OSMFILE

In case the system doesn’t have much main memory, you can add the -s, --slim and --drop options. In this case less main memory is used, instead more data is stored in the database. This makes the import much slower, though. See the chapter Middle for details on these options.

Import and Update

In this approach OSM data is imported once and afterwards updated more or less regularly from OSM change files. OSM offers minutely, hourly, and daily change files. This mode is used when regular updates are desired and change files are available.

In this mode osm2pgsql is used the first time with the -c, --create command line option (this is also the default mode) and, additionally the -s, --slim options. The following command lines are equivalent:

osm2pgsql -c -s OSMFILE
osm2pgsql --create --slim OSMFILE
osm2pgsql --slim OSMFILE

For the update runs the -a, --append and -s, --slim options must be used. The following command lines are equivalent:

osm2pgsql -a -s OSMFILE
osm2pgsql --append --slim OSMFILE

The OSMFILE in this case will usually be an OSM change file (with suffix .osc or .osc.gz).

You can not use replication diffs downloaded from planet.osm.org directly with osm2pgsql, see Updating an Existing Database for details.

This approach needs more disk space for your database than the “Import Only” approach, because all the information necessary for the updates must be stored somewhere.

Usually you will need to use at least the -C, --cache or -F, --flat-nodes command line options when doing imports and updates. See the chapter Middle for details.

Osm2pgsql needs the tables, indexes, etc. that it creates to be where it expects them to be. Do not rename of otherwise change anything created by osm2pgsql!

Getting Help or Version

To get help or the program version call osm2pgsql with one of the following options:

Option	Description
-h, --help	Print help. Add `-v, --verbose` for more verbose help.
-V, --version	Print osm2pgsql version. Also prints versions of some libraries used.

Logging

Osm2pgsql will write information about what it is doing to the console (to STDERR). If you’d rather want this information in a file, use the output redirection in your shell (osm2pgsql ... 2>osm2pgsql.log).

Several command line options allow you to change what will be logged and how:

Option	Description
--log-level=LEVEL	Set log level (`debug`, `info` (default), `warn`, or `error`).
--log-progress=VALUE	Enable (`true`) or disable (`false`) progress logging. Setting this to `auto` will enable progress logging on the console and disable it if the output is redirected to a file. Default: true.
--log-sql	Enable logging of SQL commands for debugging.
--log-sql-data	Enable logging of all data added to the database. This will write out a huge amount of data! For debugging.
-v, --verbose	Same as `--log-level=debug`.

Database Connection

In create and append mode you have to tell osm2pgsql which database to access. If left unset, it will attempt to connect to the default database (usually the username) using a unix domain socket. Most usage only requires setting -d, --database.

Option	Description
-d, --database=DB	Database name or PostgreSQL conninfo string.
-U, --user=USERNAME	Version < 2.2.0 Database user.
-U, --username=USERNAME, --user=USERNAME	Version >= 2.2.0 Database user.
-W, --password	Force password prompt. Do not put the password on the command line!
-H, --host=HOST	Database server hostname or unix domain socket location.
-P, --port=PORT	Database server port.
--schema=SCHEMA	Default for various schema settings throughout osm2pgsql (default: `public`). The schema must exist in the database and be writable by the database user.

Note that the -W or --password option does not take a password as argument! It just makes sure that osm2pgsql will ask for a password interactively.

You can also use libpq environment variables to set connection parameters. For a full list of available parameters, please consult the PostgreSQL documentation.

When you need a password for your database connection and want to run osm2pgsql inside scripts, then use a pgpass file with appropriate permissions to store the password. All other methods of giving the password to osm2pgsql are inherently insecure. You can either put a .pgpass file in the user’s home directory or supply its location through the PGPASSFILE environment variable.

Instead of specifying a database name with the -d, --database option you can also specify a connection string in the form of a keyword/value connection string (something like host=localhost port=5432 dbname=mydb) or a URI (postgresql://[user[:password]@][netloc][:port][,...][/dbname][?param1=value1&...]) See the PostgreSQL documentation for details.

Processing the OSM Data

Diagram showing overview of osm2pgsql data processing from OSM data to Input, then Middle, then Output and the Database

Osm2pgsql processing can be separated into multiple steps:

The Input reads the OSM data from the OSM file.
The Middle stores all objects and keeps track of relationships between objects.
The Output transforms the data and loads it into the database.

This is, of course, a somewhat simplified view, but it is enough to understand the operation of the program. The following sections will describe each of the steps in more detail.

The Input

Depending on the operational mode your input files are either

OSM data files (in create mode), or
OSM change files (in append mode)

Usually osm2pgsql will autodetect the file format, but see the -r, --input-reader option below. Osm2pgsql can not work with OSM history files.

OSM data files are almost always sorted, first nodes in order of their ids, then ways in order of their ids, then relations in order of their ids. The planet files, change files, and usual extracts all follow this convention.

Osm2pgsql can only read OSM files ordered in this way. This allows some optimizations in the code which speed up the normal processing.

See the Appendix A for more information on how to get and prepare OSM data for use with osm2pgsql.

Option	Description
-r, --input-reader=FORMAT	Select format of the input file. Available choices are auto (default) for autodetecting the format, xml for OSM XML format files, o5m for o5m formatted files and pbf for OSM PBF binary format.
-b, --bbox=BBOX	Apply a bounding box filter in format `MINLON,MINLAT,MAXLON,MAXLAT` on the imported data. Example: --bbox -0.5,51.25,0.5,51.75

Use of the -b, --bbox option is not recommended; it is a crude way of limiting the amount of data loaded into the database and it will not always do what you expect it to do, especially at the boundaries of the bounding box. If you use it, choose a bounding box that is larger than your actual area of interest. A better option is to create an extract of the data before using osm2pgsql. See Appendix A for options.

Working with Multiple Input Files

Usually you are using osm2pgsql with a single input file.

Osm2pgsql can read multiple input files at once, merging the data from the input files ignoring any duplicate data. For this to work the input files must all have their data from the same point in time. You can use this to import two or more geographical extracts into the same database. If the extracts are from different points in time and contain different versions of the same object, this will fail!

Do not use multiple change files as input in append mode, merge and simplify them first.

The Middle

The middle keeps track of all OSM objects read by osm2pgsql and the relationships between those objects. It knows, for instance, which ways are used by which nodes, or which members a relation has. It also keeps track of all node locations. This information is necessary to build way geometries from way nodes and relation geometries from members and it is necessary when updating data, because OSM change files only contain changes objects themselves and not all the related objects needed for creating an objects geometry.

More details are in its own chapter.

The Outputs

Osm2pgsql imports OSM data into a PostgreSQL database. How it does this is governed by the output (sometimes called a backend). Several outputs for different use cases are available:

The flex Output: This is the most modern and most flexible output option. If you are starting a new project, use this output. Future improvements to osm2pgsql will only be available in this output.; Unlike all the other output options there is almost no limit to how the OSM data can be imported into the database. You can decide which OSM objects should be written to which columns in which database tables. And you can define any transformations to the data you need, for instance to unify a complex tagging schema into a simpler schema if that’s enough for your use case.; This output is described in detail in its own chapter.
The pgsql Output: The pgsql output is the original output. It is now deprecated but still widely used. Many tutorials you find on the Internet only describe this output. It is quite limited in how the data can be written to the database, but many setups still use it.; This output comes in two “flavours”: With the original “C transformation” and with the somewhat newer “Lua transformation” which allows some changes to the data before it is imported.; This output is described in detail in its own chapter.
The null Output: The null output doesn’t write the data anywhere. It is used for testing and benchmarking, not for normal operation. If --slim is used with the null output, the middle tables are still generated.

Here are the command line options pertaining to the outputs (see later chapters for command line options only used for specific outputs):

Option Description

-O, --output=OUTPUT Select the output. Available outputs are: flex, pgsql (default), and null. You don’t need to add this option in append mode, because osm2pgsql will remember it from the import.

-S, --style=STYLE The style file. This specifies how the data is imported into the database, its format depends on the output. (For the pgsql output, the default is /usr/share/osm2pgsql/default.style, for other outputs there is no default.) You don’t need to add this option in append mode, because osm2pgsql will remember it from the import. If you set it in append mode anyway, it will use the new setting for this and future updates.

Option	Description
-O, --output=OUTPUT	Select the output. Available outputs are: flex, pgsql (default), and null. You don’t need to add this option in append mode, because osm2pgsql will remember it from the import.
-S, --style=STYLE	The style file. This specifies how the data is imported into the database, its format depends on the output. (For the pgsql output, the default is `/usr/share/osm2pgsql/default.style`, for other outputs there is no default.) You don’t need to add this option in append mode, because osm2pgsql will remember it from the import. If you set it in append mode anyway, it will use the new setting for this and future updates.

Environment Variables

Osm2pgsql itself doesn’t interpret any environment variables, but several of the libraries it uses do.

The PostgreSQL database access library (libpq) understands many environment variables.
The libosmium library has some internal settings you can change.
If osm2pgsql was compiled with the PROJ library, you can use various environment variables to control the behaviour of the library.

You can access environment variables from Lua config scripts with os.getenv("VAR").

The Flex Output

✎

The flex output, as the name suggests, allows for a flexible configuration that tells osm2pgsql what OSM data to store in your database and exactly where and how. It is configured through a Lua file which

defines the structure of the output tables and
defines functions to map the OSM data to the database data format

Use the -S, --style=FILE option to specify the name of the Lua file along with the -O flex or --output=flex option to specify the use of the flex output.

The flex output is much more capable than the legacy pgsql, which will be removed at some point. Use the flex output for any new projects and switch old projects over.

The flex style file is a Lua script. You can use all the power of the Lua language. It gives you a lot of freedom to write complex preprocessing scripts that might even use external libraries to extend the capabilities of osm2pgsql. But it also means that the scripts may mess with any part of your system when written badly. Only run Lua scripts from trusted sources!

This description assumes that you are somewhat familiar with the Lua language, but it is pretty easy to pick up the basics and you can use the example config files in the flex-config directory which contain lots of comments to get you started.

All configuration is done through the osm2pgsql global object in Lua. It has the following fields and functions:

Field / Function	Description
osm2pgsql.version	The version of osm2pgsql as a string.
osm2pgsql.config_dir	The directory where your Lua config file is. Useful when you want to include more files from Lua.
osm2pgsql.mode	Either `"create"` or `"append"` depending on the command line options (`-c, --create` or `-a, --append`).
osm2pgsql.properties	A Lua table which gives read-only access to the contents of the properties.
osm2pgsql.stage	Either `1` or `2` (1st/2nd stage processing of the data). See below.
osm2pgsql.define_expire_output(OPTIONS)	Define an expire output table or file. See below.
osm2pgsql.define_node_table(NAME, COLUMNS[, OPTIONS])	Define a node table.
osm2pgsql.define_way_table(NAME, COLUMNS[, OPTIONS])	Define a way table.
osm2pgsql.define_relation_table(NAME, COLUMNS[, OPTIONS])	Define a relation table.
osm2pgsql.define_area_table(NAME, COLUMNS[, OPTIONS])	Define an area table.
osm2pgsql.define_table(OPTIONS)	Define a table. This is the more flexible function behind all the other `define_*_table()` functions. It gives you more control than the more convenient other functions.

Osm2pgsql also provides some additional functions in the Lua helper library described in Appendix B.

Defining a Table

Usually you want to define one or more database tables where your data should end up. This is done with the osm2pgsql.define_table() function or, more commonly, with one of the slightly more convenient functions osm2pgsql.define_(node|way|relation|area)_table(). In create mode, osm2pgsql will create those tables for you in the database.

Basic Table Definition

The simple way to define a table looks like this:

osm2pgsql.define_(node|way|relation|area)_table(NAME, COLUMNS[, OPTIONS])

Here NAME is the name of the table, COLUMNS is a list of Lua tables describing the columns as documented below. OPTIONS is a Lua table with options for the table as a whole. An example might be more helpful:

local restaurants = osm2pgsql.define_node_table('restaurants', {
    { column = 'name', type = 'text' },
    { column = 'tags', type = 'jsonb' },
    { column = 'geom', type = 'point' }
})

In this case we are creating a table that is intended to be filled with OSM nodes, the table called restaurants will be created in the database with four columns: A name column of type text (which will presumably later be filled with the name of the restaurant), a column called tags with type jsonb (which will presumable be filled later with all tags of a node), a column called geom which will contain the Point geometry of the node and an Id column called node_id.

Each table is either a node table, way table, relation table, or area table. This means that the data for that table comes primarily from a node, way, relation, or area, respectively. Osm2pgsql makes sure that the OSM object id will be stored in the table so that later updates to those OSM objects (or deletions) will be properly reflected in the tables. Area tables are special, they can contain data derived from ways and from (multipolygon) relations.

Advanced Table Definition

Sometimes the osm2pgsql.define_(node|way|relation|area)_table() functions are a bit too restrictive, for instance if you want more control over the type and naming of the Id column(s). In this case you can use the function osm2pgsql.define_table().

Here are the available OPTIONS for the osm2pgsql.define_table(OPTIONS) function. You can use the same options on the osm2pgsql.define_(node|way|relation|area)_table() functions, except the name and columns options.

Table Option	Description
name	The name of the table (without schema).
ids	A Lua table defining how this table should handle ids (see the Id Handling section for details). Note that you can define tables without Ids, but they can not be updated by osm2pgsql.
columns	An array of columns (see the Defining Columns section for details).
schema	Set the PostgreSQL schema to be used for this table. The schema must exist in the database and be writable by the database user. By default the schema set with `--schema` is used, or `public` if that is not set.
data_tablespace	The PostgreSQL tablespace used for the data in this table.
index_tablespace	The PostgreSQL tablespace used for all indexes of this table.
cluster	Set clustering strategy. Use `"auto"` (default) to enable clustering by geometry, osm2pgsql will choose the best method. Use `"no"` to disable clustering.
indexes	Define indexes to be created on this table. If not set, the default is to create a GIST index on the first (or only) geometry column.

All the osm2pgsql.define*table() functions return a database table Lua object. You can call the following functions on it:

Function	Description
:name()	The name of the table as specified in the define function.
:schema()	The schema of the table as specified in the define function.
:columns()	The columns of the table as specified in the define function.
:insert(PARAMS)	Add a row to the database table. See below for details.

Id Handling

Id columns allows osm2pgsql to track which database table entries have been generated by which objects. When objects change or are deleted, osm2pgsql uses the ids to update or remove all existing rows from the database with those ids.

If you want more control over the id column(s), use the osm2pgsql.define_table() function. You can then use the ids option to define how the id column(s) should look like. To generate the same “restaurants” table described above, the osm2pgsql.define_table() call will look like this:

local restaurants = osm2pgsql.define_table({
    name = 'restaurants',
    ids = { type = 'node', id_column = 'node_id' },
    columns = {
        { column = 'name', type = 'text' },
        { column = 'tags', type = 'jsonb' },
        { column = 'geom', type = 'point' }
}})

If you don’t have an ids field, the table is generated without Id. Tables without Id can not be updated, so you can fill them in create mode, but they will not work in append mode.

The following values are allowed for the type of the ids field:

Type	Description
node	A node Id will be stored ‘as is’ in the column named by `id_column`.
way	A way Id will be stored ‘as is’ in the column named by `id_column`.
relation	A relation Id will be stored ‘as is’ in the column named by `id_column`.
area	A way Id will be stored ‘as is’ in the column named by `id_column`, for relations the Id will be stored as negative number.
any	Any type of object can be stored. See below.
tile	Special case for generalized data stored with tile x and y coordinates. See below.

The any type is for tables that can store any type of OSM object (nodes, ways, or relations). There are two ways the id can be stored:

If you have a type_column setting in your ids field, it will store the type of the object in an additional char(1) column as N, W, or R.
If you don’t have a type_column, the id of nodes stays the same (so they are positive), way ids will be stored as negative numbers and relation ids are stored as negative numbers and 1e17 is subtracted from them. This results in distinct ranges for all ids so they can be kept apart. (This is the same transformation that the Imposm program uses.)

Osm2pgsql will only create these Id indexes if an updatable database is created, i.e. if osm2pgsql is run with --slim (but not --drop). You can set the optional field create_index in the ids setting to 'always' to force osm2pgsql to always create this index, even in non-updatable databases (the default is 'auto', only create the index if needed for updating).

Generalized data (see Generalization chapter) is sometimes stored in tables indexed by x, y tile coordinates. For such tables use the tile value for the ids field. Two columns called x and y with SQL type int will be created (it is not possible to change those column names or the type). In “create” mode, if the database is updatable or when the create_index option is set to always, an index will automatically be generated on those columns after the generalization step is run with osm2pgsql-gen.

Unique Ids

It is often desirable to have a unique PRIMARY KEY on database tables. Many programs need this.

There seems to be a natural unique key, the OSM node, way, or relation ID the data came from. But there is a problem with that: It depends on the Lua config file whether a key is unique or not. Nothing prevents you from inserting multiple rows into the database with the same id. It often makes sense in fact, for instance when splitting a multipolygon into its constituent polygons.

If you need unique keys on your database tables there are two options: Using those natural keys and making sure that you don’t have duplicate entries. Or adding an additional ID column. The latter is easier to do and will work in all cases, but it adds some overhead.

Using Natural Keys for Unique Ids

To use OSM IDs as primary keys, you have to make sure that you only ever add a single row per OSM object to an output table, i.e. do not call insert multiple times on the same table for the same OSM object.

Set the create_index option in the ids setting (see above) to 'unique' to get a UNIQUE index instead of the normal non-unique index for the ID column. Version >= 2.1.0 Set it to 'primary_key' to define a primary key for this table instead of just a unique index.

Using an Additional ID Column

PostgreSQL has the somewhat magic “serial” data type. If you use that datatype in a column definition, PostgreSQL will add an integer column to the table and automatically fill it with an autoincrementing value.

In the flex config you can add such a column to your tables with something like this:

...
{ column = 'id', sql_type = 'serial', create_only = true },
...

The create_only tells osm2pgsql that it should create this column but not try to fill it when adding rows (because PostgreSQL does it for us).

You probably also want an index on this column. See the chapter on Defining Indexes on how to create this index from osm2pgsql.

Since PostgreSQL 10 you can use the GENERATED ... AS IDENTITY clause instead of the SERIAL type which does something very similar, although using anything but a proper PostgreSQL type here is not officially supported.

Defining Columns

In the table definitions the columns are specified as a list of Lua tables with the following keys:

Key	Description
column	The name of the PostgreSQL column (required).
type	The type of the column (Optional, default `'text'`).
sql_type	The SQL type of the column (Optional, default depends on `type`, see next table).
not_null	Set to `true` to make this a `NOT NULL` column. (Optional, default `false`.)
create_only	Set to `true` to add the column to the `CREATE TABLE` command, but do not try to fill this column when adding data. This can be useful for `SERIAL` columns or when you want to fill in the column later yourself. (Optional, default `false`.)
projection	On geometry columns only. Set to the EPSG id or name of the projection. (Optional, default web mercator, `3857`.)
expire	On geometry columns only. Set expire output. See Defining and Using Expire Outputs (Optional.)

The type field describes the type of the column from the point of view of osm2pgsql, the sql_type describes the type of the column from the point of view of the PostgreSQL database. Usually they are either the same or the SQL type is derived directly from the type according to the following table. But it is possible to set them individually for special cases.

`type`	Default for `sql_type`	Notes
text	`text`	(default)
bool, boolean	`boolean`
int2, smallint	`int2`
int4, int, integer	`int4`
int8, bigint	`int8`
real	`real`
hstore	`hstore`	PostgreSQL extension `hstore` must be loaded
json	`json`
jsonb	`jsonb`
direction	`int2`
geometry	`geometry(GEOMETRY,SRID)`	(*)
point	`geometry(POINT,SRID)`	(*)
linestring	`geometry(LINESTRING,SRID)`	(*)
polygon	`geometry(POLYGON,SRID)`	(*)
multipoint	`geometry(MULTIPOINT,SRID)`	(*)
multilinestring	`geometry(MULTILINESTRING,SRID)`	(*)
multipolygon	`geometry(MULTIPOLYGON,SRID)`	(*)
geometrycollection	`geometry(GEOMETRYCOLLECTION,SRID)`	(*)

The SRID for the geometry SQL types comes from the projection parameter.

For special cases you usually want to keep the type field unset and only set the sql_type field. So if you want to have an array of integer column, for instance, set only the sql_type to int[]. The type field will default to text which means you have to create the text representation of your array in the form that PostgreSQL expects it.

For details on how the data is converted depending on the type, see the section on Type Conversions.

The content of the sql_type field is not checked by osm2pgsql, it is passed on to PostgreSQL as is. Do not set this to anything else but a valid PostgreSQL type. Correct use of sql_type is not easy. Only use sql_type if you have some familiarity with PostgreSQL types and how they are converted from/to text. If you get this wrong you’ll get error messages about “Ending COPY mode” that are hard to interpret. Considering using the json or jsonb type instead, for which osm2pgsql does the correct conversions.

Defining Geometry Columns

Most tables will have a geometry column. The types of the geometry column possible depend on the type of the input data. For node tables you are pretty much restricted to point geometries, but there is a variety of options for relation tables for instance.

You can have zero, one or more geometry columns. An index will only be built automatically for the first (or only) geometry column you define. The table will be clustered by the first (or only) geometry column (unless disabled).

The supported geometry types are:

Geometry type	Description
point	Point geometry, usually created from nodes.
linestring	Linestring geometry, usually created from ways.
polygon	Polygon geometry for area tables, created from ways or relations.
multipoint	Currently not used.
multilinestring	Created from (possibly split up) ways or relations.
multipolygon	For area tables, created from ways or relations.
geometrycollection	Geometry collection, created from relations.
geometry	Any kind of geometry. Also used for area tables that should hold both polygon and multipolygon geometries.

By default geometry columns will be created in web mercator (EPSG 3857). To change this, set the projection parameter of the column to the EPSG code you want (or one of the strings latlon(g), WGS84, or merc(ator), case is ignored).

Geometry columns can have expire configurations attached to them. See the section on Defining and Using Expire Outputs for details.

Defining Indexes

Osm2pgsql will always create indexes on the id column(s) of all tables if the database is updateable (i.e. in slim mode), because it needs those indexes to update the database. You can not control those indexes with the settings described in this section.

To define indexes, set the indexes field of the table definition to an array of Lua tables. If the array is empty, no indexes are created for this table (except possibly an index on the id column(s)). If there is no indexes field (or if it is set to nil) a GIST index will be created on the first (or only) geometry column of this table.

The following fields can be set in an index definition. You have to set at least the method and either column or expression.

Key	Description
column	The name of the column the index should be created on. Can also be an array of names. Required, unless `expression` is set.
name	Optional name of this index. (Default: Let PostgreSQL choose the name.)
expression	A valid SQL expression used for indexes on expressions. Can not be used together with `column`.
include	A column name or list of column names to include in the index as non-key columns. (Only available from PostgreSQL 11.)
method	The index method (‘btree’, ‘gist’, …). See the PostgreSQL docs for available types (required).
tablespace	The tablespace where the index should be created. Default is the tablespace set with `index_tablespace` in the table definition.
unique	Set this to `true` or `false` (default). Note that you have to make sure yourself never to add non-unique data to this column.
where	A condition for a partial index. This has to be set to a text that makes valid SQL if added after a `WHERE` in the `CREATE INDEX` command.

If you need an index that can not be expressed with these definitions, you have to create it yourself using the SQL CREATE INDEX command after osm2pgsql has finished its job.

Defining and Using Expire Outputs

When osm2pgsql is working in ‘append’ mode, i.e. when it is updating an existing database from OSM change files, it can figure out which changes will potentially affect which Web Mercator tiles, so that you can re-render those tiles later. See the Expire chapter for some general information about expiry.

The list of tile coordinates can be written to a file and/or to a database table. Use the osm2pgsql.define_expire_output() Lua function to define an expire output. The function has a single paramater, a Lua table with the following fields:

Field	Description
maxzoom	The maximum zoom level for which tile coordinates are written out. Default: 0.
minzoom	The minimum zoom level for which tile coordinates are written out. Optional. Default is the same as maxzoom.
filename	The filename of the output file. Optional.
schema	The database schema for the output table. The schema must exist in the database and be writable by the database user. Optional. By default the schema set with `--schema` is used, or `public` if that is not set.
table	The database table for the output. Optional.

You have to supply the filename and/or the table (possibly with schema). You can provide either or both. The database table will be created for you if it isn’t available already.

Defined expire outputs can then be used in table definitions. Geometry columns using Web Mercator projection (EPSG 3857) can have an expire field which specifies which expire outputs should be triggered by changes affecting this geometry.

Field	Description
output	The expire output defined with `osm2pgsql.define_expire_output()`.
mode	How polygons are converted to tiles. Can be `full-area` (default), `boundary-only`, or `hybrid`.
full_area_limit	In `hybrid` mode, set the maximum area size for which a full-area expiry is done. Above this `boundary-only` is used.
buffer	The size of the buffer around geometries to be expired as a fraction of the tile size.

For an example showing how the expire output works, see the flex-config/expire.lua example config file.

Processing Callbacks

You are expected to define one or more of the following functions:

Callback function	Description
osm2pgsql.process_node(object)	Called for each new or changed tagged node.
osm2pgsql.process_way(object)	Called for each new or changed tagged way.
osm2pgsql.process_relation(object)	Called for each new or changed tagged relation.
osm2pgsql.process_untagged_node(object)	Called for each new or changed untagged node.
osm2pgsql.process_untagged_way(object)	Called for each new or changed untagged way.
osm2pgsql.process_untagged_relation(object)	Called for each new or changed untagged relation.

They all have a single argument of type table (here called object) and no return value. If you are not interested in all object types, you do not have to supply all the functions.

Usually you are only interested in tagged objects, i.e. OSM objects that have at least one tag, so you will only define one or more of the first three functions. But if you are interested in untagged objects also, define the last three functions. If you want to have the same behaviour for untagged and tagged objects, you can define the functions to be the same.

These functions are called for each new or modified OSM object in the input file. No function is called for deleted objects, osm2pgsql will automatically delete all data in your database tables that were derived from deleted objects. Modifications are handled as deletions followed by creation of a “new” object, for which the functions are called.

You can do anything in those processing functions to decide what to do with this data. If you are not interested in that OSM object, simply return from the function. If you want to add the OSM object to some table call the insert() function on that table.

The parameter table (object) has the following fields and functions:

Field / Function	Description
.id	The id of the node, way, or relation.
.type	The object type as string (`node`, `way`, or `relation`).
.tags	A table with all the tags of the object.
.version	Version of the OSM object. (*)
.timestamp	Timestamp of the OSM object, time in seconds since the epoch (midnight 1970-01-01). (*)
.changeset	Changeset containing this version of the OSM object. (*)
.uid	User id of the user that created or last changed this OSM object. (*)
.user	User name of the user that created or last changed this OSM object. (*)
.is_closed	Ways only: A boolean telling you whether the way geometry is closed, i.e. the first and last node are the same.
.nodes	Ways only: An array with the way node ids.
.members	Relations only: An array with member tables. Each member table has the fields `type` (values `n`, `w`, or `r`), `ref` (member id) and `role`.
:grab_tag(KEY)	Return the tag value of the specified key and remove the tag from the list of tags. (Example: `local name = object:grab_tag('name')`) This is often used when you want to store some tags in special columns and the rest of the tags in an jsonb or hstore column.
:get_bbox()	Get the bounding box of the current node, way, or relation. This function returns four result values: the lon/lat values for the bottom left corner of the bounding box, followed by the lon/lat values of the top right corner. Both lon/lat values are identical in case of nodes. Example: `lon, lat, dummy, dummy = object:get_bbox()`. Relation members (nested relations) are not taken into account.
:as_point()	Create point geometry from OSM node object.
:as_linestring()	Create linestring geometry from OSM way object.
:as_polygon()	Create polygon geometry from OSM way object.
:as_multipoint()	Create (multi)point geometry from OSM node/relation object.
:as_multilinestring()	Create (multi)linestring geometry from OSM way/relation object.
:as_multipolygon()	Create (multi)polygon geometry from OSM way/relation object.
:as_geometrycollection()	Create geometry collection from OSM relation object.

These are only available if the OSM input file actually contains those fields. When handling updates they are only included if the middle table contain this data (i.e. when -x|--extra-attributes option was used).

The as_* functions will return a NULL geometry if the geometry can not be created for some reason, for instance a polygon can only be created from closed ways. This can also happen if your input data is incomplete, for instance when nodes referenced from a way are missing. You can check the geometry object for is_null(), for example object:as_multipolygon():is_null().

The as_linestring() and as_polygon() functions can only be used on ways.

The as_multipoint() function can be used on nodes and relations. For nodes it will always return a point geometry, for relations a point or multipoint geometry with all available node members.

The as_multilinestring() and as_multipolygon() functions, on the other hand, can be used for ways and for relations. The latter will either return a linestring/polygon or a multilinestring/multipolygon, depending on whether the result is a single geometry or a multi-geometry.

If you need all geometries of a relation, you can use as_geometrycollection(). It will contain all geometries which can be generated from node and way members of the relation in order of those members. Members of type “relation” are ignored. Node members will result in a point geometry, way members will result in a linestring geometry. Geometries that can’t be built are missing. (You can detect this by counting the number of geometries with num_geometries() and comparing that to the number of members of type node and relation.) If no valid geometry can be created, so if the geometry collection would be empty, a null geometry is returned instead.

The `after` Callback Functions

There are three additional processing functions that are sometimes useful:

Callback function	Description
osm2pgsql.after_nodes()	Called after all nodes are processed.
osm2pgsql.after_ways()	Called after all ways are processed.
osm2pgsql.after_relations()	Called after all relations are processed.

OSM data files contain objects in the order nodes, then ways, then relations. This means you will first get all the process_(untagged_)node calls, then the after_nodes call, then all the process_(untagged_)way calls, then the after_ways call, then all the process_(untagged_)relation calls, and lastly the after_relationscall.

All after_* callbacks are called at the appropriate place even if there are no objects of the corresponding type in the input.

The `insert()` Function

Use the insert() function to add data to a previously defined table:

-- definition of the table:
local table_pois = osm2pgsql.define_node_table('pois', {
    { column = 'tags', type = 'jsonb' },
    { column = 'name', type = 'text' },
    { column = 'geom', type = 'point' },
})
...
function osm2pgsql.process_node(object)
...
    table_pois:insert({
        tags = object.tags,
        name = object.tags.name,
        geom = object:as_point()
    })
...
end

The insert() function takes a single table parameter, that describes what to fill into all the database columns. Any column not mentioned will be set to NULL. It returns true if the insert was successful.

Note that you can’t set the object id, this will be handled for you behind the scenes.

Handling of NULL in `insert()` Function

Any column not set, or set to nil (which is the same thing in Lua), or set to the null geometry, will be set to NULL in the database. If the column is defined with not_null = true in the table definition, the row will not be inserted. Usually that is just what you want, bad data is silently ignored that way.

If you want to check whether the insert actually happened, you can look at the return values of the insert() command. The insert() function actually returns up to four values:

local inserted, message, column, object = table:insert(...)

Value	Description
inserted	`true` if the row was inserted, `false` otherwise
message	A message telling you the reason why the insertion failed
column	The name of the column that triggered the failure
object	The OSM object we are currently processing. Useful for, say, logging the id

The last three are only set if the first is false.

Stages

When processing OSM data, osm2pgsql reads the input file(s) in order, nodes first, then ways, then relations. This means that when the nodes or ways are read and processed, osm2pgsql can’t know yet whether a node or way is in a relation (or in several). But for some use cases we need to know in which relations a node or way is and what the tags of these relations are or the roles of those member ways. The typical case are relations of type route (bus routes etc.) where we might want to render the name or ref from the route relation onto the way geometry.

The osm2pgsql flex output supports this use case by adding an additional “reprocessing” step. Osm2pgsql will call the Lua function osm2pgsql.select_relation_members() for each added, modified, or deleted relation. Your job is to figure out which node or way members in that relation might need the information from the relation to be rendered correctly and return those ids in a Lua table with the ‘nodes’ and/or ‘ways’ fields. This is usually done with a function like this:

function osm2pgsql.select_relation_members(relation)
    if relation.tags.type == 'route' then
        return {
            nodes = {},
            ways = osm2pgsql.way_member_ids(relation)
        }
    end
end

Instead of using the helper function osm2pgsql.way_member_ids() which returns the ids of all way members (or the analog osm2pgsql.node_member_ids() function), you can write your own code, for instance if you want to check the roles.

Note that select_relation_members() is called for deleted relations and for the old version of a modified relation as well as for new relations and the new version of a modified relation. This is needed, for instance, to correctly mark member ways of deleted relations, because they need to be updated, too. The decision whether a node/way is to be marked or not can only be based on the tags of the relation and/or the roles of the members. If you take other information into account, updates might not work correctly.

In addition you have to store whatever information you need about the relation in your process_relation() function in a global variable.

Make sure you use select_relation_members() only for deciding which nodes/ways to reprocess, do not store information about the relations from that function, it will not work with updates. Use the process_relation() function instead.

After all relations are processed, osm2pgsql will reprocess all marked nodes/ways by calling the process_node() and process_way() functions, respectively, for them again. This time around you have the information from the relation in the global variable and can use it.

If you don’t mark any nodes or ways, nothing will be done in this reprocessing stage.

(It is currently not possible to mark relations. This might or might not be added in future versions of osm2pgsql.)

You can look at osm2pgsql.stage to see in which stage you are.

You want to do all the processing you can in stage 1, because it is faster and there is less memory overhead. For most use cases, stage 1 is enough.

Processing in two stages can add quite a bit of overhead. Because this feature is new, there isn’t much operational experience with it. So be a bit careful when you are experimenting and watch memory and disk space consumption and any extra time you are using. Keep in mind that:

All data stored in stage 1 for use in stage 2 in your Lua script will use main memory.
Keeping track of way ids marked in stage 1 needs some memory.
To do the extra processing in stage 2, time is needed to get objects out of the object store and reprocess them.
Osm2pgsql will create an id index on all way tables to look up ways that need to be deleted and re-created in stage 2.

Type Conversions

The insert() function will try its best to convert Lua values into corresponding PostgreSQL values. But not all conversions make sense. Here are the detailed rules:

Lua values of type function, userdata, or thread will always result in an error.
The Lua type nil is always converted to NULL.
If the result of a conversion is NULL and the column is defined as NOT NULL the data is not inserted, see above for details.
The Lua type table is converted to the PostgreSQL type hstore if and only if all keys and values in the table are string values.
For boolean columns: The number 0 is converted to false, all other numbers are true. Strings are converted as follows: "yes", "true", "1" are true; "no", "false", "0" are false, all others are NULL.
For integer columns (int2, int4, int8): Boolean true is converted to 1, false to 0. Numbers that are not integers or outside the range of the type result in NULL. Strings are converted to integers if possible otherwise the result is NULL.
For real columns: Booleans result in an error, all numbers are used as is, strings are converted to a number, if that is not possible the result is NULL.
For direction columns (stored as int2 in the database): Boolean true is converted to 1, false to 0. The number 0 results in 0, all positive numbers in 1, all negative numbers in -1. Strings "yes" and "1" will result in 1, "no" and "0" in 0, "-1" in -1. All other strings will result in NULL.
For json and jsonb columns string, number, and boolean values are converted to their JSON equivalent as you would expect. (The special floating point numbers NaN and Inf can not be represented in JSON and are converted to null). An empty table is converted to an (empty) JSON object, tables that only have consecutive integer keys starting from 1 are converted into JSON arrays. All other tables are converted into JSON objects. Mixed key types are not allowed. Osm2pgsql will detect loops in tables and return an error.
For text columns and any other not specially recognized column types, booleans result in an error and numbers are converted to strings.
Geometry objects are converted to their PostGIS counterparts. Null geometries are converted to database NULL. Geometries in WGS84 will automatically be transformed into the target column SRS if needed. Non-multi geometries will automatically be transformed into multi-geometries if the target column has a multi-geometry type.

If you want any other conversions, you have to do them yourself in your Lua code. Osm2pgsql provides some helper functions for other conversions, see the Lua helper library (Appendix B).

Geometry Objects in Lua

Lua geometry objects are created by calls such as object:as_point() or object:as_polygon() inside processing functions. It is not possible to create geometry objects from scratch, you always need an OSM object.

You can write geometry objects directly into geometry columns in the database using the table insert() function. But you can also transform geometries in multiple ways.

Geometry objects have the following functions. They are modelled after the PostGIS functions with equivalent names.

Function	Description
:area()	Returns the area of the geometry calculated on the projected coordinates. The area is calculated using the SRS of the geometry, the result is in map units. For any geometry type but (MULTI)POLYGON the result is always `0.0`. (See also `:spherical_area()`.)
:centroid()	Return the centroid (center of mass) of a geometry. Implemented for all geometry types.
:get_bbox()	Get the bounding box of this geometry. This function returns four result values: the lon/lat values for the bottom left corner of the bounding box, followed by the lon/lat values of the top right corner. Both lon/lat values are identical in case of points. Example: `lon, lat, dummy, dummy = object:as_polygon():centroid():get_bbox()`. If possible use the `get_bbox()` function on the OSM object instead, it is more efficient.
:geometries()	Returns an iterator for iterating over member geometries of a multi-geometry. See below for detail.
:geometry_n()	Returns the nth geometry (1-based) of a multi-geometry.
:geometry_type()	Returns the type of geometry as a string: `NULL`, `POINT`, `LINESTRING`, `POLYGON`, `MULTIPOINT`, `MULTILINESTRING`, `MULTIPOLYGON`, or `GEOMETRYCOLLECTION`.
:is_null()	Returns `true` if the geometry is a NULL geometry, `false` otherwise.
:length()	Returns the length of the geometry. For any geometry type but (MULTI)LINESTRING this is always `0.0`. The length is calculated using the SRS of the geometry, the result is in map units.
:line_merge()	Merge lines in a (MULTI)LINESTRING as much as possible into longer lines.
:num_geometries()	Returns the number of geometries in a multi-geometry. Always 0 for NULL geometries and always 1 for non-multi geometries.
:pole_of_inaccessibility(opts)	Calculate “pole of inaccessibility” of a polygon, a point farthest away from the polygon boundary, sometimes called the center of the maximum inscribed circle. Note that for performance reasons this is an approximation. It is intended as a reasonably good labelling point. One optional parameter opts, which must be a Lua table with options. The only option currently defined is `stretch`. If this is set to a value larger than 1 an ellipse instead of a circle is inscribed. This might be useful for labels which usually use more space horizontally. Use a value between 0 and 1 for a vertical ellipse.
:segmentize(max_segment_length)	Segmentize a (MULTI)LINESTRING, so that no segment is longer than `max_segment_length`. Result is a (MULTI)LINESTRING.
:simplify(tolerance)	Simplify (MULTI)LINESTRING geometries with the Douglas-Peucker algorithm. (Currently not implemented for other geometry types.)
:spherical_area()	Returns the area of the geometry calculated on the spheroid. The geometry must be in WGS 84 (4326). For any geometry type but (MULTI)POLYGON the result is always `0.0`. The result is in m². (See also `:area()`.)
:srid()	Return SRID of the geometry.
:transform(target_srid)	Transform the geometry to the target SRS.

The Lua length operator (#) returns the number of geometries in the geometry object, it is synonymous to calling num_geometries().

Converting a geometry to a string (tostring(geom)) returns the geometry type (as with the geometry_type() function).

All geometry functions that return geometries will return a NULL geometry on error. All geometry functions handle NULL geometry on input in some way. So you can always chain geometry functions and if there is any problem on the way, the result will be a NULL geometry. Here is an example:

local area = object:as_polygon():transform(3857):area()
-- area will be 0.0 if not a polygon or transformation failed

To iterate over the members of a multi-geometry use the geometries() function:

local geom = object:as_multipolygon()
for g in geom:geometries() do
    landuse.insert({
        geom = g,
        ...
    })
end

In Lua you can not get at the actual contents of the geometries, i.e. the coordinates and such. This is intentional. Writing functions in Lua that do something with the coordinates will be much slower than writing those functions in C++, so Lua scripts should concern themselves only with the high-level control flow, not the details of the geometry. If you think you need some function to access the internals of a geometry, start a discussion on Github.

Locators

Version >= 2.2.0 This feature is only available from version 2.2.0.

When processing OSM data it is often useful to know in which area some OSM object is. Maybe you want to draw specific highway shields in each state. Or you want to transliterate labels in different ways depending on the country the label is in. Or draw features differently depending on whether they are inside a national park or outside. Or set different default values for speed limits based on the country a highway is in. To do this you can use a locator.

A locator is initialized with one or more regions, each region has a name and a polygon or bounding box. A geometry of an OSM object can then be checked against this list of regions to figure out in which region(s) it is located.

Locators are created in Lua with define_locator():

local countries = osm2pgsql.define_locator({ name = 'countries' })

The name of the locator is used for informational and error messages only. You can have as many locators as you want, they work independently of each other.

Bounding boxes can be added to a locator with add_bbox(). Use the longitude and latitude of the lower left and upper right corners of the box as parameters:

local extract = osm2pgsql.define_locator({ name = 'extract' })
extract:add_bbox('inside', -32.35, 62.22, -7.35, 68.16)

Polygons can be added from the database by calling add_from_db() and specifying a SQL query which can return any number of rows, each defining a region with the name and the (multi)polygon as columns:

local extract = osm2pgsql.define_locator({ name = 'extract' })
extract:add_from_db("SELECT country_code, geom FROM countries")

The name can be any string (or anything the database can convert to a string when reading the data), for instance the name of a country or an ISO country code. You can have multiple regions with the same name, but you can not differentiate between them later. The geometry must be in WGS84 (EPSG:4326).

You can get region data from OSM or any other source, you’ll just have to import it into the database before running osm2pgsql. If you want to use OSM data, you can run osm2pgsql twice, once with a config file importing only the polygons and then with your normal config which reads the just imported data into a locator.

A locator can then be queried in the callbacks using all_intersecting() which returns a list of names of all regions that intersect the specified OSM object geometry. The results are not in any specific order.

local country_codes = extract:all_intersecting(object:as_point())

If the result is an empty array (Lua table), the object isn’t in any of the regions. (You can check for that with if #country_codes == 0 then ...).

Or you can use the function first_intersecting() which only returns a single region for those cases where there can be no overlapping data or where the details of objects straddling region boundaries don’t matter. It is faster than all_intersecting() because the search returns after the first region is found.

local cc = extract:first_intersecting(object:as_linestring())

The result is nil if the geometry isn’t in any of the regions.

Several example config files are provided in the flex-config/locator directory of the source code repository.

Locator Performance

Working with a Locator while the data is being loaded into the database is much faster than to do the same operations inside the database after the import.

You can have hundreds of thousands of regions in a locator, osm2pgsql handles that fine. Checking each OSM object against a list of all OSM country boundaries works just fine, even when importing a full planet.

Because the region polygons are stored in an R-tree internally, the biggest performance problem comes from very large polygons. It is much better to separate the polygons inside a multipolygon (for instance using ST_Dump) and to split polygons into smaller chunks (for instance using the ST_SubDivide function):

local extract = osm2pgsql.define_locator({ name = 'extract' })
extract:add_from_db([[
WITH polys AS (
    SELECT cc, (ST_Dump(geom)).geom AS geom FROM countries
)
SELECT cc, ST_Subdivide(geom, 200) FROM polys")
]])

It can take a while to compute the subdivision. So if you run osm2pgsql often, compute the subdivision once and store the result in an extra table and just load that from osm2pgsql.

The Pgsql Output

✎

The pgsql output is the original output osm2pgsql started with. It was designed for rendering OpenStreetMap data, principally with Mapnik. It is still widely used although it is much more limited than the more modern “flex” output in how the data can be represented in the database.

If you are starting a new project with osm2pgsql, we recommend you use the flex output instead. The pgsql output is marked as deprecated now and does not get any of the new features that the flex output has or will get in the future. The pgsql output will be removed at some point, so you should think about migrating your existing projects.

Database Layout

The pgsql output always creates a fixed list of four database tables shown below. The PREFIX can be set with the -p, --prefix option, the default is planet_osm. (Note that this option is also interpreted by the middle!)

Table	Description
PREFIX_point	Point geometries created from nodes.
PREFIX_line	Line geometries created from ways and relations tagged `type=route`.
PREFIX_roads	Contains some of the same data as the `line` table, but with attributes selected for low-zoom rendering. It does not only contain roads!
PREFIX_polygon	Polygon geometries created from closed ways and relations tagged `type=multipolygon` or `type=boundary`.

All tables contain a geometry column named way (which is not the most intuitive name, but it can’t be changed now because of backwards compatibility).

Style File

Some aspects of how the pgsql output converts OSM data into PostgreSQL tables can be configured via a style file. The default style file that is usually installed with osm2pgsql is suitable for rendering many styles. It contains some documentation of the style syntax and works well as an example or starting point for your own changes. With a custom style file, you can control how different object types and tags map to different columns and data types in the database.

The style file is a plain text file containing four columns separated by spaces. As each OSM object is processed it is checked against conditions specified in the first two columns. If they match, processing options from the third and fourth columns are applied.

Column	Description
1. OSM object type	Can be `node`, `way` or both separated by a comma. `way` will also apply to relations with `type=multipolygon`, `type=boundary`, or `type=route`; all other relations are ignored.
2. Tag	The tag to match on. If the fourth column is `linear` or `polygon`, a column for this tag will be created in each of the point, line, polygon, and road tables.
3. PostgreSQL data type	Specifies how data will be stored in the tag’s PostgreSQL table column. Possible values are `text`, `int4`, or `real`. If the fourth column is `delete` or `phstore`, this column has no meaning and should just be set to `text`.
4. Flags	Zero or more flags separated by commas. For possible values see below.

Possible values for the flags are:

Flag	Description
linear	Specifies that ways with this tag should be imported as lines by default, even if they are closed. Other conditions can still override this.
polygon	Specifies that closed ways with this tag should be imported as polygons by default. Other conditions can still override this.
nocolumn	The two flags above automatically create a column for the specified tag. This flag overrides this behaviour such that no column is created. This is especially useful for hstore, where all key value data is stored in the hstore column such that no dedicated columns are needed.
phstore	The same as `polygon,nocolumn` for backward compatibility
delete	Prevents the specified tag (but not the entire object) from being stored in the database. Useful for tags that tend to have long values but will not be used for rendering, such as `source=*`. This flag only affects `--slim` mode imports.
nocache	This flag is deprecated and does nothing.

Closed ways with tag area=yes and relations with type=multipolygon or type=boundary will be imported as polygons even if no polygon flag is set. Non-closed ways and closed ways with area=no will always be imported as lines.

Nodes are always placed in the “point” table, never in the “line” or “polygon” table.

Usually only objects which result in at least one of the columns declared in the style file being not NULL will be added to the database. But see below in the Use of Hstore section.

The style file may also contain comments. Any text between a # and the end of the line will be ignored.

Special ‘Tags’

There are several special values that can be used in the tag column (second column) of the style file for creating additional fields in the database which contain things other than tag values.

Special tag	Description
way_area	Creates a database column that stores the area (calculated in the units of the projection, normally Mercator meters) for any objects imported as polygons. Use with `real` as the data type. See also the `--reproject-area` option.
z_order	Adds a column that is used for ordering objects when rendering. It mostly applies to objects with `highway=` or `railway=`. Use with `int4` as the data type.
osm_user	Adds a column that stores the username of the last user to edit an object in the database (*).
osm_uid	Adds a column that stores the user ID number of the last user to edit an object in the database (*).
osm_version	Adds a column that stores the version of an object in the database (ie, how many times the object has been modified) (*).
osm_timestamp	Adds a column that stores the date and time that the most recent version of an object was added to OpenStreetMap (*).

To use these, you must use the command line option --extra-attributes.

If importing with both --hstore and --extra-attributes the meta-data will end up in the tags hstore column regardless of the style file.

Schemas and Tablespaces

Usually all tables, indexes and functions that the pgsql output creates are in the default schema and use the default tablespace.

You can use the command line option --output-pgsql-schema=SCHEMA to tell osm2pgsql that it should use the specified schema to create the tables, indexes, and functions in. Note that you have to create the schema before running osm2pgsql and make sure the database user was granted the rights to create tables, indexes, and functions in this schema. Read more about schemas in the PostgreSQL documentation. By default the public schema is used.

Sometimes you want to create special PostgreSQL tablespaces and put some tables or indexes into them. Having often used indexes on fast SSD drives, for instance, can speed up processing a lot. There are three command line options that the pgsql output interprets: To set the tablespace for output tables and indexes, use the --tablespace-main-data=TABLESPC and --tablespace-main-index=TABLESPC options, respectively. You can also use the -i, --tablespace-index=TABLESPC option which will set the index tablespace for the pgsql output as well as for the middle! Note that it is your job to create the tablespaces before calling osm2pgsql and making sure the disk behind it is large enough.

Coastline Processing

The natural=coastline tag is suppressed by default, even if you import the natural=* key. Many maps get the coastlines from a different source, so it does not need to import them from the input file. You can use the --keep-coastlines parameter to change this behavior if you want coastlines in your database. See the OSM wiki for more information on coastline processing.

Use of Hstore

Hstore is a PostgreSQL data type that allows storing arbitrary key-value pairs in a single column. It needs to be installed on the database with CREATE EXTENSION hstore;

Hstore is used to give more flexibility in using additional tags without reimporting the database, at the cost of less speed and more space.

By default, the pgsql output will not generate hstore columns. The following options are used to add hstore columns of one type or another:

--hstore or -k adds any tags not already in a conventional column to a hstore column called tags. With the default stylesheet this would result in tags like highway appearing in a conventional column while tags not in the style like name:en or lanes:forward would appear only in the hstore column.
--hstore-all or -j adds all tags to a hstore column called tags, even if they’re already stored in a conventional column. With the standard stylesheet this would result in tags like highway appearing in conventional column and the hstore column while tags not in the style like name:en or lanes:forward would appear only in the hstore column.
--hstore-column or -z, which adds an additional column for tags starting with a specified string, e.g. --hstore-column 'name:' produces a hstore column that contains all name:xx tags. This option can be used multiple times.

You can not use both --hstore and --hstore-all together.

The following options can be used to modify the behaviour of the hstore columns:

--hstore-match-only modifies the above options and prevents objects from being added if they only have tags in the hstore column and no tags in the non-hstore columns. If neither of the above options is specified, this option is ignored.
--hstore-add-index adds indexes to all hstore columns. This option is ignored if there are no hstore columns. Using indexes can speed up arbitrary queries, but for most purposes partial indexes will be faster. You have to create those yourself.

Either --hstore or --hstore-all when combined with --hstore-match-only should give the same rows as no hstore, just with the additional hstore column.

Note that when you are using the --extra-attributes option, all your nodes, ways, and relations essentially get a few extra tags. Together with the hstore options above, the object attributes might end up in your hstore column(s) possibly using quite a lot of space. This is especially true for the majority of nodes that have no tags at all, which means they would normally not appear in your output tables. You might want to use --hstore-match-only in that case.

Lua Tag Transformations

The pgsql output supports Lua scripts to rewrite tags before they enter the database. Use the command line option --tag-transform-script=SCRIPT to enable this.

There is inevitably a performance hit with any extra processing. The Lua tag transformation is a little slower than the C++-based default, but this will probably not matter much in practice. But Lua pre-processing may save you further processing later. The Lua transformations allow you, for instance, to unify disparate tagging (for example, highway=path; foot=yes and highway=footway) and perform complex queries, potentially more efficiently than writing them as rules in your stylesheet which are executed at rendering time.

Note that this is a totally different mechanism than the Lua scripts used in the flex output.

The Lua script needs to implement the following functions:

function filter_tags_node(tags, num_tags)
return filter, tags

function filter_tags_way(tags, num_tags)
return filter, tags, polygon, roads

function filter_basic_tags_rel(tags, num_tags)
return filter, tags

These take a set of tags as a Lua key-value table, and an integer which is the number of tags supplied.

The first return value is filter, a flag which you should set to 1 if the way/node/relation should be filtered out and not added to the database, 0 otherwise. (They will still end up in the slim mode tables, but not in the rendering tables.)

The second return value is tags, a transformed (or unchanged) set of tags, which will be written to the database.

filter_tags_way returns two additional flags:

poly should be 1 if the way should be treated as a polygon, or 0 if it is to be treated as a line.
roads should be 1 if the way should be added to the roads table, 0 otherwise.

function filter_tags_relation_member(tags, member_tags, roles, num_members)
return filter, tags, member_superseded, boundary, polygon, roads

The function filter_tags_relation_member is more complex and can handle more advanced relation tagging, such as multipolygons that take their tags from the member ways.

This function is called with the tags from the relation; a set of tags for each of the member ways (member relations and nodes are ignored); the set of roles for each of the member ways; and the number of members. The tag and role sets are both arrays (indexed tables).

As with the other functions, it should return a filter flag, and a transformed set of tags to be applied to the relation in later processing. The third return value, member_superseded, is obsolete and will be ignored. The fourth and fifth return values, boundary and polygon, are flags that specify if the relation should be processed as a line, a polygon, or both (e.g. administrative boundaries). Set the final return value, roads, to 1 if the geometry should be added to the roads table.

There is a sample tag transform Lua script in the repository as an example, which (nearly) replicates the built-in transformations and can be used as a template for one’s own scripts.

Test your Lua script with small excerpts before applying it to a whole country or even the planet. Be aware that the Lua tagtransform allows to run arbitrary code on your system. Only run scripts from trusted sources!

Pgsql Output Command Line Options

These are all the command line options interpreted by the pgsql output:

Option	Description
--tablespace-main-data=TABLESPC	Store the data tables in the PostgreSQL tablespace `TABLESPC`.
--tablespace-main-index=TABLESPC	Store the indexes in the PostgreSQL tablespace `TABLESPC`.
-l, --latlong	Store coordinates in degrees of latitude & longitude.
-m, --merc	Store coordinates in Spherical Mercator (Web Mercator, EPSG:3857) (the default).
-E, --proj=SRID	Use projection EPSG:`SRID`.
-p, --prefix=PREFIX	Prefix for table names (default: `planet_osm`). This option affects the middle as well as the pgsql output table names.
--tag-transform-script=SCRIPT	Specify a Lua script to handle tag filtering and normalisation. The script contains callback functions for nodes, ways and relations, which each take a set of tags and returns a transformed, filtered set of tags which are then written to the database.
-x, --extra-attributes	Include attributes (user name, user id, changeset id, timestamp and version). This also requires additional entries in your style file.
-k, --hstore	Add tags without column to an additional hstore (key/value) column in the database tables.
-j, --hstore-all	Add all tags to an additional hstore (key/value) column in the database tables.
-z, --hstore-column=PREFIX	Add an additional hstore (key/value) column named `PREFIX` containing all tags that have a key starting with `PREFIX`, eg `\--hstore-column "name:"` will produce an extra hstore column that contains all `name:xx` tags.
--hstore-match-only	Only keep objects that have a value in at least one of the non-hstore columns.
--hstore-add-index	Create indexes for all hstore columns after import.
-G, --multi-geometry	Normally osm2pgsql splits multi-part geometries into separate database rows per part. A single OSM object can therefore use several rows in the output tables. With this option, osm2pgsql instead generates multi-geometry features in the PostgreSQL tables.
-K, --keep-coastlines	Keep coastline data rather than filtering it out. By default objects tagged `natural=coastline` will be discarded based on the assumption that Shapefiles generated by OSMCoastline (https://osmdata.openstreetmap.de/) will be used for the coastline data.
--reproject-area	Compute area column using Spherical Mercator coordinates even if a different projection is used for the geometries.
--output-pgsql-schema=SCHEMA	Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the pgsql output. The schema must exist in the database and be writable by the database user. By default the schema set with `--schema` is used, or `public` if that is not set.

Middle

✎

The middle keeps track of all OSM objects read by osm2pgsql and the relationships between those objects. It knows, for instance, which nodes are used by which ways, or which members a relation has. It also keeps track of all node locations. This information is necessary to build way geometries from way nodes and relation geometries from members and it is necessary when updating data, because OSM change files only contain changed objects themselves and not all the related objects needed for creating an object’s geometry.

The Properties Table

Osm2pgsql stores some properties into a database table. This table is always called osm2pgsql_properties. It will be created in the schema set with the --middle-schema option, or the public schema by default.

The properties table tracks some settings derived from the command line options and from the input file(s). It allows osm2pgsql to make sure that updates are run with compatible options.

The osm2pgsql_properties table contains two columns: The property column contains the name of a property and the value column its value. Properties are always stored as strings, for boolean properties the strings true and false are used.

The following properties are currently defined:

Property	Type	Description
`attributes`	bool	Import with OSM attributes (i.e. osm2pgsql was run with `-x` or `--extra-attributes`)?
`current_timestamp`	string	Largest timestamp of any object in any of the input file(s) in ISO format (`YYYY-mm-ddTHH:MM:SSZ`). Updated with each data update.
`db_format`	int	0 = not updatable, 1 = legacy format (not supported any more), 2 = current format.
`flat_node_file`	string	Absolute filename of the flat node file (specified with `-F` or `--flat-nodes`). See below for some details.
`import_timestamp`	string	Largest timestamp of any object in the in the input file(s) in ISO format (`YYYY-mm-ddTHH:MM:SSZ`). Only set for initial import.
`output`	string	The output as set with the `-O` or `--output` option.
`prefix`	string	Table name prefix set with `-p` or `--prefix`.
`replication_base_url`	string	For replication (see below).
`replication_sequence_number`	string	For replication (see below).
`replication_timestamp`	string	For replication (see below).
`style`	string	The style file as set with the `-S` or `--style` option. See below for some details.
`updatable`	bool	Is this database updatable (imported with `--slim` and without `--drop`)?
`version`	string	Version number of the osm2pgsql application that did the import.

When updating an existing database that has an osm2pgsql_properties table, osm2pgsql will check that the command line options used are compatible and will complain if they are not. Options that are not set on the command line will automatically be set if needed. So for instance if you import a database without -x but then use -x on updates, osm2pgsql will fail with an error message. If you use -x both on import and on update everything is fine. And if you use -x on import but not on update, osm2pgsql will detect that you used that option on import and also use it on update.

The name of the flat node and style files specified on the command line are converted to absolute file names and stored in the flat_node_file and style property, respectively. That means that osm2pgsql will find the files again even if you start it from a different current working directory. If you need to move the flat node or style file somewhere else, you can do that. The next time you run osm2pgsql, add the -F, --flat-nodes or -S, --style option again with the new file name and osm2pgsql will use the new name and update the properties table accordingly.

The current_timestamp and import_timestamp properties are not set if the input file(s) don’t contain timestamps on the OSM objects. (Timestamps in OSM files are optional, but most OSM files have them.)

The replication_* properties reflect the setting of the respective header fields in the input file on import and are used by osm2pgsql-replication to automatically update the database. That program will also update those fields. If you import multiple files, these properties will not be set.

The contents of the osm2pgsql_properties table are internal to osm2pgsql and you should never change them. There is one exception: You can add your own properties for anything you need, but make sure to start the property names with an underscore (_). this way the property names will never clash with any names that osm2pgsql might introduce in the future.

Database Structure

The middle stores its data in the database in the following tables. The PREFIX can be set with the -p, --prefix option, the default is planet_osm. (Note that this option is also interpreted by the pgsql output!)

Table	Description
PREFIX_nodes	OSM nodes
PREFIX_ways	OSM ways
PREFIX_rels	OSM relations
PREFIX_users	OSM users

If a flat node file is used (see below) the PREFIX_nodes table might be missing or empty, because nodes are stored in the flat node file instead. The users table is only used when the -x, --extra-attributes command line option is set (see below).

The tables have the following structure:

Column	Type	Description
id	`int8 NOT NULL`	Unique OSM id of this object. Primary key.
lat	`int4 NOT NULL`	(Nodes only) Latitude * 10⁷.
lon	`int4 NOT NULL`	(Nodes only) Longitude * 10⁷.
nodes	`int8[] NOT NULL`	(Ways only) Array of node ids.
members	`jsonb NOT NULL`	(Relations only) Contains all relation members, for the format see below.
tags	`jsonb`	Tags of this OSM object in the obvious key/value format.

If you need the PostGIS gemetries for nodes and ways you can create a view with something like this:

CREATE VIEW nodes_geom AS
    SELECT id, tags, ST_SetSRID(ST_MakePoint(lon::float / 10000000, lat::float / 10000000), 4326) AS geom
        FROM planet_osm_nodes;

CREATE VIEW ways_geom AS
    SELECT id, nodes, tags, ST_MakeLine(g.points) AS geom
        FROM planet_osm_ways w, LATERAL (
            SELECT array_agg(ST_SetSRID(ST_MakePoint(n.lon::float / 10000000, n.lat::float / 10000000), 4326)) AS points
            FROM unnest(nodes) AS x, planet_osm_nodes n WHERE x = n.id
        ) AS g;

The `members` Column

The members column contains a JSON array of JSON objects. For each member the JSON object contains the following fields:

Field	Type	Description
type	Char	Single character `N`, `W`, or `R` for the OSM object type node, way, or relation.
ref	Integer	The OSM id of the member.
role	Text	The role of the member.

A function planet_osm_member_ids(members int8[], type char(1)) is provided. (The prefix is the same as the tables, here planet_osm.) The function returns all ids in members of type type (‘N’, ‘W’, or ‘R’) as an array of int8. To get the ids of all way members of relation 17, for instance, you can use it like this:

SELECT planet_osm_member_ids(members, 'W') AS way_ids
    FROM planet_osm_rels WHERE id = 17;

Extra Attributes

The following extra columns are stored when osm2pgsql is run with the -x, --extra-attributes command line option (all columns can be NULL if the respective fields were not set in the input data):

Column	Type	Description
created	timestamp with time zone	Timestamp when this version of the object was created.
version	int4	Version number of this object.
changeset_id	int4	Id of the changeset that contains this object version.
user_id	int4	User id of the user who created this object version.

In addition the PREFIX_users table is created with the following structure:

Column	Type	Description
id	int4	Unique user id. Primary key.
name	text	User name (`NOT NULL`).

Indexes

PostgreSQL will automatically create BTREE indexes for primary keys on all middle tables. In addition there are the following indexes:

An GIN index on the nodes column of the ways table allows you to find all ways referencing a give node. You need to access this index with a query like this (which finds all ways referencing node 123):

SELECT * FROM planet_osm_ways
    WHERE nodes && ARRAY[123::int8]
    AND planet_osm_index_bucket(nodes) && planet_osm_index_bucket(ARRAY[123::int8]);

Note the extra condition using the planet_osm_index_bucket() function which makes sure the index can be used. The function will have the same prefix as your tables (by default planet_osm).

To find the relations referencing specific nodes or ways use the planet_osm_member_ids() function described above. There are indexes for members of type node and way (but not members of type relation) provided which use this function. Use a query like this:

SELECT * FROM planet_osm_rels
    WHERE planet_osm_member_ids(members, 'W'::char(1)) && ARRAY[456::int8];

Make sure to use the right casts (::char(1) for the type, ::int8 for the ids), without them PostgreSQL sometimes is not able to match the queries to the right functions and indexes.

By default there is no index on the tags column. If you need this, you can create it with

CREATE INDEX ON planet_osm_nodes USING gin (tags) WHERE tags IS NOT NULL;

Such an index will support queries like

SELECT id FROM planet_osm_nodes WHERE tags ? 'amenity'; -- check for keys
SELECT id FROM planet_osm_nodes WHERE tags @> '{"amenity":"post_box"}'::jsonb; -- check for tags

Reserved Names and Compatibility

For compatibility with older and future versions of osm2pgsql you should never create tables, indexes, functions or any other objects in the database with names that start with osm2pgsql_ or with the PREFIX you have configured (planet_osm_ by default). This way your objects will not clash with any objects that osm2pgsql creates.

Always read the release notes before upgrading in case osm2pgsql changes the format or functionality of any tables, indexes, functions or other objects in the database that you might be using.

Flat Node Store

-F or --flat-nodes specifies that instead of a table in PostgreSQL, a binary file is used as a database of node locations. This should only be used on full planet imports or very large extracts (e.g. Europe) but in those situations offers significant space savings and speed increases, particularly on mechanical drives.

The file will need approximately 8 bytes * maximum node ID, regardless of the size of the extract. With current OSM data (in 2024) that’s about 100 GB. As a good rule of thumb you can look at the current PBF planet file on planet.osm.org, the flat node file will probably be somewhat larger than that.

If you are using the --drop option, the flat node file will be deleted after import.

Caching

In slim-mode, i.e. when using the database middle, you can use the -C, --cache option to specify how much memory (in MBytes) to allocate for caching data. Generally more cache means your import will be faster. But keep in mind that other parts of osm2pgsql and the database will also need memory.

To decide how much cache to allocate, the rule of thumb is as follows: use the size of the PBF file you are trying to import or about 75% of RAM, whatever is smaller. Make sure there is enough RAM left for PostgreSQL. It needs at least the amount of shared_buffers given in its configuration.

You may also set --cache to 0 to disable caching completely to save memory. If you use a flat node store you should disable the cache, it will usually not help in that situation.

In non-slim mode, i.e. when using the RAM middle, the --cache setting is ignored. All data is stored in RAM and uses however much memory it needs.

Bucket Index for Slim Mode

Osm2pgsql can use an index for way node lookups in slim mode that needs a lot less disk space than earlier versions did. For a planet the savings can be about 200 GB! Because the index is so much smaller, imports are faster, too. Lookup times are slightly slower, but this shouldn’t be an issue for most people.

If you are not using slim mode and/or not doing updates of your database, this does not apply to you.

Middle Command Line Options

Option	Description
--tablespace-slim-data=TABLESPC	Store the slim mode tables in the given tablespace.
--tablespace-slim-index=TABLESPC	Store the indexes of the slim mode tables in the given tablespace.
-p, --prefix=PREFIX	Prefix for table names (default: `planet_osm`). This option affects the middle as well as the pgsql output table names. You don’t need to add this option in append mode, because osm2pgsql will remember it from the import.
-s, --slim	Store temporary data in the database. Without this mode, all temporary data is stored in RAM. If you do not have enough memory, the import will not work successfully. With slim mode, you can import the data even on a system with limited RAM, albeit much slower. Slim mode is also needed if you want to update your database later.
--drop	Drop the slim mode tables from the database and remove the flat nodes file once the import is complete. This can greatly reduce the size of the database, as the slim mode tables typically are the same size, if not slightly bigger than the main tables. It does not, however, reduce the maximum spike of disk usage during import. It can furthermore increase the import speed, as no indexes need to be created for the slim mode tables, which (depending on hardware) can nearly halve import time. Slim mode tables however have to be persistent if you want to be able to update your database, as these tables are needed for diff processing.
-C NUM, --cache=NUM	Ignored in non-slim mode. In slim mode: Use up to NUM MB of RAM for caching nodes. Giving osm2pgsql sufficient cache to store all imported nodes typically greatly increases the speed of the import. Default: 800 (MB).
-x, --extra-attributes	Include attributes of each object in the middle tables and make them available to the outputs. Attributes are: user name, user id, changeset id, timestamp and version. You don’t need to add this option in append mode, because osm2pgsql will remember it from the import.
-F FILENAME, --flat-nodes=FILENAME	The flat-nodes mode is a separate method to store node locations on disk. Instead of storing this information in RAM or in the main PostgreSQL database, this mode creates its own separate custom database to store the information. Storage is much more efficient. Storing the node information for the full planet requires hundreds of GBytes in PostgreSQL, the same data is stored in “only” 100GB using the flat-nodes mode. This can also increase the speed of applying diff files. This option activates the flat-nodes mode and specifies the location of the database file. It is a single large file. This mode is only recommended for imports of the full planet or very large extracts. The default is disabled. You don’t need to add this option in append mode, because osm2pgsql will remember it from the import. If you set it in append mode anyway, it will use the new setting for this and future updates.
--middle-schema=SCHEMA	Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the middle. The schema must exist in the database and be writable by the database user. By default the schema set with `--schema` is used, or `public` if that is not set.
--middle-with-nodes	Usually nodes are only stored in the database middle tables when no flat nodes file is used. When a flat node file is used and you still want tagged nodes (and only those) in the database, use this option.

Expire

✎

When osm2pgsql is processing OSM changes, it can create a list of (Web Mercator) tiles that are affected by those changes. This list can later be used to delete or re-render any changed tiles you might have cached. Osm2pgsql only creates this list. How to actually expire the tiles is outside the scope of osm2pgsql. Expire only makes sense in append mode.

Tile expiry will probably only work when creating Web Mercator (EPSG:3857) geometries. When using other output projections osm2pgsql can still generate expire files, but it is unclear how useful they are. The flex output will only allow you to enable expire on Web Mercator geometry columns.

There are two ways to configure tile expiry. The old way (and the only way if you are using the pgsql output) is to use the -e, --expire-tiles, -o, --expire-output, and --expire-bbox-size command line options. This allows only a single expire output to be defined (and it has to go to a file).

The new way, which is only available with the flex output, allows you to define any number of expire outputs (and they can go to files or to database tables).

Expire Outputs

Each expire output has a minimum and maximum zoom level. For each geometry that needs to be expired, the coordinates of the affected tiles are determined and stored. When osm2pgsql is finished with its run, it writes the tile coordinates for all zoom levels between minimum and maximum zoom level to the output.

There is a somewhat special case: If you don’t define a zoom level for the expire output, zoom level 0 is assumed for the minimum and maximum zoom level. That means that any change will result in an expire entry for tile 0/0/0. You can use this to trigger some processing for any and all changes in a certain table for instance.

Memory requirements for osm2pgsql rise with the maximum zoom level used and the number of changes processed. This is usually no problem for zoom level 12 or 14 tiles, but might be an issue if you expire on zoom level 18 and have lots of changes to process.

The output can be written to a file or a table:

Expire File

The generated expire file is a text file that contains one tile coordinate per line in the format ZOOM/X/Y. Tiles appear only once in the file even if multiple geometry changes are affecting the same tile.

The file is written to in append mode, i.e. new tiles are added to the end of the file. You have to clear the file after processing the tiles.

Expire Table

The expire table has five columns zoom, x, y, first, and last. A primary key constraints on the first three columns makes sure that there is no duplicate data. The first and last columns contain the timestamp of the first time this tile was marked as expired and the last time, respectively. These two timestamps can be used to implement various expiry strategies.

You have to delete entries from this file after expiry is run.

Details of the Expire Calculations

To figure out which tiles need to be expired, osm2pgsql looks at the old and new geometry of each changed feature and marks all tiles as expired that are touched by either or both of those geometries. For new features, there is only the new geometry, of course. For deleted features only the old geometry is used.

If the geometry of a feature didn’t change but the tags changed in some way that will always trigger the expiry mechanism. It might well be that the tag change does not result in any visible change to the map, but osm2pgsql can’t know that, so it always marks the tiles for expiry.

Which tiles are marked to be expired depends on the geometry type generated:

For point geometries, the tile which contains this point is marked as expired.
For line geometries (linestring or multilinestring) all tiles intersected by the line are marked as expired.
For (multi)polygons there are several expire modes: In full-area mode all tiles in the bounding box of the polygon are marked as expired. In boundary-only mode only the lines along the boundary of the polygon are expired. In hybrid mode either can happen depending on the area of the polygon. Polygons with an area larger than full_area_limit are expired as if boundary-only is set, smaller areas in full-area mode. When using the flex output, you can set the mode and full_area_limit as needed for each geometry column. For the pgsql output the expire output always works in hybrid mode, use the --expire-bbox-size option to set the full_area_limit (default is 20000).

In each case neighboring tiles might also be marked as expired if the feature is within a buffer of the tile boundary. This is so that larger icons, thick lines, labels etc. which are rendered just at a tile boundary which “overflow” into the next tile are handled correctly. By default that buffer is set at 10% of the tile size, in the flex output it is possible to change it using the buffer setting.

Expire Command Line Options

These are the command line options to configure expiry. Use them with the pgsql or flex output.

When using the flex output it is recommended you switch to the new way of defining the expire output explained in Defining and Using Expire Outputs.

Option	Description
-e, --expire-tiles=[MIN-ZOOM-]MAX-ZOOM	Create a tile expiry list for zoom level `MAX-ZOOM` or all zoom levels between `MIN-ZOOM` and `MAX-ZOOM` (default is `0` which means the feature is disabled)
-o, --expire-output=FILENAME	Output file name for expired tiles list (default: `dirty_tiles`)
--expire-bbox-size=SIZE	Max width and height for a polygon to expire the whole polygon, not just the boundary (default `20000`)

Generalization

✎

Experimental Osm2pgsql has some limited support for generalization. See the generalization project page for some background and details. This work is experimental and everything described here might change without notice.

For the generalization functionality the separate program osm2pgsql-gen is provided. In the future this functionality might be integrated into osm2pgsql itself. The same Lua configuration file that is used for osm2pgsql is also used to configure the generalization. Generalization will only work together with the flex output.

The documentation in this chapter is incomplete. We are working on it…

Overview

Generalization is the process by which detailed map data is selected, simplified, or changed into something suitable for rendering on smaller scale maps (or smaller zoom levels). In osm2pgsql this is done with a separate program after an import or update finished. Data is processed in the database and/or read out of the database and processed in osm2pgsql and then written back.

The end result is that in addition to the usual tables created and filled by osm2pgsql you have a set of additional tables with the generalized data.

Generalization is currently only supported for Web Mercator (EPSG 3857). This is by far the most common use case, we can look at extending this later if needed.

Osm2pgsql supports several different strategies for generalization which use different algorithms suitable for different types of data. Each strategy has several configuration options. See the next section for general options used by most strategies and the section after that for all the details about the strategies.

Configuration

All tables needed for generalized data have to be configured just like any table in osm2pgsql. Currently there are some restrictions on the tables:

The input and output tables must use the same schema.
The geometry column used must have the same name as the geometry column in the table used as input for a generalizer.
Output tables for tile-based generalizers must have ids set to tile, which automatically ceates x and y columns for the tile coordinates. An index will also be created on those columns after generalization.

To add generalization to your config, add a callback function osm2pgsql.process_gen() and run generalizers in there:

function osm2pgsql.process_gen()
    osm2pgsql.run_gen(STRATEGY, { ... })
end

Replace STRATEGY with the strategy (see below) and add all parameters to the Lua table.

The following parameters are used by most generalizers:

Parameter	Type	Description
name	text	Identifier for this generalizer used for debug outputs and error message etc.
debug	bool	Set to `true` to enable debug logging for this generalizer. Debug logging must also be enabled with `-l, --log-level=debug` on the command line.
schema	text	Database schema for all tables. Default: `public`.
src_table	text	The table with the input data.
dest_table	text	The table where generalizer output data is written to.
geom_column	text	The name of the geometry column in the input and output tables (default: `geom`).

For more specific parameters see below.

You can also run any SQL command in the process_gen() callback with the run_sql() function:

osm2pgsql.run_sql({
    description = 'Descriptive name for this command for logging',
    sql = "UPDATE foo SET bar = 'x'"
})

The following fields are available in the run_sql() command:

Parameter	Type	Description
description	text	Descriptive name or short text for logging.
sql	text or array of texts	The SQL command to run. The `sql` field can be set to a string or to an array of strings in which case the commands in those strings will be run one after the other.
transaction	bool	Set to `true` to run the command(s) from the `sql` field in a transaction (Default: `false`).
if_has_rows	text	SQL command that is run first. If that SQL command returns any rows, the commands in `sql` are run, otherwise nothing is done. This can be used, to trigger generalizations only if something changed, for instance when an expire table contains something. Use a query like `SELECT 1 FROM expire_table LIMIT 1`. Default: none, i.e. the command in `sql` always runs.

Generalization Strategies

There are currently two types of strategies: Some strategies always work on all data in the input table(s). If there are changes in the input data, the processing has to restart from scratch. If you work with updates, you will usually only run these strategies once a day or once a week or so. In general those strategies make only sense for data that doesn’t change that often (or where changes are usually small) and if it doesn’t matter that data in smaller zoom levels are only updated occasionally.

The other type of strategy uses a tile-based approach. Whenever something changes, all tiles intersecting with the change will be re-processed. Osm2pgsql uses the existing expire mechanism to keep track of what to change.

Strategy `builtup`

This strategy derives builtup areas from landuse polygons, roads and building outlines. It is intended to show roughly were there are urban areas. Because it uses input data of several diverse types, it works reasonably well in different areas of the world even if the landuse tagging is incomplete.

This strategy is tile-based. Internally it uses a similar approach as the raster-union strategy but can work with several input tables.

Parameters used by this strategy (see below for some additional general parameters):

Parameter	Type	Description
src_tables	text	Comma-separated list of input table names in the order landuse layer, buildings layer, roads layer.
image_extent	int	Width/height of the raster used for generalization (Default: 2048).
image_buffer	int	Buffer used around the raster image (default: 0).
min_area	real	Drop output polygons smaller than this. Default: off
margin	real	The overlapping margin as a percentage of `image_extent` for raster processing of tiles.
buffer_size	text	Amount by which polygons are buffered in pixels. Comma-separated list for each input file.
turdsize	int
zoom	int	Zoom level.
make_valid	bool	Make sure resulting geometries are valid.
area_column	text	Column name where to store the area of the result polygons.

See this blog post for some background.

Strategy `discrete-isolation`

When rendering a map with many point features like cities or mountain peaks it is often useful to only put the most important features on the map. Importance can be something like the number of people living in a city or the height of a peak. But if only the absolute importance is used, some areas on the map will get filled with many features, while others stay empty. The Discrete Isolation algorithm can be used to calculate a more relative measure of importance which tends to create a more evenly filled map.

This strategy always processes all features in a table.

Parameters used by this strategy (see below for some additional general parameters):

Parameter	Type	Description
id_column	text	The name of the id column in the source table.
importance_column	text	The column in the source table with the importance metric. Column type must be a number type.

The src_table and dest_table have always to be the same.

You must have an index on the id column, otherwise this will be very slow! Set create_index = 'always' in your source table configuration.

You must have the following columns in your table. This is currently not configurable:

Column	Type	Description
discr_iso	real	Discrete isolation value
irank	int	Importance rank
dirank	int	Discrete isolation rank

Use these column definitions in your config file to add them:

{ column = 'discr_iso', type = 'real', create_only = true },
{ column = 'irank', type = 'int', create_only = true },
{ column = 'dirank', type = 'int', create_only = true },

See this blog post for some background.

Strategy `raster-union`

This strategy merges and simplifies polygons using a raster intermediate. It is intended for polygon layers such as landcover where many smaller features should be aggregated into larger ones. It does a very similar job as the vector-union strategy, but is faster.

This strategy is tile-based.

Parameters used by this strategy (see below for some additional general parameters):

Parameter	Type	Description
image_extent	int	Width/height of the raster used for generalization (Default: 2048).
margin	real	The overlapping margin as a percentage of `image_extent` for raster processing of tiles.
buffer_size	text	Amount by which polygons are buffered in pixels (Default: 10).
zoom	int	Zoom level.
group_by_column	text	Name of a column in the source and destination tables used to group the geometries by some kind of classification (Optional).
expire_list	text
img_path	text	Used to dump PNGs of the “before” and “after” images to a file for debugging.
img_table	text	Used to dump “before” and “after” raster images to the database for debugging. The table will be created if it doesn’t exist already.
where	text	Optional WHERE clause to add to the SQL query getting the input data from the database. Must be empty or a valid SQL snippet.

Actual image extent used will be image_extent + 2 * margin * image_extent. margin * image_extent is rounded to nearest multiple of 64.

The img_path parameters can be set to help with debugging. Set img_path to something like this: some/dir/path/img. Resulting images will be in the directory some/dir/path and are named img-X-Y-TYPE-[io].png for input (i) or output (o) images. The TYPE is the value from the group_by_column.

See this blog post for some background.

Strategy `rivers`

This strategy is intended to find larger rivers with their width and aggregate them into longer linestrings. The implementation is incomplete and not usable at the moment.

This strategy always processes all features in a table.

Parameters used by this strategy (see below for some additional general parameters):

Parameter	Type	Description
src_areas	text	Name of the input table with waterway areas.
width_column	text	Name of the number type column containing the width of a feature.

See this blog post for some background.

Strategy `vector-union`

This strategy merges and simplifies polygons using vector calculations. It is intended for polygon layers such as landcover where many smaller features should be aggregated into larger ones. It does a very similar job as the raster-union strategy, but is slower.

This strategy is tile-based.

Parameters used by this strategy (see below for some additional general parameters):

Parameter	Type	Description
margin	real
buffer_size	text	Amount by which polygons are buffered in Mercator map units.
group_by_column	text	Column to group data by. Same column is used in the output for classification.
zoom	int	Zoom level.
expire_list	text

Strategy `tile-sql`

Run some SQL code for each tile. Use {ZOOM}, {X}, and {Y} in the SQL command to set the zoom level and tile coordinates.

This strategy is tile-based.

Parameters used by this strategy (see below for some additional general parameters):

Parameter	Type	Description
sql	int	The SQL code to run.
zoom	int	Zoom level.

Running osmp2gsql-gen

Here are the most important command line options:

Command line option	Description
-a, --append	Run in append (update) mode. Same option as with osm2pgsql.
-c, --create	Run in create (import) mode. Same option as with osm2pgsql. (This is the default.)
-j, --jobs=JOBS	Maximum number of threads to use. (Default: no threads.)
-l, --log-level=LEVEL	Set log level (`debug`, `info` (default), `warn`, or `error`).
--log-sql	Log all SQL commands send to the database.
--middle-schema=SCHEMA	Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the middle. The schema must exist in the database and be writable by the database user. By default the schema set with `--schema` is used, or `public` if that is not set. Set this to the same value as used on the `osm2pgsql` command line.

Some strategies can run many jobs in parallel, speeding up processing a lot. Use the -j, --jobs option to set the maximum number of threads. If nothing else is running in parallel, try setting this to the number of available CPU cores.

To specify which database to work on osm2pgsql-gen uses the same command line options as osm2pgsql:

Option	Description
-d, --database=DB	Database name or PostgreSQL conninfo string.
-U, --user=USERNAME	Version < 2.2.0 Database user.
-U, --username=USERNAME, --user=USERNAME	Version >= 2.2.0 Database user.
-W, --password	Force password prompt. Do not put the password on the command line!
-H, --host=HOST	Database server hostname or unix domain socket location.
-P, --port=PORT	Database server port.
--schema=SCHEMA	Default for various schema settings throughout osm2pgsql (default: `public`). The schema must exist in the database and be writable by the database user.

Advanced Topics

✎

Notes on Memory Usage

Importing an OSM file into the database is very demanding in terms of RAM usage. Osm2pgsql and PostgreSQL are running in parallel at this point and both need memory. You also need enough memory for general file system cache managed by the operating system, otherwise all IO will become too slow.

PostgreSQL blocks at least the part of RAM that has been configured with the shared_buffers parameter during PostgreSQL tuning and needs some memory on top of that. (See Tuning the PostgreSQL Server). Note that the PostgreSQL manual recommends setting shared_buffers to 25% of the memory in your system, but this is for a dedicated database server. When you are running osm2pgsql on the same host as the database, this is usually way too much.

Osm2pgsql needs at least 2GB of RAM for its internal data structures, potentially more when it has to process very large relations. In addition it needs to maintain a cache for node locations. The size of this cache can be configured with the parameter --cache.

When importing with a flatnode file (option --flat-nodes), it is best to disable the node cache completely (--cache=0) and leave the memory for the system cache to speed up accessing the flatnode file.

For imports without a flatnode file, set --cache approximately to the size of the OSM pbf file you are importing. (Note that the --cache setting is in MByte). Make sure you leave enough RAM for PostgreSQL and osm2pgsql as mentioned above. If the system starts swapping or you are getting out-of-memory errors, reduce the cache size or consider using a flatnode file.

When you are running out of memory you’ll sometimes get a bad_alloc error message. But more often osm2pgsql will simply crash without any useful message. This is, unfortunately, something we can not do much about. The operating system is not telling us that there is no memory available, it simply ends the program. This is due to something called “overcommit”: The operating system will allow the program to allocate more memory than there is actually available because it is unlikely that all programs will actually need this memory and at the same time. Unfortunately when programs do, there isn’t anything that can be done except crash the program. Memory intensive programs like osm2pgsql tend to run into these problems and it is difficult to predict what will happen with any given set of options and input files. It might run fine one day and crash on another when the input is only slightly different.

Note also that memory usage numbers reported by osm2pgsql itself or tools such as ps and top are often confusing and difficult to interpret. If it looks like you are running out of memory, try a smaller extract, or, if you can, use more memory, before reporting a problem.

Parallel Processing

Some parts of the osm2pgsql processing can run in parallel. Depending on the hardware resources of you machine, this can make things faster or slower or even overwhelm your system when too many things happen in parallel and your memory runs out. For normal operation the defaults should work, but you can fine-tune the behaviour using some command line options.

Osm2pgsql will do some of its processing in parallel. Usually it will use however many threads your CPUs support, but no more than 4. For most use cases this should work well, but you can tune the number of threads used with the --number-processes command line option. (Note that the option is a bit of a misnomer, because this sets the number of threads used, they are all in a single process.) If disks are fast enough e.g. if you have an SSD, then this can greatly increase speed of the “going over pending ways” and “going over pending relations” stages on a multi-core server. Past 8 threads or so this will probably not gain you any speed advantage.

By default osm2pgsql starts the clustering and index building on all tables in parallel to increase performance. This can be a disadvantage on slow disks, or if you don’t have enough RAM for PostgreSQL to do the parallel index building. PostgreSQL potentially needs the amount of memory set in the maintenance_work_mem config setting for each index it builds. With 7 or more indexes being built in parallel this can mean quite a lot of memory is needed. Use the --disable-parallel-indexing option to build the indexes one after the other.

Using Database While Updating

To improve performance osm2pgsql uses several parallel threads to import or update the OSM data. This means that there is no transaction around all the updates. If you are querying the database while osm2pgsql is running, you might be able to see some updates but not others. While an import is running you should not query the data. For updates it depends a bit on your use case.

In most cases this is not a huge problem, because OSM objects are mostly independent of one another. If you are

writing OSM objects into multiple tables,
using two-stage processing in the Flex output, or
doing complex queries joining several tables

you might see some inconsistent data, although this is still rather unlikely. If you are concerned by this, you should stop any other use of the database while osm2pgsql is running. This is something that needs to be done outside osm2pgsql, because osm2pgsql doesn’t know what else is running and whether and how it might interact with osm2pgsql.

Note that even if you are seeing inconsistent data at some point, the moment osm2pgsql is finished, the data will be consistent again. If you are running a tile server and using the expire functionality, you will, at that point, re-render all tiles that might be affected making your tiles consistent again.

Handling Failed Imports or Updates

If a database import with osm2pgsql fails for any reason you have to start from the beginning. Fix the problem that caused the failure and run osm2pgsql again. You might want to re-initialize the database before that or remove some leftover tables and indexes, but you don’t have to.

If a database update with osm2pgsql fails for any reason just do the update again (after fixing the cause of the failure). As descibed in the previous section, you might have inconsistent map data until you run that update to completion. If you are running multiple updates and don’t know which one failed you can always apply all updates again from the beginning. If you are using osm2pgsql-replication simply re-run it after fixing the problem, it will figure out from where to restart the updates.

Clustering by Geometry

Typical use cases for a database created by osm2pgsql require querying the database by geometry. To speed up this kind of query osm2pgsql will by default cluster the data in the output tables by geometry which means that features which are in reality near to each other will also be near to each other on the disk and access will be faster.

If a table has multiple geometry columns, clustering will always be by the first geometry column.

This clustering is achieved by ordering and copying each whole table after the import. This will take some time and it means you will temporarily need twice the disk space.

When you are using the flex output, you can disable clustering by setting the cluster table option to no (see the Advanced Table Definition section).

Logging and Monitoring SQL Commands

The --log-sql option allows you to see what SQL commands are issued by osm2pgsql. Each log line starts with the date and time and SQL:, after that you see the connection number (C3) and the log information, usually an SQL command exactly as it was sent to the database.

For new connections the database backend process id is logged.

New connections also show the “context” which specifies which part of osm2pgsql has created this connection. The context and connection number will also appear in the application_name used for this connection which can be seen in the pg_stat_activity table from inside the database.

Tips & Tricks

✎

The flex output allows a wide range of configuration options. Here are some extra tips & tricks.

Using `create_only` Columns for Postprocessed Data

Sometimes it is useful to have data in table rows that osm2pgsql can’t create. For instance you might want to store the center of polygons for faster rendering of a label.

To do this define your table as usual and add an additional column, marking it create_only. In our example case the type of the column should be the PostgreSQL type GEOMETRY(Point, 3857), because we don’t want osm2pgsql to do anything special here, just create the column with this type as is.

polygons_table = osm2pgsql.define_area_table('polygons', {
    { column = 'tags', type = 'hstore' },
    { column = 'geom', type = 'geometry' },
    { column = 'center', sql_type = 'GEOMETRY(Point, 3857)', create_only = true },
    { column = 'area', type = 'area' },
})

After running osm2pgsql as usual, run the following SQL command:

UPDATE polygons SET center = ST_Centroid(geom) WHERE center IS NULL;

If you are updating the data using osm2pgsql --append, you have to do this after each update. When osm2pgsql inserts new rows they will always have a NULL value in center, the WHERE condition makes sure that we only do this (possibly expensive) calculation once.

Creating GENERATED Columns

From version 12 on PostgreSQL supports GENERATED columns. You can use these from osm2pgsql with a trick, by adding some “magic” wording to the sql_type in a column definition.

We don’t promise that this trick will work forever. Putting anything into the sql_type but a PostgreSQL type is strictly experimental.

It is probably best explained with an example. With this config you create a polygon table with an additional column center that is automatically filled with the centroid of the polygon added:

local polygons = osm2pgsql.define_area_table('polygons', {
    { column = 'tags', type = 'jsonb' },
    { column = 'geom', type = 'polygon', not_null = true },
    { column = 'center', create_only = true, sql_type = 'geometry(point, 3857) GENERATED ALWAYS AS (ST_Centroid(geom)) STORED' },
})

function osm2pgsql.process_way(object)
    if object.is_closed then
        polygons:insert({
            tags = object.tags,
            geom = object:as_polygon()
        })
    end
end

You could do this specific case simpler and faster by using the centroid() function in Lua. But PostGIS has a lot more interesting functions that you can use this way. Be sure to read the PostgreSQL documentation about the syntax and functionality of the GENERATED columns.

Make sure you have the create_only = true option set on the column, otherwise this will not work.

Accessing Environment Variables from Lua

In Lua scripts you can access environment variables with os.getenv("VAR"). The Lua config scripts that osm2pgsql uses are normal Lua scripts, so you can do that there, too.

Logging Memory Use of Lua Scripts

Lua scripts can use quite a lot of memory if you are not careful. This is usually only a problem when using two-stage processing. To monitor how much memory is currently used, use this function:

function create_memory_reporter(filename, frequency)
    local counter = 0
    local file = io.open(filename, 'w')
    file:write('timestamp,counter,mbyte\n')

    return function()
        if counter % frequency == 0 then
            local mem = collectgarbage('count')
            local ts = os.date('%Y-%m-%dT%H:%M:%S,')
            file:write(ts .. counter .. ',' .. math.ceil(mem / 1024) .. '\n')
            file:flush()
        end
        counter = counter + 1
    end
end

Then use it to create one or more memory_reporters. The first argument is the output file name, the second specifies after how many process callbacks it should trigger the output. Make sure this number is not too small, otherwise processing will become quite slow.

Here is an example use for the process_node callback:

local mr = create_memory_reporter('/tmp/osm2pgsql-lua-memlog.csv', 10000)

function osm2pgsql.process_node(object)
    mr()
    ...
end

You can have one memory reporter for nodes, ways, and relations together or have separate ones.

Checking Lua Scripts

If you have the Lua compiler (luac) installed (it comes with the installation of the Lua interpreter), you can use it to syntax check your Lua scripts independently of osm2pgsql. Just run luac -p SCRIPT.lua.

A script that fails this check will not work with osm2pgsql. But it is only a syntax check, it doesn’t run your script, so the script can still fail when used in osm2pgsql. But it helps getting rid of the simple errors quickly.

There is also the luacheck linter program which does a lot more checks. Install with luarocks install luacheck, on Debian/Ubuntu you can use the lua-checks package. Then run luacheck SCRIPT.lua.

Lua Script Performance

A complex Lua flex configuration might mean that osm2pgsql spends quite some time running your Lua code. Optimizing that code will be important.

You can somewhat increase processing speed by using the LuaJIT compiler, but you have to compile osm2pgsql yourself, because pre-built osm2pgsql binaries usually don’t use it.

Here are some tips:

Always use local variables which are cheap. It also makes your code more robust.
Don’t do work more than once. Store intermediate results in local variables instead of recomputing it. For instance if you need the geometry of an OSM object several times, get it once and store it (local geom = object:as_polygon(); local center = geom:centroid()).

See Lua Performance Tips from the author of Lua for some in-depth tips on how to improve your Lua code. This stackoverflow question also has some great information.

Lua Debug Output

Sometimes it is useful to print some debug output from a Lua script. For simple variables this can easily be done with the Lua built-in print() function.

For more complex data you can use the inspect Lua library. It is available from the source or using LuaRocks. Debian/Ubuntu users can install the lua-inspect package.

Getting and Preparing OSM Data

✎

Osm2pgsql imports OSM data into a database. But where does this data come from? This appendix shows how to get the data and, possibly, prepare it for use with osm2pgsql.

Data Formats

There are several common formats for OSM data and osm2pgsql supports all of them (XML, PBF, and O5M). If you have the option, you should prefer the PBF format, because the files are smaller than those in other formats and faster to read.

OSM data files usually have a suffix .osm, .osm.gz, .osm.bz2, or .osm.pbf, sometimes just .pbf. They are used in osm2pgsql “create” mode.

OSM change files usually have the suffix .osc.gz. They are used in osm2pgsql “append” mode.

Getting the Data

The Planet File

The OpenStreetMap project publishes the so-called “planet file”, which contains a dump of the current full OSM database. This dump is available in XML and PBF format.

Downloads are available on planet.osm.org and several mirrors.

If you are new to osm2pgsql we recommend you start experimenting with a small extract, not the planet file! The planet file is huge (tens of GBytes) and it will take many hours to import.

Geographical Extracts

Importing data into the database takes time and uses a lot of disk space. If you only need a portion of the OSM data it often makes sense to only import an extract of the data.

Geographical extracts for countries, states, cities, etc. are available from several sources.

The extracts from Geofabrik are very popular. They are updated daily and also offer daily change files suitable for updating an osm2pgsql database.

Always choose an extract slightly larger than the area you are interested in, otherwise you might have problems with the data at the boundary (also see Handling of Incomplete OSM Data).

If you can’t find a suitable extract, see below for creating your own.

Updating an Existing Database

The OpenStreetMap database changes all the time. To get these changes into your database, you need to download the OSM change files, sometimes called diffs or replication diffs, which contain those changes. They are available from planet.osm.org. Depending on how often you want to update your database, you can get minutely, hourly, or daily change files.

Some services offering OSM data extracts for download also offer change files suitable for updating those extracts. Geofabrik has daily change files for all its updates. See the extract page for a link to the replication URL. (Note that change files go only about 3 months back. Older files are deleted.) download.openstreetmap.fr has minutely change files for all its extracts.

To keep an osm2pgsql database up to date you need to know the replication (base) URL, i.e. the URL of the directory containing a state.txt file.

Keeping the database up-to-date with osm2pgsql-replication

Osm2pgsql comes with a script scripts/osm2pgsql-replication which is the easiest way to keep an osm2pgsql database up to date. The script requires PyOsmium and Psycopg (psycopg2 and psycopg3 both will work) to be installed.

Initialising the Update Process

Before you can download updates, osm2pgsql-replication needs to find the starting point from which to apply the updates. Run the initialisation like this:

osm2pgsql-replication init -d <dbname>

The -d parameter tells the replication script which database to connect to. The script also supports all other parameters mentioned in the section database connection including libpq environment variables. The only exception is the ‘-W’ parameter for interactive password prompts. When you need to supply a password, always use a pgpass file.

Osm2pgsql will store a table osm2pgsql_properties to your database for new imports. osm2pgsql-replication uses that table for storing the replication information.

By default the update server and interval will be set from the file headers, for planet dumps minutely updates from the OSM main servers will be used. If you want to use a different replication service, use the --server parameter.

It is safe to repeat initialisation at any time. For example, when you want to change the replication service, simply run the init command again with a different --server parameter.

Fetching updates

Fetching updates is as simple as running:

osm2pgsql-replication update -d <dbname> -- <parameters to osm2pgsql>

This fetches data from the replication service, saves it in a temporary file and calls osm2pgsql with the given parameters to apply the changes. Note that osm2pgsql-replication makes sure to only fetch a limited amount of data at the time to make sure that it does not use up too much RAM. If more data is available it will repeat the download and call of osm2pgsql until the database is up to date. You can change the amount of data downloaded at once with --max-diff-size, the default is 500MB.

Sometimes you need to run additional commands after osm2pgsql has updated the database, for example, when you use the expiry function. You can use the option --post-processing to give osm2pgsql-replication a script it is supposed to run after each call to osm2pgsql. Note that if the script fails, then the entire update process is considered a failure and aborted.

Putting it all together with systemd

osm2pgsql-replication works well as a systemd service that keeps your database up to date automatically. There are many ways to set up systemd. This section gives you a working example to get you started.

First set up a service that runs the updates. Add the following file as /etc/systemd/system/osm2pgsql-update.service:

[Unit]
Description=Keep osm2pgsql database up-to-date

[Service]
WorkingDirectory=/tmp
ExecStart=osm2pgsql-replication update -d <dbname> -- <parameters to osm2pgsql>
StandardOutput=append:/var/log/osm2pgsql-updates.log
User=<database user>
Type=simple
Restart=on-failure
RestartSec=5min

Make sure to adapt the database name, osm2pgsql parameters and the user that the script should be run as. The Restart parameters make sure that updates will be tried again, when something goes wrong like there is a temporary network error.

Now add a timer script that starts the service regularly. Add the following file as /etc/systemd/system/osm2pgsql-update.timer:

[Unit]
Description=Trigger a osm2pgsql database update

[Timer]
OnBootSec=10
OnUnitActiveSec=1h

[Install]
WantedBy=timers.target

This timer is good to bring your database up to date once per hour. If you use minutely updates, you can lower the value for OnUnitActiveSec to get the changes even more often. If you use a daily replication service like Geofabrik, set OnUnitActiveSec to at least 1 hour! If you prefer to run your updates only once every night, use OnCalendar=*-*-* 02:00 instead. This will update your server every night at 2am. See the man pages of systemd.timer for more information.

Now reload systemd so it scans the new scripts and enable the timer:

sudo systemctl daemon-reload
sudo systemctl enable osm2pgsql-update.timer
sudo systemctl start osm2pgsql-update.timer

Other methods for updating

If this script is not available in your version of osm2pgsql or you want more control over the update process, there are other options. You need a program to download the change files and keep track of where you are in the replication process. Then you load the changes into the database using osm2pgsql’s “append” mode.

We recommend the pyosmium_get_changes.py tool from the PyOsmium project. With it, downloading all changes since you ran the program the last time is just a single command.

When you have changes for many months or years it might make more sense to drop your database completely and re-import from a more current OSM data file instead of updating the database from change files.

Processing a daily update in one step is fine, but if you want to update data for longer time periods update it in steps, one for each day or so. If you update too much data in one step, there is a good chance the update will take a long time or even fail.

If you have imported an extract into an osm2pgsql database but there are no change files for the area of the extract, you can still use the replication diffs from planet.osm.org to update your database. But, because those extracts contain data for the whole planet, your database will keep accumulating more and more data outside the area of your extract that you don’t really need.

Be aware that you usually can not use OSM change files directly with osm2pgsql. The replication diffs contain all changes, including, occasionally, duplicate changes to the same object, if it was changed more than once in the period covered by the change file. You need to “simplify” the change files before use, for instance with the osmium merge-changes --simplify command. If you use osm2pgsql-replication or pyosmium_get_changes.py this will be taken care of automatically.

Rerunning a failed update

If you run osm2pgsql with --append and the update fails for some reason, for instance when your server crashes in the middle of it, you can just re-run the update again and everything should come out fine.

Preparing OSM Data for Use by Osm2pgsql

Before some OSM data file can be given to osm2pgsql it is sometimes necessary to prepare it in some way. This chapter explains some options.

For most of these steps, the recommended application is the osmium command line tool, which offers a wide range of functionality of slicing and dicing OSM data.

One handy command is osmium fileinfo which can tell you a lot about an OSM file, including how many objects it contains, whether it is sorted, etc.

Creating Geographical Extracts

You can create your own extracts from the planet file or from existing extracts with the osmium extract command. It can create extracts of OSM data from a bounding box or from a boundary.

There are different “cutting strategies” used by osmium extract leading to different results. Read the man page and understand the implications of those strategies before deciding which one to use.

Always cut an extract slightly larger than the area you are interested in, otherwise you might have problems with the data at the boundary (also see Handling of Incomplete OSM Data).

Merging OSM Data Files

If you are working with extracts you sometimes have several extracts, lets say for different countries, that you want to load into the same database. One approach in this case is to merge those extracts first and then import the resulting file with osm2pgsql. But osm2pgsql can also read multiple files at once.

You can use the osmium merge command for merging the input files.

Note that in any case for this to work the input files must all have their data from the same point in time. You can use this to import two or more geographical extracts into the same database. If the extracts are from different points in time and contain different versions of the same object, this will fail!

Merging OSM Change Files

To speed up processing when you have many OSM change files, you can merge several change files into larger files and then process the larger files with osm2pgsql. The osmium merge-changes command will do this for you. Make sure to use the option -s, --simplify. (That command also works with a single change file, if you just want to simplify one file.)

Usually you should not merge change files for more than a day or so when doing this, otherwise the amount of changes osm2pgsql has to process in one go becomes too large.

OSM Data with Negative Ids

OSM data usually uses positive numbers for the ids of nodes, ways, and relations. Negative numbers are sometimes used for inofficial OSM data, for instance when non-OSM data is mixed with OSM data to make sure the are no id clashes.

Osm2pgsql can only handle positive ids. It uses negative ids internally (for multipolygon geometries in the database).

If you have negative ids in your input file, you can renumber it first. You can use the osmium renumber command for this.

Handling Unsorted OSM Data

Osm2pgsql can only read OSM files ordered in this way. This allows some optimizations in the code which speed up the normal processing.

If you have an unsorted input file, you should sort it first. You can use the osmium sort command for this.

Working with OSM History Data

OpenStreetMap offers complete dumps of all OSM data which include not only the current version of all objects like the normal planet dump, but also all earlier version of OSM objects including deleted ones.

Like most other OSM software, osm2pgsql can not handle this data.

For some use cases there is a workaround: Create extracts from the full history dump for specific points in time and feed those to osm2pgsql. You can use the osmium time-filter command to create such extracts.

Lua Library for Flex Output

✎

The flex output includes a small library of useful Lua helper functions. All functions are in the osm2pgsql namespace.

These functions are available on the flex output, they cannot be used in the Lua tag transformations of the pgsql output.

Name	clamp
Synopsis	osm2pgsql.clamp(VALUE, MIN, MAX)
Description	Return VALUE if it is between MIN and MAX, MIN if it is smaller, or MAX if it is larger. All parameters must be numbers. If VALUE is `nil`, `nil` is returned.
Example	osm2pgsql.clamp(2, 3, 4) ⟶ 3

Name	has_prefix
Synopsis	osm2pgsql.has_prefix(STRING, PREFIX)
Description	Returns `true` if the STRING starts with PREFIX. If STRING is `nil`, `nil` is returned.
Example	osm2pgsql.has_prefix('addr:city', 'addr:') ⟶ true

Name	has_suffix
Synopsis	osm2pgsql.has_suffix(STRING, SUFFIX)
Description	Returns `true` if the STRING ends with SUFFIX. If STRING is `nil`, `nil` is returned.
Example	osm2pgsql.has_suffix('tiger:source', ':source') ⟶ true

Name	make_check_values_func
Synopsis	osm2pgsql.make_check_values_func(VALUES[, DEFAULT])
Description	Return a function that will check its only argument against the list of VALUES. If it is in that list, it will be returned, otherwise the DEFAULT (or nil) will be returned.
Example	local get_highway_value = osm2pgsql.make_check_values_func({ 'motorway', 'trunk', 'primary', 'secondary', 'tertiary' }, 'road') ... if object.tags.highway then local highway_type = get_highway_value(object.tags.highway) ... end

Name	make_clean_tags_func
Synopsis	osm2pgsql.make_clean_tags_func(KEYS)
Description	Return a function that will remove all tags (in place) from its only argument if the key matches KEYS. KEYS is an array containing keys, key prefixes (ending in ``), or key suffixes (starting with ``). The generated function will return `true` if it removed all tags, `false` if there are still tags left.
Example	local clean_tags = osm2pgsql.make_clean_tags_func{'source', 'source:', ':source', 'note'} function osm2pgsql.process_node(node) if clean_tags(node.tags) then return end ... end

Name	split_string
Synopsis	osm2pgsql.split_string(STRING[, DELIMITER])
Description	Split STRING on DELIMITER (default: ';' (semicolon)) and return an array table with the results. Items in the array will have any whitespace at beginning and end removed. If STRING is `nil`, `nil` is returned.
Example	local opening_hours = osm2pgsql.split_string(object.tags.opening_hours)

Name	split_unit
Synopsis	osm2pgsql.split_unit(STRING, DEFAULT_UNIT)
Description	Split STRING of the form "VALUE UNIT" (something like "10 mph" or "20km") into the VALUE and the UNIT and return both. The VALUE must be a negative or positive integer or real number. The space between the VALUE and UNIT is optional. If there was no unit in the string, the DEFAULT_UNIT will be returned instead. Return `nil` if the STRING doesn't have the right pattern or is `nil`.
Example	value, unit = osm2pgsql.split_unit(object.tags.maxspeed, 'km/h')

Name	trim
Synopsis	osm2pgsql.trim(STRING)
Description	Return STRING with whitespace characters removed from the beginning and end. If STRING is `nil`, `nil` is returned.
Example	local name = osm2pgsql.trim(object.tags.name)

Name	node_member_ids
Synopsis	osm2pgsql.node_member_ids(RELATION)
Description	Return an array table with the ids of all node members of RELATION.
Example	function osm2pgsql.select_relation_members(relation) if relation.tags.type == 'boundary' then return { ways = osm2pgsql.node_member_ids(relation) } end end

Name	way_member_ids
Synopsis	osm2pgsql.way_member_ids(RELATION)
Description	Return an array table with the ids of all way members of RELATION.
Example	function osm2pgsql.select_relation_members(relation) if relation.tags.type == 'route' then return { ways = osm2pgsql.way_member_ids(relation) } end end

Command Line Options

✎

This appendix contains an overview of all command line options.

Main Options

Option	Description
-a, --append	Run in append mode.
-c, --create	Run in create mode (this is the default if `-a, --append` is not used).

Help/Version Options

Option	Description
-h, --help	Print help. Add `-v, --verbose` for more verbose help.
-V, --version	Print osm2pgsql version. Also prints versions of some libraries used.

Logging Options

Option	Description
--log-level=LEVEL	Set log level (`debug`, `info` (default), `warn`, or `error`).
--log-progress=VALUE	Enable (`true`) or disable (`false`) progress logging. Setting this to `auto` will enable progress logging on the console and disable it if the output is redirected to a file. Default: true.
--log-sql	Enable logging of SQL commands for debugging.
--log-sql-data	Enable logging of all data added to the database. This will write out a huge amount of data! For debugging.
-v, --verbose	Same as `--log-level=debug`.

Database Options

Option	Description
-d, --database=DB	Database name or PostgreSQL conninfo string.
-U, --user=USERNAME	Version < 2.2.0 Database user.
-U, --username=USERNAME, --user=USERNAME	Version >= 2.2.0 Database user.
-W, --password	Force password prompt. Do not put the password on the command line!
-H, --host=HOST	Database server hostname or unix domain socket location.
-P, --port=PORT	Database server port.
--schema=SCHEMA	Default for various schema settings throughout osm2pgsql (default: `public`). The schema must exist in the database and be writable by the database user.

Input Options

Option	Description
-r, --input-reader=FORMAT	Select format of the input file. Available choices are auto (default) for autodetecting the format, xml for OSM XML format files, o5m for o5m formatted files and pbf for OSM PBF binary format.
-b, --bbox=BBOX	Apply a bounding box filter in format `MINLON,MINLAT,MAXLON,MAXLAT` on the imported data. Example: --bbox -0.5,51.25,0.5,51.75

Middle Options

Option	Description
--tablespace-slim-data=TABLESPC	Store the slim mode tables in the given tablespace.
--tablespace-slim-index=TABLESPC	Store the indexes of the slim mode tables in the given tablespace.
-p, --prefix=PREFIX	Prefix for table names (default: `planet_osm`). This option affects the middle as well as the pgsql output table names. You don’t need to add this option in append mode, because osm2pgsql will remember it from the import.
-s, --slim	Store temporary data in the database. Without this mode, all temporary data is stored in RAM. If you do not have enough memory, the import will not work successfully. With slim mode, you can import the data even on a system with limited RAM, albeit much slower. Slim mode is also needed if you want to update your database later.
--drop	Drop the slim mode tables from the database and remove the flat nodes file once the import is complete. This can greatly reduce the size of the database, as the slim mode tables typically are the same size, if not slightly bigger than the main tables. It does not, however, reduce the maximum spike of disk usage during import. It can furthermore increase the import speed, as no indexes need to be created for the slim mode tables, which (depending on hardware) can nearly halve import time. Slim mode tables however have to be persistent if you want to be able to update your database, as these tables are needed for diff processing.
-C NUM, --cache=NUM	Ignored in non-slim mode. In slim mode: Use up to NUM MB of RAM for caching nodes. Giving osm2pgsql sufficient cache to store all imported nodes typically greatly increases the speed of the import. Default: 800 (MB).
-x, --extra-attributes	Include attributes of each object in the middle tables and make them available to the outputs. Attributes are: user name, user id, changeset id, timestamp and version. You don’t need to add this option in append mode, because osm2pgsql will remember it from the import.
-F FILENAME, --flat-nodes=FILENAME	The flat-nodes mode is a separate method to store node locations on disk. Instead of storing this information in RAM or in the main PostgreSQL database, this mode creates its own separate custom database to store the information. Storage is much more efficient. Storing the node information for the full planet requires hundreds of GBytes in PostgreSQL, the same data is stored in “only” 100GB using the flat-nodes mode. This can also increase the speed of applying diff files. This option activates the flat-nodes mode and specifies the location of the database file. It is a single large file. This mode is only recommended for imports of the full planet or very large extracts. The default is disabled. You don’t need to add this option in append mode, because osm2pgsql will remember it from the import. If you set it in append mode anyway, it will use the new setting for this and future updates.
--middle-schema=SCHEMA	Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the middle. The schema must exist in the database and be writable by the database user. By default the schema set with `--schema` is used, or `public` if that is not set.
--middle-with-nodes	Usually nodes are only stored in the database middle tables when no flat nodes file is used. When a flat node file is used and you still want tagged nodes (and only those) in the database, use this option.

Output Options

Option Description

Pgsql Output Options

Option	Description
--tablespace-main-data=TABLESPC	Store the data tables in the PostgreSQL tablespace `TABLESPC`.
--tablespace-main-index=TABLESPC	Store the indexes in the PostgreSQL tablespace `TABLESPC`.
-l, --latlong	Store coordinates in degrees of latitude & longitude.
-m, --merc	Store coordinates in Spherical Mercator (Web Mercator, EPSG:3857) (the default).
-E, --proj=SRID	Use projection EPSG:`SRID`.
-p, --prefix=PREFIX	Prefix for table names (default: `planet_osm`). This option affects the middle as well as the pgsql output table names.
--tag-transform-script=SCRIPT	Specify a Lua script to handle tag filtering and normalisation. The script contains callback functions for nodes, ways and relations, which each take a set of tags and returns a transformed, filtered set of tags which are then written to the database.
-x, --extra-attributes	Include attributes (user name, user id, changeset id, timestamp and version). This also requires additional entries in your style file.
-k, --hstore	Add tags without column to an additional hstore (key/value) column in the database tables.
-j, --hstore-all	Add all tags to an additional hstore (key/value) column in the database tables.
-z, --hstore-column=PREFIX	Add an additional hstore (key/value) column named `PREFIX` containing all tags that have a key starting with `PREFIX`, eg `\--hstore-column "name:"` will produce an extra hstore column that contains all `name:xx` tags.
--hstore-match-only	Only keep objects that have a value in at least one of the non-hstore columns.
--hstore-add-index	Create indexes for all hstore columns after import.
-G, --multi-geometry	Normally osm2pgsql splits multi-part geometries into separate database rows per part. A single OSM object can therefore use several rows in the output tables. With this option, osm2pgsql instead generates multi-geometry features in the PostgreSQL tables.
-K, --keep-coastlines	Keep coastline data rather than filtering it out. By default objects tagged `natural=coastline` will be discarded based on the assumption that Shapefiles generated by OSMCoastline (https://osmdata.openstreetmap.de/) will be used for the coastline data.
--reproject-area	Compute area column using Spherical Mercator coordinates even if a different projection is used for the geometries.
--output-pgsql-schema=SCHEMA	Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the pgsql output. The schema must exist in the database and be writable by the database user. By default the schema set with `--schema` is used, or `public` if that is not set.

Expire Options

Option	Description
-e, --expire-tiles=[MIN-ZOOM-]MAX-ZOOM	Create a tile expiry list for zoom level `MAX-ZOOM` or all zoom levels between `MIN-ZOOM` and `MAX-ZOOM` (default is `0` which means the feature is disabled)
-o, --expire-output=FILENAME	Output file name for expired tiles list (default: `dirty_tiles`)
--expire-bbox-size=SIZE	Max width and height for a polygon to expire the whole polygon, not just the boundary (default `20000`)

Advanced Options

Option	Description
-I, --disable-parallel-indexing	Disable parallel clustering and index building on all tables, build one index after the other.
--number-processes=THREADS	Specifies the number of parallel threads used for certain operations. The default is to set this between 1 and 4 depending on the number of CPUs you have. Values up to 32 are possible but probably not useful. Note that each thread opens multiple connections to the database and you will probably reach the limit of allowed database connections.

Upgrading

✎

Upgrading to 2.0.0

Version 2.0.0 has some larger changes compared to the version 1 series. If you are using osm2pgsql in production, we recommend that you upgrade to version 1.11.0 first. Then run osm2pgsql as usual and read all the messages it prints carefully. If there are any messages about deprecated functionality act on them before upgrading to 2.0.0.

Upgrade information for older versions are in the v1 manual.

Middle table format: Only the new format for the middle tables (osm_planet_nodes/ways/rels) is supported. This only affects you if you use slim mode and update your database. If you don’t have a table named osm2pgsql_properties in your database you need to reimport your database. If you have a table osm2pgsql_properties, look for the entry db_format. If that is set to 1, you have to reimport your database. The command line option --middle-database-format was removed.; Support for non-bucket way node indexes or non-standard bucket sizes for way node indexes have been removed. The way node index will now always use the bucket index which is much smaller than a regular index. If you are not using the default bucket way node index, you have to reimport. The command line option --middle-way-node-index-id-shift which set this was removed.; If in doubt, reimport, old databases are probably bloated anyway, an occasional reimport helps with that. For more information about this see the Middle chapter.
Gazetteer output removed: The Gazetteer output was removed. Historically it was used for the Nominatim geocoder but that switched to using the flex output.
Removed command line options: The command line options --cache-strategy, --with-forward-dependencies, have been removed. The command line option -i, --tablespace-index has been removed, use the --tablespace-slim-index and/or --tablespace-main-index options instead. The options --middle-database-format and --middle-way-node-index-id-shift were removed, see above.; Generally the checks for command line options are a bit more strict, so some combinations might not work or show warnings.
New system requirements: Support for older systems has been removed. On Linux you need at least glibc 8 (for std::filesystem), versions of the proj library before version 6 are not supported any more. There is no dependency on the boost-property-tree library any more. Lua is now required, you can not build osm2pgsql without it any more.
Switching from add_row() to insert(): If you are using the flex output and your Lua config file uses the add_row() function, you need to change your configuration. The area column type has also been removed because it only works with add_row(). See this tutorial for all the details.
Check that Lua functions on OSM object are called correctly: Check that functions in Lua code such as object:as_point() are called correctly using the colon-syntax, not as object.as_point(). A warning is printed once per function if this is not the case. We are trying to get in line with common Lua syntax here which will allow some improvements later on.
Handling of untagged objects and object attributes has changed: The -x or --extra-attributes command line option did several things at once which made some processing much slower than it needed to be. From 2.0.0 onwards the behaviour is different:; OSM object attributes (version, timestamp, changeset, uid, user) will always be available in the flex Lua processing functions if those attributes are available in the input data.; When the -x or --extra-attributes command line option is used the behaviour for the “psgql” output is the same. But when using the “flex” output, this option only means that the attributes are stored in the middle tables and are available in the flex Lua processing functions.; Indepedent of the -x or --extra-attributes command line option, the processing functions process_node|way|relation() will never be called for untagged objects. Instead the functions process_untagged_node|way|relation() will be called, if available. It is up to the user to decide in which data they are interested in and define the functions accordingly.
Pgsql output deprecated: The old “pgsql” output is deprecated now, it will be removed in the version 3 series. This is still a few years off, but please start moving your configurations over to the “flex” output.

Sizing

✎

It is sometimes difficult to figure out how large a machine you need for an osm2pgsql database import or how long that import will take. This appendix will give some guidance on that. But remember that each situation is different, hardware is different, operating systems are different, database versions are different. And the exact configuration for the database and osm2pgsql can play a large role. So take the numbers here only as some rough first approximations and try it out yourself.

Here are the numbers for non-updateable (non-slim) imports:

Input file	PBF file size	Import time	Database size	RAM used
Switzerland	0.4 GB	1 to 4 mins	2 to 3 GB	3 GB
Germany	4 GB	18 to 39 mins	20 to 30 GB	10 GB
Europe	27 GB	130 to 240 mins	140 to 180 GB	60 GB
Planet	70 GB	5.5 to 9 hours	330 to 407 GB	120 GB

Here are the numbers for updateable (slim with flat nodes) imports:

Input file	PBF file size	Import time	Database size
Switzerland	0.4 GB	3 to 5 mins	4 to 5 GB
Germany	4 GB	20 to 42 mins	40 to 48 GB
Europe	27 GB	150 to 240 mins	260 to 310 GB
Planet	70 GB	6 to 10 hours	590 to 730 GB

Imports were run on a machine with AMD Ryzen 9 3900 12-Core Processor, 128 GB RAM and NVMe SSDs. The database was tuned according the chapter on server tuning. These values are from osm2pgsql version 1.7.0 and with PostgreSQL 14 using data from October 2022.

The imports were run with different configurations, using the pgsql output and the flex output (with LuaJIT disabled) from simple configurations to complex ones using the openstreetmap-carto style. RAM use is for osm2pgsql itself only, the database itself also needs memory. For updatable databases RAM use is always reported to be around the 80 GB needed for mapping the flat node file into RAM, but that’s not the actual memory used, so these numbers are not shown.

Osm2pgsql v2 Manual

Table of Contents

Introduction

System Requirements

Operating System

Main Memory

Disk

Database Software

Lua Scripting Language

Preparing the Database

Installing PostgreSQL/PostGIS

Creating a Database

Security Considerations

Encoding

Tuning the PostgreSQL Server

Expert Tuning

Using a Template Database

Database Maintainance

Geometry Processing

Geometry Types

Single vs. Multi Geometries

Geometry Validity

Processing of Nodes

Processing of Ways

Processing of Relations

Handling of Incomplete OSM Data

Projections

Latlong (WGS84)

Web Mercator

Other Projections

Running osm2pgsql

Basic command line operation

Import Only

Import and Update

Getting Help or Version

Logging

Database Connection

Processing the OSM Data

The Input

Working with Multiple Input Files

The Middle

The Outputs

Environment Variables

The Flex Output

Defining a Table

Basic Table Definition

Advanced Table Definition

Id Handling

Unique Ids

Using Natural Keys for Unique Ids

Using an Additional ID Column

Defining Columns

Defining Geometry Columns

Defining Indexes

Defining and Using Expire Outputs

Processing Callbacks

The after Callback Functions

The insert() Function

Handling of NULL in insert() Function

Stages

Type Conversions

Geometry Objects in Lua

Locators

Locator Performance

The Pgsql Output

Database Layout

Style File

Special ‘Tags’

Schemas and Tablespaces

Coastline Processing

Use of Hstore

Lua Tag Transformations

Pgsql Output Command Line Options

Middle

The Properties Table

Database Structure

The members Column

Extra Attributes

Indexes

Reserved Names and Compatibility

The `after` Callback Functions

The `insert()` Function

Handling of NULL in `insert()` Function

The `members` Column

Strategy `builtup`

Strategy `discrete-isolation`

Strategy `raster-union`

Strategy `rivers`

Strategy `vector-union`

Strategy `tile-sql`

Using `create_only` Columns for Postprocessed Data