osm2pgsql

Osm2pgsql Manual

This manual is work-in-progress. It is incomplete and some parts are assembled from bits and pieces here or there and don't fit well together yet.

Table of Contents

  1. 1. Introduction
  2. 2. System Requirements
  3. 3. Preparing the Database
  4. 4. Running osm2pgsql
  5. 5. The Flex Output
  6. 6. The Pgsql Output
  7. 7. Middle
  8. 8. Expire
  9. 9. Advanced Topics
  10. 10. Tips & Tricks
  11. A: Projections
  12. B: Lua Library for Flex Output
  13. C: Getting and Preparing OSM Data
  14. D: Command Line Options

Introduction

Osm2pgsql is used to import OSM data into a PostgreSQL/PostGIS database for rendering into maps and many other uses. Usually it is only part of a toolchain, for instance other software is needed that do that actual rendering (i.e. turning the data into a map), deliver the maps to the user etc.

Osm2pgsql is a fairly complex piece of software and it interacts with the database system and other pieces of the toolchain in complex ways. It will take a while until you get some experience with it. It is strongly recommended that you try out osm2pgsql with small extracts of the OSM data, for instance the data for your city. Do not start with importing data for the whole planet, it is easy to get something wrong and it will break after processing the data for hours or days. You can save yourself a lot of trouble doing some trial runs starting with small extracts and working your way up, observing memory and disk usage along the way.

It helps to be familiar with the PostgreSQL database system and the PostGIS extension to it as well as the SQL database query language. Some knowledge of the Lua language is also useful.

This manual always documents the current version of osm2pgsql. If same information is only valid in certain versions, the section will have a note like this: Version >= 1.4.0

It is recommended that you always use the newest released version of osm2pgsql. Earlier versions sometimes contain bugs that have long since been fixed.

System Requirements

Operating System

Osm2pgsql works on Linux, Windows, and macOS.

Osm2pgsql is developed on Linux and most of the developers don’t have experience with running it on anything but Linux, so it will probably works best there. This documentation is also somewhat geared towards Linux users. That being said, we strive to make the software work on Windows and macOS as well. Please report any problems you might have.

Only 64bit systems are supported.

Read the installation instructions for details on how to install osm2pgsql on your system.

Main Memory

Memory requirements for your system will vary widely depending on the size of your input file. Is it city-sized extract or the whole planet? As a rule of thumb you need at least as much main memory as the PBF file with the OSM data is large. So for a planet file you currently need at least 64 GB RAM. Osm2pgsql will not work with less than 2 GB RAM.

More memory can be used as cache and speed up processing a lot.

Disk

You are strongly encouraged to use a SSD (NVMe if possible) for your database. This will be much faster than traditional spinning hard disks.

Requirements for your system will vary widely depending on

  • the amount of data you want to store (city-sized extract or the whole planet?)
  • whether or not you want to update the data regularly

Database Software

You need the PostgreSQL database system with the PostGIS extension installed.

Osm2pgsql aims to support all PostgreSQL and PostGIS versions that are currently supported by their respective maintainers. Currently PostgreSQL versions 9.6 and above and PostGIS versions 2.2 and above are supported. PostgreSQL version 11 or above and PostGIS version 2.4 or above is recommended.

In some cases older versions of osm2pgsql have problems with newer database software versions. You should always run the newest released osm2pgsql version which should ensure you get any fixes we have done to make osm2pgsql work well with any PostgreSQL version.

Osm2pgsql does not work with database systems other than PostgreSQL. The PostgreSQL/PostGIS combination is unique in its capabilities and we are using some of their special features. We are not planning to support any other database systems. There are some database systems out there which claim compatibility with PostgreSQL, they might work or they might not work. Please tell us if you have any experience with them.

Lua Scripting Language

Some parts of osm2pgsql require the use of the Lua scripting language. It is highly recommended that you use a version of osm2pgsql with Lua support enabled, because it gives you a much more flexible configuration. If available osm2pgsql can also be compiled using the Lua JIT (just in time) compiler.

Osm2pgsql can also be compiled without Lua support. In that case the pgsql and gazetteer outputs are the only available.

Preparing the Database

Before you can import any OSM data into a database, you need a database.

Installing PostgreSQL/PostGIS

You need the PostgreSQL database software and the PostGIS plugin for the database. Please read the respective documentation on how to install it. On Linux this can almost always be done with the package manager of your distribution.

Creating a Database

To create a database that can be used by osm2pgsql follow these steps:

  1. Create a database user that osm2pgsql will use. This user doesn’t need any special rights. We’ll use osmuser here.
  2. Create a database that osm2pgsql will use belonging to the user you just created. We’ll use osm as a name here.
  3. Enable the postgis and hstore extensions in the newly created database.

On a typical Linux system you’ll have a system user called postgres which has admin privileges for the database system. We’ll use that user for these steps.

Here are typical commands used:

sudo -u postgres createuser osmuser
sudo -u postgres createdb --encoding=UTF8 --owner=osmuser osm
sudo -u postgres psql osm --command='CREATE EXTENSION postgis;'
sudo -u postgres psql osm --command='CREATE EXTENSION hstore;'

Security Considerations

Osm2pgsql does not need any special database rights, it doesn’t need superuser status and doesn’t need to create databases or roles. You should create a database user for the specific use by osm2pgsql which should not have any special PostgreSQL privileges.

Any osm2pgsql setup will need a database to work on. You should create this as PostgreSQL superuser and change ownership (with --owner option on the createdb command or an OWNER= clause on the CREATE DATABASE SQL command) to the user osm2pgsql is using. This way the database user that osm2pgsql uses doesn’t need database creation privileges.

Typical osm2pgsql setups need the postgis and hstore extensions to be enabled in the database. To install these you need superuser privileges in the database. Enable them (with CREATE EXTENSION) as PostgreSQL superuser on the database that osm2pgql is using before you run osm2pgsql.

Of course osm2pgsql needs to be able to create tables and write to the database. Usually it can do this as owner of the database created for it. Using the data, on the other hand, doesn’t need any of those rights. So the map rendering software you are using, for instance, usually only needs to read the data. It is recommended that you run these as a different database user, distinct from the database user osm2pgsql is using and only give that user SELECT rights (with the GRANT command).

Version >= 1.4.0 If you are using a security scheme based on database schemas in your database you can use the --middle-schema and --output-pgsql-schema options and the schema table option in the flex output, respectively, to tell osm2pgsql to load data into specific schemas. You have to create those schemas and give them the correct rights before running osm2pgsql.

Encoding

OpenStreetMap data is from all around the world, it always uses UTF-8 encoding. osm2pgsql will write the data as is into the database, so it has to be in UTF-8 encoding, too.

On any modern system the default encoding for new databases should be UTF-8, but to make sure, you can use the -E UTF8 or --encoding=UTF8 options when creating the database for osm2pgsql with createdb.

Tuning the PostgreSQL Server

Usual installs of the PostgreSQL server come with a default configuration that isn’t well tuned for large databases. You will have to change those settings and restart PostgreSQL before running osm2pgsql, otherwise your system will be much slower than needed.

This section is incomplete and possibly not up to date.

You should tune the following parameters in your postgresql.conf file:

shared_buffers = 2GB
maintenance_work_mem = (10GB)
autovacuum_work_mem = 2GB
work_mem = (50MB)
effective_cache_size = (24GB)
synchronous_commit = off
checkpoint_segments = 100 # only for postgresql <= 9.4
max_wal_size = 1GB # postgresql > 9.4
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9

See more in docs on https://nominatim.org/release-docs/latest/admin/Installation/

For details see also the Resource Consumption section in the Server Configuration chapter in the PostgreSQL documentation.

Running osm2pgsql

Basic command line operation

Osm2pgsql can work in one of two ways: Import only or Import and Update.

If you are new to osm2pgsql we recommend you try the import only approach first and use a small extract of the OSM data.

Import only

OSM data is imported once into the database and will not change afterwards. If you want to update the data, you have to delete it and do a full re-import. This approach is used when updates aren’t needed or only happen rarely. It is also sometimes the only option, for instance when you are using extracts for which change files are not available. This is also a possible approach if disk space is tight, because you don’t need to store data needed only for updates.

In this mode osm2pgsql is used with the -c, --create command line option. This is also the default mode. The following command lines are equivalent:

osm2pgsql -c OSMFILE
osm2pgsql --create OSMFILE
osm2pgsql OSMFILE

In case the system doesn’t have much main memory, you can add the -s, --slim and --drop options. In this case less main memory is used, instead more data is stored in the database. This makes the import much slower, though. See chapter XXX for details on these options.

Import and Update

In this approach OSM data is imported once and afterwards updated more or less regularly from OSM change files. OSM offers minutely, hourly, and daily change files. This mode is used when regular updates are desired and change files are available.

In this mode osm2pgsql is used the first time with the -c, --create command line option (this is also the default mode) and, additionally the -s, --slim options. The following command lines are equivalent:

osm2pgsql -c -s OSMFILE
osm2pgsql --create --slim OSMFILE
osm2pgsql --slim OSMFILE

For the update runs the -a, --append and -s, --slim options must be used. The following command lines are equivalent:

osm2pgsql -a -s OSMFILE
osm2pgsql --append --slim OSMFILE

This approach needs more disk space for your database, because all the information necessary for the updates must be stored somewhere.

Getting Help or Version

To get help or the program version run with the following options:

Short option Long option Description
-h --help Print help. Add -v, --verbose for more verbose help
-V --version Print osm2pgsql version

Database connection

In create and append mode you have to tell osm2pgsql which database to access using the following command line options:

Short option Long option Description
-d DB --database=DB Database name
-U USERNAME --user=USERNAME Database user
-W --password Force password prompt
-H HOST --host=HOST Database server hostname or socket location
-P PORT --port=PORT Database server port

Version >= 1.4.0 Instead of specifying a database name with the -d, --database option you can also specify a connection string in the form of a keyword/value connection string (something like host=localhost port=5432 dbname=mydb) or a URI (postgresql://[user[:password]@][netloc][:port][,...][/dbname][?param1=value1&...]) See the PostgreSQL documentation for details.

Processing the OSM data

Osm2pgsql processing can be separated into multiple steps:

  1. The Input reads the OSM data from the OSM file.
  2. The Middle stores all objects and keeps track of relationships between objects.
  3. The Output transforms the data and loads it into the database.

This is, of course, a somewhat simplified view, but it is enough to understand the operation of the program. The following sections will describe each of the steps in more detail.

The Input

Depending on the operational mode your input file is either

  • an OSM data file (in create mode), or
  • an OSM change file (in append mode)

Usually osm2pgsql will autodetect the file format, but see the -r, --input-reader option below. Osm2pgsql can not work with OSM history files.

See the Appendix C for more information on how to get and prepare OSM data for use with osm2pgsql.

Short option Long option Description
-r FORMAT --input-reader=FORMAT Select format of the input file. Available choices are auto (default) for autodetecting the format, xml for OSM XML format files, o5m for o5m formatted files and pbf for OSM PBF binary format.
-b BBOX --bbox=BBOX Apply a bounding box filter in format MINLON,MINLAT,MAXLON,MAXLAT on the imported data. Example: --bbox -0.5,51.25,0.5,51.75

Use of the -b, --bbox option is not recommended; it is a crude way of limiting the amount of data loaded into the database and it will not always do what you expect it to do, especially at the boundaries of the bounding box. If you use it, choose a bounding box that is larger than your actual area of interest. A better option is to create an extract of the data before using osm2pgsql. See Appendix C for options.

The Middle

The middle keeps track of all OSM objects read by osm2pgsql and the relationships between those objects. It knows, for instance, which ways are used by which nodes, or which members a relation has. It also keeps track of all node locations. It information is necessary to build way geometries from way nodes and relation geometries from members and it is necessary when updating data, because OSM change files only contain changes objects themselves and not all the related objects needed for creating an objects geometry.

More details are in its own chapter.

The Outputs

Osm2pgsql imports OSM data into a PostgreSQL database. How it does this is governed by the output (sometimes called a backend). Several outputs for different use cases are available:

The flex Output Version >= 1.3.0

This is the most modern and most flexible output option. If you are starting a new project, use this output. Many future improvements to osm2pgsql will only be available in this output.

Unlike all the other output options there is almost no limit to how the OSM data can be imported into the database. You can decide which OSM objects should be written to which columns in which database tables. And you can define any transformations to the data you need, for instance to unify a complex tagging schema into a simpler schema if that’s enough for your use case.

This output is described in detail in its own chapter.

The pgsql Output

The pgsql output is the original output and the one that most people who have been using osm2pgsql know about. Many tutorials you find on the Internet only describe this output. It is quite limited in how the data can be written to the database, but many setups still use it.

This output comes in two “flavours”: With the original “C transformation” and with the somewhat newer “Lua transformation” which allows some changes to the data before it is imported.

This output is described in detail in its own chapter.

The gazetteer Output

The gazetteer output is a specialized output used for Nominatim only. There is no information in this manual about its use, see the Nominatim documentation for details.

The multi Output

The multi output is deprecated, it will be removed in a future version of osm2pgsql. If you are using it, switch to the flex output as soon as possible.

The null Output

The null output doesn’t write the data anywhere. It is used for testing and benchmarking, not for normal operation.

Here are the command line options pertaining to the outputs (see later chapters for command line options only used for specific outputs):

Short option Long option Description
-O OUTPUT --output=OUTPUT Select the output. Available outputs are: flex, pgsql (default), gazetteer, multi, and null
-S STYLE --style=STYLE The style file. This specifies how the data is imported into the database, its format depends on the output. (For the pgsql output, the default is /usr/share/osm2pgsql/default.style, for other outputs there is no default.)

The Flex Output

Version >= 1.3.0 The flex output appeared first in version 1.3.0 of osm2pgsql.

The Flex output is experimental. Everything in here is subject to change.

The flex output, as the name suggests, allows for a flexible configuration that tells osm2pgsql what OSM data to store in your database and exactly where and how. It is configured through a Lua file which

  • defines the structure of the output tables and
  • defines functions to map the OSM data to the database data format

Use the -s, --style=FILE option to specify the name of the Lua file.

Unlike the pgsql output, the flex output doesn’t use command line options for configuration, but the Lua config file only.

The Style File

The flex style file is a Lua script. You can use all the power of the Lua language. This description assumes that you are somewhat familiar with the Lua language, but it is pretty easy to pick up the basics and you can use the example config files in the flex-config directory which contain lots of comments to get you started.

All configuration is done through the osm2pgsql object in Lua. It has the following fields and functions:

Field / Function Description
osm2pgsql.version The version of osm2pgsql as a string.
osm2pgsql.mode Either "create" or "append" depending on the command line options (-c, --create or -a, --append).
osm2pgsql.stage Either 1 or 2 (1st/2nd stage processing of the data). See below.
osm2pgsql.define_node_table(NAME, COLUMNS[, OPTIONS]) Define a node table.
osm2pgsql.define_way_table(NAME, COLUMNS[, OPTIONS]) Define a way table.
osm2pgsql.define_relation_table(NAME, COLUMNS[, OPTIONS]) Define a relation table.
osm2pgsql.define_area_table(NAME, COLUMNS[, OPTIONS]) Define an area table.
osm2pgsql.define_table(OPTIONS) Define a table. This is the more flexible function behind all the other define_*_table() functions. It gives you more control than the more convenient other functions.

Osm2pgsql also provides some additional functions in the Lua helper library described in Appendix B.

Defining a table

You have to define one or more tables where your data should end up. This is done with the osm2pgsql.define_table() function or one of the slightly more convenient functions osm2pgsql.define_(node|way|relation|area)_table().

osm2pgsql.define_table(OPTIONS)
osm2pgsql.define_(node|way|relation|area)_table(NAME, COLUMNS[, OPTIONS])

Here NAME is the name of the table, COLUMNS is a list of Lua tables describing the columns as documented below. OPTIONS is a Lua table with options for the table as a whole. When using the define_table() command, the NAME and COLUMNS are specified as options name and columns, respectively.

Each table is either a node table, way table, relation table, or area table. This means that the data for that table comes primarily from a node, way, relation, or area, respectively. Osm2pgsql makes sure that the OSM object id will be stored in the table so that later updates to those OSM objects (or deletions) will be properly reflected in the tables. Area tables are special, they can contain data derived from ways or from relations. Way ids will be stored as is, relation ids will be stored as negative numbers.

With the osm2pgsql.define_table() function you can also define tables that

  • don’t have any ids, but those tables will never be updated by osm2pgsql
  • take any OSM object, in this case the type of object is stored in an additional char(1) column.
  • are in a specific PostgresSQL tablespace (set option data_tablespace) or that get their indexes created in a specific tablespace (set option index_tablespace).
  • are in a specific schema (set option schema). Note that the schema has to be created before you start osm2pgsql.

If you are using the osm2pgsql.define_(node|way|relation|area)_table() convenience functions, osm2pgsql will automatically create an id column named (node|way|relation|area)_id, respectively. If you want more control over the id column(s), use the osm2pgsql.define_table() function.

Most tables will have a geometry column. (Currently only zero or one geometry columns are supported.) The types of the geometry column possible depend on the type of the input data. For node tables you are pretty much restricted to point geometries, but there is a variety of options for relation tables for instance.

The supported geometry types are:

Geometry type Description
point Point geometry, usually created from nodes.
linestring Linestring geometry, usually created from ways.
polygon Polygon geometry for area tables, created from ways or relations.
multipoint Currently not used.
multilinestring Created from (possibly split up) ways or relations.
multipolygon For area tables, created from ways or relations.
geometry Any kind of geometry. Also used for area tables that should hold both polygon and multipolygon geometries.

By default geometry columns will be created in web mercator (EPSG 3857). To change this, set the projection parameter of the column to the EPSG code you want (or one of the strings latlon(g), WGS84, or merc(ator), case is ignored).

There is one special geometry column type called area. It can be used in addition to a polygon or multipolygon column. Unlike the normal geometry column types, the resulting database type will not be a geometry type, but real. It will be filled automatically with the area of the geometry. The area will be calculated in web mercator, or you can set the projection parameter of the column to 4326 to calculate it with WGS84 coordinates. Other projections are currently not supported.

In addition to id and geometry columns, each table can have any number of “normal” columns using any type supported by PostgreSQL. Some types are specially recognized by osm2pgsql: text, boolean, int2 (smallint), int4 (int, integer), int8 (bigint), real, hstore, and direction. See the “Type conversion” section for details.

Instead of the above types you can use any SQL type you want. If you do that you have to supply the PostgreSQL string representation for that type when adding data to such columns (or Lua nil to set the column to NULL).

In the table definitions the columns are specified as a list of Lua tables with the following keys:

Key Description
column The name of the PostgreSQL column (required).
type The type of the column as described above (required).
not_null Set to true to make this a NOT NULL column. (Optional, default false.)
create_only Set to true to add the column to the CREATE TABLE command, but do not try to fill this column when adding data. This can be useful for SERIAL columns or when you want to fill in the column later yourself. (Optional, default false.)
projection On geometry columns only. Set to the EPSG id or name of the projection. (Optional, default web mercator, 3857.)

All the osm2pgsql.define*table() functions return a database table object. You can call the following functions on it:

Function Description
name() The name of the table as specified in the define function.
schema() The schema of the table as specified in the define function.
columns() The columns of the table as specified in the define function.
add_row() Add a row to the database table. See below for details.

Processing callbacks

You are expected to define one or more of the following functions:

Callback function Description
osm2pgsql.process_node(object) Called for each new or changed node.
osm2pgsql.process_way(object) Called for each new or changed way.
osm2pgsql.process_relation(object) Called for each new or changed relation.

They all have a single argument of type table (here called object) and no return value. If you are not interested in all object types, you do not have to supply all the functions.

These functions are called for each new or modified OSM object in the input file. No function is called for deleted objects, osm2pgsql will automatically delete all data in your database tables that derived from deleted objects. Modifications are handled as deletions followed by creation of a “new” object, for which the functions are called.

The parameter table (object) has the following fields and functions:

Field / Function Description
id The id of the node, way, or relation.
tags A table with all the tags of the object.
version Version of the OSM object. (*)
timestamp Timestamp of the OSM object, time in seconds since the epoch (midnight 1970-01-01). (*)
changeset Changeset containing this version of the OSM object. (*)
uid User id of the user that created or last changed this OSM object. (*)
user User name of the user that created or last changed this OSM object. (*)
grab_tag(KEY) Return the tag value of the specified key and remove the tag from the list of tags. (Example: local name = object:grab_tag('name')) This is often used when you want to store some tags in special columns and the rest of the tags in an hstore column.
get_bbox() Get the bounding box of the current node or way. This function returns four result values: the lot/lat values for the bottom left corner of the bounding box, followed by the lon/lat values of the top right corner. Both lon/lat values are identical in case of nodes. Example: lon, lat, dummy, dummy = object.get_bbox() (This function doesn’t work for relations currently.)
is_closed Ways only: A boolean telling you whether the way geometry is closed, i.e. the first and last node are the same.
nodes Ways only: An array with the way node ids.
members Relations only: An array with member tables. Each member table has the fields type (values n, w, or r), ref (member id) and role.

These are only available if the -x|--extra-attributes option is used and the OSM input file actually contains those fields.

You can do anything in those processing functions to decide what to do with this data. If you are not interested in that OSM object, simply return from the function. If you want to add the OSM object to some table call the add_row() function on that table:

-- definition of the table:
table_pois = osm2pgsql.define_node_table('pois', {
    { column = 'tags', type = 'hstore' },
    { column = 'name', type = 'text' },
    { column = 'geom', type = 'point' },
})
...
function osm2pgsql.process_node(object)
...
    table_pois:add_row({
        tags = object.tags,
        name = object.tags.name,
        geom = { create = 'point' }
    })
...
end

The add_row() function takes a single table parameter, that describes what to fill into all the database columns. Any column not mentioned will be set to NULL.

The geometry column is somewhat special. You have to define a geometry transformation that will be used to transform the OSM object data into a geometry that fits into the geometry column. See the next section for details.

Note that you can’t set the object id, this will be handled for you behind the scenes.

Geometry transformations

Currently these geometry transformations are supported:

  • { create = 'point' }. Only valid for nodes, create a ‘point’ geometry.
  • { create = 'line' }. For ways or relations. Create a ‘linestring’ or ‘multilinestring’ geometry.
  • { create = 'area' } For ways or relations. Create a ‘polygon’ or ‘multipolygon’ geometry.

Some of these transformations can have parameters:

  • The line transformation has an optional parameter split_at. If this is set to anything other than 0, linestrings longer than this value will be split up into parts no longer than this value.
  • The area transformation has an optional parameter multi. If this is set to false (the default), a multipolygon geometry will be split up into several polygons. If this is set to true, the multipolygon geometry is kept as one. It depends on this parameter whether you need a polygon or multipolygon geometry column.

If no geometry transformation is set, osm2pgsql will, in some cases, assume a default transformation. These are the defaults:

  • For node tables, a point column gets the node location.
  • For way tables, a linestring column gets the complete way geometry, a polygon column gets the way geometry as area (if the way is closed and the area is valid).

Stages

When processing OSM data, osm2pgsql reads the input file(s) in order, nodes first, then ways, then relations. This means that when the ways are read and processed, osm2pgsql can’t know yet whether a way is in a relation (or in several). But for some use cases we need to know in which relations a way is and what the tags of these relations are or the roles of those member ways. The typical case are relations of type route (bus routes etc.) where we might want to render the name or ref from the route relation onto the way geometry.

The osm2pgsql flex output supports this use case by adding an additional “reprocessing” step. Osm2pgsql will call the Lua function osm2pgsql.select_relation_members() for each added, modified, or deleted relation. Your job is to figure out which way members in that relation might need the information from the relation to be rendered correctly and return those ids in a Lua table with the only field ‘ways’. This is usually done with a function like this:

function osm2pgsql.select_relation_members(relation)
    if relation.tags.type == 'route' then
        return { ways = osm2pgsql.way_member_ids(relation) }
    end
end

Instead of using the helper function osm2pgsql.way_member_ids() which returns the ids of all way members, you can write your own code, for instance if you want to check the roles.

Note that select_relation_members() is called for deleted relations and for the old version of a modified relation as well as for new relations and the new version of a modified relation. This is needed, for instance, to correctly mark member ways of deleted relations, because they need to be updated, too. The decision whether a way is to be marked or not can only be based on the tags of the relation and/or the roles of the members. If you take other information into account, updates might not work correctly.

In addition you have to store whatever information you need about the relation in your process_relation() function in a global variable.

After all relations are processed, osm2pgsql will reprocess all marked ways by calling the process_way() function for them again. This time around you have the information from the relation in the global variable and can use it.

If you don’t mark any ways, nothing will be done in this reprocessing stage.

(It is currently not possible to mark nodes or relations. This might or might not be added in future versions of osm2pgsql.)

You can look at osm2pgsql.stage to see in which stage you are.

You want to do all the processing you can in stage 1, because it is faster and there is less memory overhead. For most use cases, stage 1 is enough.

Processing in two stages can add quite a bit of overhead. Because this feature is new, there isn’t much operational experience with it. So be a bit careful when you are experimenting and watch memory and disk space consumption and any extra time you are using. Keep in mind that:

  • All data stored in stage 1 for use in stage 2 in your Lua script will use main memory.
  • Keeping track of way ids marked in stage 1 needs some memory.
  • To do the extra processing in stage 2, time is needed to get objects out of the object store and reprocess them.
  • Osm2pgsql will create an id index on all way tables to look up ways that need to be deleted and re-created in stage 2.

Type conversions

The add_row() command will try its best to convert Lua values into corresponding PostgreSQL values. But not all conversions make sense. Here are the detailed rules:

  1. Lua values of type function, userdata, or thread will always result in an error.
  2. The Lua type nil is always converted to NULL.
  3. If the result of a conversion is NULL and the column is defined as NOT NULL, an error is thrown.
  4. The Lua type table is converted to the PostgreSQL type hstore if and only if all keys and values in the table are string values. A Lua table can not be converted to any other PostgreSQL type.
  5. For boolean columns: The number 0 is converted to false, all other numbers are true. Strings are converted as follows: "yes", "true", "1" are true; "no", "false", "0" are false, all others are NULL.
  6. For integer columns (int2, int4, int8): Boolean true is converted to 1, false to 0. Numbers that are not integers or outside the range of the type result in NULL. Strings are converted to integers if possible otherwise the result is NULL.
  7. For real columns: Booleans result in an error, all numbers are used as is, strings are converted to a number, if that is not possible the result is NULL.
  8. For direction columns (stored as int2 in the database): Boolean true is converted to 1, false to 0. The number 0 results in 0, all positive numbers in 1, all negative numbers in -1. Strings "yes" and "1" will result in 1, "no" and "0" in 0, "-1" in -1. All other strings will result in NULL.
  9. For text columns and any other not specially recognized column types, booleans result in an error and numbers are converted to strings.

If you want any other conversions, you have to do them yourself in your Lua code. Osm2pgsql provides some helper functions for other conversions, see the Lua helper library (Appendix B).

The Pgsql Output

This chapter is incomplete.

The pgsql output is the original output osm2pgsql started with. It was designed for rendering OpenStreetMap data, principally with Mapnik. It is still widely used although it is somewhat limited in how the data can be represented in the database.

If you are starting a new project with osm2pgsql, we recommend you use the flex output instead. The pgsql output will not receive all the new features that the flex output has or will get in the future. (There are a lot of old configs and styles out there that need the pgsql output, though. So it will not go away anytime soon.)

Database Layout

The pgsql output always creates a fixed list of four tables shown in the following table. The PREFIX can be set with the -p, --prefix option, the default is planet_osm.

Table Description
PREFIX_point Point geometries created from nodes.
PREFIX_line Line geometries created from ways and relations tagged type=route.
PREFIX_roads Contains some of the same data as the line table, but with attributes selected for low-zoom rendering. It does not only contain roads!
PREFIX_polygon Polygon geometries created from closed ways and relations tagged type=multipolygon or type=boundary.

All tables contain a geometry column named way (which is not the most intuitive name, but it can’t be changed now because of backwards compatibility).

Style file

Some aspects of how the pgsql output converts OSM data into PostgreSQL tables can be configured via a style file. The default style file that is usually installed with osm2pgsql is suitable for rendering the standard OSM Mapnik style or similar styles. It also contains the documentation of the style syntax. With a custom style file, you can control how different object types and tags map to different columns and data types in the database.

The style file is a plain text file containing four columns separated by spaces. As each OSM object is processed it is checked against conditions specified in the first two columns. If they match, processing options from the third and fourth columns are applied.

Column Description
1. OSM object type Can be node, way or both separated by a comma. way will also apply to relations with type=multipolygon, type=boundary, or type=route; all other relations are ignored.
2. Tag The tag to match on. If the fourth column is linear or polygon, a column for this tag will be created in each of the point, line, polygon, and road tables.
3. PostgreSQL data type Specifies how data will be stored in the tag’s PostgreSQL table column. Possible values are text, int4, or real. If the fourth column is delete or phstore, this column has no meaning and should just be set to text.
4. Flags Zero or more flags separated by commas. For possible values see below.

Possible values for the flags are:

Flag Description
linear Specifies that ways with this tag should be imported as lines by default, even if they are closed. Other conditions can still override this.
polygon Specifies that closed ways with this tag should be imported as polygons by default. This will override any linear flags that would apply to the same object. Closed ways with area=yes and relations with type=multipolygon or type=boundary will be imported as polygons even if no polygon flag is set. Non-closed ways and closed ways with area=no will always be imported as lines.
nocolumn The two flags above automatically create a column for the specified tag. This flag overrides this behaviour such that no column is created. This is especially useful for hstore, where all key value data is stored in the hstore column such that no dedicated columns are needed.
phstore The same as polygon,nocolumn for backward compatibility
delete Prevents the specified tag (but not the entire object) from being stored in the database. Useful for tags that tend to have long values but will not be used for rendering, such as source=*. This flag only affects --slim mode imports.
phstore Behaves like the polygon flag, but is used in hstore mode when you do not want to turn the tag into a separate PostgreSQL column.
nocache This flag is deprecated and does nothing.

The style file may also contain comments. Any text between a # and the end of the line will be ignored.

Special ‘tags’

There are several special values that can be used in the tag column (second column) of the style file for creating additional fields in the database which contain things other than tag values.

Special tag Description
way_area Creates a database column that stores the area (calculated in the units of the projection, normally Mercator meters) for any objects imported as polygons. Use with real as the data type. See also the --reproject-area option.
z_order Adds a column that is used for ordering objects in the render. It mostly applies to objects with highway=* or railway=*. Use with int4 as the data type.
osm_user Adds a column that stores the username of the last user to edit an object in the database (*).
osm_uid Adds a column that stores the user ID number of the last user to edit an object in the database (*).
osm_version Adds a column that stores the version of an object in the database (ie, how many times the object has been modified) (*).
osm_timestamp Adds a column that stores the date and time that the most recent version of an object was added to OpenStreetMap (*).

To use these, you must use the command line option --extra-attributes when importing.

Schemas and Tablespaces

Usually all tables, indexes and functions that the pgsql output creates are in the default public schema and use the default tablespace.

Version >= 1.4.0 You can use the command line option --output-pgsql-schema=SCHEMA to tell osm2pgsql that it should use the specified schema to create the tables, indexes, and functions in. Note that you have to create the schema before running osm2pgsql and make sure the database user was granted the rights to create tables, indexes, and functions in this schema. Read more about schemas in the PostgreSQL documentation.

Sometimes you want to create special PostgreSQL tablespaces and put some tables or indexes into them. Having often used indexes on fast SSD drives, for instance, can speed up processing a lot. There are three command line options that the pgsql output interprets: To set the tablespace for output tables and indexes, use the --tablespace-main-data=TABLESPC and --tablespace-main-index=TABLESPC options, respectively. You can also use the -i, --tablespace-index=TABLESPC option which will set the tablespace for the pgsql output as well as for the middle! Note that it is your job to create the tablespaces before calling osm2pgsql and making sure the disk behind it is large enough.

Coastline Processing

The natural=coastline tag is suppressed by default, even if you import the natural=* key. Many maps get the coastlines from a different source, so it does not need to import them from the input file. You can use the --keep-coastlines parameter to change this behavior if you want coastlines in your database. See the OSM wiki for more information on coastline processing.

Pgsql Output command line options

These are the command line options interpreted by the pgsql output:

Short option Long option Description
-i TABLESPC --tablespace-index=TABLESPC Store all indices in a separate PostgreSQL tablespace named by this parameter. This allows one to e.g. store the indices on faster storage like SSDs.
  --tablespace-main-data=TABLESPC Store the data tables (non slim) in the given tablespace.
  --tablespace-main-index=TABLESPC Store the indices of the main tables (non slim) in the given tablespace.
  --latlong Store data in degrees of latitude & longitude.
-m --merc Store data in Spherical Mercator (Web Mercator, EPSG:3857) (the default).
-E SRID --proj=SRID Use projection EPSG:SRID.
-p PREFIX --prefix=PREFIX Prefix for table names (default: planet_osm).
  --tag-transform-script=SCRIPT Specify a lua script to handle tag filtering and normalisation. The script contains callback functions for nodes, ways and relations, which each take a set of tags and returns a transformed, filtered set of tags which are then written to the database.
-x --extra-attributes Include attributes for each object in the database. This includes the username, userid, timestamp and version. Note: this option also requires additional entries in your style file.
-k --hstore Add tags without column to an additional hstore (key/value) column to PostgreSQL tables.
-j --hstore-all Add all tags to an additional hstore (key/value) column in PostgreSQL tables.
-z KEY_PREFIX --hstore-column=KEY_PREFIX Add an additional hstore (key/value) column containing all tags that start with the specified string, eg --hstore-column “name:” will produce an extra hstore column that contains all name:xx tags
  --hstore-match-only Only keep objects that have a value in one of the columns (normal action with --hstore is to keep all objects).
  --hstore-add-index Create indices for the hstore columns during import.
-G --multi-geometry Normally osm2pgsql splits multi-part geometries into separate database rows per part. A single OSM id can therefore have several rows. With this option, osm2pgsql instead generates multi-geometry features in the PostgreSQL tables.
-K --keep-coastlines Keep coastline data rather than filtering it out. By default objects tagged natural=coastline will be discarded based on the assumption that Shapefiles generated by OSMCoastline (https://osmdata.openstreetmap.de/) will be used for the coastline data.
  --reproject-area Compute area column using spherical mercator coordinates.
  --output-pgsql-schema=SCHEMA Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the pgsql and multi outputs (default is no schema, i.e. the public schema is used).

Middle

This chapter is incomplete.

The middle keeps track of all OSM objects read by osm2pgsql and the relationships between those objects. It knows, for instance, which ways are used by which nodes, or which members a relation has. It also keeps track of all node locations. It information is necessary to build way geometries from way nodes and relation geometries from members and it is necessary when updating data, because OSM change files only contain changes objects themselves and not all the related objects needed for creating an objects geometry.

Short option Long option Description
-i NAME --tablespace-index=NAME Store all indices in a separate PostgreSQL tablespace named by this parameter. This allows one to e.g. store the indices on faster storage like SSDs.
  --tablespace-slim-data=NAME Store the slim mode tables in the given tablespace.
  --tablespace-slim-index=NAME Store the indices of the slim mode tables in the given tablespace.
-s --slim Store temporary data in the database. Without this mode, all temporary data is stored in RAM and if you do not have enough the import will not work successfully. With slim mode, you should be able to import the data even on a system with limited RAM, although if you do not have enough RAM to cache at least all of the nodes, the time to import the data will likely be greatly increased.
  --drop Drop the slim mode tables from the database once the import is complete. This can greatly reduce the size of the database, as the slim mode tables typically are the same size, if not slightly bigger than the main tables. It does not, however, reduce the maximum spike of disk usage during import. It can furthermore increase the import speed, as no indices need to be created for the slim mode tables, which (depending on hardware) can nearly halve import time. Slim mode tables however have to be persistent if you want to be able to update your database, as these tables are needed for diff processing.
-C NUM --cache=NUM Only for slim mode: Use up to NUM MB of RAM for caching nodes. Giving osm2pgsql sufficient cache to store all imported nodes typically greatly increases the speed of the import. Each cached node requires 8 bytes of cache, plus about 10% - 30% overhead. As a rule of thumb, give a bit more than the size of the import file in PBF format. If the RAM is not big enough, use about 75% of memory. Make sure to leave enough RAM for PostgreSQL. It needs at least the amount of shared_buffers given in its configuration. Defaults to 800.
  --cache-strategy=STRATEGY There are a number of different modes in which osm2pgsql can organize its node cache in RAM. These are optimized for different assumptions of the data and the hardware resources available. Currently available strategies are dense, chunked, sparse and optimized. dense assumes that the node id numbers are densely packed, i.e. only a few IDs in the range are missing / deleted. For planet extracts this is usually not the case, making the cache very inefficient and wasteful of RAM. sparse assumes node IDs in the data are not densely packed, greatly increasing caching efficiency in these cases. If node IDs are densely packed, like in the full planet, this strategy has a higher overhead for indexing the cache. optimized uses both dense and sparse strategies for different ranges of the ID space. On a block by block basis it tries to determine if it is more effective to store the block of IDs in sparse or dense mode. This is the default and should be typically used.
-x --extra-attributes Include attributes for each object in the database. This includes the username, userid, timestamp and version. Note: this option also requires additional entries in your style file.
  --flat-nodes=FILENAME The flat-nodes mode is a separate method to store slim mode node information on disk. Instead of storing this information in the main PostgreSQL database, this mode creates its own separate custom database to store the information. As this custom database has application level knowledge about the data to store and is not general purpose, it can store the data much more efficiently. Storing the node information for the full planet requires more than 300GB in PostgreSQL, the same data is stored in “only” 50GB using the flat-nodes mode. This can also increase the speed of applying diff files. This option activates the flat-nodes mode and specifies the location of the database file. It is a single large file. This mode is only recommended for full planet imports as it doesn’t work well with small imports. The default is disabled.
  --middle-way-node-index-id-shift SHIFT Set ID shift for way node bucket index in middle. Experts only. See XXX for details.

Expire

This chapter is incomplete.

When osm2pgsql is processing OSM changes, it can create a list of tiles that will be affected by those changes. This list can later be used to delete any changed tiles you might have cached. Osm2pgsql only creates this list. How to actually expire the tiles it outside the scope of osm2pgsql. Expire only makes sense in append mode.

Short option Long option Description
-e [MIN_ZOOM]MAX_ZOOM --expire-tiles=[MIN_ZOOM-]MAX-ZOOM Create a tile expiry list
-o FILENAME --expire-output=FILENAME Output file name for expired tiles list
  --expire-bbox-size=SIZE Max size for a polygon to expire the whole polygon, not just the boundary

Advanced Topics

These are some advanced options.

Needs more detail here.

Short option Long option Description
-I --disable-parallel-indexing By default osm2pgsql initiates the index building on all tables in parallel to increase performance. This can be a disadvantage on slow disks, or if you don’t have enough RAM for PostgreSQL to perform up to 7 parallel index building processes (e.g. because maintenance_work_mem is set high).
  --number-processes=THREADS Specifies the number of parallel threads used for certain operations. If disks are fast enough e.g. if you have an SSD, then this can greatly increase speed of the “going over pending ways” and “going over pending relations” stages on a multi-core server.

Tips & Tricks

The flex output allows a wide range of configuration options. Here are some extra tips & tricks.

Primary Keys and Unique IDs

It is often desirable to have a unique PRIMARY KEY on database tables. Many programs need this.

There seems to be a natural unique key, the OSM node, way, or relation ID the data came from. But there is a problem with that: Depending on the configuration, osm2pgsql sometimes adds several rows to the output tables with the same OSM ID. This typically happens when long linestrings are split into shorter pieces or multipolygons are split into their constituent polygons, but it can also happen if your Lua configuration file adds two rows to the same table.

If you need unique keys on your database tables there are two options: Using those natural keys and making sure that you don’t have duplicate entries. Or adding an additional ID column. The latter is easier to do and will work in all cases, but it adds some overhead.

Using Natural Keys

To use OSM IDs as primary keys, you have to make sure that

  • You only ever add a single row per OSM object to an output table, i.e. do not call add_row multiple times on the same table for the same OSM object.
  • osm2pgsql doesn’t split long linestrings into smaller ones. So you can not use the split_at option on the geometry transformation.
  • osm2pgsql doesn’t split multipolygons into polygons. So you have to set multi = true on all area geometry transformations.

You probably also want an index on the ID column. If you are running in slim mode, osm2pgsql will create that index for you. But in non-slim mode you have to do this yourself with CREATE UNIQUE INDEX. You can also use ALTER TABLE to make the column an “official” primary key column.

Using an Additional ID Column

PostgreSQL has the somewhat magic “serial” data types. If you use that datatype in a column definition, PostgreSQL will add an integer column to the table and automatically fill it with an autoincrementing value.

In the flex config you can add such a column to your tables with something like this:

...
{ column = 'id', type = 'serial', create_only = true },
...

The create_only tells osm2pgsql that it should create this column but not try to fill it when adding rows (because PostgreSQL does it for us).

You probably also want an index on this column. After the first import of your data using osm2pgsql, use CREATE UNIQUE INDEX to create one. You can also use ALTER TABLE to make the column an “official” primary key column.

Using create_only columns for postprocessed data

Sometimes it is useful to have data in table rows that osm2pgsql can’t create. For instance you might want to store the center of polygons for faster rendering of a label.

To do this define your table as usual and add an additional column, marking it create_only. In our example case the type of the column should be the PostgreSQL type GEOMETRY(Point, 3857), because we don’t want osm2pgsql to do anything special here, just create the column with this type as is.

polygons_table = osm2pgsql.define_area_table('polygons', {
    { column = 'tags', type = 'hstore' },
    { column = 'geom', type = 'geometry' },
    { column = 'center', type = 'GEOMETRY(Point, 3857)', create_only = true },
    { column = 'area', type = 'area' },
})

After running osm2pgsql as usual, run the following SQL command:

UPDATE polygons SET center = ST_Centroid(geom) WHERE center IS NULL;

If you are updating the data using osm2pgsql --append, you have to do this after each update. When osm2pgsql inserts new rows they will always have a NULL value in center, the WHERE condition makes sure that we only do this (possibly expensive) calculation once.

Projections

Osm2pgsql can create geometries in many projections. If you are using the pgsql output, the projection can be chosen with command line options, when using the flex output, the projections are specified in the Lua style file. The default is always “Web Mercator”.

Latlong (WGS84)

EPSG:4326

The original latitude & longitude coordinates from OpenStreetMap in the WGS84 coordinate reference system. This is typically chosen if you want to do some kind of analytics on the data or reproject it later.

Web Mercator

EPSG:3857

This is the projection used most often for tiled web maps. It is the default in osm2pgsql.

Data beyond about 85° North and South will be cut off, because it can not be represented in this projection.

Other Projections

If osm2pgsql was compiled with support for the PROJ library, it supports all projections supported by that library.

Lua Library for Flex Output

The flex output includes a small library of useful Lua helper functions.

Note: These functions are available on the flex output, they cannot be used in the Lua tag transformations of the pgsql output.

Nameclamp
Synopsisosm2pgsql.clamp(VALUE, MIN, MAX)
DescriptionReturn VALUE if it is between MIN and MAX, MIN if it is smaller, or MAX if it is larger. All parameters must be numbers.
Exampleosm2pgsql.clamp(2, 3, 4) ⟶ 3
Namemake_check_values_func
Synopsisosm2pgsql.make_check_values_func(VALUES[, DEFAULT])
DescriptionReturn a function that will check its only argument against the list of VALUES. If it is in that list, it will be returned, otherwise the DEFAULT (or nil) will be returned.
Examplelocal get_highway_value = osm2pgsql.make_check_values_func({ 'motorway', 'trunk', 'primary', 'secondary', 'tertiary' }, 'road') ... if object.tags.highway then local highway_type = get_highway_value(object.tags.highway) ... end
Nameway_member_ids
Synopsisosm2pgsql.way_member_ids(RELATION)
DescriptionReturn an array table with the ids of all way members of RELATION.
Examplefunction osm2pgsql.select_relation_members(relation) if relation.tags.type == 'route' then return { ways = osm2pgsql.way_member_ids(relation) } end end
Namemake_clean_tags_func
Synopsisosm2pgsql.make_clean_tags_func(KEYS)
DescriptionReturn a function that will remove all tags (in place) from its only argument if the key matches KEYS. KEYS is an array containing keys or key prefixes (ending in `*`). The generated function will return `true` if it removed all tags, `false` if there are still tags left.
Examplelocal clean_tags = osm2pgsql.make_clean_tags_func{'source', 'source:*', 'note'} function osm2pgsql.process_node(node) if clean_tags(node.tags) then return end ... end

Getting and Preparing OSM Data

Osm2pgsql imports OSM data into a database. But where does this data come from? This appendix shows how to get the data and, possibly, prepare it for use with osm2pgsql.

Data Formats

There are several common formats for OSM data and osm2pgsql supports all of them (XML, PBF, and O5M). If you have the option, you should prefer the PBF format, because the files are smaller than those in other formats and faster to read.

OSM data files usually have a suffix .osm, .osm.gz, .osm.bz2, or .osm.pbf, sometimes just .pbf. They are used in osm2pgsql “create” mode.

OSM change files usually have the suffix .osc.gz. They are used in osm2pgsql “append” mode.

Getting the data

The Planet file

The OpenStreetMap project publishes the so-called “planet file”, which contains a dump of the current full OSM database. This dump is available in XML and PBF format.

Downloads are available on planet.osm.org and several mirrors.

If you are new to osm2pgsql we recommend you start experimenting with a small extract, not the planet file! The planet file is huge (tens of GBytes) and it will take many hours to import.

Geographical Extracts

Importing data into the database takes time and uses a lot of disk space. If you only need a portion of the OSM data it often makes sense to only import an extract of the data.

Geographical extracts for countries, states, cities, etc. are available from several sources.

The extracts from Geofabrik are very popular. They are updated daily and also offer daily change files suitable for updating an osm2pgsql database.

If you can’t find a suitable extract, see below for creating your own.

Updating an existing database

There are two steps when updating an existing database.

  1. Download a change file (“diff”) with recent changes of the OSM data
  2. Load changes into the database

The second step is done with osm2pgsql in “append” mode, for the first step there are several options. We recommend the pyosmium_get_changes.py tool from the PyOsmium project. With it, downloading all changes since you ran the program the last time is just a single command.

OSM change files, sometimes called replication diffs, are available from planet.osm.org. Depending on how often you want to update your database, you can get minutely, hourly, or daily change files.

Some services offering OSM data extracts for download also offer change files suitable for updating those extracts.

When you have changes for many months or years it might make more sense to drop your database completely and re-import from a more current OSM data file instead of updating the database from change files.

If you have imported an extract into an osm2pgsql database but there are no change files for the area of the extract, you can use still use the extracts from planet.osm.org to update your database. But, because those extracts contain data for the whole planet, your database will keep accumulating more and more data outside the area of your extract that you don’t really need.

Preparing OSM data for use by osm2pgsql

Before some OSM data file can be given to osm2pgsql it is sometimes necessary to prepare it in some way. This chapter explains some options.

For most of these steps, the recommended application is the osmium command line tool, which offers a wide range of functionality of slicing and dicing OSM data.

One handy command is osmium fileinfo which can tell you a lot about an OSM file, including how many objects it contains, whether it is sorted, etc.

Creating geographical extracts

You can create your own extracts from the planet file or from existing extracts with the osmium extract command. It can create extracts of OSM data from a bounding box or from a boundary.

Merging OSM data files

If you are working with extracts you sometimes have several extracts, lets say for different countries, that you want to load into the same database. The correct approach in this case is to merge those extracts first and then import the resulting file with osm2pgsql.

You can use the osmium merge command for this.

Merging OSM change files

To speed up processing when you have many OSM change files, you can merge several change files into larger files and then process the larger files with osm2pgsql. The osmium merge-changes command will do this for you. Make sure to use the option -s, --simplify.

Usually you should not merge change files for more than a day or so when doing this, otherwise the amount of changes osm2pgsql has to process in one go becomes too large.

OSM data with negative ids

OSM data usually uses positive numbers for the ids of nodes, ways, and relations. Negative numbers are sometimes used for inofficial OSM data, for instance when non-OSM data is mixed with OSM data to make sure the are no id clashes.

Osm2pgsql can only handle positive ids. It uses negative ids internally (for multipolygon geometries in the database).

If you have negative ids in your input file, you can renumber it first. You can use the osmium renumber command for this.

Older versions of osm2pgsql will sometimes work or appear to work with negative ids, but it is not recommended to rely on this, because the processed data might be corrupted. Versions from 1.3.0 warn when you are using negative ids. From version 1.4.0 on, only positive ids are allowed.

Handling unsorted OSM data

OSM data files are almost always sorted, first nodes in order of their ids, then ways in order of their ids, then relations in order of their ids. The planet files, change files, and usual extracts all follow this convention.

Osm2pgsql can only read OSM files ordered in this way. This allows some optimizations in the code which speed up the normal processing.

If you have an unsorted input file, you should sort it first. You can use the osmium sort command for this.

Older versions of osm2pgsql will sometimes work or appear to work with unsorted data, but it is not recommended to rely on this, because the processed data might be corrupted. Versions from 1.3.0 warn when you are using unsorted data. From version 1.4.0 on, only sorted OSM files are allowed.

Working with OSM history data

OpenStreetMap offers complete dumps of all OSM data which include not only the current version of all objects like the normal planet dump, but also all earlier version of OSM objects including deleted ones.

Like most other OSM software, osm2pgsql can not handle this data.

For some use cases there is a workaround: Create extracts from the full history dump for specific points in time and feed those to osm2pgsql. You can use the osmium time-filter command to create such extracts.

Command Line Options

This appendix contains an overview of all command line options.

Main Options

Short option Long option Description
-a --append Run in append mode.
-c --create Run in create mode (this is the default if -a, --append is not used.

Help/Version Options

Short option Long option Description
-h --help Print help. Add -v, --verbose for more verbose help
-V --version Print osm2pgsql version

Database Options

Short option Long option Description
-d DB --database=DB Database name
-U USERNAME --user=USERNAME Database user
-W --password Force password prompt
-H HOST --host=HOST Database server hostname or socket location
-P PORT --port=PORT Database server port

Input Options

Short option Long option Description
-r FORMAT --input-reader=FORMAT Select format of the input file. Available choices are auto (default) for autodetecting the format, xml for OSM XML format files, o5m for o5m formatted files and pbf for OSM PBF binary format.
-b BBOX --bbox=BBOX Apply a bounding box filter in format MINLON,MINLAT,MAXLON,MAXLAT on the imported data. Example: --bbox -0.5,51.25,0.5,51.75

Middle Options

Short option Long option Description
-i NAME --tablespace-index=NAME Store all indices in a separate PostgreSQL tablespace named by this parameter. This allows one to e.g. store the indices on faster storage like SSDs.
  --tablespace-slim-data=NAME Store the slim mode tables in the given tablespace.
  --tablespace-slim-index=NAME Store the indices of the slim mode tables in the given tablespace.
-s --slim Store temporary data in the database. Without this mode, all temporary data is stored in RAM and if you do not have enough the import will not work successfully. With slim mode, you should be able to import the data even on a system with limited RAM, although if you do not have enough RAM to cache at least all of the nodes, the time to import the data will likely be greatly increased.
  --drop Drop the slim mode tables from the database once the import is complete. This can greatly reduce the size of the database, as the slim mode tables typically are the same size, if not slightly bigger than the main tables. It does not, however, reduce the maximum spike of disk usage during import. It can furthermore increase the import speed, as no indices need to be created for the slim mode tables, which (depending on hardware) can nearly halve import time. Slim mode tables however have to be persistent if you want to be able to update your database, as these tables are needed for diff processing.
-C NUM --cache=NUM Only for slim mode: Use up to NUM MB of RAM for caching nodes. Giving osm2pgsql sufficient cache to store all imported nodes typically greatly increases the speed of the import. Each cached node requires 8 bytes of cache, plus about 10% - 30% overhead. As a rule of thumb, give a bit more than the size of the import file in PBF format. If the RAM is not big enough, use about 75% of memory. Make sure to leave enough RAM for PostgreSQL. It needs at least the amount of shared_buffers given in its configuration. Defaults to 800.
  --cache-strategy=STRATEGY There are a number of different modes in which osm2pgsql can organize its node cache in RAM. These are optimized for different assumptions of the data and the hardware resources available. Currently available strategies are dense, chunked, sparse and optimized. dense assumes that the node id numbers are densely packed, i.e. only a few IDs in the range are missing / deleted. For planet extracts this is usually not the case, making the cache very inefficient and wasteful of RAM. sparse assumes node IDs in the data are not densely packed, greatly increasing caching efficiency in these cases. If node IDs are densely packed, like in the full planet, this strategy has a higher overhead for indexing the cache. optimized uses both dense and sparse strategies for different ranges of the ID space. On a block by block basis it tries to determine if it is more effective to store the block of IDs in sparse or dense mode. This is the default and should be typically used.
-x --extra-attributes Include attributes for each object in the database. This includes the username, userid, timestamp and version. Note: this option also requires additional entries in your style file.
  --flat-nodes=FILENAME The flat-nodes mode is a separate method to store slim mode node information on disk. Instead of storing this information in the main PostgreSQL database, this mode creates its own separate custom database to store the information. As this custom database has application level knowledge about the data to store and is not general purpose, it can store the data much more efficiently. Storing the node information for the full planet requires more than 300GB in PostgreSQL, the same data is stored in “only” 50GB using the flat-nodes mode. This can also increase the speed of applying diff files. This option activates the flat-nodes mode and specifies the location of the database file. It is a single large file. This mode is only recommended for full planet imports as it doesn’t work well with small imports. The default is disabled.
  --middle-way-node-index-id-shift SHIFT Set ID shift for way node bucket index in middle. Experts only. See XXX for details.

Output Options

Short option Long option Description
-O OUTPUT --output=OUTPUT Select the output. Available outputs are: flex, pgsql (default), gazetteer, multi, and null
-S STYLE --style=STYLE The style file. This specifies how the data is imported into the database, its format depends on the output. (For the pgsql output, the default is /usr/share/osm2pgsql/default.style, for other outputs there is no default.)

Pgsql Output Options

Short option Long option Description
-i TABLESPC --tablespace-index=TABLESPC Store all indices in a separate PostgreSQL tablespace named by this parameter. This allows one to e.g. store the indices on faster storage like SSDs.
  --tablespace-main-data=TABLESPC Store the data tables (non slim) in the given tablespace.
  --tablespace-main-index=TABLESPC Store the indices of the main tables (non slim) in the given tablespace.
  --latlong Store data in degrees of latitude & longitude.
-m --merc Store data in Spherical Mercator (Web Mercator, EPSG:3857) (the default).
-E SRID --proj=SRID Use projection EPSG:SRID.
-p PREFIX --prefix=PREFIX Prefix for table names (default: planet_osm).
  --tag-transform-script=SCRIPT Specify a lua script to handle tag filtering and normalisation. The script contains callback functions for nodes, ways and relations, which each take a set of tags and returns a transformed, filtered set of tags which are then written to the database.
-x --extra-attributes Include attributes for each object in the database. This includes the username, userid, timestamp and version. Note: this option also requires additional entries in your style file.
-k --hstore Add tags without column to an additional hstore (key/value) column to PostgreSQL tables.
-j --hstore-all Add all tags to an additional hstore (key/value) column in PostgreSQL tables.
-z KEY_PREFIX --hstore-column=KEY_PREFIX Add an additional hstore (key/value) column containing all tags that start with the specified string, eg --hstore-column “name:” will produce an extra hstore column that contains all name:xx tags
  --hstore-match-only Only keep objects that have a value in one of the columns (normal action with --hstore is to keep all objects).
  --hstore-add-index Create indices for the hstore columns during import.
-G --multi-geometry Normally osm2pgsql splits multi-part geometries into separate database rows per part. A single OSM id can therefore have several rows. With this option, osm2pgsql instead generates multi-geometry features in the PostgreSQL tables.
-K --keep-coastlines Keep coastline data rather than filtering it out. By default objects tagged natural=coastline will be discarded based on the assumption that Shapefiles generated by OSMCoastline (https://osmdata.openstreetmap.de/) will be used for the coastline data.
  --reproject-area Compute area column using spherical mercator coordinates.
  --output-pgsql-schema=SCHEMA Use PostgreSQL schema SCHEMA for all tables, indexes, and functions in the pgsql and multi outputs (default is no schema, i.e. the public schema is used).

Expire Options

Short option Long option Description
-e [MIN_ZOOM]MAX_ZOOM --expire-tiles=[MIN_ZOOM-]MAX-ZOOM Create a tile expiry list
-o FILENAME --expire-output=FILENAME Output file name for expired tiles list
  --expire-bbox-size=SIZE Max size for a polygon to expire the whole polygon, not just the boundary

Advanced Options

Short option Long option Description
-I --disable-parallel-indexing By default osm2pgsql initiates the index building on all tables in parallel to increase performance. This can be a disadvantage on slow disks, or if you don’t have enough RAM for PostgreSQL to perform up to 7 parallel index building processes (e.g. because maintenance_work_mem is set high).
  --number-processes=THREADS Specifies the number of parallel threads used for certain operations. If disks are fast enough e.g. if you have an SSD, then this can greatly increase speed of the “going over pending ways” and “going over pending relations” stages on a multi-core server.