# Config file reference

This page describes the different configuration options for Polars on-premises. The config file is a
standard TOML file with different sections. Any of the configuration can be overridden using
environment variables in the following format: `PC_CUBLET__section_name__key`.

Example configuration files can be found at
[Example Configurations](/polars-on-premises/bare-metal/configuration/example-configurations).

See the sidebar for extensive documentation on important components and their configuration
together.

### Top-level configuration

| Key            | Type    | Description                                                                                                                                              |
| -------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `cluster_id`   | string  | Logical ID for the cluster; workers and scheduler that share this ID will form a single cluster.<br>e.g. `prod-eu-1`; must be unique among all clusters. |
| `instance_id`  | string  | Unique ID for this node within the cluster, used for addressing and leader selection.<br>e.g. `scheduler`, `worker_0`; must be unique per cluster.       |
| `license`      | path    | Absolute path to the Polars on-premises license file required to start the process.<br>e.g. `/etc/polars/license.json`.                                  |
| `memory_limit` | integer | Hard memory budget for all components in this node; enforced via cgroups when delegated.<br>e.g. `1073741824` (1 GiB), `10737418240` (10 GiB).           |

### `[scheduler]` section

| Key                                                  | Type    | Description                                                                                                                                                                                                                                                                                                                                         |
| ---------------------------------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `enabled`                                            | boolean | Whether the scheduler component runs in this process.<br> `true` for the leader node, `false` on pure workers.                                                                                                                                                                                                                                      |
| `allow_local_sinks`                                  | boolean | Whether workers are allowed to write to a shared/local disk visible to the scheduler.<br> `false` for fully remote/storage-only setups, `true` if you have a shared filesystem.                                                                                                                                                                     |
| `n_workers`                                          | integer | Expected number of workers in this cluster; scheduler waits for the latter to be online before running queries.<br>e.g. `4`.                                                                                                                                                                                                                        |
| `anonymous_result_location`                          | object  | Destination for results of queries that do not have an explicit sink. Currently supported local mounted (must be reachable on the exact same path and `allow_local_sinks` enabled) and S3-based. Both options must be network reachable by scheduler, workers, and client.<br>e.g. `/mnt/storage/polars/results`.<br>e.g. `s3://bucket/path/to/key` |
| `anonymous_result_location.local`                    | object  | Object used for local disk-backed anonymous results.                                                                                                                                                                                                                                                                                                |
| `anonymous_result_location.local.path`               | path    | Local path where anonymous results are stored.<br>e.g. `/mnt/storage/polars/results`.                                                                                                                                                                                                                                                               |
| `anonymous_result_location.s3`                       | object  | Object used for S3-backed anonymous results.                                                                                                                                                                                                                                                                                                        |
| `anonymous_result_location.s3.url`                   | string  | S3 bucket url.<br>e.g. `s3://bucket/path/to/key`.                                                                                                                                                                                                                                                                                                   |
| `anonymous_result_location.s3.aws_endpoint_url`      | string  | Storage option configuration, see [`scan_parquet()`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_parquet.html).                                                                                                                                                                                                                |
| `anonymous_result_location.s3.aws_region`            | string  | Storage option configuration.<br/>e.g. `eu-east-1`                                                                                                                                                                                                                                                                                                  |
| `anonymous_result_location.s3.aws_access_key_id`     | string  | Storage option configuration.                                                                                                                                                                                                                                                                                                                       |
| `anonymous_result_location.s3.aws_secret_access_key` | string  | Storage option configuration.                                                                                                                                                                                                                                                                                                                       |
| `client_service`                                     | object  | Object used for configuring the bind address of the client service. This is the service used by the polars-cloud Python client. Defaults to `0.0.0.0:5051`.                                                                                                                                                                                         |
| `client_service.bind_addr`                           | string  | Bind address for the client service.<br>e.g. `0.0.0.0:5051`.                                                                                                                                                                                                                                                                                        |
| `client_service.bind_addr.ip`                        | string  | IP address for the client service bind address.<br>e.g. `192.168.1.1`.                                                                                                                                                                                                                                                                              |
| `client_service.bind_addr.port`                      | integer | Port for the client service bind address.<br>e.g. `5051`.                                                                                                                                                                                                                                                                                           |
| `client_service.bind_addr.hostname`                  | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-1`.                                                                                                                                                                                                                                                                                 |
| `worker_service`                                     | object  | Object used for configuring the bind address of the worker service. This is an internal service used by the workers. Defaults to `0.0.0.0:5050`.                                                                                                                                                                                                    |
| `worker_service.bind_addr`                           | string  | Bind address for the worker service.<br>e.g. `0.0.0.0:5050`.                                                                                                                                                                                                                                                                                        |
| `worker_service.bind_addr.ip`                        | string  | IP address for the worker service bind address.<br>e.g. `192.168.1.1`.                                                                                                                                                                                                                                                                              |
| `worker_service.bind_addr.port`                      | integer | Port for the worker service bind address.<br>e.g. `5050`.                                                                                                                                                                                                                                                                                           |
| `worker_service.bind_addr.hostname`                  | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-2`.                                                                                                                                                                                                                                                                                 |

### `[worker]` section

| Key                                         | Type    | Description                                                                                                                                                                                                                                              |
| ------------------------------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `enabled`                                   | boolean | Whether the worker component runs in this process.<br> `true` on worker nodes, `false` on the dedicated scheduler.                                                                                                                                       |
| `heartbeat_period`                          | string  | Interval for worker heartbeats towards the scheduler, used for liveness and load reporting. Either an ISO 8601 duration format or a jiff friendly duration format (see https://docs.rs/jiff/0.2.18/jiff/fmt/friendly/)<br>e.g. `5 secs`.<br>e.g. `PT5S`. |
| `shuffle_location`                          | object  | Object used for shuffle data storage.                                                                                                                                                                                                                    |
| `shuffle_location.local`                    | object  | Object used for local disk-backed shuffle data storage.                                                                                                                                                                                                  |
| `shuffle_location.local.path`               | path    | Local path where shuffle/intermediate data is stored; fast local SSD is recommended.<br>e.g. `/mnt/storage/polars/shuffle`.                                                                                                                              |
| `shuffle_location.shared_filesystem`        | object  | Object used for shared filesystem-backed shuffle data storage.                                                                                                                                                                                           |
| `shuffle_location.shared_filesystem.path`   | path    | Shared filesystem path where shuffle/intermediate data is stored. Must be accessible by all workers on the same path.<br>e.g. `/mnt/storage/polars/shuffle`.                                                                                             |
| `shuffle_location.s3`                       | object  | Object used for S3-backed shuffle data storage.                                                                                                                                                                                                          |
| `shuffle_location.s3.url`                   | path    | Destination for shuffle/intermediate data.<br>e.g. `s3://bucket/path/to/key`.                                                                                                                                                                            |
| `shuffle_location.s3.aws_endpoint_url`      | string  | Storage option configuration, see [`scan_parquet()`](https://docs.pola.rs/api/python/stable/reference/api/polars.scan_parquet.html).                                                                                                                     |
| `shuffle_location.s3.aws_region`            | string  | Storage option configuration.<br/>e.g. `eu-east-1`                                                                                                                                                                                                       |
| `shuffle_location.s3.aws_access_key_id`     | string  | Storage option configuration.                                                                                                                                                                                                                            |
| `shuffle_location.s3.aws_secret_access_key` | string  | Storage option configuration.                                                                                                                                                                                                                            |
| `task_service`                              | object  | Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to `0.0.0.0:5052`.                                                                          |
| `task_service.bind_addr`                    | string  | Bind address for the task service.<br>e.g. `0.0.0.0:5052`.                                                                                                                                                                                               |
| `task_service.bind_addr.ip`                 | string  | IP address for the task service bind address.<br>e.g. `192.168.1.1`.                                                                                                                                                                                     |
| `task_service.bind_addr.port`               | integer | Port for the task service bind address.<br>e.g. `5052`.                                                                                                                                                                                                  |
| `task_service.bind_addr.hostname`           | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-2`.                                                                                                                                                                                      |
| `task_service.public_addr`                  | string  | Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is `0.0.0.0`.<br>e.g. `192.168.1.1`.                                                                  |
| `task_service.public_addr.ip`               | string  | IP address for the task service public address.<br>e.g. `192.168.1.2`.                                                                                                                                                                                   |
| `task_service.public_addr.port`             | integer | Port for the task service public address.<br>e.g. `5052`.                                                                                                                                                                                                |
| `task_service.public_addr.hostname`         | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-2`.                                                                                                                                                                                      |
| `shuffle_service`                           | object  | Object used for configuring the bind address of the task service. This is an internal service in the worker for receiving tasks from the scheduler. Defaults to `0.0.0.0:5052`.                                                                          |
| `shuffle_service.bind_addr`                 | string  | Bind address for the task service.<br>e.g. `0.0.0.0:5053`.                                                                                                                                                                                               |
| `shuffle_service.bind_addr.ip`              | string  | IP address for the task service bind address.<br>e.g. `192.168.1.1`.                                                                                                                                                                                     |
| `shuffle_service.bind_addr.port`            | integer | Port for the task service bind address.<br>e.g. `5053`.                                                                                                                                                                                                  |
| `shuffle_service.bind_addr.hostname`        | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-2`.                                                                                                                                                                                      |
| `shuffle_service.public_addr`               | string  | Address at which this service is reachable by the scheduler. Defaults to the bind address if not set. This field is required when the bind address is `0.0.0.0`.<br>e.g. `192.168.1.1`.                                                                  |
| `shuffle_service.public_addr.ip`            | string  | IP address for the task service public address.<br>e.g. `192.168.1.2`.                                                                                                                                                                                   |
| `shuffle_service.public_addr.port`          | integer | Port for the task service public address.<br>e.g. `5053`.                                                                                                                                                                                                |
| `shuffle_service.public_addr.hostname`      | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-2`.                                                                                                                                                                                      |

### `[observatory]` section

| Key                                   | Type    | Description                                                                                                                                                                                                                                                                                                                               |
| ------------------------------------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `enabled`                             | boolean | Enable sending/receiving profiling data so clients can call `result.await_profile()`.<br> `true` on both scheduler and workers if you want profiles on queries; `false` to disable.                                                                                                                                                       |
| `max_metrics_bytes_total`             | integer | How many bytes all the worker host metrics will consume in total. If a system-wide memory limit is specified then this is added to the share that the scheduler takes. For every worker, about 50 bytes of metrics are stored per second.                                                                                                 |
| `database_path`                       | string  | Location to use for storing profiling data. An SQLite database file will be created here, or if a file already exists it will be opened. If this points to a directory, a file in that directory will be created. Polars on-premises will automatically add the `cluster_id` to this file name to ensure uniqueness within the directory. |
| `service`                             | object  | Object used for configuring the bind address of the observatory service. This is an internal service in the scheduler for receiving profiling data from all nodes. Defaults to `0.0.0.0:5049`.                                                                                                                                            |
| `service.bind_addr`                   | string  | Bind address for the observatory service.<br>e.g. `0.0.0.0:5049`.                                                                                                                                                                                                                                                                         |
| `service.bind_addr.ip`                | string  | IP address for the observatory service bind address.<br>e.g. `192.168.1.1`.                                                                                                                                                                                                                                                               |
| `service.bind_addr.port`              | integer | Port for the observatory service bind address.<br>e.g. `5049`.                                                                                                                                                                                                                                                                            |
| `service.bind_addr.hostname`          | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-2`.                                                                                                                                                                                                                                                                       |
| `rest_api.enabled`                    | boolean | By default enabled for exposing the observatory REST API. This is a public service for accessing the profiling data and host metrics data through a web interface.                                                                                                                                                                        |
| `rest_api.service`                    | object  | Object used for configuring the bind address of the observatory REST API service. Defaults to `0.0.0.0:3001`.                                                                                                                                                                                                                             |
| `rest_api.service.bind_addr`          | string  | Bind address for the observatory REST API service.<br>e.g. `0.0.0.0:3001`.                                                                                                                                                                                                                                                                |
| `rest_api.service.bind_addr.ip`       | string  | IP address for the observatory REST API service bind address.<br>e.g. `192.168.1.1`.                                                                                                                                                                                                                                                      |
| `rest_api.service.bind_addr.port`     | integer | Port for the observatory REST API service bind address.<br>e.g. `3001`.                                                                                                                                                                                                                                                                   |
| `rest_api.service.bind_addr.hostname` | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-2`.                                                                                                                                                                                                                                                                       |

### `[monitoring]` section

| Key                    | Type    | Description                                                                                                                                              |
| ---------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `enabled`              | boolean | Enable sending/receiving monitoring data to the observatory service. If enabled, it will use the address specified in `observatory_service.public_addr`. |
| `host_metrics`         | object  | Object used for configuring the host metrics exporter.                                                                                                   |
| `host_metrics.enabled` | boolean | Enable/disable exporting host metrics from this node                                                                                                     |

### `[static_leader]` section

| Key                                        | Type    | Description                                                                                                               |
| ------------------------------------------ | ------- | ------------------------------------------------------------------------------------------------------------------------- |
| `leader_instance_id`                       | string  | ID of the leader node; should match the scheduler’s `instance_id`.<br>Typically `scheduler` to match your scheduler node. |
| `scheduler_service.public_addr`            | string  | Address at which the scheduler client service is reachable from this node.<br>e.g. `192.168.1.1`.                         |
| `scheduler_service.public_addr.ip`         | string  | IP address for the scheduler client service public address.<br>e.g. `192.168.1.1`.                                        |
| `scheduler_service.public_addr.port`       | integer | Port for the scheduler client service public address.<br>e.g. `5051`.                                                     |
| `scheduler_service.public_addr.hostname`   | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-2`.                                                       |
| `observatory_service.public_addr`          | string  | Address at which the observatory service is reachable from this node.<br>e.g. `192.168.1.1`.                              |
| `observatory_service.public_addr.ip`       | string  | IP address for the observatory service public address.<br>e.g. `192.168.1.1`.                                             |
| `observatory_service.public_addr.port`     | integer | Port for the observatory service public address.<br>e.g. `5049`.                                                          |
| `observatory_service.public_addr.hostname` | string  | Alternative to `ip`, resolved once at startup.<br>e.g. `my-host-2`.                                                       |
