Availability telemetry
Availability telemetry provides information about standby and active nodes in your Vault instance. Enterprise installations also include replication metrics.
Default metrics
vault.ha.rpc.client.echo
Metric type | Value | Description |
---|---|---|
summary | ms | Time taken to send an echo request from a standby to the active node (also emitted by perf standbys) |
vault.ha.rpc.client.echo.errors
Metric type | Value | Description |
---|---|---|
counter | number | Number of standby echo request failures (also emitted by perf standbys) |
vault.ha.rpc.client.forward
Metric type | Value | Description |
---|---|---|
summary | ms | Time taken to forward a request from a standby to the active node |
vault.ha.rpc.client.forward.errors
Metric type | Value | Description |
---|---|---|
counter | number | Number of standby request forwarding failures |
Merkle tree metrics
vault.merkle.flushDirty
Metric type | Value | Description |
---|---|---|
summary | ms | The average time required to flush dirty pages to storage |
vault.merkle.flushDirty.num_pages
Metric type | Value | Description |
---|---|---|
gauge | pages | Number of pages flushed |
vault.merkle.flushDirty.outstanding_pages
Metric type | Value | Description |
---|---|---|
gauge | pages | Number of dirty pages waiting to be flushed |
vault.merkle.saveCheckpoint
Metric type | Value | Description |
---|---|---|
summary | ms | The average time required to save a checkpoint |
vault.merkle.saveCheckpoint.num_dirty
Metric type | Value | Description |
---|---|---|
gauge | pages | Number of dirty pages at checkpoint |
Write-ahead log (WAL) telemetry
vault.wal.deleteWALs
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to fully delete a write-ahead log |
vault.wal.flushReady
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to fully flush a write-ahead log that is ready for storage |
vault.wal.flushReady.queue_len
Metric type | Value | Description |
---|---|---|
summary | number | Current size of the write queue in the WAL system |
vault.wal.gc.deleted
Metric type | Value | Description |
---|---|---|
gauge | number | Number of write-ahead logs deleted during garbage collection |
vault.wal.gc.total
Metric type | Value | Description |
---|---|---|
gauge | number | Total number of write-ahead logs currently on disk |
vault.wal.loadWAL
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to load a write-ahead log |
vault.wal.persistWALs
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to persist a write-ahead log |
vault.wal.write_controller.d
Metric type | Value | Description |
---|---|---|
gauge | number | Current derivative value computed by the write controller. |
The vault.wal.write_controller.d
metric has limited production use, but Vault
developers may find vault.wal.write_controller.d
useful for tuning or
debugging controller behavior.
vault.wal.write_controller.i
Metric type | Value | Description |
---|---|---|
gauge | number | Current integral value computed by the write controller. |
The vault.wal.write_controller.i
metric has limited production use, but Vault
developers may find vault.wal.write_controller.i
useful for tuning or
debugging controller behavior.
vault.wal.write_controller.p
Metric type | Value | Description |
---|---|---|
gauge | number | Current proportional error value detected by the write controller. |
The vault.wal.write_controller.p
metric has limited production use, but Vault
developers may find vault.wal.write_controller.p
useful for tuning or
debugging controller behavior.
vault.wal.write_controller.reject_fraction
Metric type | Value | Description |
---|---|---|
gauge | number | The estimated fraction of write requests that must be rejected to maintain cluster stability. |
The write controller reject fraction is an estimate between 0 and 1.
Log shipping metrics
vault.logshipper.buffer.length
Metric type | Value | Description |
---|---|---|
gauge | buffer entries | Current length of the log shipper buffer |
vault.logshipper.buffer.max_length
Metric type | Value | Description |
---|---|---|
gauge | buffer entries | Maximum length of the log shipper buffer seen to date |
vault.logshipper.buffer.max_size
Metric type | Value | Description |
---|---|---|
gauge | bytes | Maximum allowable size of the log shipper buffer |
vault.logshipper.buffer.size
Metric type | Value | Description |
---|---|---|
gauge | bytes | Current size of the log shipper buffer |
vault.logshipper.streamWALs.guard_found
Metric type | Value | Description |
---|---|---|
counter | number | Number of times Vault began streaming WAL entires and found a starting index in the merkle tree |
vault.logshipper.streamWALs.missing_guard
Metric type | Value | Description |
---|---|---|
counter | number | Number of times Vault began streaming WAL entires without finding a starting index in the Merkle tree |
vault.logshipper.streamWALs.scanned_entries
Metric type | Value | Description |
---|---|---|
summary | entries | Number of entries scanned in the buffer before Vault found the correct entry |
Replication metrics Enterprise
Note
The following metrics only appear in telemetry results when replication is in an unhealthy state:
vault.replication.fetchRemoteKeys
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to fetch keys from a remote cluster participating in replication before Merkle tree delta generation occurs |
vault.replication.fsm.last_remote_wal
Metric type | Value | Description |
---|---|---|
gauge | number | Index of the last remote write-ahead log. |
Note
Standby nodes do not emit `last_remote_wal` details.vault.replication.fsm.last_upstream_remote_wal
Metric type | Value | Description |
---|---|---|
gauge | number | Index of the last remote WAL segment received from the upstream cluster by the local cluster leader. |
vault.replication.merkle.commit_index
Metric type | Value | Description |
---|---|---|
gauge | number | Index of the last commit to the Merkle tree |
vault.replication.merkleDiff
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to perform a Merkle tree delta comparison among the clusters participating in replication |
vault.replication.merkleSync
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to perform a Merkle tree synchronization with the most recent delta generated by the clusters participating in replication |
vault.replication.rpc.client.conflicting_pages
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a conflicting pages request for the client |
vault.replication.rpc.client.create_token_register_auth_lease
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a register authentication token request for the client |
vault.replication.rpc.client.fetch_keys
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a fetch keys request for the client |
vault.replication.rpc.client.forward
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a forward request for the client |
vault.replication.rpc.client.guard_hash
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a guard hash request for the client |
vault.replication.rpc.client.persist_alias
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to persist an alias for the client |
vault.replication.rpc.client.register_auth
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a register authentication request for the client |
vault.replication.rpc.client.register_lease
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to register a lease for the client |
vault.replication.rpc.client.save_mfa_response_auth
Metric type | Value | Description |
---|---|---|
summary | ms | Time required by the client to save the MFA authentication response |
vault.replication.rpc.client.stream_wals
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to stream write-ahead logs for the client |
vault.replication.rpc.client.sub_page_hashes
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a sub-page hash request for the client |
vault.replication.rpc.client.sync_counter
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a counter sync request for the client |
vault.replication.rpc.client.upsert_group
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a group upsert request for the client |
vault.replication.rpc.client.wrap_in_cubbyhole
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a cubbyhole wrap request for the client |
vault.replication.rpc.dr.server.echo
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete an echo request for disaster recovery |
vault.replication.rpc.dr.server.fetch_keys_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a fetch keys request for disaster recovery |
vault.replication.rpc.server.auth_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete an authentication request |
vault.replication.rpc.server.bootstrap_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a bootstrap request |
vault.replication.rpc.server.conflicting_pages_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a conflicting pages request |
vault.replication.rpc.server.echo
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete an echo operation |
vault.replication.rpc.server.forwarding_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a forwarding request |
vault.replication.rpc.server.guard_hash_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a guard hash request |
vault.replication.rpc.server.persist_alias_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a request to persist an alias |
vault.replication.rpc.server.persist_persona_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a request to persist an alias |
vault.replication.rpc.server.save_mfa_response_auth
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to save a MFA authentication response |
vault.replication.rpc.server.stream_wals_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a request to stream write-ahead logs |
vault.replication.rpc.server.sub_page_hashes_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a sub-page hashes request |
vault.replication.rpc.server.sync_counter_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a counter sync request |
vault.replication.rpc.server.upsert_group_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete a group upsert request |
vault.replication.rpc.standby.server.create_token_register_auth_lease_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to service a create token request from a standby node |
vault.replication.rpc.standby.server.echo
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to service an echo request from a standby node |
vault.replication.rpc.standby.server.register_auth_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to service a register auth request from a standby node |
vault.replication.rpc.standby.server.register_lease_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to service a register lease request from a standby node |
vault.replication.rpc.standby.server.wrap_token_request
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to service a wrap token request from a standby node |
vault.replication.wal.gc
Metric type | Value | Description |
---|---|---|
summary | ms | Time required to complete one run of the WAL garbage collection process |
vault.replication.wal.last_dr_wal
Metric type | Value | Description |
---|---|---|
gauge | number | Index of the last write-ahead log for disaster recovery. Note that this is emitted by all Vault Enterprise clusters, regardless of cluster type. |
vault.replication.wal.last_performance_wal
Metric type | Value | Description |
---|---|---|
gauge | number | Index of the last write-ahead log for performance |
vault.replication.wal.last_wal
Metric type | Value | Description |
---|---|---|
gauge | number | Index of the last write-ahead log |