Example Rules
- Record high cpu utilization queries
- Complex rule
- Record queries with high memory usage
- Record queries with memory (rss) skew above 10%
- Record high CPU queries on a host when overall CPU utilization is high on that host
- Record high CPU query processes when overall CPU utilization on a host is high
- Record queries with high spillfile count
- Record queries with high vmem usage when segment vmem usage is high as well
- Record number of bytes written to disk on a host by any query process
- Record total number of bytes written to disk per sec on a host by all query processes
- Cancel any query where the session has run longer than 120 seconds
- Throttle the cpu utilization of a query
- Throttle and even out skew
This section provides examples of rules written for various purposes.
Note: Rules must be entered on a single line, but the rules shown in this section are wrapped for readability.
Record high cpu utilization queries
The following rule invokes the gpdb_record
action when the gpadmin user runs a query and its total cpu utilization on a host exceeds 100%.
rule add simple gpdb_record(message="Too much cpu for gpadmin")
when session_id:host:total_cpu > 100
and session_id:host:pid:usename = 'gpadmin'
Complex rule
This rule invokes gpdb_record
for a query that meets the following criteria:
- a query has total CPU usage greater than 90% on a host and has been running for more than 45 seconds, or
- has cpu skew greater than 20%, and
- is a select on a table that contains “test” in its name.
rule add comborule gpdb_record(message="My Message")
when ((session_id:host:total_cpu > 90 and session_id:host:pid:runtime > 45)
or session_id:cpu_skew > 20)
and session_id:host:pid:current_query =~ /select.*test/
The rule shows how you can group Boolean expressions with parentheses.
Record queries with high memory usage
This rule records a message when a query process exceeds 20% of the resident memory on a host.
rule add transient mem_high_segment_useage_20
gpdb_record(message="MEM: high segment pctusage - 20%") when
host:pid:resident_size_pct > 20
and session_id:host:pid:usename =~/.*/
Record queries with memory (rss) skew above 10%
This rule calls the gpdb_record
action to log when memory skew exceeds 10% on a host.
rule add mem_skew_10 gpdb_record(message="MEM: query skew 10")
when session_id:resident_size_pct_skew > 10
and session_id:host:pid:usename =~/.*/
Record high CPU queries on a host when overall CPU utilization is high on that host
This rule records queries that are using up the CPU on a host especially when the overall CPU utilization on a host is high.
rule add high_query_cpu_on_host gpdb_record(message="High query CPU on host") when
session_id:host:total_cpu > 60 and
host:node_cpu_util > 80 and
session_id:host:pid:usename=~/.*/
Record high CPU query processes when overall CPU utilization on a host is high
This rule records processes that are using up the CPU especially when the overall CPU utilization on a host is high.
rule add high_cpu gpdb_record(message="High CPU usage") when
host:pid:cpu_util > 10 and
host:node_cpu_util > 80 and
session_id:host:pid:runtime > 0
Record queries with high spillfile count
This rule records total number of spillfiles created for a query across the cluster when it exceeds the specified limit.
rule add spills gpdb_record(message="High spillfile count") when
session_id:host:pid:spillfile_count_across_cluster > 2500
Record queries with high vmem usage when segment vmem usage is high as well
This rule records vmem usage of queries and segments when they both exceed specified limits. This is a rule that can be used with a kill query action when the behavior of runaway_detector_activation_percent
, which is to kill the query that consumes the highest amount of memory, is not desirable. It is recommended to turn off runaway_detector_activation_percent
if you intend to kill queries with this rule.
This query can be further refined to select or filter out specific users, applications, databases, etc.
rule add high_vmem gpdb_record(message="High segment and query vmem usage") when
host:segment_id:total_vmem_size_pct > 50 and
session_id:host:segment_id:vmem_size_pct > 5 and
session_id:host:pid:runtime > 0
Record number of bytes written to disk on a host by any query process
This rule records the number of bytes written to disk on a host by any query process.
rule add disk_write gpdb_record(message='disk') when
host:pid:disk_write_bytes > 0 and session_id:host:pid:datname = 'mydb'
Record total number of bytes written to disk per sec on a host by all query processes
This rule records the total number of bytes written to disk per second on a host by all query processes.
rule add disk_write_per_sec gpdb_record(message='disk per sec') when
session_id:host:total_disk_write_bytes_per_sec > 0 and
session_id:host:pid:application_name =~ /my_app/
Cancel any query where the session has run longer than 120 seconds
This rule invokes the host:pg_cancel_backend
action when a session_id:host:pid:runtime
exceeds two minutes.
rule add kill_long host:pg_cancel_backend()
when session_id:host:pid:runtime > 120
Throttle the cpu utilization of a query
This rule invokes the host:throttle_gpdb_query
action when the cpu utilization of a process exceeds a threshold and the query has run for more than 20 seconds.
rule add throttle host:throttle_gpdb_query(max_cpu=30)
when host:pid:cpu_util > 20
and session_id:host:pid:usename = 'gpadmin'
and session_id:host:pid:runtime > 20
Throttle and even out skew
This rule invokes host:throttle_gpdb_query
when the total cpu usage of a query on a host exceeds 90% and the current query is a select on the skewtest table.
rule add skewrule host:throttle_gpdb_query(max_cpu=50)
when session_id:host:total_cpu > 100
and session_id:host:pid:current_query =~ /select.*skewtest/
You can observe the effects of this rule in the gptop
GPDB Skew page.