Example Rules

This section provides examples of rules written for various purposes.

Note: Rules must be entered on a single line, but the rules shown in this section are wrapped for readability.

Record high cpu utilization queries

The following rule invokes the gpdb_record action when the gpadmin user runs a query and its total cpu utilization on a host exceeds 100%.

rule add simple gpdb_record(message="Too much cpu for gpadmin")
when session_id:host:total_cpu > 100
and session_id:host:pid:usename = 'gpadmin'

Complex rule

This rule invokes gpdb_record for a query that meets the following criteria:

  • a query has total CPU usage greater than 90% on a host and has been running for more than 45 seconds, or
  • has cpu skew greater than 20%, and
  • is a select on a table that contains “test” in its name.
rule add comborule gpdb_record(message="My Message")
when ((session_id:host:total_cpu > 90 and session_id:host:pid:runtime > 45)
or session_id:cpu_skew > 20)
and session_id:host:pid:current_query =~ /select.*test/

The rule shows how you can group Boolean expressions with parentheses.

Record queries with high memory usage

This rule records a message when a query process exceeds 20% of the resident memory on a host.

rule add transient mem_high_segment_useage_20
gpdb_record(message="MEM: high segment pctusage - 20%") when
host:pid:resident_size_pct > 20
and session_id:host:pid:usename =~/.*/

Record queries with memory (rss) skew above 10%

This rule calls the gpdb_record action to log when memory skew exceeds 10% on a host.

rule add mem_skew_10 gpdb_record(message="MEM: query skew 10")
when session_id:resident_size_pct_skew > 10
and session_id:host:pid:usename =~/.*/

Record high CPU queries on a host when overall CPU utilization is high on that host

This rule records queries that are using up the CPU on a host especially when the overall CPU utilization on a host is high.

rule add high_query_cpu_on_host gpdb_record(message="High query CPU on host") when
session_id:host:total_cpu > 60 and
host:node_cpu_util > 80 and 

Record high CPU query processes when overall CPU utilization on a host is high

This rule records processes that are using up the CPU especially when the overall CPU utilization on a host is high.

rule add high_cpu gpdb_record(message="High CPU usage") when 
host:pid:cpu_util > 10 and 
host:node_cpu_util > 80 and
session_id:host:pid:runtime > 0

Record queries with high spillfile count

This rule records total number of spillfiles created for a query across the cluster when it exceeds the specified limit.

rule add spills gpdb_record(message="High spillfile count") when
session_id:host:pid:spillfile_count_across_cluster > 2500

Record queries with high vmem usage when segment vmem usage is high as well

This rule records vmem usage of queries and segments when they both exceed specified limits. This is a rule that can be used with a kill query action when the behavior of runaway_detector_activation_percent, which is to kill the query that consumes the highest amount of memory, is not desirable. It is recommended to turn off runaway_detector_activation_percent if you intend to kill queries with this rule.

This query can be further refined to select or filter out specific users, applications, databases, etc.

rule add high_vmem gpdb_record(message="High segment and query vmem usage") when 
host:segment_id:total_vmem_size_pct > 50 and
session_id:host:segment_id:vmem_size_pct > 5 and 
session_id:host:pid:runtime > 0

Record number of bytes written to disk on a host by any query process

This rule records the number of bytes written to disk on a host by any query process.

rule add disk_write gpdb_record(message='disk') when 
host:pid:disk_write_bytes > 0 and session_id:host:pid:datname = 'mydb'

Record total number of bytes written to disk per sec on a host by all query processes

This rule records the total number of bytes written to disk per second on a host by all query processes.

rule add disk_write_per_sec gpdb_record(message='disk per sec') when
session_id:host:total_disk_write_bytes_per_sec > 0 and 
session_id:host:pid:application_name =~ /my_app/

Cancel any query where the session has run longer than 120 seconds

This rule invokes the host:pg_cancel_backend action when a session_id:host:pid:runtime exceeds two minutes.

rule add kill_long host:pg_cancel_backend()
when session_id:host:pid:runtime > 120

Throttle the cpu utilization of a query

This rule invokes the host:throttle_gpdb_query action when the cpu utilization of a process exceeds a threshold and the query has run for more than 20 seconds.

rule add throttle host:throttle_gpdb_query(max_cpu=30)
when host:pid:cpu_util > 20
and session_id:host:pid:usename = 'gpadmin'
and session_id:host:pid:runtime > 20

Throttle and even out skew

This rule invokes host:throttle_gpdb_query when the total cpu usage of a query on a host exceeds 90% and the current query is a select on the skewtest table.

rule add skewrule host:throttle_gpdb_query(max_cpu=50)
when session_id:host:total_cpu > 100
and session_id:host:pid:current_query =~ /select.*skewtest/

You can observe the effects of this rule in the gptop GPDB Skew page.