Caveats

Rule Conditions Must Include a session_id

To write a rule that performs a Greenplum Database action (gpdb_record, pg_terminate_backend, host:throttle_gpdb_query), the condition must include a session_id, even when the intended condition is based solely on process information. For example, the following rule appears to terminate any query that uses more than 20% of system memory:

pg_terminate_backend() when host:pid:resident_size_pct > 20

However, because this rule contains no session_id, Workload Manager cannot infer the query to terminate, and the rule will not be added. To get the desired behavior, add an always-true session_id condition to the rule, for example:

pg_terminate_backend() when host:pid:program_size_pct > 20
and session_id:host:pid:runtime > 0

Queries Executing in Under Five Seconds are Ignored

Queries that run for less than five seconds are ignored by Workload Manager in order to minimize load on the system and to help focus on queries that consume greater resources.

Avoid Race Conditions When Using Vmem Metrics

In rare conditions, if memory allocated for a segment is close to exceeding gp_vmem_protect_limit or runaway_detector_activation_percent, a query that triggers these limits may be killed by the vmem protector before Workload Manager can cancel another query that has met a vmem-related Workload Manager rule condition.

For example, query Q1 may be an important query that consumes a significant amount of memory. Workload Manager wants to protect Q1 by killing other less important queries, Q2 and Q3, which consume less memory. If the total memory usage for a segment running these queries is close to runaway_detector_activation_percent and Workload Manager decides to kill Q2 and Q3 at time t, Q1 may be killed due to segment memory exceeding runaway_detector_activation_percent at time t+1, and Q2 and Q3 may be killed by Workload Manager at time t+2 based on the decision made at time t. This issue can be avoided by disabling runaway_detector_activation_percent and ensuring a Workload Manager rule triggers well before vmem_protect_limit can be reached. The host:segment_id:total_vmem_size_pct and session_id:host:segment_id:vmem_size_pct metrics can be used for this purpose. Here is an example rule:

cancel_Q2_vmem_exceed host:pg_cancel_backend() when 
    host:segment_id:total_vmem_size_pct > 65 and
    session_id:host:segment_id:vmem_size_pct > 5 and
    session_id:host:pid:current_query =~ /Q2/

If you would like to use these vmem metrics, be sure to enable them as described in the Vmem section of the Workload Manager Metric Reference.