Understanding Rules
This topic provides an introduction to Workload Manager rules including how to write them and how they behave in a Greenplum Database cluster with Workload Manager.
Rules Overview
A Workload Manager rule specifies an action to execute when a specified condition is detected in the Greenplum Database cluster. Administrators write Workload Manager rules to investigate problem queries, throttle queries that consume too much CPU, or simply terminate queries that could disrupt the database system.
The rulesengine
service on each Greenplum host evaluates rules against facts, called datums, collected from the Greenplum host operating systems and database processes. At regular intervals, datums are collected and submitted to the rulesengine
service on each host. When the rules engine matches a rule, it performs its action.
A rule has an action expression and a condition expression separated by the WHEN
keyword. It can be read as “do <action-exp> WHEN <condition-exp>”.
Here is a rule that terminates any session that has run for over 120 seconds:
pg_terminate_backend() when session_id:host:pid:runtime > 120
In the above rule, the action expression is pg_terminate_backend()
and the condition expression is session_id:host:pid:runtime > 120
.
The term session_id:host:pid:runtime
is a scoped datum; runtime
is the name of the datum and session_id:host:pid
is the scope. This scoped datum specifies the elapsed execution time for a query executor process on a segment host. The colon-delimited sections of the scope and datum identify the source of the value:
session_id
– ID of a Greenplum Database queryhost
– the name of a segment hostpid
– process ID of a query executor process running on the hostruntime
– elapsed time since the query executor process started
You create rules using the rule add
command in an interactive gp-wlm
session or with the --rule-add
command-line option. Each rule has a unique name used for managing the rule with commands such as rule modify
or rule delete
.
A rule may also be labeled transient
, which means the rule is active only until it is deleted or Workload Manager is restarted.
For details about the rule add
command syntax and usage see Add Rules.
For reference information about Workload Manager commands that manage existing rules (modify, delete, dump, import, and restore), see Managing Rules.
The next sections provide more detailed information about the components of a rule: action expressions, condition expressions, datums, and scopes.
Action Expression
The action to perform when a rule is triggered is specified with one of the following Workload Manager actions:
gpdb_record
– record a custom message and details of the database query process in thegp_wlm_records
database table.host:throttle_gpdb_query
– throttle a Greenplum Database query on a specified host.host:pg_cancel_backend
– cancel the current query in a session on a host by calling the PostgreSQLpg_cancel_backend()
function.pg_terminate_backend
– terminate a session by calling the PostgreSQLpg_terminate_backend()
function.
A rule’s condition expression always identifies a single query executor process on a single Greenplum segment host. When a rule’s action executes, it will have in its context the query’s session ID, a segment host name, and the process ID of a single query executor process on the host.
Each of the actions responds to the single Greenplum Database query executor process identified by the condition expression. See Rule Actions for reference information for the actions.
Action expressions are written as functions and can have zero or more arguments, specified with key=value
pairs in parentheses after the action name:
<action-name>(<arg1>=<value1>,<arg2>=<value2>,...)
gpdb_record
The gpdb_record
action writes the text specified in its message
argument to a log file, along with details of the database query process identified in the rule’s condition expression. For example, the gpdb_record
action can log a message when any query process exceeds 120 seconds:
gpdb_record(message='query runtime exceeds 120 seconds') when session_id:host:pid:runtime > 120
The gp_wlm_records
external Greenplum Database table provides SQL query access to the logged records. (See Querying the gp_wlm_records Table for more information.)
The gpdb_record
action has several arguments, but only the message
argument is required to be specified in the rule. Here is the full list of arguments for this action:
-
message
– Informative string describing the reason for recording -
current_query
- The text of the current query -
gpdb_segment_role
- Role of the database instance:GPDB_MASTER
orGPDB_SEGMENT
-
host
- The hostname of the segment -
pid
- The postgres process associated with the query -
query_start
- Query start time -
session_id
- Session id of the query -
usename
- Name of the user logged into this backend
With the exception of message
, a value for each of these arguments is inferred from the matched query process. gpdb_record
logs a record that includes the supplied message, all of these inferred values, the text of the rule, and context values from the condition expression.
host:throttle_gpdb_query
The host:throttle_gpdb_query
action holds a query to a maximum share of CPU on a host, specified in the max_cpu
argument as a percentage of CPU utilization.
The host:
prefix on the host:throttle_gpdb_query
action is a scope. The host:
scope indicates that the action will be performed only on the host machines where the rule’s condition is matched. The host:throttle_gpdb_query
action is currently the only scoped action. (Datums used in the condition expression are all scoped. See Datums and Scopes below for details.)
This host:throttle_gpdb_query
rule throttles a query on a host to 30% CPU utilization:
host:throttle_gpdb_query(max_cpu=30) when session_id:host:total_cpu > 20
The session_id:host:total_cpu
scoped datum is the total percentage of CPU used by all query executor processes on a host working on the same query.
Note that this rule establishes a range between 20% and 30% CPU utilization. Throttling on a host begins when total CPU utilization for the query exceeds 20% and ends when it drops below 20%. Throttling keeps the CPU utilization from exceeding 30%. Setting max_cpu
argument higher than the rule’s trigger threshold prevents rapidly alternating between throttling enabled and throttling disabled states that could occur if the threshold and maximum CPU are equal.
pg_cancel_backend
The host:pg_cancel_backend
action cancels a query on a host. It executes the pg_cancel_backend()
PostgreSQL function on the session matched by the condition expression.
The following rule cancels the current query in a session that exceeds 75% total CPU utilization on any segment host and has run for more than five minutes:
host:pg_cancel_backend() when session_id:host:total_cpu > 75 and session_id:host:pid:runtime > 300
When a rule cancels a query, Workload Manager logs the event in a log file on the segment host. These event records can be queried using the gp_wlm_events
database view. The view depends on Greenplum external tables on each segment host and must first be set up using manage-event-tables.sh
script. See Querying Workload Manager Event Data for details.
pg_terminate_backend
The pg_terminate_backend
action executes the PostgreSQL pg_terminate_backend()
function on the session matched by the condition expression. This is an unscoped action because a session must be terminated on all segments.
The following rule terminates a session that exceeds 75% total CPU utilization on any segment host and has run for more than five minutes:
pg_terminate_backend() when session_id:host:total_cpu > 75 and session_id:host:pid:runtime > 300
When a rule terminates a query, Workload Manager logs the event in a log file on each segment host. These event records can be queried using the gp_wlm_events
database view. The view depends on Greenplum external tables on each segment host and must first be set up using manage-event-tables.sh
script. See Querying Workload Manager Event Data for details.
Condition Expression
The condition expression (predicate) of a rule is a Boolean expression that identifies Greenplum Database queries you want to act upon.
Datums can be compared to values using the following operators.
Operator | Value Format | Description |
---|---|---|
= | A number for numeric datums or a quoted string for strings. | Matches only when the values are exactly equal. |
!= | A number for numeric datums or a quoted string for strings. | Matches when the values are not equal. |
=~ | Regular expression on the right side enclosed in slashes (/ ). datum =~ /sel.*by/ |
Performs a regular expression match between the string value and the specified regex. Posix regular expression syntax is used. |
> | Number | Greater than |
< | Number | Less than |
>= | Number | Greater than or equal to |
<= | Number | Less than or equal to |
Expressions can be arbitrarily complex, joining multiple comparisons with Boolean AND and OR operators and parentheses to enforce precedence. For example:
host:pid:cpu_util > 50 or (host:pid:cpu_util > 30 and session_id:host:pid:usename = "fred")
Including Clause
The including
keyword introduces a comma-separated list of datums to add to the context when a rule triggers. Any datum referenced in the condition expression is automatically added to the context. To add context values for datums not used in the condition expression, list the datums after the including
keyword.
Datums in the including
clause are specified without scopes. If the rules compiler cannot infer the scope from scopes already bound in the rule, the rule fails compilation with an error message.
The following rule adds the host:pid:long_name
and host:pid:avg_cpu_util
datums to the context:
gpdb_record(message="CPU over 50%") when host:pid:cpu_util > 50 including long_name, avg_cpu_util
The host:pid:cpu_util
datum is in the context because it is referenced in the condition clause.
When a gpdb_record
action triggers, the context datums are added to the context_args
column of the gp_wlm_events
table. When a host:pg_cancel_backend
or pg_terminate_backend
action triggers, the context datums are added to the context
column of the gp_wlm_events
view.
The additional datum values can provide useful information when investigating recorded messages and termination events.
Datums and Scopes
Datums are data items collected by the agent, and include operating system statistics, OS process statistics, and database query data.
Workload Manager provides a rich set of datums to use in condition expressions so that you can target queries and query processes with very specific characteristics. For example, a rule could target queries executed with a certain database role that access a certain table and use over 30% of CPU on any host.
The name of a datum is prefixed by its scope, which provides context for the datum. The host:pid
scope of the host:pid:cpu_util
datum, for example, means that the cpu_util
datum is the percentage of CPU used by an OS process (pid
) executing on a specific host (host
). The session_id:host:pid
scope for the session_id:host:pid:usename
datum indicates that the usename
datum is the database role executing a Greenplum Database segment query process. The session_id
is the id of the query and host
is the segment host where the query executor process, pid
, is executing.
Datums in the including
list of a rule are specified without scopes. The rules compiler searches for included datums in scopes already bound in the condition expression and fails if the scope cannot be inferred.
Rules must be written in a way to identify a single query executor process on a host. The following rule records a message when the resident memory size for any process exceeds 20%. The host:pid
scope does not include a session_id
, so an additional rexexp term is added to the condition expression match any query. This ensures that the host:pid:resident_size_pct
datum is from a query executor process and that the action has a known query when it executes. Without the session_id:host:pid:usename
comparison, this rule would fail to compile.
rule add mem_high_segment_useage_20
gpdb_record(message="MEM: high segment pct usage - 20%") when
host:pid:resident_size_pct > 20
and session_id:host:pid:usename =~/.*/
Workload Manager Datum Reference lists all of the datums, their scopes, and their data formats.
datid
Scopedatid
scope is for datums that are values from a single database in the Greenplum Database system. The datid:datname
datum, for example, can be used to restrict a rule to a specific database:
... and dataid:datname = 'my_db'
Datums with datid
scope must be combined in the condition expression with other datums that identify a query process.
gpdb
Scopegpdb
scope is for datums from the entire Greenplum Database system. There is currently just one such datum: gpdb:total_master_connections
, which is the total number of client connects for all databases in the system. This datum could be used to prevent a rule from triggering until a specified number of connections is exceeded.
host
Scopehost
scope applies to datums that are values from a single host in the Greenplum cluster. These include the current date and time values from the host and the host’s total CPU utilization.
host:segment_id
Scopehost:segment_id
Scope is used for datums from a single Greenplum segment. It is used for datums that report the virtual memory (vmem) usage for a segment.
host:pid
Scopehost:pid
scope is for datums referring to any operating system process on a host. These datums include the memory, CPU, and I/O statistics available from Linux for OS processes. Datums with host:pid
scope can be used to narrow a rule to query processes using more host resources than expected.
session_id
Scopesession_id
is the Greenplum cluster-wide ID for a database query. The datums with session_id
scope are CPU and disk I/O skew statistics for a single query that Workload Manager calculates from the host:pid
datums from all query executor processes on all segment hosts for the query.
session_id:host
Scopesession_id:host
scope includes datums that are aggregated memory, CPU, and I/O statistics for all processes on all hosts running a query.
session_id:host:segment_id
Scopesession_id:host:segment_id
scope includes datums that report the amount of virtual memory (vmem) consumed by a Greenplum segment for a session.
session_id:host:pid
Scopesession_id:host:pid
scope is used for datums that take values from a query executor process on a single segment host.