Best Practices for Rules

  1. Avoid creating rules that modify the condition the rule’s expression is matching. For example, consider this rule:

    host:throttle_gpdb_query(max_cpu=20) when host:pid :cpu_util > 30 and session_id:host:pid:runtim > 0

    If CPU usage goes above 30%, the rule triggers and reduces the usage to 20%. When the usage falls below 30%, the rule is no longer matched, so the throttling ends and usage can again climb to 30%. This creates an undesirable cyclic behavior. Instead, create a rule like the following:

    host:throttle_gpdb_query(max_cpu=30) when host:pid:cpu_util > 20
    and session_id:host:pid:runtime > 0

    This rule triggers at 20% CPU utilization and throttles the CPU to 30% utilization. The throttling continues until utilization drops below 20%. The session_id:host:pid:runtime condition is true for any running query and provides the necessary session_id for the throttle_gpdb_query action.

  2. Avoid creating rules that terminate a query based on skew alone. Consider the following rule:

    pg_terminate_backend when session_id:resident_size_pct_skew > 10

    This is a poor rule for two reasons. First, it terminates all queries when skew is above 10, including queries that were not contributing to skew. Second, well behaved queries can temporarily experience skew high enough to achieve this condition. For example, if the segments do not complete a query at the same time, skew can appear near the end of execution. A query could run normally across several nodes and then, as each node completes its portion of the query, its resource utilization drops, causing a temporary increase in skew while other nodes are still running.

  3. Rules that match data with datid: scope will trigger for any database in the cluster unless a predicate is added to confine the match to a target database. For example, this rule triggers whenever the number of connections to any single database exceeds 10:

    gpdb_record(message="exceeded 10 connections")
    when session_id:host:pid:runtime > 0
    and datid:numbackends > 10

    Add a predicate to filter for the database associated with the session:

    gpdb_record(message="exceeded 10 connections on foo")
    when session_id:host:pid:runtime > 0
    and datid:datname = "foo"
    and datid:numbackends > 10