Defining Workload Management Rules

With Greenplum Database resource groups, you can manage concurrency, CPU, and memory resources for categories of transactions you define. Each Greenplum Database user is assigned a single resource group in the pg_roles system table, and their transactions execute in that resource group.

Greenplum Command Center enables you to write assignment rules to more flexibly assign transactions to resource groups at execution time. You can also write a rule for each resource group to specify how long to wait before killing idle sessions.

Command Center saves your rules as a JSON value in the postgres database gpcc.workload_config table. The gp_wlm database extension reads the rules from the table and applies them. The gp_wlm extension also provides user-defined functions in the postgres database you can use to get and set the rules programmatically.

Assignment Rules

Greenplum Database defers to the workload management database extension to assign transactions to resource groups. Command Center users with Admin permission level can create assignment rules in Command Center to assign transactions to resource groups.

When a transaction begins, Greenplum Database calls the workload management extension to determine the resource group. The extension evaluates the assignment rules and, if a matching rule is found, returns that rule’s resource group. If no assignment rule matches, Greenplum Database falls back to the default behavior, assigning the transaction to the resource group specified for the current user in the pg_authid system table.

Idle Session Kill Rules

Command Center administrators can create idle session kill rules for resource groups that specify the maximum number of seconds that a session can remain idle before it is killed. When the session process on the Greenplum Database master becomes idle and the number of seconds has elapsed, the session kills itself.

Idle session kill rules are per resource group, allowing you to determine how long to wait for different classes of transactions.

Editing Rules for Resource Groups

In future Greenplum Command Center releases, you will be able to use the Workload Management user interface to interactively define workloads, assignment rules, and idle session kill rules. In the current release, you must edit the JSON document the workload management database extension evaluates.

JSON Editor

Greenplum Command Center saves the JSON document in a Greenplum Database table, gpcc.gp_wlm_config, in the postgres database. When you save your changes, Command Center updates the document in the database. The size of the JSON document must not exceed 1MB.

Warning! Never manually modify the gpcc.gp_wlm_config table.

When you first enable Workload Management in Command Center, a sample JSON document is displayed. The following sample has rules for the two default Greenplum Database resource groups, admin_group and default_group.

{
  "version": 1,
  "assignmentRules": [
    {
      "resourceGroupName": "admin_group",
      "roleName": "optionalRoleToFilterWith",
      "queryTags": "exampleKey1=exampleValue1;exampleKey2=exampleValue2",
      "disabled": true
    },
    {
      "resourceGroupName": "default_group",
      "queryTags": "exampleKey1=exampleValue1;exampleKey2=exampleValue2",
      "disabled": true
    }
  ],
  "idleSessionKillRules": {
    "admin_group": 
      {
        "timeoutSeconds": 7200
      },
    "default_group": 
      {
        "timeoutSeconds": 7200
      }
    }
}

JSON Format for Rules

This section describes the JSON object that stores the resource group assignment and idle session kill rules. The object has three members:

  • a version key/value pair
  • an assignmentRules array containing one element for each assignment rule
  • an idleSessionKillRules list containing one key/value pair for each resource group

version pair

version
The version value is an integer. It is reserved for future use. It can be set to 1.

assignmentRules array

assignmentRules
The assignmentRules array has one element for each assignment rule. Each element maps the rule to a resource group in Greenplum Database and defines the conditions that assign a transaction to that resource group.

Greenplum Command Center evaluates each rule in the assignmentRules array from top to bottom and stops at the first match, so the order is important.

The elements in the assignmentRules array can have the following key/value pairs.

resourceGroupName
The name of the Greenplum Database resource group. The resource group must already exist or have been created in Greenplum Database with the CREATE RESOURCE GROUP SQL statement.

roleName
The name of a Greenplum Database role to match against the current role identifier in the Greenplum Database session. The current role is initially the database role that authenticated with the Greenplum Database system to begin the session. A user with sufficient privileges can change the current role in the database session using the SET ROLE SQL command.

If no role is specified in the assignment rule and the query tags match, the transaction is assigned to the specified resource group. If the roleName value is present, however, the current database user must match the specified role.

queryTags
A list of query tags to match against the gpcc.query_tags parameter in the Greenplum Database session. A query tag is a user-defined <name>=<value> pair. Separate multiple query tags with semicolons. For example, the following statement, executed in the Greenplum Database session, sets the appName and appUser query tags to “tableau” and “bi_sales”, respectively.

=# SET gpcc.query_tags TO 'appName=tableau;appUser=bi_sales';

To match, all tags in the assignment rule’s queryTags value must be present in the gpcc.query_tags parameter in the database session. The order of the tags is not significant, and the gpcc.query_tags parameter can be a superset of the tags defined in the queryTags value.

If the queryTags value is empty, or the parameter omitted, the rule will match every session for the roleName database role.

If you set the gpcc.query_tags parameter inside of a transaction, you must commit the transaction before the new query tags are used to evaluate assignment rules.

The gpcc.query_tags parameter can be specified using the SET command, as above, or as a connection parameter with database clients that support it, such as psql. Here are two examples that show how to specify query tags on the psql command line:

$ PGOPTIONS="-c gpcc.query_tags='appName=tableau;appUser=bi_sales'" psql

$ psql postgresql://mdw:5432/postgres?options="-c gpcc.query_tags%3D'appName%3Dtableau;appUser%3Dbi_sales'"

In the second example, it is necessary to code the equals signs as %3D to prevent psql from interpreting the query tags as command-line arguments.

disabled
If set to true, the assignment rule is ignored when Command Center evaluates rules to assign transactions to workloads. This parameter is optional and its default value is "false".

idleSessionKillRules list

The value of the idleSessionKillRules object is a list of key/value pairs, where the key is the name of a Greenplum Database resource group. Each resource group should only appear as a key in the list once. The value is an object that contains the following key/value pair.

timeoutSeconds
The number of seconds after a session becomes idle before the session is killed. The value should generally be large enough to allow interactive users to enter SQL commands. A value of 600, for example, kills the session after 10 minutes in the idle state.

Rule Examples

Simple Assignment Rule Example

This example demonstrates how the Greenplum Database workload management extension assignment works to set the resource group for a transaction.

With the default resource groups, admin_group and default_group, any transaction the gpmon user executes is assigned to the admin_group, since gpmon is a superuser. This example enables the gpmon role to set query tags to temporarily execute transactions in the default_group resource group.

  1. Log in to the gpperfmon database with the gpmon role.

    gpperfmon=# SELECT rsgname FROM pg_stat_activity
    WHERE procpid = pg_backend_pid();
       rsgname
    -------------
     admin_group
    (1 row)
    

    If the result is empty, resource groups are not properly enabled. See Troubleshooting Workload Management for help getting resource groups set up.

  2. On the Command Center Admin> Workload Mgmt view, add the following assignment rule to the Assignment Rules JSON and click the Update button.

    {
      "resourceGroupName": "default_group",
      "roleName": "gpmon",
      "queryTags": "transType=default;app=psql"
    }
    
  3. In the gpperfmon database session, set the gpcc.query_tags parameter to transType=default and check the resource group.

    gpperfmon=# SET gpcc.query_tags TO 'transType=default';
    gpperfmon=# SELECT rsgname FROM pg_stat_activity WHERE procpid = pg_backend_pid();
       rsgname
    -------------
     admin_group
    (1 row)    
    

    The resource group is still admin_group because the query tag does not match all of the tags defined in the assignment rule.

  4. Set the gpcc.query_tags parameter to app=psql;reason=testing;transType=default and check the resource group.

    gpperfmon=# SET gpcc.query_tags TO 'app=psql;reason=testing;transType=default';
    gpperfmon=# SELECT rsgname FROM pg_stat_activity WHERE procpid = pg_backend_pid();
        rsgname
    ---------------
     default_group
    (1 row)
    

    Subsequent transactions in this session are assigned to the default_group resource group. Notice that you can set the query tags in any order and that tags not specified in the assignment rule queryTags value are ignored.

Benchmark Assignment Rule Example

This example tests two scenarios using a TPC-H benchmark test. (See the TPC web site for information about TPC benchmarks and source code.)

In scenario one, two roles (tpch_1 and tpch_4) execute a parallel workload in separate resource groups configured identically. In scenario two, both roles execute the workload in the same resource group with memory, cpu, and concurrency limits doubled.

Four assignment rules defined in the Command Center Admin> Workload Mgmt view assign transactions to resource groups for the two scenarios.

{
  "version": 1,
  "assignmentRules": [
    {
      "resourceGroupName": "tpch_group1",
      "roleName": "tpch_1",
      "queryTags": "scenario=one",
      "disabled": false
    },
    {
      "resourceGroupName": "tpch_group2",
      "roleName": "tpch_4",
      "queryTags": "scenario=one",
      "disabled": false
    },
    {
      "resourceGroupName": "tpch_group1",
      "roleName": "tpch_1",
      "queryTags": "scenario=two",
      "disabled": false
    },
    {
      "resourceGroupName": "tpch_group1",
      "roleName": "tpch_4",
      "queryTags": "scenario=two"
    }
  ]
}

The first two rules, with query tag "scenario=one", assign transactions executed by role tpch_1 to the tpch_group1 resource group and transactions executed by role tpch_4 to the tpch_group2 resource group. The last two rules, with query tag "scenario=two", assign transactions from both roles tpch_1 and tpch_4 to the tpch_group1 resource group.

Here is the resource group configuration for scenario one.

=# CREATE RESOURCE GROUP tpch_group1 with (CPU_RATE_LIMIT=25, MEMORY_LIMIT=20, CONCURRENCY=20);
=# CREATE RESOURCE GROUP tpch_group2 with (CPU_RATE_LIMIT=25, MEMORY_LIMIT=20, CONCURRENCY=20);
=# SELECT groupname, cpu_rate_limit, memory_limit, concurrency FROM 
    gp_toolkit.gp_resgroup_config;
   groupname   | cpu_rate_limit | memory_limit | concurrency
---------------+----------------+--------------+-------------
 default_group | 25             | 30           | 20
 admin_group   | 10             | 10           | 10
 tpch_group1   | 25             | 20           | 20
 tpch_group2   | 25             | 20           | 20
(4 rows)

Setting query tags for the transactions requires setting the gpcc.query_tags Greenplum Database session variable. For scenario one the gpcc.query_tags session parameter should be set as follows:

=# SET gpcc.query_tags to 'scenario=one';

Given the assignment rules we created, if the gpcc.query_tags parameter is not set, or is not set to 'scenario=one' or 'scenario=two' all transactions are assigned to the role’s default resource group, which is default_group if you have not explicitly set it.

If the client application allows defining session parameters on the connection string, you could define the gpcc.query_tags parameter on the connection request.

This example submits queries in SQL script files using the psql client utility, so the gpcc.query_tags parameter must be defined whenever psql starts. The psql command-line does not have an option to set arbitrary parameters, so we arrange for the SQL script to read the query tags from an OS environment variable.

  1. Set an environment variable to the content you want to define for gpcc.query_tags.

    $ export GPCC_QUERY_TAGS='scenario=one'
    
  2. Set the gpcc.query_tags session variable in the SQL script by inserting these lines before the start of the transaction.

    \set qtags '\'' `echo "$GPCC_QUERY_TAGS"` '\''
    SET gpcc.query_tags to :qtags;
    

When you execute the benchmark test, you can watch the Query Monitor to see that transactions are assigned to workload tpch_group1 or tpch_group2 according to the assignment rules.

Scenario One Query Monitor

This is the resource group configuration for scenario two.

=# DROP RESOURCE GROUP tpch_group2;
=# ALTER RESOURCE GROUP tpch_group1 SET CPU_RATE_LIMIT 50;
=# ALTER RESOURCE GROUP tpch_group1 SET MEMORY_LIMIT 40;
=# ALTER RESOURCE GROUP tpch_group1 SET concurrency 40;
=# SELECT groupname, cpu_rate_limit, memory_limit, concurrency
    FROM gp_toolkit.gp_resgroup_config;
   groupname   | cpu_rate_limit | memory_limit | concurrency
---------------+----------------+--------------+-------------
 default_group | 25             | 30           | 20
 admin_group   | 10             | 10           | 10
 tpch_group1   | 50             | 40           | 40
(3 rows)

Set the GPCC_QUERY_TAGS environment variable for scenario two.

export GPCC_QUERY_TAGS='scenario=two'

Run the benchmark test again and compare the workload assignments on the Query Monitor. All transactions should be executed in the tpch_group1 resource group.

Scenario Two Query Monitor

Accessing Workload Assignment Rules with Database Functions

The Greenplum Database workload management extension (gp_wlm) creates two user-defined functions in the postgres database to retrieve or set the workload assignment rules. The Greenplum Command Center Workload Management user interface calls these functions to display the current JSON string and to save your edits.

You can call these functions to programmatically query or configure workload management rules.

Warning! The gpcc.workload_config table should only be accessed by using the gpcc.get_workload_config() and gpcc.set_workload_config() functions or the Command Center user interface. Do not drop the table while the workload management extension is enabled.

The gpcc.set_workload_config() function requires valid JSON syntax, but does not validate the workload management rules. You must ensure that the JSON value contains a version variable, and correctly specified assignment and idle session kill rules.

gpcc.get_workload_config()
Retrieves the current workload assignment rules as a JSON value.

Example

postgres=# SELECT gpcc.get_workload_config();
                  get_workload_config
--------------------------------------------------------
 {
    "version": 1,
    "assignmentRules": [
      {
        "resourceGroupName": "default_group",
        "roleName": "gpadmin",
        "queryTags": "appName=tableau;appUser=bi_sales"
      },
      {
        "resourceGroupName": "admin_group",
        "roleName": "gpadmin",
        "queryTags": "appName=tableau;appUser=bi_acct",
        "disabled": true
      }
    ],
    "idleSessionKillRules": {
      "admin_group": {
        "timeoutSeconds": 600
      },
      "default_group": {
        "timeoutSeconds": 3600
      }
    }
  }

(1 row)

gpcc.set_workload_config()
Sets the workload assignment rules. The argument is a valid JSON value containing the assignment rules an. See JSON Parameters for descriptions of the parameters.

The gpcc.set_workload_config() function accepts any valid JSON value. You must ensure that the value contains a version element and properly specified assignment and idle session kill rules.

If you call gpcc.set_workload_config() within a transaction, you must commit the transaction before the workload managment extension applies the new rules.

Example

postgres=# SELECT gpcc.set_workload_config(
'
  { "version": 1, 
  "assignmentRules": 
     [ 
       {
         "resourceGroupName": "default_group",
         "roleName": "gpadmin",
         "queryTags": "appName=tableau;appUser=bi_sales"
       },
       {
         "resourceGroupName": "admin_group",
         "roleName": "gpadmin",
         "queryTags": "appName=tableau;appUser=bi_acct",
         "disabled": true
      }
    ],
    "idleSessionKillRules": 
    { 
      "admin_group": 
        { "timeoutSeconds": 600 },
      "default_group": 
        { "timeoutSeconds": 3600 } 
    }
  }
' 
);
 set_workload_config
---------------------
 t
(1 row)