Managing Workload Query Assignment Rules

When the Greenplum Command Center Workload Management feature is enabled, Greenplum Database defers to the workload management database extension to assign transactions to resource groups. Command Center users with Admin permission level can create assignment rules in Command Center to assign transactions to resource groups.

When a transaction begins, Greenplum Database calls the workload management extension to determine the resource group. The extension evaluates the assignment rules and, if a matching rule is found, returns that rule’s resource group. If no assignment rule matches, Greenplum Database falls back to the default behavior, assigning the transaction to the resource group specified for the current user in the pg_authid system table.

Editing Assignment Rules

In future Greenplum Command Center releases, you will be able to use the Workload Management user interface to interactively define workloads and assignment rules. In the current release, you must edit the JSON document the workload management database extension evaluates to determine the workload for a transaction.

JSON Editor

Greenplum Command Center saves the JSON document in a Greenplum Database table, gpcc.gp_wlm_config, in the postgres database. When you save your changes, Command Center updates the document in the database. The size of the JSON document must not exceed 1MB.

Warning! Never manually modify the gpcc.gp_wlm_config table.

When you first enable Workload Management in Command Center, a sample JSON document with two disabled assignment rules is displayed. The sample has rules for the two default Greenplum Database resource groups, admin_group and default_group.

{
  "version": 1,
  "assignmentRules": [
    {
      "resourceGroupName": "admin_group",
      "roleName": "optionalRoleToFilterWith",
      "queryTags": "exampleKey1=exampleValue1;exampleKey2=exampleValue2",
      "disabled": true
    },
    {
      "resourceGroupName": "default_group",
      "queryTags": "exampleKey1=exampleValue1;exampleKey2=exampleValue2",
      "disabled": true
    }
  ]
}

JSON Parameters

This section describes the JSON document that stores the workload rules.

version
The version parameter takes an integer value and is reserved for future use. It can be set to 1.

assignmentRules
The assignmentRules array has one entry for each assignment rule. Each entry maps the workload to a resource group in Greenplum Database and defines the conditions that assign a transaction to that resource group.

Greenplum Command Center evaluates each rule in the assignmentRules array from top to bottom and stops at the first match, so the order is important.

resourceGroupName
The name of the Greenplum Database resource group. In the current release, the resource group must already exist or have been created in Greenplum Database with the CREATE RESOURCE GROUP SQL statement.

roleName
The name of a Greenplum Database role to match against the current role identifier in the Greenplum Database session. The current role is initially the database role that authenticated with the Greenplum Database system to begin the session. A user with sufficient privileges can change the current role in the database session using the SET ROLE SQL command.

If no role is specified in the assignment rule and the query tags match, the transaction is assigned to the specified resource group. If the roleName parameter is present, however, the current database user must match the specified role.

queryTags
A list of query tags to match against the gpcc.query_tags parameter in the Greenplum Database session. A query tag is a user-defined <name>=<value> pair. Separate multiple query tags with semicolons. For example, the following statement, executed in the Greenplum Database session, sets the appName and appUser query tags to “tableau” and “bi_sales”, respectively.

=# SET gpcc.query_tags TO 'appName=tableau;appUser=bi_sales';

Assignment rules are evaluated when a transaction begins. If you set query tags inside of a transaction BEGIN/END block, the new query tags are not evaluated until after the transaction commits.

To match, all tags in the assignment rule’s queryTags value must be present in the gpcc.query_tags parameter in the database session. The order of the tags is not significant, and the gpcc.query_tags parameter can be a superset of the tags defined in the queryTags value.

If the queryTags value is empty, or the parameter omitted, the rule will match every session for the roleName database role.

If you set the gpcc.query_tags parameter inside of a transaction, you must commit the transaction before the new query tags are used to evaluate assignment rules.

The gpcc.query_tags parameter can be specified using the SET command, as above, or as a connection parameter with database clients that support it, such as psql. Here are two examples that show how to specify query tags on the psql command line:

$ PGOPTIONS="-c gpcc.query_tags='appName=tableau;appUser=bi_sales'" psql

$ psql postgresql://mdw:5432/postgres?options="-c gpcc.query_tags%3D'appName%3Dtableau;appUser%3Dbi_sales'"

In the second example, it is necessary to code the equals signs as %3D to prevent psql from interpreting the query tags as command-line arguments.

disabled
If set to true, the assignment rule is ignored when Command Center evaluates rules to assign transactions to workloads. This parameter is optional and its default value is "false".

Assignment Rule Examples

Simple Example

This example demonstrates how the Greenplum Database workload management extension assignment works to set the resource group for a transaction.

With the default resource groups, admin_group and default_group, any transaction the gpmon user executes is assigned to the admin_group, since gpmon is a superuser. This example enables the gpmon role to set query tags to temporarily execute transactions in the default_group resource group.

  1. Log in to the gpperfmon database with the gpmon role.

    gpperfmon=# SELECT rsgname FROM pg_stat_activity
    WHERE procpid = pg_backend_pid();
       rsgname
    -------------
     admin_group
    (1 row)
    

    If the result is empty, resource groups are not properly enabled. See Troubleshooting Workload Management for help getting resource groups set up.

  2. On the Command Center Admin> Workload Mgmt view, add the following assignment rule to the Assignment Rules JSON and click the Update button.

    {
      "resourceGroupName": "default_group",
      "roleName": "gpmon",
      "queryTags": "transType=default;app=psql"
    }
    
  3. In the gpperfmon database session, set the gpcc.query_tags parameter to transType=default and check the resource group.

    gpperfmon=# SET gpcc.query_tags TO 'transType=default';
    gpperfmon=# SELECT rsgname FROM pg_stat_activity WHERE procpid = pg_backend_pid();
       rsgname
    -------------
     admin_group
    (1 row)    
    

    The resource group is still admin_group because the query tag does not match all of the tags defined in the assignment rule.

  4. Set the gpcc.query_tags parameter to app=psql;reason=testing;transType=default and check the resource group.

    gpperfmon=# SET gpcc.query_tags TO 'app=psql;reason=testing;transType=default';
    gpperfmon=# SELECT rsgname FROM pg_stat_activity WHERE procpid = pg_backend_pid();
        rsgname
    ---------------
     default_group
    (1 row)
    

    Subsequent transactions in this session are assigned to the default_group resource group. Notice that you can set the query tags in any order and that tags not specified in the assignment rule queryTags value are ignored.

Benchmark Example

This example tests two scenarios using a TPC-H benchmark test. (See the TPC web site for information about TPC benchmarks and source code.)

In scenario one, two roles (tpch_1 and tpch_4) execute a parallel workload in separate resource groups configured identically. In scenario two, both roles execute the workload in the same resource group with memory, cpu, and concurrency limits doubled.

Four assignment rules defined in the Command Center Admin> Workload Mgmt view assign transactions to resource groups for the two scenarios.

{
  "version": 1,
  "assignmentRules": [
    {
      "resourceGroupName": "tpch_group1",
      "roleName": "tpch_1",
      "queryTags": "scenario=one",
      "disabled": false
    },
    {
      "resourceGroupName": "tpch_group2",
      "roleName": "tpch_4",
      "queryTags": "scenario=one",
      "disabled": false
    },
    {
      "resourceGroupName": "tpch_group1",
      "roleName": "tpch_1",
      "queryTags": "scenario=two",
      "disabled": false
    },
    {
      "resourceGroupName": "tpch_group1",
      "roleName": "tpch_4",
      "queryTags": "scenario=two"
    }
  ]
}

The first two rules, with query tag "scenario=one", assign transactions executed by role tpch_1 to the tpch_group1 resource group and transactions executed by role tpch_4 to the tpch_group2 resource group. The last two rules, with query tag "scenario=two", assign transactions from both roles tpch_1 and tpch_4 to the tpch_group1 resource group.

Here is the resource group configuration for scenario one.

=# CREATE RESOURCE GROUP tpch_group1 with (CPU_RATE_LIMIT=25, MEMORY_LIMIT=20, CONCURRENCY=20);
=# CREATE RESOURCE GROUP tpch_group2 with (CPU_RATE_LIMIT=25, MEMORY_LIMIT=20, CONCURRENCY=20);
=# SELECT groupname, cpu_rate_limit, memory_limit, concurrency FROM 
    gp_toolkit.gp_resgroup_config;
   groupname   | cpu_rate_limit | memory_limit | concurrency
---------------+----------------+--------------+-------------
 default_group | 25             | 30           | 20
 admin_group   | 10             | 10           | 10
 tpch_group1   | 25             | 20           | 20
 tpch_group2   | 25             | 20           | 20
(4 rows)

Setting query tags for the transactions requires setting the gpcc.query_tags Greenplum Database session variable. For scenario one the gpcc.query_tags session parameter should be set as follows:

=# SET gpcc.query_tags to 'scenario=one';

Given the assignment rules we created, if the gpcc.query_tags parameter is not set, or is not set to 'scenario=one' or 'scenario=two' all transactions are assigned to the role’s default resource group, which is default_group if you have not explicitly set it.

If the client application allows defining session parameters on the connection string, you could define the gpcc.query_tags parameter on the connection request.

This example submits queries in SQL script files using the psql client utility, so the gpcc.query_tags parameter must be defined whenever psql starts. The psql command-line does not have an option to set arbitrary parameters, so we arrange for the SQL script to read the query tags from an OS environment variable.

  1. Set an environment variable to the content you want to define for gpcc.query_tags.

    $ export GPCC_QUERY_TAGS='scenario=one'
    
  2. Set the gpcc.query_tags session variable in the SQL script by inserting these lines before the start of the transaction.

    \set qtags '\'' `echo "$GPCC_QUERY_TAGS"` '\''
    SET gpcc.query_tags to :qtags;
    

When you execute the benchmark test, you can watch the Query Monitor to see that transactions are assigned to workload tpch_group1 or tpch_group2 according to the assignment rules.

Scenario One Query Monitor

This is the resource group configuration for scenario two.

=# DROP RESOURCE GROUP tpch_group2;
=# ALTER RESOURCE GROUP tpch_group1 SET CPU_RATE_LIMIT 50;
=# ALTER RESOURCE GROUP tpch_group1 SET MEMORY_LIMIT 40;
=# ALTER RESOURCE GROUP tpch_group1 SET concurrency 40;
=# SELECT groupname, cpu_rate_limit, memory_limit, concurrency
    FROM gp_toolkit.gp_resgroup_config;
   groupname   | cpu_rate_limit | memory_limit | concurrency
---------------+----------------+--------------+-------------
 default_group | 25             | 30           | 20
 admin_group   | 10             | 10           | 10
 tpch_group1   | 50             | 40           | 40
(3 rows)

Set the GPCC_QUERY_TAGS environment variable for scenario two.

export GPCC_QUERY_TAGS='scenario=two'

Run the benchmark test again and compare the workload assignments on the Query Monitor. All transactions should be executed in the tpch_group1 resource group.

Scenario Two Query Monitor

Accessing Workload Assignment Rules with Database Functions

The Greenplum Database workload management extension (gp_wlm) creates two user-defined functions in the postgres database to retrieve or set the workload assignment rules. The Greenplum Command Center Workload Management user interface calls these functions to display the current JSON string and to save your edits.

You can call these functions to programmatically query or configure workload management rules.

Warning! The gpcc.workload_config table should only be accessed by using the gpcc.get_workload_config() and gpcc.set_workload_config() functions or the Command Center user interface. Do not drop the table while the workload management extension is enabled.

The gpcc.set_workload_config() function requires valid JSON syntax, but does not validate the workload management rules. You must ensure that the JSON value contains a version variable, and an assignmentRules array with correctly specified workload assignment rules.

gpcc.get_workload_config()
Retrieves the current workload assignment rules as a JSON value.

Example

postgres=# SELECT gpcc.get_workload_config();
            get_workload_config
-----------------------------------------------------
{
  "version": 1,
  "assignmentRules": [
    {
      "resourceGroupName": "default_group",
      "roleName": "gpmon",
      "queryTags": "transType=default;db=gpperfmon"
    },
    {
      "resourceGroupName": "etl",
      "roleName": "etl",
      "queryTags": "source=east",
      "disabled": false
    },
    {
      "resourceGroupName": "etl",
      "roleName": "etl",
      "queryTags": "source=west",
      "disabled": true
    },
    {
      "resourceGroupName": "ds_projects",
      "roleName": "ds_user",
      "queryTags": "app=madlib"
    }
  ]
}
(1 row)

gpcc.set_workload_config()
Sets the workload assignment rules. The argument is a valid JSON value containing the assignment rules. See JSON Parameters for descriptions of the parameters.

The gpcc.set_workload_config() function accepts any valid JSON value. You must ensure that the value contains a version element and a properly specified assignmentRules array.

If you call gpcc.set_workload_config() within a transaction, you must commit the transaction before the workload management extension begins to apply the new rules.

Example

postgres=# SELECT gpcc.set_workload_config('{
postgres'#   "version": 1,
postgres'#   "assignmentRules": [
postgres'#     {
postgres'#       "resourceGroupName": "default_group",
postgres'#       "roleName": "gpmon",
postgres'#       "queryTags": "transType=default;db=gpperfmon"
postgres'#     },
postgres'#     {
postgres'#       "resourceGroupName": "etl",
postgres'#       "roleName": "etl",
postgres'#       "queryTags": "source=east",
postgres'#       "disabled": false
postgres'#     },
postgres'#     {
postgres'#       "resourceGroupName": "etl",
postgres'#       "roleName": "etl",
postgres'#       "queryTags": "source=west",
postgres'#       "disabled": true
postgres'#     },
postgres'#     {
postgres'#       "resourceGroupName": "ds_projects",
postgres'#       "roleName": "ds_user",
postgres'#       "queryTags": "app=madlib"
postgres'#     }
postgres'#   ]
postgres'# }
postgres'# ');
 set_workload_config
---------------------
 t
(1 row)