Alerts

On the Admin> Alerts page, an administrator can set up alert rules to detect and respond to events occurring in the Greenplum Database system and in currently executing database queries. When a rule is matched, Command Center logs a record.

You can set up email alerts by configuring an SMTP server in Greenplum Database or in Command Center. Additionally, you can create a send_alert.sh shell script to forward alerts to other destinations, such as an SMS gateway or a Slack channel. If the script is present, Command Center runs it whenever an alert is raised.

Command Center creates the gpmetrics schema in the gpperfmon database to store both rules and log records. See gpmetrics Schema Reference for information about the gpcc_alert_rule and gpcc_alert_log tables in the gpmetrics schema.

This topic contains the following subtopics:

Configuring Alert Rules

Click EDIT to manage alert event rules. To enable an alert rule, enter any data required in the fields and check the box. Uncheck the box to disable the rule. Click SAVE when you have finished making changes to the alert configuration.

Alerts

Cluster Level

Database connectivity failure

An alert is raised when either of these conditions is detected:
  • Command Center is running on the master host, the master host is up, but the database is down or cannot be reached.
  • Command Center is running on the standby master host, the standby master host is up, but the master host is down or cannot be reached, or the database is down or cannot be reached.
  • Command Center checks three times before raising the alert. If the host where Command Center should be running is down, no alert is raised.

Number of connections exceeds [N]

An alert is raised when the total number of database connections exceeds the number specified. The number of connections is checked every 30 seconds. After an alert is raised, the metrics collector checks the number of connections every 30 minutes until the number of connections drops below the threshold, and then it resumes checking every 30 seconds.

Master Level

Master Panic

An alert is raised when messages with PANIC severity are detected at the master host. An PANIC error causes all database sessions to abort.

Master Fatal

An alert is raised when messages with FATAL severity are detected at the master host. A FATAL error causes the current session to abort. To reduce noise, fatal messages logged due to password authentication errors do not raise alerts.

Master CPU exceeds [%] for [N] min

An alert is raised when the CPU usage on the master host exceeds the specified percentage for the specified number of minutes. Command Center samples CPU usage on the master host every 15 seconds and calculates the mean of the samples.

Master Memory exceeds [%] for [N] min

An alert is raised when the percent of memory used on the master host exceeds the specified percentage for the specified number of minutes. Command Center samples memory usage on the master host every 15 seconds and calculates the mean of the samples. Only memory in use is considered; memory for buffers and cache is not included.

Segment Level

Segment failure

An alert is raised when one or more failed segments are detected. After the alert email is raised, Command Center will raise the alert every 30 minutes until the segments are recovered.

Total disk space exceeds [%] on a segment host

An alert is raised when the total of disk space in use for all segment hosts exceeds the specified percentage. Command Center gathers the available disk space and total disk space from each segment host in the Greenplum Database cluster. The percent of total disk space in use is calculated by the following formula:
     100 - sum(<available disk space>) / sum(<total disk space>) * 100
A disk space alert is raised no more than once every 24 hours.

Segment Hosts Avg CPU exceeds [%] for [N] min

An alert is raised when the average percent of CPU used for all segment hosts exceeds the specified percentage for the specified number of minutes. Command Center samples all segment hosts every 15 seconds and calculates the mean of the samples.

Segment Hosts Avg Memory exceeds [%] for [N] min

An alert is raised when the average memory for all segment hosts exceeds the specified percentage for the specified number of minutes. Command Center samples all segment hosts every 15 seconds and calculates the mean of the samples. Only memory in use is considered; memory for buffers and cache is not included.

Query Level

Out of memory errors

An alert is raised when an executing query fails with an out of memory (OOM) error. Note that no alert is raised if there is insufficient memory to start the query.

Spill files for a query exceeds [N] GB

An alert is raised when the total disk space consumed by a running query’s spill files exceeds the specified number of gigabytes. An alert is raised only once per query.

Query runtime exceeds [N] min

An alert is raised when a query runtime exceeds the number of minutes specified. This alert is raised just once for a query.

Query is blocked for [N] min

An alert is raised if a query remains in a blocked state for longer than the specified number of minutes. If an alert is raised, and then the query unblocks, runs, and blocks again for the specified time, an additional alert is raised. Blocked time excludes the time a query is queued before it runs. It is possible for a “Query runtime exceeds [N] min” rule to also trigger while a query is blocked.

Configuring Alert Email

Command Center requires an SMTP server to send alert emails.

Command Center first attempts an encrypted TLS connection and then falls back to an unencrypted connection if TLS is not supported. The SMTP server must support one of the following authentication methods: NONE, PLAIN, LOGIN, or CRAM-MD5. Command Center will use the most secure of these methods the SMTP server allows.

Configuring email With Command Center

Click EDIT in the Manage email configuration panel.

email config

The alert email configuration is set with the following Greenplum Database server configuration parameters:

SMTP Server address

The name or IP address of the SMTP server and the SMTP port number. The port number is typically 587 for connections with TLS encryption or 465 without encryption. Example: smtp.example.com:465

Username

The username of the account to authenticate with the SMTP server. This is an optional field, only required if the SMTP server requires a username for authentication. Example: gpcc-alerts@example.com

Password

The password for the SMTP username. For security, the password is masked. This field is optional, only needed if the SMTP server requires a username and password for authentication.

Send emails to

To add an address to the list, enter the address and press Enter. To remove an email address, click the X on the address.

From

The email address to use for the From: address in the alert email. Example: do-not-reply@example.com. If you leave this field blank, Command Center uses the default value, noreply-gpcc-alerts@pivotal.io.

When you click SAVE, Command Center sends a test email to the addresses in the Send emails to field. The email contains a list of the currently configured alert rules. If there is an error in the SMTP server or username/password configuration and the email cannot be sent, Command Center displays an error message.

Configuring email for Greenplum Database

The following server configuration parameters are used to configure SMTP email for Greenplum Database.

gp_email_smtp_server

The SMTP server and port. Example: smtp.example.com:465

gp_email_smtp_userid

The name of a user to authenticate with the SMTP service. Example: gpcc-alerts@example.com

gp_email_smtp_password

The password for the SMTP user.

gp_email_from

The email address to set as the email sender. Example: noreply-gpcc-alerts@example.com

gp_email_to

A semicolon-separated list of email addresses to receive alert messages. Example gpcc-admin@example.com;gpdb-admin@example.com

Command Center uses the gp_email_smtp_server, gp_email_smtp_userid, and gp_email_smtp_password parameters if they are set. It ignores the remaining parameters.

You can check the current value of a configuration parameter by running the gpconfig -s command on the master host, for example:

$ gpconfig -s gp_email_smtp_server

Use the gpconfig -c option to set the values of server configuration parameters, for example:

$ gpconfig -c gp_email_smtp_server -v "smtp.example.com:465"
$ gpconfig -c gp_email_smtp_userid -v "gpcc-alerts@example.com"
$ gpconfig -c gp_email_smtp_password -v "changeme"
$ gpconfig -c gp_email_from -v "gpcc-alerts@example.com"
$ gpconfig -c gp_email_to -v "gpcc-admin@example.com;gpdb-admin@example.com"

Run gpstop -u to reload the configuration files after changing these configuration parameters.

Creating a Send Alert Script

The send alert script is a shell script that you can use to send Command Center alerts to destinations such as SMS gateways, pagers, team collaboration tools like Slack, chat servers, archive files, alternative email servers, and so on. You can use the send alert script in addition to sending email from Command Center, or as an alternative to sending alert emails from Command Center.

Command Center looks for the script $MASTER_DATA_DIRECTORY/gpmetrics/send_alert.sh on the host where Command Center is running—either the master host or standby host. If the file exists and is executable by the gpadmin user, Command Center executes the script. The following variables are set on the command line when the script runs.

Variable Description
LINK URL of the Greenplum Command Center web server.
QUERYID ID of the query, if the alert was triggered by a query.
SERVERNAME Name of the Greenplum Command Center server.
QUERYTEXT The text of the query, if the alert was triggered by a query.
ACTIVERULENAME Current text of the rule, with user-specified values included.
LOGID Value of this alert’s id column in the gpmetrics.gpcc_alert_log table.
RULEDESCRIPTION Text of the rule, including user-specified values, at the time the alert was raised.
ALERTDATE Date the alert was raised.
ALERTTIME Time the alert was raised.
SUBJECT Subject line for email.

An example script that you can customize is provided at $GPCC_HOME/alert-email/send_alert.sh.sample. The example formats the alert as HTML email text and pipes it through the Linux mail command.

To set up a send alert script:

  1. Copy the $GPCC_HOME/alert-email/send_alert.sh.sample file to $MASTER_DATA_DIRECTORY/gpmetrics/send_alert.sh.

  2. Customize the script with code to format and deliver the alert to your desired destination.

  3. Run gpcc start to restart Command Center and enable the script.

SNMP Alert Script Example

SNMP support is discontinued in Greenplum Database 6, but you can use a Command Center alert script to send alerts to an SNMP manager.

This example alert script sends Command Center alerts as SNMP notifications in a format similar to Greenplum Database 5 SNMP notifications, using the SNMP OIDs for Greenplum Database as arguments.

This is the OID hierarchy structure for Greenplum Database:

iso(1)
identified-organization(3)
dod(6)
internet(1)
private(4)
enterprises(1)
gpdbMIB(31327)
gpdbObjects(1)
gpdbAlertMsg(1)

The following table lists the Greenplum Database OIDs:

Name Address Type: Description
gpdbAlertMsg 1.3.6.1.4.1.31327.1.1 string: alert message text
gpdbAlertSeverity 1.3.6.1.4.1.31327.1.2 integer: severity level
gpdbAlertSqlState 1.3.6.1.4.1.31327.1.3 string: SQL standard error codes
gpdbAlertDetail 1.3.6.1.4.1.31327.1.4 string: detailed alert message text
gpdbAlertSqlStmt 1.3.6.1.4.1.31327.1.5 string: SQL statement generating this alert, if applicable
gpdbAlertSystemName 1.3.6.1.4.1.31327.1.6 string: hostname

Command Center sets environment variables that can be mapped to the gpdbAlertMsg, gpdbAlertSqlStmt, and gpdbAlertSystemName OIDs. For the other OIDs, the script sends an empty string or 0, depending on data type.

Prerequisites

The SNMP service must be configured and enabled on the host where Command Center is running. If the snmpd file is not present in the /usr/sbin directory, then SNMP is not installed on the system.

To install SNMP on Red Hat and CentOS, use yum to install the net-snmp, net-snmp-libs, and net-snmp-utils packages. Configure SNMP following the operating system documentation. This example script specifies the SNMP community public, but you can set a different community string with the rocommunity parameter in the /etc/snmp/snmpd.conf file.

Start the snmpd daemon with the systemctl command:

$ sudo systemctl enable snmpd.service

Verify the system snmpd daemon is running with this command:

$ sudo snmpwalk -v 1 -c public localhost .1.3.6.1.2.1.1.1.0

If the daemon is running, the snmpwalk command output is similar to the following:

SNMPv2-MIB::sysDescr.0 = STRING: Linux gp-mdw 3.10.0-957.12.2.el7.x86_64 #1 SMP Tue May 14 21:24:32 UTC 2019 x86_64

Setting up SNMP Notifications

  1. As the gpadmin user, create the file $MASTER_DATA_DIRECTORY/gpmetrics/send_alert.sh with the following contents:

    RUNNINT_TIME=`psql -c 'select extract(epoch from (current_timestamp - pg_postmaster_start_time())) * 100;'|sed -n '3p'|awk '{print $1}'`
    HOST=`hostname`
    snmptrap -v 2c -c public \
        {##trapserver##} \
        "${RUNNINT_TIME}" \
        "enterprises.31327.5.0.1" \
        SNMPv2-SMI::enterprises.31327.1.1 s "${RULEDESCRIPTION}" \
        SNMPv2-SMI::enterprises.31327.1.2 i 0 \
        SNMPv2-SMI::enterprises.31327.1.3 s "" \
        SNMPv2-SMI::enterprises.31327.1.4 s "" \
        SNMPv2-SMI::enterprises.31327.1.5 s "${QUERYTEXT}" \
        SNMPv2-SMI::enterprises.31327.1.6 s "${HOST}"
    

    Replace {##trapserver##} with the name of the host running snmpd.

  2. Make the script executable.

    $ chmod +x $MASTER_DATA_DIRECTORY/gpmetrics/send_alert.sh
    
  3. Restart Command Center to enable the script.

    $ gpcc start
    

Testing the SNMP Alert Script

To test the script, you can use the snmptrapd trap receiver.

  1. As root, edit the /etc/snmp/snmptrap.conf file and add the following line to allow the daemon to receive unauthenticated notifications:

    disableAuthorization yes
    
  2. Start the snmptrapd daemon with this command:

    # /usr/sbin/snmptrapd -m ALL -Lf ~/snmptest.log
    

Check the /home/root/snmptest.log file for notifications.