Greenplum Command Center reports the total size for all spill files created for a query.
Greenplum Database creates spill files, also called workfiles, to save intermediate results when there is insufficient memory to execute a query in memory. Disk I/O is much slower than memory access, so a query that creates spill files will take longer to complete.
gp_toolkit schema contains views you can use to see details about spill file usage for current queries. You can see the number and sizes of spill files created for each operator in a query execution plan, and totals by query and segment. This is useful information to detect data skew and to help tune queries.
See the gp_toolkit Administrative Schema reference for descriptions of these views.
You can work to eliminate spill files by increasing the amount of memory available to the query or by optimizing the query to use the memory available more efficiently.
You may be able to revise the query to prevent spilling by eliminating or postponing memory-intensive operators.
Following are some ways to increase memory available to queries when resource group resource management is enabled in Greenplum Database.
- Decrease the resource group’s concurrency so that each query’s share of memory increases.
- Increase the resource group’s
MEMORY_SHARED_QUOTAparameter to increase the amount of resource group shared memory.
- Decrease the percentage of memory allocated to all resource groups to increase the amount of global shared memory.
When resource queue resource management is active, Greenplum Database can detect and terminate “runaway” queries that consume a high percentage of available memory. You can prevent runaway queries by limiting the number of spill files created or the total size of spill files created. See the
gp_workfile_limit* configuration parameters below for more information.
If you cannot prevent queries from spilling, it is important to ensure that the number of spill files created is mnimized and that problems such as CPU or data skew are found and corrected. Skew can create excessive numbers of spill files on one or more segments.
To minimize disk usage and I/O when spill files are created, make sure the
gp_workfile_compress_algorithm configuration parameter is set to ‘zlib’ and not 'none’.
Greenplum Database by default limits the number of spill files allowed per query for each segment to 100,000. You can raise or lower this limit, and you can also limit the number of spill files for all queries on a segment, and limit the disk space consumed by spill files per query and per segment. Use the following Greenplum Database server configuration parameters to manage spill files.
'none'. Set to
'zlib'to enable compression. Using compression reduces the number of I/O operations at the expense of increased CPU.