Greenplum Command Center reports the total size for all spill files created for a query.
Greenplum Database creates spill files, also called workfiles, to save intermediate results when there is insufficient memory to execute a query in memory. Disk I/O is much slower than memory access, so a query that creates spill files will take longer to complete.
Managing Spill Files for Small Queries
For a time-critical query expected to return a small result set quickly, you can work to eliminate spill files by increasing the amount of memory available to the query or by optimizing the query to use the memory available more efficiently.
Make sure the query is not affected by data skew and that table statistics are up to date.
A query that performs memory-intensive operations like sorts, joins, and hashes on large tables may need to create spill files. For these queries, it is important to ensure that the number of spill files created is not excessive and that problems such as CPU or data skew are found and corrected. Skew can create excessive numbers of spill files on one or more segments.
Also, make sure the
gp_workfile_compress_algorithm configuration parameter is set to ‘zlib’ and not 'none’. Compressing spill files improves query performance by reducing I/O.
Investigating Spill File Usage
gp_toolkit schema contains views you can use to see details about spill file usage for current queries. You can see the number and sizes of spill files created for each operator in a query execution plan, and totals by query and segment. This is useful information to detect data skew and to help tune queries.
See the gp_toolkit Administrative Schema reference for descriptions of these views.
Limiting Spill Files
Greenplum Database by default limits the number of spill files allowed per query for each segment to 100,000. You can raise or lower this limit, and you can also limit the number of spill files for all queries on a segment, and limit the disk space consumed by spill files per query and per segment. Use the following Greenplum Database server configuration parameters to manage spill files.
'none'. Set to
'zlib'to enable compression. Using compression reduces the number of I/O operations at the expense of increased CPU.