Spill Files

Greenplum Command Center reports the total size for all spill files created for a query.

Greenplum Database creates spill files, also called workfiles, to save intermediate results when there is insufficient memory to execute a query in memory. Disk I/O is much slower than memory access, so a query that creates spill files will take longer to complete.

Managing Spill Files for Small Queries

For a time-critical query expected to return a small result set quickly, you can work to eliminate spill files by increasing the amount of memory available to the query or by optimizing the query to use the memory available more efficiently.

Make sure the query is not affected by data skew and that table statistics are up to date.

Large Queries

A query that performs memory-intensive operations like sorts, joins, and hashes on large tables may need to create spill files. For these queries, it is important to ensure that the number of spill files created is not excessive and that problems such as CPU or data skew are found and corrected. Skew can create excessive numbers of spill files on one or more segments.

Also, make sure the gp_workfile_compress_algorithm configuration parameter is set to ‘zlib’ and not 'none’. Compressing spill files improves query performance by reducing I/O.

Investigating Spill File Usage

The gp_toolkit schema contains views you can use to see details about spill file usage for current queries. You can see the number and sizes of spill files created for each operator in a query execution plan, and totals by query and segment. This is useful information to detect data skew and to help tune queries.

See the gp_toolkit Administrative Schema reference for descriptions of these views.

Limiting Spill Files

Greenplum Database by default limits the number of spill files allowed per query for each segment to 100,000. You can raise or lower this limit, and you can also limit the number of spill files for all queries on a segment, and limit the disk space consumed by spill files per query and per segment. Use the following Greenplum Database server configuration parameters to manage spill files.

gp_workfile_limit_files_per_query
Sets the maximum number of spill files allowed per query per segment. Default is 100,000.

gp_workfile_limit_per_query
Sets the maximum disk size an individual query is allowed to use for spill files at each segment. The default value is 0, which means no limit is enforced.

gp_workfile_limit_per_segment
Sets the maximum total disk size that all running queries are allowed to use for creating spill files at each segment. The default value is 0, which means a limit is not enforced.

gp_workfile_compress_algorithm
Specifies the compression algorithm to use for spill files when a hash aggregation or hash join operation spills to disk during query processing. The default is 'none'. Set to 'zlib' to enable compression. Using compression reduces the number of I/O operations at the expense of increased CPU.

See also Managing Spill Files Generated by Queries.