Query Plan Execution
The Greenplum Database legacy and GPORCA query optimizers generate execution plans that produce the results requested by the query. A plan is a sequence of operators, such as table scans, joins, sorts, aggregates, and data motions.
When you click Run Explain on the query detail view, Command Center runs the SQL
EXPLAIN command for the query and displays the query execution plan next to the text of the query.
You can view the plan in text format or a graphical representation by clicking the Textual Explain or Plan & Progress tab. In the textual format, each plan node is flagged with an arrow (
->). In the graphical view, the nodes are represented with boxes that fill as the plan executes.
The plan executes from the bottom up. Each node in the plan performs an operation and passes results up to the next node in the plan.
Optimizer status: line reports whether the explain plan was generated using the GPORCA optimizer or the legacy query optimizer.
Each node contains this information:
- the type of operation to be performed
- cost estimates, a two-part estimate of the cost to produce the first row and the cost to produce all rows
- estimate of the number of rows to be produced by the operation
- width of the widest row produced by the operation
-> Hash Join (cost=0.00..1487.46 rows=375084 width=30)
A hash join operation joins two relations by producing a hash table on the join key of the smaller table and then scanning the larger table to find the matching rows.
This hash join example is estimated to produce 375084 rows. The widest row produced is 30 bytes.
The unit of cost is a single data block access, so cost is actually an estimate of the operator’s I/O activity. The estimated cost to produce all rows, the second number in the cost estimate, is cumulative; it includes the cost estimates from all earlier nodes in the plan.
Segments can work on portions of a query in parallel, each segment executing operators independently on their local data. When the plan requires exchanging data between segments, a data motion operator coordinates the data transfer between segments. The plan is divided into “slices” where these data motions occur.
A data motion node in a plan identifies the slice and the number of segments participating in the motion.
-> Broadcast Motion 4:4 (slice2; segments: 4) (cost=0.00..867.15 rows=10000 width=30)
In a broadcast motion, each segment broadcasts all of its rows for a table over the network so that every segment has a complete copy of the table. In this example, the broadcast motion marks the completion of
slice2 with four segments sending and four segments receiving.
Each segment has one or more backend processes working on a slice. Backend processes working on the same slice are called a “gang”.
Operators are processes that take as input database tables or the output from other operators, and perform some action to produce a transformed output.
A Function Scan node selects the partitions to scan. The function can be one of the following:
gp_partition_expansion- chooses all nodes
gp_partition_selection- chooses a partition with an equality expression
gp_partition_inversion- chooses partitions with a range expression
One table must be broadcast so that every segment can compare all rows of one table with the rows it has from the other table. This is expensive and is best used only for small tables.
One table must be broadcast so that every segment can compare all rows of one table with the rows it has from the other table.