Selectivity
The first measure, selectivity, represents a fraction of rows from a row set. The row set can be a base table, a view, or the result of a join or a
GROUP
BY
operator. The selectivity is tied to a query predicate, such as last_name
= 'Smith
', or a combination of predicates, such as last_name
= 'Smith
' AND
job_type
= 'Clerk
'. A predicate acts as a filter that filters a certain number of rows from a row set. Therefore, the selectivity of a predicate indicates how many rows from a row set will pass the predicate test. Selectivity lies in a value range from 0.0 to 1.0. A selectivity of 0.0 means that no rows will be selected from a row set, and a selectivity of 1.0 means that all rows will be selected.The estimator uses an internal default value for selectivity, if no statistics are available. Different internal defaults are used, depending on the predicate type. For example, the internal default for an equality predicate (
last_name
= 'Smith
') is lower than the internal default for a range predicate (last_name
> 'Smith
'). The estimator makes this assumption because an equality predicate is expected to return a smaller fraction of rows than a range predicate.When statistics are available, the estimator uses them to estimate selectivity. For example, for an equality predicate (
last_name
= 'Smith
'), selectivity is set to the reciprocal of the number n of distinct values of last_name
, because the query selects rows that all contain one out of n distinct values. If a histogram is available on the last_name
column, then the estimator uses it instead of the number of distinct values. The histogram captures the distribution of different values in a column, so it yields better selectivity estimates. Having histograms on columns that contain skewed data (in other words, values with large variations in number of duplicates) greatly helps the CBO generate good selectivity estimates.