Structured Query Language/Rollup Cube
In the chapter Grouping we have seen that the key word GROUP BY creates groups of rows within a result set. Additionally aggregat functions like SUM()
computes condensed values for each of those groups.
Because GROUP BY can summarize by multiple columns, there is often the requirement to compute summary values for 'super-groups', which arise by successively omitting one column after the next from the GROUP BY specification.
Example Table
editTo illustrate the situation, we offer an example table and typical questions to such kind of tables.
CREATE TABLE car_pool (
-- define columns (name / type / default value / nullable)
id DECIMAL NOT NULL,
producer VARCHAR(50) NOT NULL,
model VARCHAR(50) NOT NULL,
yyyy DECIMAL NOT NULL CHECK (yyyy BETWEEN 1970 AND 2020),
counter DECIMAL NOT NULL CHECK (counter >= 0),
CONSTRAINT car_pool_pk PRIMARY KEY (id)
);
--
INSERT INTO car_pool VALUES ( 1, 'VW', 'Golf', 2005, 5);
INSERT INTO car_pool VALUES ( 2, 'VW', 'Golf', 2006, 2);
INSERT INTO car_pool VALUES ( 3, 'VW', 'Golf', 2007, 3);
INSERT INTO car_pool VALUES ( 4, 'VW', 'Golf', 2008, 3);
INSERT INTO car_pool VALUES ( 5, 'VW', 'Passat', 2005, 5);
INSERT INTO car_pool VALUES ( 6, 'VW', 'Passat', 2006, 1);
INSERT INTO car_pool VALUES ( 7, 'VW', 'Beetle', 2005, 1);
INSERT INTO car_pool VALUES ( 8, 'VW', 'Beetle', 2006, 2);
INSERT INTO car_pool VALUES ( 9, 'VW', 'Beetle', 2008, 4);
INSERT INTO car_pool VALUES (10, 'Toyota', 'Corolla', 2005, 4);
INSERT INTO car_pool VALUES (11, 'Toyota', 'Corolla', 2006, 3);
INSERT INTO car_pool VALUES (12, 'Toyota', 'Corolla', 2007, 2);
INSERT INTO car_pool VALUES (13, 'Toyota', 'Corolla', 2008, 4);
INSERT INTO car_pool VALUES (14, 'Toyota', 'Prius', 2005, 1);
INSERT INTO car_pool VALUES (15, 'Toyota', 'Prius', 2006, 1);
INSERT INTO car_pool VALUES (16, 'Toyota', 'Hilux', 2005, 1);
INSERT INTO car_pool VALUES (17, 'Toyota', 'Hilux', 2006, 1);
INSERT INTO car_pool VALUES (18, 'Toyota', 'Hilux', 2008, 1);
--
COMMIT;
In the table, there are two different car producers, 6 models and 4 years. Typical questions to such tables are:
- Number of cars per producer or per model.
- Number of cars per combination of some criteria like: producer plus model or producer plus year.
- Total number of cars (without any criteria).
ROLLUP
editAs we have seen, the keyword GROUP BY offers condensed data for exactly one grouping level, producer plus model in this case.
SELECT producer, model, sum(counter) as cnt
FROM car_pool
GROUP BY producer, model
ORDER BY producer, cnt desc;
--
Toyota Corolla 13
Toyota Hilux 3
Toyota Prius 2
VW Golf 13
VW Beetle 7
VW Passat 6
In such situations, one would like to know also the corresponding values for upper groups: per producer or for the whole table. This can be achieved by submitting a slightly different SELECT.
SELECT producer, sum(counter) as cnt
FROM car_pool
GROUP BY producer
ORDER BY producer, cnt desc;
--
Toyota 18
VW 26
--
--
SELECT sum(counter) as cnt
FROM car_pool;
--
44
In principle, it is possible, to combine such SELECTs via UNION or to submit them sequentially. But because this is a standard requirement, SQL offers a more elegant solution, namely the extension of the GROUP BY with the ROLLUP keyword. Based on the results of the GROUP BY, it offers additional rows for every superordinate group, which arises by omitting the grouping criteria one after the other.
SELECT producer, model, sum(counter) as cnt
FROM car_pool
GROUP BY ROLLUP (producer, model); -- the MySQL syntax is: GROUP BY producer, model WITH ROLLUP
--
Toyota Corolla 13
Toyota Hilux 3
Toyota Prius 2
Toyota 18 <-- the additional row per first producer
VW Beetle 7
VW Golf 13
VW Passat 6
VW 26 <-- the additional row per next producer
44 <-- the additional row per all producers
The simple GROUP BY clause creates rows at the level of producer plus model. The ROLLUP keyword leads to additional rows where first the model and then model and producer are omitted.
CUBE
editThe ROLLUP keyword offers solutions where a hierarchical point of view is adequate. But in data warehouse applications, one likes to navigate freely through the aggregated data, not only from top to bottom. To support this requirement, the SQL standard offers the keyword CUBE. It is an extension of ROLLUP and offers additional rows for all possible combinations of the GROUP BY columns.
In the case of our above example with the two columns producer and model, the ROLLUP has created rows for 'producer-only' and 'no criteria' (= complete table). Additional to that, CUBE creates rows for 'model-only'. (If different producer would use the same model-name, such rows will lead to only 1 additional row.)
SELECT producer, model, SUM(counter) AS cnt
FROM car_pool
GROUP BY CUBE (producer, model); -- not supported by MySQL
--
Toyota Corolla 13
Toyota Hilux 3
Toyota Prius 2
Toyota - 18
VW Beetle 7
VW Golf 13
VW Passat 6
VW - 26
- Beetle 7 <--
- Corolla 13 <--
- Golf 13 <-- additional rows for 'model-only'
- Hilux 3 <--
- Passat 6 <--
- Prius 2 <--
- - 44
If there are tree grouping columns c1, c2, and c3, the keywords lead to the following grouping.
GROUP BY: | (c1, c2, c3) |
GROUP BY ROLLUP: | (c1, c2, c3), (c1, c2), (c1) and () |
GROUP BY CUBE: | (c1, c2, c3), (c1, c2), (c1, c3), (c2, c3), (c1), (c2), (c3) and () |