Friday, August 21, 2020

What Is The Difference Between GROUP BY and Windowing Functions? PART II

     Last time we covered the basic differences between GROUP BY and Windowing functions. We ended up with this query:

select distinct count(ID) over (partition by District) as count, 
    count(CountryCode) over (partition by District) as '#', 
    SUM(Population) over (partition by District)  as 'Pop City',        SUM(Population) over (partition by CountryCode) as 'Pop Country' from city 
where CountryCode IN ('USA','BEL') 
order by CountryCode,District;

    Which I admit is a little messy. We have three windows over rows with two separate qualifiers.  One of the nice things about Windowing Functions is that they clean up so nicely and also make them easier to read. What we need to do is break out the 'over (...) ' sections.  Let us call the 'partition by District' as D_win and partition by CountryCode as C_win.  

select distinct count(ID) over D_win as count, 
    count(CountryCode) over D_win as '#', 
    SUM(Population) over D_win  as 'Pop City', 
    SUM(Population) over C_win  as 'Pop Country' 
from city 
where CountryCode IN ('USA','BEL') 
window C_win as (partition by CountryCode),
D_win as (partition by District)
order by CountryCode,District;

    This is much easier to comprehend and will be much easier to modify.  

GROUP BY and ORDER BY Mutually Exclusive No More!

    Now back to GROUP BY.  

    Prior to MySQL 8.0.12 there was something a few folks tripped overyou cannot also use an ORDER BY clause to sort the results. In other words, ROLLUP and ORDER BY were mutually exclusive in MySQL. But this as changed in '12 and you can no send queries like this:

select region, 
    count(name) as '# counties'  
from country 
group by Region with rollup 
order by Region;
| region                    | # counties |
| NULL                      |        239 |
| Antarctica                |          5 |
| Australia and New Zealand |          5 |
| Baltic Countries          |          3 |
| British Islands           |          2 |
| Caribbean                 |         24 |
| Central Africa            |          9 |
| Central America           |          8 |
| Eastern Africa            |         20 |
| Eastern Asia              |          8 |
| Eastern Europe            |         10 |
| Melanesia                 |          5 |
| Micronesia                |          7 |
| Micronesia/Caribbean      |          1 |
| Middle East               |         18 |
| Nordic Countries          |          7 |
| North America             |          5 |
| Northern Africa           |          7 |
| Polynesia                 |         10 |
| South America             |         14 |
| Southeast Asia            |         11 |
| Southern Africa           |          5 |
| Southern and Central Asia |         14 |
| Southern Europe           |         15 |
| Western Africa            |         17 |
| Western Europe            |          9 |

    The reader of the output still has to know that the NULL entry under the region column is the rolled up total. And a lot of casual readers will be confused by having the total on the first line instead of the traditional placement on the last time. Now you can get the total on the bottom by adding DESC to the ORDER BY clause but now the regions are not ascending alpha order.  Dang trade offs.

    One thing to note is the query cost for the GROUP BY version is 264.40  (and yes these timing are relative to my hardware and should only be used for rough comparisions)

    But lets look at a Windowing Function version of this query:

select  distinct Region, 
        count(Name) over (partition by Region) 
from Country 
order by Region;

    The query cost here is 503.40!  Ouch!  The sort cost alone is 239.0! But Windowing Functions let you move the ORDER BY clause.

select    distinct Region, 
          count(Name) over (partition by Region order by Region) 
from Country;

    And the query cost is again 264.40. Yup, the same as the GROUP BY version.

    We do not have the ROLLUP functionality but as you saw in the last blog that it would not be hard to add.  First let us clean up the window definition.

select distinct Region,    
       count(Name) over w1 
from Country 
    w1 as (partition by Region order by Region);

    Then add a new window definition:

select distinct Region, 
       count(Name) over w1, 
       count(Name) over w2 
from Country 
    w1 as (partition by Region order by Region), 
    w2 as ();

    And the query cost is still 264.40.