Aggregation¶
You can use the following five aggregate functions for declarative queries: sum()
, count()
, min()
, max()
, avg()
and group_concat()
. Let’s see some examples of simple queries using these functions.
Total GPA of students from group 101:
sum(s.gpa for s in Student if s.group.number == 101)
Number of students with a GPA above three:
count(s for s in Student if s.gpa > 3)
First name of a student, who studies philosophy, sorted alphabetically:
min(s.name for s in Student if "Philosophy" in s.courses.name)
Birth date of the youngest student in group 101:
max(s.dob for s in Student if s.group.number == 101)
Average GPA in department 44:
avg(s.gpa for s in Student if s.group.dept.number == 44)
Names of students of group 101 joined by comma:
group_concat(s.name for s in Student if s.group.number == 101)
Note
Although Python already has the standard functions sum()
, count()
, min()
, and max()
, Pony adds its own functions under the same names. Also, Pony adds its own avg()
and group_concat()
functions. These functions are implemented in the pony.orm
module and they can be imported from there either “by the star”, or by its name.
The functions implemented in Pony expand the behavior of standard functions in Python; thus, if in a program these functions are used in their standard way, the import will not affect their behavior. But it also allows specifying a declarative query inside the function.
If one forgets to import these functions from the pony.orm
package, then an error will appear upon use of the Python standard functions sum()
, count()
, min()
, and max()
with a declarative query as a parameter:
TypeError: Use a declarative query in order to iterate over entity
Aggregate functions can also be used inside a query. For example, if you need to find not only the birth date of the youngest student in the group, but also the student himself, you can write the following query:
select(s for s in Student if s.group.number == 101
and s.dob == max(s.dob for s in Student
if s.group.number == 101))
Or, for example, to get all groups with an average GPA above 4.5:
select(g for g in Group if avg(s.gpa for s in g.students) > 4.5)
This query can be shorter if we use Pony attribute lifting feature:
select(g for g in Group if avg(g.students.gpa) > 4.5)
And this query shows all tags for article
select((article, group_concat(article.tags)) for article in Aricle)
Query object aggregate functions¶
You can call the aggregate methods of the Query
object:
select(sum(s.gpa) for s in Student)
Is equal to the following query:
select(s.gpa for s in Student).sum()
Here is the list of the aggregate functions:
Quety.group_concat()
Several aggregate functions in one query¶
SQL allows you including several aggregate functions in the same query. For example, we might want to receive both the lowest and the highest GPA for each group. In SQL, such a query would look like this:
SELECT s.group_number, MIN(s.gpa), MAX(s.gpa)
FROM Student s
GROUP BY s.group_number
This query will return the lowest and the highest GPA for each group. With Pony you can use the same approach:
select((s.group, min(s.gpa), max(s.gpa)) for s in Student)
Function count
¶
Aggregate queries often need to calculate the quantity of something. Here is how we get the number of students in Group 101:
count(s for s in Student if s.group.number == 101)
The number of students in each group related to the department 44:
select((g, count(g.students)) for g in Group if g.dept.number == 44)
or this way:
select((s.group, count(s)) for s in Student if s.group.dept.number == 44)
In the first example the aggregate function count()
receives a collection, and Pony will translate it into a subquery. (Actually, this subquery will be optimized by Pony and will be replaced with LEFT JOIN
).
In the second example, the function count()
receives a single object instead of a collection. In this case Pony will add a GROUP BY
section to the SQL query and the grouping will be done on the s.group
attribute.
If you use the count()
function without arguments, this will be translated to SQL COUNT(*)
. If you specify an argument, it will be translated to COUNT(DISTINCT column)
.
Conditional count
¶
There is another way of using the count()
function. Let’s assume that we want to get three numbers for each group:
The number of students that have a GPA less than 3
The number of students with GPA between 3 to 4
The number of students with GPA higher than 4
The query can be constructed this way:
select((g, count(s for s in g.students if s.gpa <= 3),
count(s for s in g.students if s.gpa > 3 and s.gpa <= 4),
count(s for s in g.students if s.gpa > 4)) for g in Group)
Although this query will work, it is pretty long and not very effecive - each count
will be translated into a separate subquery. For such situations, Pony provides a “conditional COUNT” syntax:
select((s.group, count(s.gpa <= 3),
count(s.gpa > 3 and s.gpa <= 4),
count(s.gpa > 4)) for s in Student)
This way, we put our condition into the count()
function. This query will not have subqueries, which makes it more effective.
Note
The queries above are not entirely equivalent: if a group doesn’t have any students, then the first query will select that group having zeros as the result of count()
, while the second query simply will not select the group at all. This happens because the second query selects the rows from the table Student, and if the group doesn’t have any students, then the table Student will not have any rows for this group.
If you want to get rows with zeros, then an effective SQL query should use the left_join()
function:
left_join((g, count(s.gpa <= 3),
count(s.gpa > 3 and s.gpa <= 4),
count(s.gpa > 4)) for g in Group for s in g.students)
More sophisticated aggregate queries¶
Using Pony you can do even more complex grouping. For example, you can group by an attribute part:
select((s.dob.year, avg(s.gpa)) for s in Student)
The birth year in this case is not a distinct attribute – it is a part of the dob
attribute.
You can have expressions inside the aggregate functions:
select((item.order, sum(item.price * item.quantity))
for item in OrderItem if item.order.id == 123)
Here is another way of making the same query:
select((order, sum(order.items.price * order.items.quantity))
for order in Order if order.id == 123)
In the second case, we use the attribute lifting concept. The expression order.items.price
creates an array of prices, while order.items.quantity
generates an array of quantities. As the result, in this example, we’ll have the sum of quantity multiplied by the price for each order item.
Queries with HAVING¶
The SELECT
statement has two different sections which are used for conditions: WHERE
and HAVING
. The WHERE
section is used more often and contains conditions which will be applied to each row. If a query contains aggregate functions, such as MAX
or SUM
, the SELECT
statement may also contain GROUP BY
and HAVING
sections. The conditions of the HAVING
section are applied after grouping the SQL query results. Typically the conditions of the HAVING
section always contain aggregate functions, while conditions in the WHERE
section may only contain aggregate functions inside a subquery.
When you write a query which contains aggregate functions, Pony needs to determine if the resulting SQL will contain the GROUP BY
and HAVING
sections and where it should put each condition from the Python query. If a condition contains an aggregate function, Pony places the condition into the HAVING
section. Otherwise it places the condition into the WHERE
section.
Consider the following query, which returns the tuples (Group
, count_of_students):
select((s.group, count(s)) for s in Student
if s.group.dept.number == 44 and avg(s.gpa) > 4)
In this query we have two conditions. The first condition is s.group.dept.number == 44
. Since it doesn`t include an aggregate function, Pony will place this condition into the WHERE
section. The second condition avg(s.gpa) > 4
contains the aggregate function avg
and will be placed into the HAVING
section.
Another question is what columns Pony should add to the GROUP BY
section. According to the SQL standard, any non-aggregated column which placed into the SELECT
statement should be added to the GROUP BY
section too. Let’s consider the following query:
SELECT A, B, C, SUM(D), MAX(E), COUNT(F)
FROM T1
WHERE ...
GROUP BY ...
HAVING ...
According to the SQL standard, we need to include the columns A
, B
and C
into the GROUP BY
section, because these columns appear in the SELECT
list and don’t wrapped with any aggregate function. Pony does exactly this. If your aggregated Pony query returns a tuple with several expressions, any non-aggregated expression will be placed into the GROUP BY
section. Let’s consider the same Pony query again:
select((s.group, count(s)) for s in Student
if s.group.dept.number == 44 and avg(s.gpa) > 4)
This query returns the tuples (Group
, count_of_students). The first element of the tuple, the Group
instance, is not aggregated, so it will be placed into the GROUP BY
section:
SELECT "s"."group", COUNT(DISTINCT "s"."id")
FROM "Student" "s", "Group" "group-1"
WHERE "group-1"."dept" = 44
AND "s"."group" = "group-1"."number"
GROUP BY "s"."group"
HAVING AVG("s"."gpa") > 4
The s.group
expression was placed into the GROUP BY
section, and the condition avg(s.gpa) > 4
was placed into the HAVING
section of the query.
Sometimes the condition which should be placed into the HAVING
section contains some non-aggregated columns. Such columns will be added to the GROUP BY
section, because according to the SQL standard it is forbidden to use a non-aggregated column inside the HAVING
section, if it was not added to the GROUP BY
list.
Another example:
select((item.order, item.order.total_price,
sum(item.price * item.quantity))
for item in OrderItem
if item.order.total_price < sum(item.price * item.quantity))
This query has the following condition: item.order.total_price < sum(item.price * item.quantity)
, which contains an aggregate function and should be added to the HAVING
section. But the part item.order.total_price
is not aggregated. Hence, it will be added to the GROUP BY
section in order to satisfy the SQL requirements.
Aggregate functions in order by section¶
The aggregate functions can be used inside the Query.order_by()
function. Here is an example:
select((s.group, avg(s.gpa)) for s in Student) \
.order_by(lambda s: desc(avg(s.gpa)))
Another way of ordering by an aggregated value is specifying the position number inside the Query.order_by()
method:
select((s.group, avg(s.gpa)) for s in Student).order_by(-2)