In this tutorial, you will learn how to use the SQL GROUP BY statement ot group your data into buckets according to their value in a specific column. You will also learn you to use aggregate functions to perform calculations on database data based on the data’s category iin the GROUP BY statement.

GROUP BY statements are one of the most important concepts for SQL programmers to understand. They are also fairly complex. Accordingly, be sure to understand the concepts from this lesson before proceeding through this course.

## What is a SQL GROUP BY Statement?

The SQL GROUP BY statement allows us to easily aggregate columns into categories. Let’s explore this idea with a simple example from our DVD Rental database.

We saw in our last lecture that within the `film`

table, there is a column called `rental_rate`

that contains the rental prices for each movie in the table. The `rental-rate`

column only holds three different values:

- $0.99
- $2.99
- $4.99

We can treat each of these rental rates as its own category using a GROUP BY statement. Once we have used the GROUP BY statement to create categories, we use an aggregate function to perform calculations on the values within those categories.

Let’s consider a generalized example. Here is the syntax for a typical GROUP BY statement:

```
SELECT category_column, AGG(data_column)
FROM table_name
GROUP BY category_column;
```

In this SQL example, `category_column`

is the column that we are using to define our categories. `data_column`

is the column that we are performing aggregate functions on, while AGG is a hypothetical aggregate function. `table_name`

is the table that contains both `category_column`

and `data_column`

.

It is important to note that a GROUP BY clause must appear right after a FROM statement or a WHERE statement. If this is not the case, your query will return an error.

This will make more sense once we have actually learned some aggregate functions. Let’s explore SQL aggregate functions next.

## What Are Aggregate Functions?

You cannot use the SQL GROUP BY statement without pairing it with an aggregate function. With that in mind, let’s learn what aggregate functions are and how they interact with the GROUP BY clause.

The purpose of aggregate functions is to take multiple inputs and return a single output.

PostgreSQL includes a number of aggregate functions. You can read the PostgreSQL documentation for a complete list of aggregate functions

In this section, I will introduce some of the most important aggregate functions and provide examples of how they function using our DVD Rental database. Specifically, I will introduce the following functions:

- AVG
- COUNT
- MAX
- MIN
- SUM

## The SQL AVG Function

The SQL AVG function allows you to easily calculate the average value of section of values.

For example, here is how you would calculate the average value of the `rental_rate`

column from the `film`

table in our DVD Rental database:

```
SELECT AVG(rental_rate)
FROM film;
```

Here is what the output of this SQL query looks like in pgAdmin:

As you can see, the average value in the `rental_rate`

column is $2.98.

We can pair the AVG function with the GROUP BY clause to generate average values within categories.

As an example, consider the `rental_duration`

column within the `film`

table. This column contains integer values from 3 to 7 that describe how long a customer can rent a specific movie for.

Let’s write a SQL query using GROUP BY that calculates the average value of `rental_rate`

for movies in each `rental_duration`

category:

```
SELECT rental_duration, ROUND(AVG(rental_rate),2)
FROM film
GROUP BY rental_duration;
```

Note that I have also wrapped the AVG function in a ROUND function with a second argument of `2`

to make the output of the query more readable.

Here’s what the output of this query looks like:

For the other aggregate functions that we explore in this lesson, we will follow a similar recipe of categorizing movies into different buckets based on `rental_duration`

, and then applying aggregate functions to those categories.

### The SQL COUNT Function

The SQL COUNT function combined with the GROUP BY clause allows you to easily count the number of items contained in each category.

We have already worked with the SQL COUNT function earlier in this course to count all of the rows within a specific column. However, we have not yet paired it with the GROUP BY statement, which has limited its capability.

The COUNT function simply counts the number of rows returned by a given SQL query. This means it does not matter which column the COUNT method is applied to. By convention, it is considered a best practice to simply use `COUNT(*)`

in any SQL query.

Let’s look at an example where we categorize the rows of the `film`

table based on `rental_duration`

and count the number of rows in each category:

```
SELECT rental_duration, COUNT(*)
FROM film
GROUP BY rental_duration;
```

Here is the output of this SQL query:

### The SQL MAX Function

The SQL MAX function allows you to easily determine the maximum value of a column.

As an example, we could determine the maximum value of the `length`

column of the `film`

table with the following SQL query:

```
SELECT MAX(length) FROM film;
```

Here is the output of this query:

As you can see, the film with the maximum length is 185 minutes long.

The SQL MAX function can be used with GROUP BY to determine the maximum values of different categories. Let’s again divide our `film`

table into categories based on `rental_duration`

and try and find the longest film within each category.

We can answer this question with the following query:

```
SELECT rental_duration, MAX(length)
FROM film
GROUP BY rental_duration;
```

The output of this code is below:

### The SQL MIN Function

Like the SQL MAX function, the SQL MIN Function allows you easily find the smallest value within a table column.

As an example, we can find the shortest movie in the `film`

table with the following query:

```
SELECT MIN(length) FROM film;
```

As you can see, the shortest film in our DVD Rental database is 46 minutes long.

Let’s explore how we can pair the SQL MIN function with the GROUP BY clause by finding the shortest film in each `rental_duration`

category. The following query would answer this question for us:

```sql

SELECT rental_duration, MIN(length)

FROM film

GROUP BY rental_duration;

```
Here is the output of this query:
![Example of the SQL MIN Function with GROUP BY](/images/sql-course/sql-group-by/sql-min-group-by-example.png)
### The SQL SUM Function
The SQL SUM function performs similarly to the other aggregate functions explored in this lesson. Namely, it allows us to calculate the sum of either a column (when applied to an entire column) or a category (when combined with the GROUP BY clause).
Here is an example of how we could use the SUM function to calculate how long it would take to watch every movie in the DVD Rental database:
```sql
SELECT SUM(length) FROM film;
```

Here is the output of this query:

As you can surely imagine by now, we can also combine the SUM function with the GROUP BY clause to calculate the sum of different categories within our table.

Here is how you could calculate the sum of the `length`

column for every `rental_duration`

category in a SQL query:

```
SELECT rental_duration, SUM(length)
FROM film
GROUP BY rental_duration;
```

Here is the output of this code:

## Final Thoughts

In this lesson, we explored how to use aggregate functions to take in multiple values, perform a calculation, and return a single value. Specifically, we worked with the following aggregate functions:

- AVG
- COUNT
- MAX
- MIN
- SUM

In the next section of this course, we’ll explore the SQL HAVING keyword in detail.