MySQL / Reading and querying data
How to filter query results in MySQL
Introduction
The SELECT
command is the primary means of retrieving data from a MySQL database. While the basic command allows you to specify the columns you want to display, the table to pull from, and the output format to use, much of the power of SELECT
comes from its ability to filter results.
Filtering queries allows you to return only the results that you're interested in by providing specific criteria that the records must match. There are many different ways to filter queries in SQL and in this guide, we'll introduce some of the most common filtering options available for your MySQL databases: WHERE
, GROUP BY
, HAVING
, and LIMIT
.
By familiarizing yourself with these optional clauses, you can learn to construct queries that target the correct data, even in databases with many records.
Using the WHERE
clause to define match criteria
One of the most flexible and most common ways of specifying your data requirements is with the WHERE
clause. The WHERE
clause provides a way of specifying the requirements that a record must meet to match the query. If a record does not satisfy all of the conditions specified by the WHERE
clause, it is not included in the query results.
The WHERE
clause works by specifying boolean expressions that are checked against each candidate row of data. If the result of the expression is false, the row will be removed from the results and will not be returned or continue to the next stage of processing. If the result of the expression is true, it satisfies the criteria of the search and will continue on for any further processing as a candidate row.
The basic syntax of the WHERE
clause looks like this:
SELECT * FROM <table> WHERE <condition>;
The <condition>
can be anything that results in a boolean value. MySQL does not have a dedicated builtin boolean type and uses the TINYINT
type to express boolean values instead. MySQL recognizes BOOLEAN
and BOOL
as aliases for the TINYINT
type.
Because of this implementation, nonzero values are considered true, while 0
is considered false. To handle the reverse case, the constant TRUE
is an alias for 1
, while FALSE
likewise is an alias for 0
.
Conditions are often formed using one or more of the following operators:
=
: equal to>
: greater than<
: less than>=
: greater than or equal to<=
: less than or equal to<>
or!=
: not equal<=>
:NULL
-safe equal to (returns 1 if both values areNULL
and 0 if just one value isNULL
)AND
: the logical "and" operator — joins two conditions and returnsTRUE
if both of the conditions areTRUE
OR
: logical "or" operator — joins two conditions and returnsTRUE
if at least one of the conditions areTRUE
IN
: value is contained in the list, series, or range that followsBETWEEN
: value is contained within the range the minimum and maximum values that follow, inclusiveIS NULL
: matches if value isNULL
NOT
: negates the boolean value that followsEXISTS
: the query that follows contains resultsLIKE
: matches against a pattern (using the wildcards%
to match 0 or more characters and_
to match a single character)REGEXP
orREGEXP_LIKE()
: matches against a pattern using regular expressionsSTRCMP
: Compares strings using lexicographical sort to determine which value comes first.
While the above list represents some of the most common test constructs, there are many other operators that may yield boolean results that can be used in conjunction with a WHERE
clause.
Prisma Client supports filtering by multiple criteria. Check out our documentation on filtering to learn more.
Examples using WHERE
One of the most common and straightforward checks is for equality, using the =
operator. Here, we check whether each row in the customer
table has a last_name
value equal to Smith
:
SELECT * FROM customer WHERE last_name = 'Smith';
We can add additional conditions to this to create compound expressions using logical operators. This example uses the AND
clause to add an additional test against the first_name
column. Valid rows must satisfy both of the given conditions:
SELECT * FROM customer WHERE first_name = 'John' AND last_name = 'Smith';
Similarly, we can check whether any of a series of conditions are met. Here, we check rows from the address
table to see whether the zip_code
value is equal to 60626 or the neighborhood
column is equal to the string "Roger's Park":
SELECT * FROM address WHERE zip_code = '60626' OR neighborhood = "Roger's Park";
The IN
operator can work like an comparison between a number of values, wrapped in parentheses. If there is a match with any of the given values, the expression is TRUE
:
SELECT * FROM customer WHERE last_name IN ('Smith', 'Johnson', 'Fredrich');
Here, we check against a string pattern using LIKE
. The %
works as a wildcard matching zero or more characters, so "Pete", "Peter", and any other string that begins with "Pete" would match:
SELECT * FROM customer WHERE last_name LIKE 'Pete%';
We could do a similar search using the REGEXP
function to check for matches using regular expressions. In this case, we check whether the value of last_name
begins with a "d" and contains the substring "on", which would match names like "Dickson", "Donald", and "Devon":
SELECT * FROM customer WHERE last_name REGEXP '^D.*on.*';
We can check whether a street number is within the 4000 block of addresses using the BETWEEN
and AND
operators to define an inclusive range:
SELECT * FROM address WHERE street_number BETWEEN 4000 AND 4999;
Here, we can display any customer
entries that have social security numbers that are not 9 digits long. We use the LENGTH()
function to get the number of digits in the field and the <>
to check for inequality:
SELECT * FROM customer WHERE LENGTH(SSN) <> 9;
Using the GROUP BY
clause to summarize multiple records
The GROUP BY
clause is another very common way to filter results by representing multiple results with a single row. The basic syntax of the GROUP BY
clause looks like this:
SELECT <columns> FROM <table> GROUP BY <columns_to_group>;
When a GROUP BY
clause is added to a statement, it tells MySQL to display a single row for each unique value for the given column or columns. This has some important implications.
Since the GROUP BY
clause is a way of representing multiple rows as a single row, MySQL can only execute the query if it can calculate a value for each of the columns it is tasked with displaying. This means that each column identified by the SELECT
portion of the statement has to either be:
- included in the
GROUP BY
clause to guarantee that each row has a unique value - abstracted to summarize all of the rows within each group
Practically speaking, this means that any columns in the SELECT
list not included in the GROUP BY
clause must use an aggregate function to produce a single result for the column for each group.
If you are connecting to your database with Prisma Client, you can use aggregations to compute over and summarize values.
Examples using GROUP BY
For the examples in this section, suppose that we have a table called pet
that we've defined and populated like so:
CREATE TABLE pet (id SERIAL PRIMARY KEY,type VARCHAR(50),name VARCHAR(50),color VARCHAR(50),age INT);INSERT INTO pet (type, name, color, age) VALUES('dog', 'Spot', 'brown', 3),('dog', 'Rover', 'black', 7),('dog', 'Sally', 'brown', 1),('cat', 'Sabrina', 'black', 8),('cat', 'Felix', 'white', 4),('cat', 'Simon', 'orange', 8),('rabbit', 'Buttons', 'grey', 4),('rabbit', 'Bunny', 'brown', 8),('rabbit', 'Briony', 'brown', 6);
The simplest use of GROUP BY
is to display the range of unique values for a single column. To do so, use the same column in SELECT
and GROUP BY
. Here, we see all of the colors used in the table:
SELECT color FROM pet GROUP BY color;
+--------+color |+--------+brown |black |white |orange |grey |+--------+5 rows in set (0.00 sec)
As you move beyond a single column in the SELECT
column list, you must either add the columns to the GROUP BY
clause or use an aggregate function to produce a single value for the group of rows being represented.
Here, we add type
to the GROUP BY
clause, meaning that each row will represent a unique combination of type
and color
values. We also add the age
column, summarized by the avg()
function to find the average age of each of the groups:
SELECT type, color, avg(age) AS average_age FROM pet GROUP BY type, color;
+--------+--------+-------------+type | color | average_age |+--------+--------+-------------+dog | brown | 2.0000 |dog | black | 7.0000 |cat | black | 8.0000 |cat | white | 4.0000 |cat | orange | 8.0000 |rabbit | grey | 4.0000 |rabbit | brown | 7.0000 |+--------+--------+-------------+7 rows in set (0.00 sec)
Aggregate functions work just as well with a single column in the GROUP BY
clause. Here, we find the average age of each type of animal:
SELECT type, avg(age) AS average_age FROM pet GROUP BY type;
+--------+-------------+type | average_age |+--------+-------------+dog | 3.6667 |cat | 6.6667 |rabbit | 6.0000 |+--------+-------------+3 rows in set (0.00 sec)
If we want to display the oldest of each type of animal, we could instead use the max()
function on the age
column. The GROUP BY
clause collapses the results into the same rows as before, but the new function alters the result in the other column:
SELECT type, max(age) AS oldest FROM pet GROUP BY type;
+--------+--------+type | oldest |+--------+--------+dog | 7 |cat | 8 |rabbit | 8 |+--------+--------+3 rows in set (0.00 sec)
Using the HAVING
clause to filter groups of records
The GROUP BY
clause is a way to summarize data by collapsing multiple records into a single representative row. But what if you want to narrow these groups based on additional factors?
The HAVING
clause is a modifier for the GROUP BY
clause that lets you specify conditions that each group must satisfy to be included in the results.
The general syntax looks like this:
SELECT <columns> FROM <table> GROUP BY <columns_to_group> HAVING <condition>
The operation is very similar to the WHERE
clause, with the difference being that WHERE
filters single records and HAVING
filters groups of records.
Examples using HAVING
Using the same table we introduced in the last section, we can demonstrate how the HAVING
clause works.
Here, we group the rows of the pet
table by unique values in the type
column, finding the minimum value of age
as well. The HAVING
clause then filters the results to remove any groups where the age is not greater than 1:
SELECT type, min(age) AS youngest FROM pet GROUP BY type HAVING min(age) > 1;
+--------+----------+type | youngest |+--------+----------+cat | 4 |rabbit | 4 |+--------+----------+2 rows in set (0.00 sec)
In this example, we group the rows in pet
by their color. We then filter the groups that only represent a single row. The result shows us every color that appears more than once:
SELECT color FROM pet GROUP BY color HAVING count(color) > 1;
+-------+color |+-------+brown |black |+-------+2 rows in set (0.00 sec)
We can perform a similar query to get the combinations of type
and color
that only a single animal has:
SELECT type, color FROM pet GROUP BY type, color HAVING count(color) = 1;
+--------+--------+type | color |+--------+--------+dog | black |cat | black |cat | white |cat | orange |rabbit | grey |+--------+--------+5 rows in set (0.00 sec)
Using the LIMIT
clause to set the maximum number of records
The LIMIT
clause offers a different approach to paring down the records your query returns. Rather than eliminating rows of data based on criteria within the row itself, the LIMIT
clause sets the maximum number of records returned by a query.
The basic syntax of LIMIT
looks like this:
SELECT * FROM <table> LIMIT <num_rows> [OFFSET <num_rows_to_skip>];
Here, the <num_rows>
indicates the maximum number of rows to display from the executed query. This is often used in conjunction with ORDER BY
clauses to get the rows with the most extreme values in a certain column. For example, to get the five best scores on an exam, a user could ORDER BY
a score
column and then LIMIT
the results to 5.
While LIMIT
counts from the top of the results by default, the optional OFFSET
keyword can be used to offset the starting position it uses. In effect, this allows you to paginate through results by displaying the number of results defined by LIMIT
and then adding the LIMIT
number to the OFFSET
to retrieve the following page.
If you are connecting to your database with Prisma Client, you can use pagination to iterate through results.
Examples using LIMIT
We will use the pet
table from earlier for the examples in this section.
As mentioned above, LIMIT
is often combined with an ORDER BY
clause to explicitly define the ordering of the rows before slicing the appropriate number. Here, we sort the pet
entries according to their age
, from oldest to youngest. We then use LIMIT
to display the top 5 oldest animals:
SELECT * FROM pet ORDER BY age DESC LIMIT 5;
+----+--------+---------+--------+------+id | type | name | color | age |+----+--------+---------+--------+------+4 | cat | Sabrina | black | 8 |6 | cat | Simon | orange | 8 |8 | rabbit | Bunny | brown | 8 |2 | dog | Rover | black | 7 |9 | rabbit | Briony | brown | 6 |+----+--------+---------+--------+------+5 rows in set (0.00 sec)
If we need a record for any single dog
within the table, we could construct a query like this. Keep in mind that while the result might be difficult to predict, this is not a random selection and should not be used as such:
SELECT * FROM pet WHERE type = 'dog' LIMIT 1;
+----+------+------+-------+------+id | type | name | color | age |+----+------+------+-------+------+1 | dog | Spot | brown | 3 |+----+------+------+-------+------+1 row in set (0.00 sec)
We can use the OFFSET
clause to paginate through results. We include an ORDER BY
clause to define a specific order for the results.
For the first query, we limit the results without specifying an OFFSET
to get the first 3 youngest entries:
SELECT * FROM pet ORDER BY age LIMIT 3;
+----+--------+---------+-------+------+id | type | name | color | age |+----+--------+---------+-------+------+3 | dog | Sally | brown | 1 |1 | dog | Spot | brown | 3 |7 | rabbit | Buttons | grey | 4 |+----+--------+---------+-------+------+3 rows in set (0.00 sec)
To get the next 3 youngest, we can add the number defined in LIMIT
to the OFFSET
to skip the results we've already retrieved:
SELECT * FROM pet ORDER BY age LIMIT 3 OFFSET 3;
+----+--------+---------+-------+------+id | type | name | color | age |+----+--------+---------+-------+------+7 | rabbit | Buttons | grey | 4 |9 | rabbit | Briony | brown | 6 |2 | dog | Rover | black | 7 |+----+--------+---------+-------+------+3 rows in set (0.00 sec)
If we add the LIMIT
to the OFFSET
again, we'll get the next 3 results:
SELECT * FROM pet ORDER BY age LIMIT 3 OFFSET 6;
+----+--------+---------+--------+------+id | type | name | color | age |+----+--------+---------+--------+------+4 | cat | Sabrina | black | 8 |6 | cat | Simon | orange | 8 |8 | rabbit | Bunny | brown | 8 |+----+--------+---------+--------+------+3 rows in set (0.00 sec)
This lets us retrieve rows of data from a query in manageable chunks.
Conclusion
Most of the time, when retrieving data from MySQL tables, you will likely be applying filtering conditions to pick out the appropriate records. Whether that is an unambiguous WHERE
clause that matches a specific id
using the =
equality operator, or a GROUP BY
clause that helps you summarize multiple records in a single value, filtering data is a normal part of working with records.
Understanding how to use these optional clauses to evaluate potential data against your criteria and mould the results accordingly allows MySQL to do the selection work for you. Using these constructs, you can extract useful information from large, semi-organized collections of data.