SiteKickr Web Development

MySQL Group Concatenation – The hidden function

I admit, my work life is pretty easy.  I work from home, I have awesome clients and a really great thermos that keeps my coffee warm for about 2 days. But that doesn't stop me from trying to make things easier, searching for the path of least resistance that will shave minutes off of any development task. That's why I'm a big fan of the MySQL GROUP_CONCAT function.
 

My Philosophy on GROUP_CONCAT

Before I get into detail, I thought I'd share my philosophy on the MySQL Group_Concat function:

  1. People like lists, and Group_Concat is great at turning data into lists.
  2. It makes server-side processing easier, no loops required in simple cases.
  3. It can turn multiple queries into a single query (eliminating nested loops).
  4. It's more efficient – lists have smaller footprints than matrices.
  5. I can produce dangerously long result rows if you're not careful.
  6. It's not part of standard SQL, so it isn't well known.

Before I get into some kinda cool examples, I just want to be sure reader's understand what this aggregate function does.
 

A Brief GROUP_CONCAT Intro

You probably know about other aggregate functions:

We use these functions in conjunction with the GROUP BY statement. I'll stop there, you're not looking for a basic lesson in SQL here. So let me introduce the GROUP_CONCAT function.

GROUP_CONCAT is an aggregate function like the others. And, like the other aggregate functions, it returns only one value. The big difference though is that the value it returns is actually a string, a string that is a concatenation of all values in the group.

Let's say you have a table of users, and each user can have one or more orders, stored in an orders table.

You are able to aggregate a list of all of a user's order numbers into one field, using GROUP_CONCAT.

SELECT users.username, GROUP_CONCAT(orders.order_id) as ordernums
FROM users
INNER JOIN orders
        ON orders.user_id = users.user_id
GROUP BY users.user_id

This will return a result set similar to:

Username        Ordernums
testuser1          5,16,32,103
testuser2          78,101
testuser3          104,106

 

Real World GROUP_CONCAT Examples

Now that you have an idea what GROUP_CONCAT does, for the purposes of solidifying that knowledge, let's have a look at some interesting ways to use it. These are real-life examples, ways I've actually employed GROUP_CONCAT in my development.
 

Direct to Semantic HTML

The HTML definition list (now called a description list in HTML 5) isn't used quite as often as it should be. I imagine it's a mix of it being more difficult to style than other tags and it's lack of understanding. It should be used more often to define relationships between a term and its definition or description. The MySQL GROUP_CONCAT function makes really easy work of directly outputting valid HTML in the form of a definition list.

Let's say we have a table like this:

Name                   Job Title
Dan Aykroyd        Vice-President
Chevy Chase        Account Executive
Billy Crystal         Systems Analyst

We're looking for a nice easy way to display a list of job titles, along with the person that holds that job title. We could use the traditional-style query that returns three result rows, or we could go with a query that aggregates all results into a single row, and formats it as HTML!

SELECT 
GROUP_CONCAT('<dt>',name, '</dt><dd>', job_title, '</dd>' SEPARATOR '\n') as jobs
FROM people

This will return a single row, with a single field that contains a nice little piece of HTML. HTML that we can directly output without any processing:

<dt>Dan Aykroyd</dt><dd>Vice-President</dd>
<dt>Chevy Chase</dt><dd>Account Executive</dd>
<dt>Billy Crystal</dt><dd>Systems Analyst</dd>

Of course, this type of use should be considered when a script depending on those query results is expecting HTML output.

 

Avoiding Multiple Queries and Nested Loops

Another example of something I did for a client of mine just a week ago, though I changed the non-essential details for this example.

A car dealership has an inventory of cars online, grouped by the make of the car. For instance:

Ford
2010 F150
2008 Mustang
2012 Escape

Toyota
2005 Corolla
2008 4Runner

Bob, the website manager, has the job of accepting requests from various people at the dealership to list cars on the website as they come into the lot. Anyone can submit a listing request car by filling out a simple form (choosing the Make, Model, Year & Price).

The difficulty that Bob is having though, is that there is poor communication within the dealership and employees are submitting cars that have already been submitted by someone else.

Bob needs a real quick way to see which cars have already been listed for a particular make, before approving a request.

Enter GROUP_CONCAT…

SELECT 
   c.id, c.year, c.name, ct.name as category_name
  (SELECT GROUP_CONCAT(' ', year, ' ', name)
   FROM cars
   WHERE car_type_id = ct.id AND approved = 1 
   GROUP BY car_type_id) as already_approved_list
FROM cars c
INNER JOIN car_types ct
        ON ct.id = c.car_type_id
WHERE c.approved = 0

This query will produce a result set like this:

ID   YEAR  NAME      CATEGORY_NAME   ALREADY_APPROVED_LIST
101  2012  Escape    Ford             2010 F150, 2008 Mustang, 2012 Escape
102  2011  4Runner   Ford             2005 Corolla, 2008 4Runner

By looking at this list, Bob can instantly see that the 2012 Escape is a duplicate listing and should not be approved.

There is only one query required to obtain this information, and hence, only one loop required to display it.

 

Concatenation with NULL is like multiplying by zero

In grade school, they taught you that any number multiplied by zero produces a result of zero. But, for some reason, your teacher forgot to mention that this also applies to MySQL string concatenation.

In MySQL (and other databases), any string concatenated with a NULL value produces a NULL result.

Let's illustrate with an example, building on the car inventory above:

Toyota
2005 Corolla
NULL 4Runner

ID   YEAR  NAME      CATEGORY_NAME   ALREADY_APPROVED_LIST
102  2011  4Runner   Ford             2005 Corolla

Note that the 4Runner isn't listed, even though its' name is supplied. Because the year is NULL, the resulting concatenation of year and name equals NULL.

That's outrageous, right! How can I possibly use the GROUP_CONCAT function if it doesn't work as expected when NULL values are introduced. NULL values are extremely common in my data.

Coalesce, is that the right name for it?

We learned in science class that the word coalesce means "to come together or unite into one mass". As in when drops of water coalesce into a puddle. But that's not really what's going on with the MySQL Coalesce function. The Coalesce function returns the first non-NULL argument passed to it.

In my opinion, coalescence is more closely related to string concatenation. I think it should have been Oracle's first priority, after buying MySQL, to do a deep investigation of who decided to name this function and why they never took a science class 😉

The MySQL development crew knows how important NULL values are, and how often they occur – so much so that they've provided a function specifically for dealing with NULL values. This function is named Coalesce.

Coalesce will return the first non-NULL argument which is passed into it. If you haven't heard about this function before, you can see how useful it can be in conjunction with concatenation functions.

It's use is simple. If we were to leverage Coalesce in our example above, it would look like this:

SELECT 
   c.id, c.year, c.name, 
   ct.name as category_name
  (SELECT 
   GROUP_CONCAT(' ', COALESCE(year, ''), 
                ' ', name)
   FROM cars
   WHERE car_type_id = ct.id AND 
         approved = 1 
   GROUP BY car_type_id) 
   as already_approved_list
FROM cars c
INNER JOIN car_types ct
        ON ct.id = c.car_type_id
WHERE c.approved = 0

If the year field happens to contain a NULL value, the Coalesce function will compare it against the other values passed in and return the empty string provided to it.

 

So that's my take on GROUP_CONCAT. Think about this function whenever you have nested query loops and you have that sneaking suspicion that something could be done a little bit better.