MySQL – left join on last (or first) record in the right table

Posted on October 28, 2008 by Murray Hopkins

After spending a few hours searching and testing this one I thought I’d better share it since I found many similar questions but no answers. I am not a MySQL expert so there may be a better way to do this. Please tell me if there is!!

This is a simplified example. I have a table of customers and any sales they may have made. I want to list ALL customers together with the LAST sale they made. ie even if there was no sale at all, I still want the customer data listed with NULLS for the sale data.

Sample Customer data:

custID	custFirst	custLast
1	John	Smith
2	Sally	Fields
3	Winston	Churchill

Sample Sales data:

saleID	saleDate	saleAmount	custID
1	2008-10-20 09:12:00	20	1
2	2007-12-03 18:45:00	14	1
3	2008-02-13 16:00:00	10	3
4	2006-07-08 18:00:00	30	3
5	2007-05-14 14:48:00	18	3
6	2008-10-22 19:00:00	57	1

And the result set from the query below:

custID	custFirst	custLast	saleID	saleDate	saleAmount	custID
1	John	Smith	6	2008-10-22 19:00:00	57	1
2	Sally	Fields	(NULL)	(NULL)	(NULL)	(NULL)
3	Winston	Churchill	3	2008-02-13 16:00:00	10	3

For John and Winston we have the most recent sale they made. Sally hasnt purchased anything yet.

This is the SQL:

SELECT
    customers.*, sale_tmp.*
FROM
    customers

    LEFT JOIN ( SELECT s1.*
		FROM sales as s1
		LEFT JOIN sales AS s2
		     ON s1.custID = s2.custID AND s1.saledate < s2.saledate
		WHERE s2.custID IS NULL ) as sale_tmp

    ON (customers.custID = sale_tmp.custID)

This relies on a couple of sql concepts linked together. The first is the idea that you can LEFT JOIN a temporary table just like a “real” table. So, this bit:

( SELECT s1.*
		FROM sales as s1
		LEFT JOIN sales AS s2
		     ON s1.custID = s2.custID AND s1.saledate < s2.saledate
		WHERE s2.custID IS NULL ) as sale_tmp

creates a temporary table called “sale_tmp” that we are LEFT JOINing to the customers table via the common custID columns. Dont worry about the table construction, just note that a table is being created here.

The second concept is called a within-group aggregate and this is the way that we get the most recent sale for each custID in the sales table. There is a very good explanation of that here:

http://www.artfulsoftware.com/infotree/queries.php?&bw=1395#101

So, to sum up, the technique is to create a temporary table (sale_tmp) that contains just the most recent sales for each custID in the sales table, then left join sale_tmp to customers.

If you wanted their first purchases, simply change

s1.saledate < s2.saledate

s1.saledate > s2.saledate

Here is the db dump if you want to play around.


CREATE TABLE `customers` (
  `custID` int(10) NOT NULL auto_increment,
  `custFirst` varchar(50) default NULL,
  `custLast` varchar(50) default NULL,
  PRIMARY KEY  (`custID`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;

/*Data for the table `customers` */

insert  into `customers`(`custID`,`custFirst`,`custLast`) values (1,'John','Smith'),(2,'Sally','Fields'),(3,'Winston','Churchill');

/*Table structure for table `sales` */

CREATE TABLE `sales` (
  `saleID` int(10) NOT NULL auto_increment,
  `saleDate` datetime default NULL,
  `saleAmount` double default NULL,
  `custID` int(10) default '0',
  PRIMARY KEY  (`saleID`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;

/*Data for the table `sales` */

insert  into `sales`(`saleID`,`saleDate`,`saleAmount`,`custID`) values (1,'2008-10-20 09:12:00',20,1),(2,'2007-12-03 18:45:00',14,1),(3,'2008-02-13 16:00:00',10,3),(4,'2006-07-08 18:00:00',30,3),(5,'2007-05-14 14:48:00',18,3),(6,'2008-10-22 19:00:00',57,1);

Now I have that off my chest I can go back to using this in my real application.

37 thoughts on “MySQL – left join on last (or first) record in the right table”

Baldur Þór on December 12, 2008 at 7:48 am said:

Thanks a million, you just made my life a lot easier right now 😀

Reply ↓
- Nate Schmolze on March 7, 2013 at 4:23 pm said:
  
  second that
  
  Reply ↓
  - Shalini on December 21, 2017 at 7:48 pm said:
    
    thanks a lot
Murray Hopkins on December 13, 2008 at 7:48 am said:

I am glad you found it useful!
Murray

Reply ↓
Some Random Dude on February 3, 2009 at 8:20 am said:

Works great unless there is more than one sales record for the date of the latest sale. If there is more than one sale on that day you get a duplicate records for each sale. The following change to the code fixes this. In my case sale prices would be the same so grabbing any of them was fine for me. The following would not work if you wanted other details like the biggest or smallest sale of the day.

This code:

ON s1.custID = s2.custID AND s1.saledate < s2.saledate
WHERE s2.custID IS NULL

Should be changed to this:

ON s1.custID = s2.custID AND s1.saledate < s2.saledate
WHERE s2.custID IS NULL limit 1

Maybe it will help the next person. Regardless I learned something new today.

R.Dude

Reply ↓
- Murray Hopkins on February 3, 2009 at 9:19 am said:
  
  Good pickup! Thanks for that.
  
  Reply ↓
- Hnatt on March 11, 2011 at 10:27 am said:
  
  I think the LIMIT statement in subquery will not depend on the main query. So result of
  ( SELECT s1.* FROM sales as s1 LEFT JOIN sales AS s2 ON s1.custID = s2.custID AND s1.saledate < s2.saledate WHERE s2.custID IS NULL LIMIT 1)
  will allways be the same record for each row of main query. So all fields selected from joined table sale_tmp will be NULL except for the one where customers.custID will be equal to custID in that particular first row.
  
  Please correct me if I’m wrong. (And by wrong I don’t mean my bad English)
  
  Reply ↓
  - Deano on June 14, 2012 at 7:03 pm said:
    
    Your right, that adjustment would only work if you wanted a single persons latest sale and they happened to have multiple on the same day. It doesn’t produce a list of everyones latest because at you stated they all return NULL except one one that falls in the LIMIT.
Zumi on April 20, 2009 at 2:39 pm said:

I’ve been able to condense my four-query script to just two because of this. Thank you!

Reply ↓
Bob on December 18, 2009 at 11:05 pm said:

Thanks

Reply ↓
Travis on December 24, 2009 at 9:08 pm said:

You are my hero. This is a much more elegant solution that doing sub-queries in the SELECT or trying to do it with a regular left join and grouping. The latter works, but it doesn’t allow you to sort the records in the table you are joining on (the temp table).

Thanks so much!!

Reply ↓
zzandry on February 17, 2010 at 7:11 am said:

Thanks a lot, magic query,
(mifoka anana be loatra)

Reply ↓
Chris Burnham on July 9, 2010 at 3:38 am said:

Excellent, thanks! Did exactly what I needed it to. Never would have figured this out on my own. It’s just sad that there isn’t a more elegant way to do this.

Reply ↓
- Murray Hopkins on July 9, 2010 at 8:30 am said:
  
  Thanks. Yes, it is a bit convoluted!
  
  As a tip I have found that keeping the inner SELECT as small as possible made a big difference to the speed of the overall query – which is obvious I suppose. I had another application of this where that inner SELECT returned only a few records at first but became very large (>1000 records) over time and adding a few more WHERE clauses to the inner SELECT to trim the number down made a huge difference.
  
  Reply ↓
Felipe Urrego on July 27, 2010 at 2:57 am said:

Thanks a lot, I was looking for this along time ago (… not clear why on artful site was not found earlier… my fault 😛 ); always my solutions was to build the temporary table as a static object (or on memory if was a procedure related job… but ussually I use this kind of stuff for upload / update specific data); with this I´m able to save time doing complex queries without build intermediate static tables

Reply ↓
Tito on October 19, 2010 at 6:45 am said:

Boy, you made my day. Thanks a lot for a great article.

Reply ↓
Chris on February 21, 2011 at 12:13 pm said:

I tired the limit 1 addition since I have a table that I am matching that has several entires from the same day.
It ends up having issues though when I add the “limit 1”.
Turns the rest of the info to NULL.

Reply ↓
Donato on April 29, 2011 at 6:54 pm said:

For some magic reason, adding a GROUP BY clause to a JOIN, gives you the first record of the second table:

SELECT customers.*, sales.*
FROM customers
LEFT JOIN sales ON (customers.custID = sales.custID)
GROUP BY customers.custID

I do not know if this query can be changed to suit your needs…

Reply ↓
Programmer on June 27, 2011 at 8:12 pm said:

The GROUP BY clause only works if you have the latest purchases first in the database table for each customer, because the GROUP BY picks first rows for data and discards all the rest per each condition.

Best solution I’ve come up with is to have a subquery to order up the purchases in DESCENDING order, another subquery to GROUP them for each customer and then join them to the customers list.
It would look like this:

SELECT
c.*, s.*
FROM
customers AS c
LEFT JOIN (
SELECT *
FROM (
SELECT *
FROM sales
ORDER BY saledate DESC
) GROUP BY custID
) AS s
ON (c.custID = s.custID)

Reply ↓
Mihail Irintchev on September 22, 2011 at 9:26 pm said:

Good idea, just a small rationalization to to the original query from me as well (in order to keep the inner query simpler):

SELECT
customers.*, sale_tmp.*
FROM
customers
LEFT JOIN (
SELECT s.* FROM sales as s ORDER BY saleDate DESC
) as sale_tmp ON customers.custID = sale_tmp.custID

GROUP BY customers.custID

Seems more clear this way, but I admit that I haven’t put them to the test to see which one would be more efficient,

P.S. I am no SQL guru either 😉

Reply ↓
Martin on July 16, 2012 at 11:38 pm said:

This example helped me solve a problem I had been working on for hours. Thank you very much!

Reply ↓
Tomas on September 5, 2012 at 2:04 am said:

I have similiar problem. But I need the last record to some date. If I look on your data, let’s say I need the last sale of each user till ‘2008-10-21’. So for the record of John Smith it will select sale from ‘2008-10-20’, not from ‘2008-10-22’. Any suggestions?

Reply ↓
Meet Matt on November 23, 2012 at 9:36 pm said:

Thank you for your great solution! That’s exactly what I was looking for!
My problem was very similar: 3 tables (users, accounts and orders) and I used to merge two massive arrays in php in order to have a single array.
It produced even more problems when I needed to sort, filter and paginate data, because it required “walking” through the whole array unsetting some of it, so it required loads of memory and CPU.
Now everything is sorted, filtered and paginated on DB side!! That’s amazing!

Reply ↓
Coofer Cat on November 28, 2012 at 2:52 am said:

Great solution – thank you. Previously I’ve just pulled in the whole lot and filtered out what I don’t need in software. Clearly, doing it all in SQL is far, far better. It takes a bit of documenting in the code though, as it’s not immediately obvious what it’s doing. It still rocks though 🙂

Reply ↓
damijank on November 30, 2012 at 12:26 am said:

you could’ve used GROUP BY in the sub-select … would be more elegant and a little more efficient…

Reply ↓
- _ade on January 2, 2013 at 4:13 am said:
  
  @damijank – GROUP BY seems to lead to an arbitrary result from the joined table. See here: http://stackoverflow.com/questions/6468314/join-two-tables-returning-just-one-row-from-the-second-table-mysql
  
  @Murray – thanks a lot for this, a modified version has just solved a problem I’ve had for days.
  
  Reply ↓
  - Murray Hopkins on January 2, 2013 at 6:17 am said:
    
    You are very welcome!
- _ade on January 2, 2013 at 4:17 am said:
  
  @damijank
  EDIT: sorry, wrong link. This is the article I was referring to:
  http://stackoverflow.com/questions/6105451/limit-sql-join-when-using-codeigniter-active-record-class/14094232#14094232
  
  Reply ↓
  - Damijan Kozic on December 4, 2018 at 10:05 am said:
    
    I just got notified about these replies, it’s been SIX years since… :-O
    Since I’m already here, @_ade – this “Uncorrelated subquery” here says otherwise: https://dev.mysql.com/doc/refman/8.0/en/example-maximum-column-group-row.html
    True, I’ve left out some details but hey, I was pretty green back in 2012 😉
  - Murray Hopkins on December 4, 2018 at 10:12 am said:
    
    Sorry about the delay in approving. I wont bore you with the details. 😦
    
    Thanks for responding (unlike me). 😉
  - Damijan Kozic on December 4, 2018 at 10:32 am said:
    
    @Murray Hopkins
    Don’t sweat about it 🙂 It was nice to be reminded of something i’ve done so long ago, especially when It still holds true 😀 Have a nice one!
Random Guy on February 20, 2013 at 1:09 am said:

I believe you’ve re-invented this: http://dev.mysql.com/doc/refman/5.5/en/example-maximum-column-group-row.html

That’s honestly pretty impressive 🙂

Reply ↓
- Murray Hopkins on February 20, 2013 at 1:56 pm said:
  
  Hmmmm… I wonder why I didnt find that when I was trying to figure this out! Thanks.
  
  Reply ↓
Martin on February 11, 2014 at 11:59 pm said:

Thanks, I was stuck on an issue exactly like this.

Reply ↓
fabio on March 2, 2016 at 9:03 am said:

How can i use with this ‘ ORDER BY proc_agdate DESC LIMIT 1,1 ‘ on the same query?

i want second from the last data.

Reply ↓
webnator on July 15, 2021 at 10:59 pm said:

12 years later and this is still a lifesaver, thanks!!!!

Reply ↓
- Murray Hopkins on July 16, 2021 at 7:04 am said:
  
  Thank you. I am glad it is still helpful. And, after all the help I have had from various sources eg SO, etc it is great to be able to “give back” even in such a small way. 🙂
  
  Reply ↓