After spending a few hours searching and testing this one I thought I’d better share it since I found many similar questions but no answers. I am not a MySQL expert so there may be a better way to do this. Please tell me if there is!!
This is a simplified example. I have a table of customers and any sales they may have made. I want to list ALL customers together with the LAST sale they made. ie even if there was no sale at all, I still want the customer data listed with NULLS for the sale data.
Sample Customer data:
| custID | custFirst | custLast |
| 1 | John | Smith |
| 2 | Sally | Fields |
| 3 | Winston | Churchill |
Sample Sales data:
| saleID | saleDate | saleAmount | custID |
| 1 | 2008-10-20 09:12:00 | 20 | 1 |
| 2 | 2007-12-03 18:45:00 | 14 | 1 |
| 3 | 2008-02-13 16:00:00 | 10 | 3 |
| 4 | 2006-07-08 18:00:00 | 30 | 3 |
| 5 | 2007-05-14 14:48:00 | 18 | 3 |
| 6 | 2008-10-22 19:00:00 | 57 | 1 |
And the result set from the query below:
| custID | custFirst | custLast | saleID | saleDate | saleAmount | custID |
| 1 | John | Smith | 6 | 2008-10-22 19:00:00 | 57 | 1 |
| 2 | Sally | Fields | (NULL) | (NULL) | (NULL) | (NULL) |
| 3 | Winston | Churchill | 3 | 2008-02-13 16:00:00 | 10 | 3 |
For John and Winston we have the most recent sale they made. Sally hasnt purchased anything yet.
This is the SQL:
SELECT
customers.*, sale_tmp.*
FROM
customers
LEFT JOIN ( SELECT s1.*
FROM sales as s1
LEFT JOIN sales AS s2
ON s1.custID = s2.custID AND s1.saledate < s2.saledate
WHERE s2.custID IS NULL ) as sale_tmp
ON (customers.custID = sale_tmp.custID)
This relies on a couple of sql concepts linked together. The first is the idea that you can LEFT JOIN a temporary table just like a “real” table. So, this bit:
( SELECT s1.* FROM sales as s1 LEFT JOIN sales AS s2 ON s1.custID = s2.custID AND s1.saledate < s2.saledate WHERE s2.custID IS NULL ) as sale_tmp
creates a temporary table called “sale_tmp” that we are LEFT JOINing to the customers table via the common custID columns. Dont worry about the table construction, just note that a table is being created here.
The second concept is called a within-group aggregate and this is the way that we get the most recent sale for each custID in the sales table. There is a very good explanation of that here:
http://www.artfulsoftware.com/infotree/queries.php?&bw=1395#101
So, to sum up, the technique is to create a temporary table (sale_tmp) that contains just the most recent sales for each custID in the sales table, then left join sale_tmp to customers.
If you wanted their first purchases, simply change
s1.saledate < s2.saledate
to
s1.saledate > s2.saledate
Here is the db dump if you want to play around.
CREATE TABLE `customers` ( `custID` int(10) NOT NULL auto_increment, `custFirst` varchar(50) default NULL, `custLast` varchar(50) default NULL, PRIMARY KEY (`custID`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8; /*Data for the table `customers` */ insert into `customers`(`custID`,`custFirst`,`custLast`) values (1,'John','Smith'),(2,'Sally','Fields'),(3,'Winston','Churchill'); /*Table structure for table `sales` */ CREATE TABLE `sales` ( `saleID` int(10) NOT NULL auto_increment, `saleDate` datetime default NULL, `saleAmount` double default NULL, `custID` int(10) default '0', PRIMARY KEY (`saleID`) ) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8; /*Data for the table `sales` */ insert into `sales`(`saleID`,`saleDate`,`saleAmount`,`custID`) values (1,'2008-10-20 09:12:00',20,1),(2,'2007-12-03 18:45:00',14,1),(3,'2008-02-13 16:00:00',10,3),(4,'2006-07-08 18:00:00',30,3),(5,'2007-05-14 14:48:00',18,3),(6,'2008-10-22 19:00:00',57,1);
Now I have that off my chest I can go back to using this in my real application.
December 12, 2008 at 7:48 am |
Thanks a million, you just made my life a lot easier right now
December 13, 2008 at 7:48 am |
I am glad you found it useful!
Murray
February 3, 2009 at 8:20 am |
Works great unless there is more than one sales record for the date of the latest sale. If there is more than one sale on that day you get a duplicate records for each sale. The following change to the code fixes this. In my case sale prices would be the same so grabbing any of them was fine for me. The following would not work if you wanted other details like the biggest or smallest sale of the day.
This code:
ON s1.custID = s2.custID AND s1.saledate < s2.saledate
WHERE s2.custID IS NULL
Should be changed to this:
ON s1.custID = s2.custID AND s1.saledate < s2.saledate
WHERE s2.custID IS NULL limit 1
Maybe it will help the next person. Regardless I learned something new today.
R.Dude
February 3, 2009 at 9:19 am |
Good pickup! Thanks for that.
April 20, 2009 at 2:39 pm |
I’ve been able to condense my four-query script to just two because of this. Thank you!