After spending a few hours searching and testing this one I thought I’d better share it since I found many similar questions but no answers. I am not a MySQL expert so there may be a better way to do this. Please tell me if there is!!
This is a simplified example. I have a table of customers and any sales they may have made. I want to list ALL customers together with the LAST sale they made. ie even if there was no sale at all, I still want the customer data listed with NULLS for the sale data.
Sample Customer data:
custID | custFirst | custLast |
1 | John | Smith |
2 | Sally | Fields |
3 | Winston | Churchill |
Sample Sales data:
saleID | saleDate | saleAmount | custID |
1 | 2008-10-20 09:12:00 | 20 | 1 |
2 | 2007-12-03 18:45:00 | 14 | 1 |
3 | 2008-02-13 16:00:00 | 10 | 3 |
4 | 2006-07-08 18:00:00 | 30 | 3 |
5 | 2007-05-14 14:48:00 | 18 | 3 |
6 | 2008-10-22 19:00:00 | 57 | 1 |
And the result set from the query below:
custID | custFirst | custLast | saleID | saleDate | saleAmount | custID |
1 | John | Smith | 6 | 2008-10-22 19:00:00 | 57 | 1 |
2 | Sally | Fields | (NULL) | (NULL) | (NULL) | (NULL) |
3 | Winston | Churchill | 3 | 2008-02-13 16:00:00 | 10 | 3 |
For John and Winston we have the most recent sale they made. Sally hasnt purchased anything yet.
This is the SQL:
SELECT customers.*, sale_tmp.* FROM customers LEFT JOIN ( SELECT s1.* FROM sales as s1 LEFT JOIN sales AS s2 ON s1.custID = s2.custID AND s1.saledate < s2.saledate WHERE s2.custID IS NULL ) as sale_tmp ON (customers.custID = sale_tmp.custID)
This relies on a couple of sql concepts linked together. The first is the idea that you can LEFT JOIN a temporary table just like a “real” table. So, this bit:
( SELECT s1.* FROM sales as s1 LEFT JOIN sales AS s2 ON s1.custID = s2.custID AND s1.saledate < s2.saledate WHERE s2.custID IS NULL ) as sale_tmp
creates a temporary table called “sale_tmp” that we are LEFT JOINing to the customers table via the common custID columns. Dont worry about the table construction, just note that a table is being created here.
The second concept is called a within-group aggregate and this is the way that we get the most recent sale for each custID in the sales table. There is a very good explanation of that here:
http://www.artfulsoftware.com/infotree/queries.php?&bw=1395#101
So, to sum up, the technique is to create a temporary table (sale_tmp) that contains just the most recent sales for each custID in the sales table, then left join sale_tmp to customers.
If you wanted their first purchases, simply change
s1.saledate < s2.saledate
to
s1.saledate > s2.saledate
Here is the db dump if you want to play around.
CREATE TABLE `customers` ( `custID` int(10) NOT NULL auto_increment, `custFirst` varchar(50) default NULL, `custLast` varchar(50) default NULL, PRIMARY KEY (`custID`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8; /*Data for the table `customers` */ insert into `customers`(`custID`,`custFirst`,`custLast`) values (1,'John','Smith'),(2,'Sally','Fields'),(3,'Winston','Churchill'); /*Table structure for table `sales` */ CREATE TABLE `sales` ( `saleID` int(10) NOT NULL auto_increment, `saleDate` datetime default NULL, `saleAmount` double default NULL, `custID` int(10) default '0', PRIMARY KEY (`saleID`) ) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8; /*Data for the table `sales` */ insert into `sales`(`saleID`,`saleDate`,`saleAmount`,`custID`) values (1,'2008-10-20 09:12:00',20,1),(2,'2007-12-03 18:45:00',14,1),(3,'2008-02-13 16:00:00',10,3),(4,'2006-07-08 18:00:00',30,3),(5,'2007-05-14 14:48:00',18,3),(6,'2008-10-22 19:00:00',57,1);
Now I have that off my chest I can go back to using this in my real application.