MySQL – left join on last (or first) record in the right table

After spending a few hours searching and testing this one I thought I’d better share it since I found many similar questions but no answers. I am not a MySQL expert so there may be a better way to do this. Please tell me if there is!!

This is a simplified example. I have a table of customers and any sales they may have made. I want to list ALL customers together with the LAST sale they made. ie even if there was no sale at all, I still want the customer data listed with NULLS for the sale data.

Sample Customer data:

custID custFirst custLast
1 John Smith
2 Sally Fields
3 Winston Churchill

Sample Sales data:

saleID saleDate saleAmount custID
1 2008-10-20 09:12:00 20 1
2 2007-12-03 18:45:00 14 1
3 2008-02-13 16:00:00 10 3
4 2006-07-08 18:00:00 30 3
5 2007-05-14 14:48:00 18 3
6 2008-10-22 19:00:00 57 1

And the result set from the query below:

custID custFirst custLast saleID saleDate saleAmount custID
1 John Smith 6 2008-10-22 19:00:00 57 1
2 Sally Fields (NULL) (NULL) (NULL) (NULL)
3 Winston Churchill 3 2008-02-13 16:00:00 10 3

For John and Winston we have the most recent sale they made. Sally hasnt purchased anything yet.

This is the SQL:

SELECT
    customers.*, sale_tmp.*
FROM
    customers

    LEFT JOIN ( SELECT s1.*
		FROM sales as s1
		LEFT JOIN sales AS s2
		     ON s1.custID = s2.custID AND s1.saledate < s2.saledate
		WHERE s2.custID IS NULL ) as sale_tmp

    ON (customers.custID = sale_tmp.custID)

This relies on a couple of sql concepts linked together. The first is the idea that you can LEFT JOIN a temporary table just like a “real” table. So, this bit:

( SELECT s1.*
		FROM sales as s1
		LEFT JOIN sales AS s2
		     ON s1.custID = s2.custID AND s1.saledate < s2.saledate
		WHERE s2.custID IS NULL ) as sale_tmp

creates a temporary table called “sale_tmp” that we are LEFT JOINing to the customers table via the common custID columns. Dont worry about the table construction, just note that a table is being created here.

The second concept is called a within-group aggregate and this is the way that we get the most recent sale for each custID in the sales table. There is a very good explanation of that here:

http://www.artfulsoftware.com/infotree/queries.php?&bw=1395#101

So, to sum up, the technique is to create a temporary table (sale_tmp) that contains just the most recent sales for each custID in the sales table, then left join sale_tmp to customers.

If you wanted their first purchases, simply change

s1.saledate < s2.saledate

to

s1.saledate > s2.saledate

Here is the db dump if you want to play around.


CREATE TABLE `customers` (
  `custID` int(10) NOT NULL auto_increment,
  `custFirst` varchar(50) default NULL,
  `custLast` varchar(50) default NULL,
  PRIMARY KEY  (`custID`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;

/*Data for the table `customers` */

insert  into `customers`(`custID`,`custFirst`,`custLast`) values (1,'John','Smith'),(2,'Sally','Fields'),(3,'Winston','Churchill');

/*Table structure for table `sales` */

CREATE TABLE `sales` (
  `saleID` int(10) NOT NULL auto_increment,
  `saleDate` datetime default NULL,
  `saleAmount` double default NULL,
  `custID` int(10) default '0',
  PRIMARY KEY  (`saleID`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8;

/*Data for the table `sales` */

insert  into `sales`(`saleID`,`saleDate`,`saleAmount`,`custID`) values (1,'2008-10-20 09:12:00',20,1),(2,'2007-12-03 18:45:00',14,1),(3,'2008-02-13 16:00:00',10,3),(4,'2006-07-08 18:00:00',30,3),(5,'2007-05-14 14:48:00',18,3),(6,'2008-10-22 19:00:00',57,1);

Now I have that off my chest I can go back to using this in my real application.

Looping over a query with CFScript

It is often convenient to be able to loop over a coldfusion query with CFScript. There is a way to do this even though CF7 does not provide a documented way to do it (and I dont know about CF8).

By harnessing the underlying Java you can do the following:

while( qry.Next() ) {
    qryRow = queryRowToStruct(qry);
    writeOutput( "name=" & qryRow.firstname );
}

(subsititute your relevant column name for “firstname”)

The Next() method is a Java iterator method and can be used on CF queries.

The queryRowToStruct() function is a modified version of the one found at
CFLib.org

The purpose of the function is to return a CF structure containing the data at the specified query row number. See the link above for details.

You just need to modify the line
var row = 1;
to
var row = query.getRow();

getRow() is another Java iterator method and it returns the sequential row number from the query. The Next() method increments the row counter on the query and since the query is passed into the queryRowToStruct() function by reference, each call the getRow() will return the current row number, starting at 1.

Addendum: In order to allow the queryRowToStruct() function to be used as it was originally intended, you need to make another small change so that you can use the function without the iterator and only one parameter ie so that queryRowToStruct(query) will return the first record from the query.

Here is the complete (modified) function (credits to Nathan Dintenfass (nathan@changemedia.com)):


function queryRowToStruct(query){
	// By default, do this to the first row of the query
	//var row = 1;
	// No. By using getRow() we can call this function sequentially and get the next record each time.
	// Or, we can pass the row number we want as the 2nd parameter as before.
	var row = query.getRow();
	//a var for looping
	var ii = 1;
	//the cols to loop over
	var cols = listToArray(query.columnList);
	//the struct to return
	var stReturn = structnew();

	// getRow() will return 0 if next() hasnt been called. 
	// So, to allow this function to work by calling it with one parameter and no previous call to next(), 
	// we need the following:
	if (row eq 0) row = 1; 

	//if there is a second argument, use that for the row number
	if(arrayLen(arguments) GT 1)
		row = arguments[2];
	//loop over the cols and build the struct from the query row
	for(ii = 1; ii lte arraylen(cols); ii = ii + 1){
		stReturn[cols[ii]] = query[cols[ii]][row];
	}		
	//return the struct
	return stReturn;
}

You can also still do this
qryRow = queryRowToStruct(query,3);
to get the third row (as per the original use of the function).

For other info on using Java and queries see:
href=”http://www.bennadel.com/blog/204-Using-ColdFusion-Query-s-Underlying-Java-Methods-For-Query-Manipulation-And-Logic.htm”