Recently I have been working on a data-mining problem that requires supervised learning. The problem is not supposed to be big, just a few hundreds of features. The interesting issue is the number of records, around half a million or more. Most of the implementations of supervised learning algorithms available on the web are not designed with such a volume of data. Scalability of the algorithms becomes a clear issue when dealing such a volume of data. For instance, algorithms that scale as n^3 with respect to the data may become prohibitively costly for any feasible approach. Also, algorithms that require global processing, defeating efficient parallelization may not be an option either. For such reasons, I started working last fall on efficient implementations of GBML algorithms focusing on (1) efficient implementations hacking the available hardware, (2) minimizing memory food prints required, and (3) massively exploiting the inherent parallelism of such methods. A few initial steps can be found here and here.