It originally came about as a result of Andrew Ng's Machine Learning course on Coursera. After the course I wanted to test out what I'd learn on a novel data set, something that would interest me so I went immediately for bicycles. Along the way though I realized how difficult the data was to compile and that there is no universally available source for it -- so I'm making it.
The comparison algorithm is in place but doesn't work very well at the moment. There are a number of improvements I intend to make in the coming months. First, dropping a number of irrelevant features from the algorithm, adding a few in as well, but the biggest challenge will be impute and interpolating missing features for many bikes. The data from manufacturer to manufacturer is spotty and it is difficult to do an apples to apples comparison as of yet.
A big take away from this project is discovering methods for decreasing experimentation time. I can't say enough how invaluable having a brand new very powerful machine is. For this project I retired my MBP when I started facing serious hardware limitations and built up a Hackintosh. Being able to keep the entirety of the dataset on your hard drive and having enough CPU to chomp on it makes a dramatic difference in your ability and will to try different things out.
Check out the site and any feedback is appreciated. Many features including user generated content and comments soon.