Here is my third interview. I spoke to Paul Lamere (pictured left) just an hour or two after my chat to Zac Johnson. Some of their points overlap with each other: for example, Paul's Green Day/Weezer/Radiohead argument about collaborative filtering is pretty similar to Zac's, which he illustrated with the Beatles/Bob Marley/James Brown. However, there are also minor differences and nuances of emphasis, reflecting, perhaps, the perspectives of their employers.
Paul Lamere works for Sun Labs in Boston, where he's the Principal Investigator for a project called Search Inside the Music. Unusually among industrial research labs, Sun's are funded from the corporate treasury, rather than product groups, which gives them greater scope for blue sky thinking, while still focusing on the needs of Sun's customers, who include a variety of high-profile music service providers.
Paul is interested in the whole music discovery process. Citing the long tail effect, he sees the challenge as providing the link between a million songs in the backroom and the mobile device in your hand. The ideal would be to have an option on that device that just says "play me music I like".
Paul divides the approaches to this challenge into three:
- expert recommendations (such as Pitchfork reviews);
- social models and collaborative filtering (as used by Yahoo and Amazon);
- content-based models (including MusicIP and Pandora).
[Note that this is slightly different from Zac Johnson's three-way classification, which splits (3) above into editorial-driven and acoustic-driven, and doesn't include the expert recommendations. However, I think this difference could be very easily resolved.]
Paul points to some of the limitations of the collaborative filtering models. There is a popularity bias which means there is not a lot of rotation in the most listened to, and most recommended, tracks. For example, on Last.fm, one song by The Postal Service was in the top 3 of the charts for nine months (see the charts from July 2005 and those from April 2006). Why does this happen? Paul gives an example where different users might say they liked the band Green Day. Let's say that, based on this, the service recommends tracks by Weezer and Radiohead. One user might think these were good recommendations and would continue using the service. Another who didn't like these recommendations might be put off and stop using the service. The net effect would be that the Green Day/Weezer/Radiohead link would be reinforced, because the service cannot collect data from listeners who don't recognise this link. Another limitation of collaborative filtering is that it takes time for the new releases to appear on the radar: it requires several people to listen to them first before they, literally, filter through the recommendation system. None of which is to imply that Paul doesn't like Last.fm — as Donald Norman said of the Mac user interface in the 1980s, it's good enough to be worth criticising.
Sun's current focus, however, is mostly on content-based approaches, and Paul brings his background in speech recognition to bear, as well as Sun's experience of text analysis (similar to Google's analysis of web pages). Using these techniques, Sun can 'mine' music reviews to analyse the "words around the music" and identify when a particular track is described as "frenetic" or "jagged". People are better at formulating search queries using text, so they can then search for frenetic or jagged music.
Thus Paul recognises that the content-based approach won't be the 'final answer', but will need to be combined with other models. In particular, Paul is looking to combine this approach with a model that tracks individual listening. This enables you to make recommendations that are based not just on the music content but are also tailored to the individual by taking into account their preferences.
Evaluation of recommendation systems is a problem, Paul observes. What criteria do you apply? Paul is also keen to distinguish between simple music recommendations, which effectively say to their users "Here's a disparate bunch of songs that we think you might like", and playlist generators that say "Here is a coherent group of songs that go together".
Where do Sun's interests lie in this field? One area is in supporting the analysis of large catalogues of songs for music service providers. The analyses Paul is working on require 10-20 seconds of CPU time per song. You could run a personal music collection through this kind of analysis in a weekend, but a catalogue of millions of tracks would take several years. To get round this problem, Sun offers its Utility Grid solution, which enables clients to hire large amounts of CPU time by the hour, instead of buying large numbers of servers for one-off or infrequent tasks.
More generally, Sun would like to be seen as the neutral Switzerland of the music industry battles. They hope their enabling technologies like music fingerprinting and Open DRM can facilitate the development of the celestial jukebox nirvana. Paul hopes to see more interoperability between music discovery services. This would make it possible to take recommendations from Last.fm and feed them into MusicIP to find out which are acoustically similar, thus 'mashing up' music recommendations and playlist generation.
When I ask about Web 2.0, Paul explains how this is an extension of the neutral enabling role Sun has set for itself. If you license detailed music metadata from specialist providers, it's expensive. There is plenty of open data in sources like Wikipedia and creative-commons-licensed blogs, but it is unstructured and therefore difficult for machines to use. This is where text mining and information extraction come in: harvesting the open, unstructured data to turn it into structured data and then provide an API for the metadata.
Finally, we discuss visualisation, which I know from his blog is one of Paul's interests. Paul talks me through the animation of his Search Inside the Music demonstration. Each of the spheres is an mp3 file. Their colour denotes genre (for example, the dark blue tracks are classical), and the spatial clustering reflects acoustic similarity. The big album cover in the background is for the song currently playing, and the six smaller ones around it are those most acoustically similar. With this system you can specify a start song for a playlist and an end song somewhere else in the 3D space, and the system will draw a line between them, including other songs along the way, with the sequence optimised to minimise what Paul calls "iPod whiplash". Is this a prototype of a consumer product? Could it fit on a mobile phone screen? Paul says that its principal purpose is just to demonstrate possibilities: though some technology partners have expressed interest, the user interface guys "just roll their eyes" at the prospects of the usability and practicality issues involved.
With all this clever technology at his disposal, Paul confesses to being amazed at how popular the whiplash-inducing shuffle feature is. But it shows, he says, how people want to be able to find and hear music without thinking about it too hard.