« Swarmteams: capturing the buzz of trusted fans | Main | Audio whiplash from personalised ads with free music downloads »

11 December 2007

Detecting exceptions and fraud on recommender systems

Screenshot from iLike profileI stumbled across this iLike profile when I noticed someone who'd listened to David Bowie over 2 billion times. "Patrick S" has, according to his profile, been alive less than a billion seconds (and iTunes has been available for only the later part of his life), yet he's managed to register over 2 billion plays for 32 separate artists. The data concerned seems to be imported by iLike from iTunes.

Who knows whether this is just idle mischief, a software glitch, or a concerted effort at gaming or 'shilling' the iLike recommender system? Whatever the cause, recommender systems clearly have to detect such incongruent profiles — which can't be hard in cases as gross as this — and make sure the data is excluded from their recommendation algorithms. Cleaning up the profiles for this user and the 32 artists concerned would be a good idea, too, lest they undermine others' confidence in what they see on the site.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/651392/24112304

Listed below are links to weblogs that reference Detecting exceptions and fraud on recommender systems:

Comments

Note also that the listen count of 2,147,483,647 happens to be 2^31 - 1. This is a significant number since it is the largest signed value that can be represented in a 32 bit integer.

I suspect that this is idle mischief. You can follow the trail as patrick s experiments trying to find the largest playcounts:

Joey Schlabs 2,147,483,646
Paloalto 2,113,430,281
Andre 3000 2,112,545,318
Seven Band 2,112,303,498
Nick Cave & The Bad Seeds 2,108,076,443
Elvis Presley 2,104,761,844

Perhaps more troubling is that iLike doesn't catch this and filter out such an obvious hack. Now if you look at the fans of John Mellencamp http://www.ilike.com/artist/John+Mellencamp/top_listeners all you see is patrick_s, the rest of the fans are normalized to zero. That's a bad user experience.

(Note that I tried the same thing a while back to see what last.fm would do when I told it I had played Deerhoof a modest 200,000 times, they filtered it out).

Moreover, if iLike gets the data from the iTunes personal library, one only has to open the associated XML file, and write down the desired number of plays per song.
That's it!

Thanks for this input.

Ah, Paul, some people recognise 2^31 - 1 immediately, and some don't ;-)

Oscar, I just tried editing the play count from the XML file as you describe, but iTunes seemed to outwit me when I started it up again and set it back to what it had been before. I suspect it's simple, but not quite that simple (for me!).

It's a good illustration of how automated recommendation systems lack the most basic common sense. Machines are great at spotting patterns and apparent associations, but their recommendations are at best automated guesses (albeit sometimes very good guesses). When a friend recommends something, though, I know it's going to be good - there's a bit more substance to the recommendation than a mathematical algorithm.

Funny, I noticed a similar glitch on a blog traffic site about 2 years ago. Not an automated recommendation system, but something used similarly.

The Truth Laid Bare used to be a worthwhile way to compare blog traffic -- until a few people figured out how to get multiple blogs. But what struck me about your post was how similar the two data runs looked:

Here's the faked misleading traffic data:

28) Athletics Nation :: An Oakland A's Blog 40455 visits/day
29) Red Reporter :: A Cincinnati Reds Blog 40455 visits/day
30) Bruins Nation :: A UCLA Bruins weblog 40455 visits/day
31) Camden Chat :: A Baltimore Orioles Blog 40455 visits/day

On and on for a dozen sites -- it struck me similarly to your list above.

The full fisking is here: http://bigpicture.typepad.com/comments/2006/02/gaming_the_blog.html

Post a comment

If you have a TypeKey or TypePad account, please Sign In

Related Sites

Recently noted elsewhere on the web

Find me on Social Networks

Discovery Blogs