I keep hearing people talking about how twitter is going to be over run with spam now that it is becoming mainstream. I really don’t understand this viewpoint, and will take time here to outline what they could be talking about, and what can be done.
This is in reply to http://www.twine.com/item/123c9051b-g8/can-twitter-survive-what-is-about-to-happen-to-it specifically, but these ideas have been mentioned over and over.
Short Version: We need to stop worrying about spam on twitter, and start worrying about all the cool stuff we can make.
Kinds of Spam¶
A person you are following is tweeting too much? How is that spam? Simply unfollow them. This is one of the big ones I don’t understand people complaining about. It’s OPT IN to follow people; if you don’t like what they say, unfollow them!
The current implementation of hashtag spam is indeed a problem, because it is a publish and not a follow model. So anyone can include a #hashtag and it will get picked up by a hashtag aggregator. This is the common problem of broadcast mediums. It can be solved filtering hashtags to only certain users, or some other kind of grouping concept. (A twitter account that retweets a hashtag only from the people it follows, for example). You could also do the filtering on the web end, showing only hashtags from user X and Y.
This seems like a problem that could be solved by the hashtag aggregators. Currently they are just dumb aggregators, and adding relevancy would probably be easy. This also screams out as an area where Bayesian Filtering would be useful, since you have a tag that is presumably about a topic.
Again, this is the same as the “Hypertweeting argument”. I follow @slicehoststatus because it is just updates about my connectivity. They have a @slicehost account that is more customer service oriented that I don’t care about. Services will have logical separation between their feeds or they won’t be used.
The post does mention some good solutions to the problem. I will address my thoughts on those here as well.
Number of Followers as a Filter.¶
This seems likely to be gamed, and a trivial filter. This might be useful when combined with other metrics, such as how long a user as been on the site, how many people they are following (and have followed!). This is where the idea of metadata being important comes in.
Re-Tweeting Activity as a Filter.¶
Perhaps, but this needs to be formalized. I would really like Twitter to formalize RT’s so that I can filter them out, because I find very little value in them personally. Twitter already has functionality in it (liking tweets) that seems like a more logical choice to use.
Metadata for Filtering.¶
He makes a good point that metadata is what is needed. It will be really hard to do these calculations outside of twitter (especially as it grows, it will be hard internally). I really think that the spam problem is something that is a non starter, and twitter will work just fine without any more measure of spam protection. The metadata will be really interesting for a lot of other applications.
I have yet to see a real post that has made me think that twitter will have a spam problem. The opt-in subscription method is really genius, and makes spam almost impossible. The model of twitter will stay spam free (I will get content from people I follow). External services (search and aggregators) will suffer from spam problems, until they get better (spam) filtering.
I think the real problem of twitter is how to find interesting people to follow, and not how to remove spam. This is where the problem of spam and filtering really come into play. Starting with a network of people you know, and branching from there is how twitter will work. The social graph is really interesting in that realm.
A lot of the conversation above leads itself into other really interesting areas of data analysis. Stopping spam is easy, let’s go data mining :)