Ah, Reddit: Sure, it’s a wonderful time-waster and, at times, a very useful resource. But as anyone who has plunged in too deeply can tell you, it also has some dark corners — dark corners that host some rather soul-crushing interactions. Inspired by a Reddit post asking users which sections of the site, or subreddits, they felt were the most toxic, Ben Bell of the company Idibon decided to better map these nether realms, so he pulled a bunch of comments from Reddit in an attempt to better identify both the cesspools of toxicity and the oases of supportive conversation.
As he explains in a blog post that is very much worth reading, Bell defined a remark as “toxic” if it contained either an ad hominem attack or an instance of overt racial, homophobic, or misogynistic bigotry, and then set about using sentiment analysis, or computer analysis of a snippet of language’s emotional content, to start producing some statistics. The problem, as he wrote in an email to Science of Us, is that “Toxicity as we defined it would have been too complicated for sentiment analysis to pick up.” But on the other hand, it wasn’t realistic to have humans look at every comment Ben and Idibon wanted to analyze.
The solution? Have humans and computers tag-team the problem: First Idibon’s algorithm tagged a bunch of comments as positive or negative in sentiment — a task current sentiment-analysis technology can handle fairly well — and then humans checked each one to appropriately label it as supportive, toxic, or neither.
Then Bell examined the extent to which the analyzed comments had been up-voted or down-voted on Reddit — if a subreddit has lots of toxic comments but they are consistently down-voted to oblivion, it would be unfair to label that subreddit as “toxic” — allowing him to produce some pretty graphs. Here’s one running down where a bunch of subreddits rank in terms of how frequently the sampled comments exhibits signs of toxicity and supportiveness (mouse over to get the subreddit’s name):
It’s kind of cool that GetMotivated, devoted to users helping spur one another on in their daily lives with inspirational quotes, and DIY, devoted, as the name implies, to DIY projects, are both so supportive — one would like to think that when good people assemble in an online space, they can keep the tone positive and helpful, and that seems to be what’s going on with these two subreddits.
But back to humanity’s dark side for a second — here’s a graph showing which subreddits exhibit the most bigotry (again, mouse over):
Here Bell has some interesting things to say about how different communities police themselves:
Looking specifically at bigoted comments, the importance of taking score into account rather than number of comments becomes even more apparent. For a small number of communities (/r/Libertarian, /r/Jokes,/r/community, and /r/aww) the total aggregated score of comments that our annotators labeled as bigoted was actually negative – so despite having bigoted comments present in their communities, those bigoted comments were rejected by the community as a whole. On the other end of the spectrum we see /r/TheRedPill, a subreddit dedicated to proud male chauvinism, where bigoted comments received overwhelming approval from the community at large.
So one of the morals here is that just about any online community can attract jerks; what matters is whether the non-jerks join forces to mute the jerks. Jerk-muting: the internet’s lost art.