‘Toxic’ open source GitHub discussions analyzed in study

Toxic discussions on open-source GitHub projects tend to involve entitlement, subtle insults, and arrogance, according to an academic study. That contrasts with the toxic behavior – typically bad language, hate speech, and harassment – found on other corners of the web.

Whether that seems obvious or not, it’s an interesting point to consider because, for one thing, it means technical and non-technical methods to detect and curb toxic behavior on one part of the internet may not therefore work well on GitHub, and if you’re involved in communities on the code-hosting giant, you may find this research useful in combating trolls and unacceptable conduct.

It may also mean systems intended to automatically detect and report toxicity in open-source projects, or at least ones on GitHub, may need to be developed specifically for that task due to their unique nature.


Courtney Miller, Sophie Cohen, Daniel Klug, Bogdan Vasilescu, and Christian Kästner – describe their findings in a paper [PDF] titled, “‘Did You Miss My Comment or What?’ Understanding Toxicity in Open Source Discussions,” that was presented last month at the ACM/IEEE International Conference on Software Engineering in Pittsburgh, Pennsylvania.

In a video explainer, Miller, a doctoral student at CMU’s Institute for Software Research and lead author on the paper, says the project adopted the definition of toxicity proposed by those working on Google’s Perspective API: “rude, disrespectful, or unreasonable language that is likely to make someone leave a discussion.”


The open source community’s long tradition of blunt interaction has led many projects to adopt codes of conduct, the paper notes. The reason for doing so is to encourage contributors to join open source projects and to keep them from being driven away by trolling and other forms of hostility.

The researchers acknowledge that “toxicity in open source is often written off as a naturally occurring if not necessary facet of open source culture.” And while there are those who defend a more rough-and-tumble mode of online interaction, there are consequences for angry interactions. Witness the departures in the Perl community over hostility.

“Toxicity is different in open-source communities,” Miller said in a CMU news release. “It is more contextual, entitled, subtle and passive-aggressive.”


many open source contributors have cited toxic and continuously negative behavior as their reason for disengaging (see Section 2 of our paper for more details). Because of this, it was important to consider toxicity that could be considered toxic to a wide spectrum of open source contributors.”

Toxicity in open source projects is relatively rare – the researchers in previous work found only about six per 1,000 GitHub issues to be toxic. That meant a random sampling of issues wouldn’t serve the research objective, so the group adopted several strategies for identifying toxic issues and comments: a language-based detector, finding mentions of “codes of conduct” and locked threads, and threads that had been deleted.

The result was a data set of 100 toxic issues on GitHub. What the researchers found was that toxicity on the Microsoft-owned website has its own particular characteristics.


The computer scientists note that GitHub Issues, while they include insults, arrogance, and trolling seen elsewhere, do not exhibit the severe language common on platforms like Reddit and Twitter. Beyond milder language, GitHub differs in its abundance of entitled comments – people making demands as if their expectations were based on a contract or payment.


The researchers identify a variety of triggers for toxic behavior, which mostly occur in large, popular projects. These include: trouble using software, technical disagreements, politics/ideology, and past interactions.


“The harms of toxicity were outside the scope of this project, but informally we observed that one thing that seemed to be an efficient way of curbing toxicity was for maintainers to cite their project’s code of conduct and lock the thread as too heated,” said Miller. “This seemed to help reduce the amount of time and emotional labor involved with dealing with the toxicity.”


Source: ‘Toxic’ open source GitHub discussions analyzed in study

Organisational Structures | Technology and Science | Military, IT and Lifestyle consultancy | Social, Broadcast & Cross Media | Flying aircraft