AI systems to detect ‘hate speech’ could have ‘disproportionate negative impact’ on African Americans.
A new Cornell University study reveals that some artificial intelligence systems created by universities to identify “prejudice” and “hate speech” online might be racially biased themselves and that their implementation could backfire, leading to the over-policing of minority voices online.
[Researcher, Thomas] Davidson said tweets written in “African American English,” or AAE, may be more likely to be considered offensive “due to […] internal biases.” For example, terms such as nigga and bitch are common hate speech “false positives.” “We need to consider whether the linguistic markers we use to identify potentially abusive language may be associated with language used by members of protected categories,” the study’s conclusion states.
“Human error” and “inadequate training” have been cited as explanations.
Update, via the comments:
Given the volume of research that’s subordinate to the conceit that anything reflecting poorly on a Designated Victim Group must therefore, by definition, be an unconscionable act of bias, it’s refreshing to see that the authors of the study do concede that the effect they denounce is most likely a result of statistical differences in actual behaviour:
Different communities have different speech norms, such that a model suitable for one community may discriminate against another… The ‘n-word’… can be extremely racist or quotidian, depending on the speaker… we should not penalise African-Americans for using [it].
However, the authors seem quaintly mystified by the fact that tweets by black people “are classified as containing sexism almost twice as frequently.” And whether the word bitch and various common synonyms should result in flagging and censure only when used by white people and other, as it were, unprotected categories is left to the imagination.
Also, open thread.