Trawling for Hate in the Network

Doppelportrait Prof. Dr. Melanie Siegel und Prof. Dr. Melina Alexa

Professor of Information Science Melanie Siegel from the Media faculty at h_da is working on the automatic detection of hate speech and fake news with her new research project ‘DeTox’. In cooperation with the reporting platform ‘Hessen gegen Hetze’ (‘Hesse Against Hate’), 2,500 tweets were analysed using automated artificial intelligence processes. An innovative aspect is its focus on German language social media texts. Siegel recently published a second text book on the subject together with her colleague Melina Alexa.

By Alexandra Welsch, 20.9.2021

‘So Annalena Baerbock intends to abolish widows’ pensions in order to use the money to help integrate refugees’. What? No, quite the contrary, for this is a simple enough case of ‘Fake News’. This false claim of fact was posted on Facebook shortly after the Federal Chairperson of the Green party announced her intention to stand for the post of Chancellor. It was viewed by many and, as expected, and desired, it provoked a veritable shitstorm. That the Green politician “has got a screw loose!” was one of the more moderate comments roused by the following wave of hate. This is by far not the first time that the politician has been subjected to, and made a victim of, ‘Hate Speech’.

Although it seems almost like a truism in the meantime, according to the experienced computer linguist Melanie Siegel it is very much a solid fact which warrants genuine and biting examination: "We have noticed that there is a huge problem with ‘Fake News’ and ‘Hate Speech’ on social media platforms", says the professor for Informational Science at the Department of Media at Darmstadt University of Applied Sciences. She has concerned herself systematically with destructive forms of communication for several years. Yet now, directly prior to the federal state elections, the storm began brewing again. As an example, Siegel states “there was lots of stuff spread around against Annalena Baerbock”. It is within this context that Siegel feels ‘Fake News’ is particularly vulgar and destructive “because it etches away at our attitude towards conventional politics and thus destabilises our entire society”.

Portrait Prof. Dr. Melanie Siegel — “Hate Speech destabilises our democracy”: Prof. Dr. Melanie Siegel researches into AI processes to automatically recognise Hate Speech on the internet.

The information scientist is currently working intensively on a new research project to find out how lies and hatred on the Internet can be systematically identified: "DeTox - Detection of Toxicity and Aggression in Postings and Comments on the Net" is the name of the project that is investigating and developing automated detection and classification procedures for hate speech and fake news using artificial intelligence methods. This is not limited to identifying toxic contents – moreover, it has been designed to define actual dissemination routes, as well as to record individual criminal offences in a way useful to authorities. The project is a cooperative venture between the reporting platform "Hessen gegen Hetze" (Hesse Against Hate) of the Hessen Cyber Competence Center, the h_da Research Center for Applied Computer Science and the Fraunhofer Institute for Secure Information Technology SIT in Darmstadt. It is funded by the Hesse Ministry of the Interior and Sports.

2,500 Tweets: a feast of hate and lies just waiting to be exposed

The basis is 2500 German-language Twitter postings, i.e. Tweets that are provided to Siegel's research team in a pre-classified form by the participating Hesse Cyber Competence Centre (H3C). H3C provides postings, comments and images, which assault others by negating their innate individuality in adversarial social terms: Nationality, skin colour, ethnic or religious beliefs, ideology, physical entity, gender, sexual orientation, political opinions, sheer superficial appearance or blunt social status. One early result: social media is clearly becoming dominated by wretches who “defame, insult then threaten”. Furthermore, posts are so carefully calibrated as to imply that these extremist opinions are of a broad measure. The under-financed social media platforms seem overwhelmed. Really. “This is just one of the urgent calls for automated methods to identify suspicious comments.“

The trawlers for lies and hate employ algorithms and AI to automatically classify and identify texts. Siegel reports that she already has ample experience in this department by dint of similar projects. For instance, within the framework of the GermEval international research contest she helped to initiate, three separate events were held during which several methods of AI machine learning were employed to combine and compare thousands of Tweets. These initiatives are now to form the basis as the systematic calculations are being further refined and optimised.

The focus on German language social media texts

One of the key professional areas for the computer linguist is termed ‘sentiment analysis’, a method whereby opinions being expressed publicly can be automatically examined. It involves information extracted from texts written by internet users being analysed in terms of the ways they relate, emotionally, to specific issues, products or events. Collaborating with h_da professor Melina Alexa, who teaches in the online communication program, she recently published a second book on the subject ("Tools for Social Listening and Sentiment Analysis"). This book provide practical guidelines for the usage of sentiment analysis tools. "Before, there was no textbook related to the German language," Siegel points out. As co-expert professor Melina Alexa explains “according to study results, the most important issues in communications management include dealing with the demands of the increase in volume and speed of information flow, and the use of big data and algorithms in communication." Both the book and the DeTOx project adhere to similar objectives: applicable text analyses and effective filters designed to deal with German language internet texts – a niche existence which had, to date, been merely a lowly second cousin to a rather overbearing English language bear.

Portrait: Professorin Melina Alexa — As an expert for ‘Sentiment Analysis’ and ‘Social Listening’, professor Melina Alexa collaborated with her colleague Melanie Siegel to compile a practically-oriented teaching guide focussed on the issue at hand.

The Fraunhofer SIT institute has developed a software tool specifically for the DeTox project. It enables Tweets to be classified, or ‘annotated’. It designates which individual Tweets should be highlighted, as well as to which extent the comment is positive, negative or neutral; in other words whether this is outright ‘Hate Speech’, or at least relevant in a legal sense. The entire project exerts significant responsibility on the people working on it, as Siegel is aware: “we need to talk to people, to fashion some form of effective discussion”. Additionally, each Tweet is subsequently evaluated by three members of staff: “This is an enormous effort!” To this end, four students, working 10 hours per week, have been drafted in as temporary assistants to help deal with the overload. The tool will be continuously trained by feeding it with tweets in order to automatically recognize relevant features in texts.

Subjective and offensive expressions are generally suspicious

Although the project is scaled to run until mid 2022, initial results offer pretty effective insights. Melanie Siegel cites an example: "The fact that you can also recognize Fake News by the type of language used". Mina Schütz, a doctoral student involved in the project, worked this out." Clearly, ‘Hate Speech’ is relatively easy to target and determine, due to the offensive language used. However, exposing complicated lies, as such, is far more difficult – as these generally intertwine comments with common-place phrasiology. As post-doc Mina Schütz explains: “There are linguistic aspects that suggest Fake News.” For example, the use of many personal pronouns, exclamation points, or emojis indicates a subjective, emotional language that is not used in serious scientific or journalistic texts. A further flag is given to excessive headlines which lead into very little textual information. Based on such findings, Melanie Siegel adapts the machine learning systems she uses. But one major challenge in looking at fake news remains, she says: "I can't define the truth content, because I can't cross-check all the facts."

But as Professor Siegel points out, they are by no means developing a system for fully automated filtering of Hate Speech and Fake News at DeTox. "Freedom of speech is a valuable commodity, and I would never leave that to machines in a completely automated way." Rather, she said, it's about a tool for pre-classification as an aid for people who then ultimately have to evaluate and sort it out. People like those who work at the "Hessen gegen Hetze" platform. Melanie Siegel believes that "we can't leave it up to the social media platforms to filter things."

Translation: Paul Comley

Text books available from the h_da experts about the issue:

Siegel, M. & Alexa, M. (2020). Sentiment-Analyse deutschsprachiger Meinungsäußerungen. Springer, Wiesbaden. www.springer.com/de/book/9783658296988

Alexa, M., & Siegel, M. (2021). Tools für Social Listening und Sentiment-Analyse. Springer, Wiesbaden. www.springer.com/de/book/9783658334673

Additional links:

Project website: projects.fzai.h-da.de/detox/projekt/
Hessische Meldestelle für Hasskommentare: Meldestelle für Hasskommentare (www.hessengegenhetze.de)

Contact details

Christina Janssen
Science editor
Tel.: +49.6151.16-30112
E-Mail: christina.janssen@h-da.de