AI Papers Reader

Personalized digests of latest AI research

View on GitHub

RabakBench: Unlocking Multilingual AI Safety in Singapore's Unique Linguistic Landscape

Singapore’s vibrant linguistic tapestry, woven with English, Singlish, Chinese, Malay, and Tamil, presents a unique challenge for the development of safe Large Language Models (LLMs). Recognizing this, researchers have introduced RABAKBENCH, a novel multilingual safety benchmark specifically designed for low-resource languages and localized contexts. This benchmark aims to address the critical gap in evaluating LLMs’ safety across diverse linguistic and cultural nuances, an area where current models often falter.

Bridging the Gap: The Need for Localized Safety

LLMs have demonstrated impressive multilingual capabilities, but their safety mechanisms often break down when encountering languages and cultural contexts outside of high-resource settings. This can lead to the propagation of culturally-specific harms or the misinterpretation of benign phrases. RABAKBENCH tackles this by focusing on Singapore’s unique linguistic environment, particularly its widely-used creole, Singlish, which blends elements from Malay, Hokkien, Tamil, and other languages.

A Scalable, Three-Stage Pipeline for Robust Benchmarking

The creation of RABAKBENCH involved a sophisticated three-stage process:

  1. Generation: The process began by collecting real-world Singlish web content. This data was then augmented with LLM-driven “red teaming” – essentially, using AI to deliberately probe LLMs for weaknesses and generate adversarial examples. This “red teaming” specifically targeted potential failures in content safety classifiers, aiming to uncover instances where harmful content might be missed or benign content flagged incorrectly. For example, the system might be fed a Singlish phrase that uses local slang to mask a derogatory sentiment, testing if a classifier can detect it.

  2. Labeling: To efficiently label the generated data, the researchers employed a semi-automated approach. They identified LLMs that demonstrated high agreement with human judgments for safety annotations. These selected LLMs then acted as “labelers,” assigning fine-grained safety categories and severity levels to each example. Majority voting among these LLM labelers was used to ensure label stability and accuracy, closely mirroring human consensus. This stage produced over 5,000 safety-labeled examples across four languages.

  3. Translation: The Singlish-labeled dataset was then translated into Chinese, Malay, and Tamil. A crucial aspect of this stage was ensuring “high-fidelity translation,” meaning the translations preserved not only the semantic meaning but also the specific nuances of toxicity present in the original Singlish text. This was achieved through a rigorous process involving native speaker verification and careful LLM prompting to prevent the sanitization or misrepresentation of harmful content. For instance, a Singlish phrase containing a culturally specific insult would be translated in a way that retains its offensive nature in the target language.

Key Contributions and Impact

RABAKBENCH offers several key contributions:

  • Localized Safety Benchmark: It provides a much-needed benchmark for evaluating LLM safety in low-resource, multilingual settings, with a specific focus on Singapore’s context. The benchmark uses a detailed taxonomy of harm categories with severity levels, enabling more nuanced analysis than simple safe/unsafe classifications.
  • Scalable Pipeline: The methodology offers a reproducible framework for building similar localized safety datasets in other low-resource environments, combining scalable data generation, efficient LLM-assisted labeling, and high-fidelity translation.
  • Guardrail Evaluation: The paper presents an evaluation of 11 popular LLM guardrail classifiers on RABAKBENCH, revealing significant performance degradation compared to their English-centric evaluations. This highlights the urgent need for language-specific safety tuning.

By releasing RABAKBENCH and its associated evaluation code, the researchers aim to empower the community to build safer and more globally equitable AI systems, particularly in regions with rich linguistic diversity.