Mixture-of-Thought Enables More Robust Logical Reasoning in LLMs

🔊

💬 Ask

A new study introduces “Mixture-of-Thought” (MoT), a framework that significantly improves the logical reasoning capabilities of Large Language Models (LLMs). The paper, published on arXiv, addresses a key limitation of current LLM-based reasoning approaches: their reliance on a single reasoning modality, typically natural language. Humans, in contrast, naturally leverage multiple modalities like natural language, code, and symbolic logic to solve problems. MoT aims to bridge this gap.

MoT allows LLMs to reason across three complementary modalities: natural language, code, and a newly introduced “truth-table” modality. The truth-table modality systematically enumerates logical cases, helping to mitigate common errors in natural language reasoning, such as “missing branches” and “invalid converses.”

Think of it like this: imagine a question like, “If Peter Parker is either a superhero or a civilian, and if Thor is happy, does Peter Parker wear a uniform?”

Natural Language: The LLM might break it down step-by-step in plain English, “If Thor is happy, then the Hulk is angry. The Hulk wakes up when angry…” This is intuitive, but prone to errors if a step is missed.
Code: The LLM would translate the problem into Python code, defining classes for “Hulk,” “Thor,” and “PeterParker” with attributes for their states (angry, happy, superhero, etc.) and methods reflecting the premises. While more structured, it’s still an abstraction.
Truth Table: The LLM constructs a table listing all possible combinations of the truth values of the variables. For example, Thor happy (T), Hulk angry (H), Peter Parker superhero (S), etc. and then prunes possibilities that are inconsistent with the given premises. This ensures all cases are considered and often helps avoid common pitfalls in natural language reasoning.

The MoT framework operates in two phases:

Self-Evolving MoT Training: The LLM jointly learns from filtered, self-generated rationales across modalities. It starts with a small set of examples and iteratively generates reasoning traces in each modality. These traces are then checked for correctness and consistency, and high-quality traces are used to further train the model. This process creates a continuous feedback loop, improving the model’s reasoning abilities over time.
MoT Inference: During inference, the trained model generates responses under each modality. A voting mechanism is then used to produce the final answer, leveraging the synergy of the three modalities.

The researchers evaluated MoT on logical reasoning benchmarks, including FOLIO and ProofWriter. Results show that MoT consistently and significantly outperforms strong LLM baselines with single-modality chain-of-thought approaches, achieving accuracy gains of up to 11.7 percentage points. In one experiment, a 9B-parameter MoT model even matched the performance of GPT-4 + Logic-LM on FOLIO.

Analysis revealed that MoT is particularly effective on harder logical reasoning problems. A fine-grained error study confirmed the role of the truth-table modality in helping overcome key bottlenecks in natural language inference.

The researchers have made their training code publicly available on GitHub, enabling further research and development in this area. The MoT framework offers a promising approach to enhancing the logical reasoning capabilities of LLMs, bringing them closer to human-level cognitive versatility.

AI Papers Reader

Personalized digests of latest AI research

Mixture-of-Thought Enables More Robust Logical Reasoning in LLMs

Chat about this paper