HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Cumulative Reasoning with Large Language Models

Yifan Zhang Jingqin Yang Yang Yuan Andrew Chi-Chih Yao

Cumulative Reasoning with Large Language Models

Abstract

Recent advancements in large language models (LLMs) have shown remarkable progress, yet their ability to solve complex problems remains limited. In this work, we introduce Cumulative Reasoning (CR), an approach that utilizes LLMs cumulatively and iteratively, mirroring human thought processes for problem-solving. CR decomposes tasks into smaller, manageable components and leverages previous propositions for effective composition, significantly enhancing problem-solving capabilities. We demonstrate CR's advantage through several complex reasoning tasks: it outperforms existing methods in logical inference tasks with up to a 9.3% improvement, achieving 98.04% accuracy on the curated FOLIO wiki dataset. In the Game of 24, it achieves 98% accuracy, marking a 24% improvement over the prior state-of-the-art. In solving MATH problems, CR achieves a 4.2% increase from previous methods and a 43% relative improvement in the most challenging level 5 problems. When incorporating a code environment with CR, we further harness LLMs' reasoning capabilities and outperform the Program of Thought (PoT) method by 38.8%. The code is available at https://github.com/iiis-ai/cumulative-reasoning.

Code Repositories

iiis-ai/cumulative-reasoning
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
math-word-problem-solving-on-mathCR (GPT-4-turbo model, w/ code)
Accuracy: 72.2
math-word-problem-solving-on-mathCR (GPT-4 model, w/o code)
Accuracy: 58.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Cumulative Reasoning with Large Language Models | Papers | HyperAI