| o1-mini + Language Agent Tree Search (Hamming.ai) | 82.3 | Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | |
| GPT-3.5 Turbo + Language Agent Tree Search | 81.1 | Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | |
| GPT-4 (Self-Debugging with unit tests + trace) | 80.2 | Teaching Large Language Models to Self-Debug | |