Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

A Survey on Large Language Model Benchmarks

Waver: Wave Your Way to Lifelike Video Generation































A Survey on Large Language Model Benchmarks

Waver: Wave Your Way to Lifelike Video Generation






























LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Deep Think with Confidence
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Intern-S1: A Scientific Multimodal Foundation Model
Language-Guided Tuning: Enhancing Numeric Optimization with Textual Feedback
NiceWebRL: a Python library for human subject experiments with reinforcement learning environments
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models
Granary: Speech Recognition and Translation Dataset in 25 European Languages
TransLLM: A Unified Multi-Task Foundation Framework for Urban Transportation via Learnable Prompting
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge
MultiRef: Controllable Image Generation with Multiple Visual References
Prompt Orchestration Markup Language
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
HPSv3: Towards Wide-Spectrum Human Preference Score
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Evaluating Identity Leakage in Speaker De-Identification Systems
Next Visual Granularity Generation
4DNeX: Feed-Forward 4D Generative Modeling Made Easy
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
An integrated microwave neural network for broadband computation and communication
GTool: Graph Enhanced Tool Planning with Large Language Model
Observation of dendrite formation at Li metal-electrolyte interface by a machine-learning enhanced constant potential framework
LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries
Deep Think with Confidence
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Intern-S1: A Scientific Multimodal Foundation Model
Language-Guided Tuning: Enhancing Numeric Optimization with Textual Feedback
NiceWebRL: a Python library for human subject experiments with reinforcement learning environments
From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction
DuPO: Enabling Reliable LLM Self-Verification via Dual Preference Optimization
From Scores to Skills: A Cognitive Diagnosis Framework for Evaluating Financial Large Language Models
Granary: Speech Recognition and Translation Dataset in 25 European Languages
TransLLM: A Unified Multi-Task Foundation Framework for Urban Transportation via Learnable Prompting
Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs
Training-Free Text-Guided Color Editing with Multi-Modal Diffusion Transformer
Evaluating Podcast Recommendations with Profile-Aware LLM-as-a-Judge
MultiRef: Controllable Image Generation with Multiple Visual References
Prompt Orchestration Markup Language
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
HPSv3: Towards Wide-Spectrum Human Preference Score
ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
Evaluating Identity Leakage in Speaker De-Identification Systems
Next Visual Granularity Generation
4DNeX: Feed-Forward 4D Generative Modeling Made Easy
ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning
An integrated microwave neural network for broadband computation and communication
GTool: Graph Enhanced Tool Planning with Large Language Model
Observation of dendrite formation at Li metal-electrolyte interface by a machine-learning enhanced constant potential framework