Command Palette
Search for a command to run...
Papers
Daily updated cutting-edge AI research papers to help you keep up with the latest AI trends

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Mem-α: Learning Memory Construction via Reinforcement Learning































Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection

Mem-α: Learning Memory Construction via Reinforcement Learning






























Search Self-play: Pushing the Frontier of Agent Capability without Supervision
CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters
Optimizing Mixture of Block Attention
FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks
Chain-of-Thought Hijacking
InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention
3EED: Ground Everything Everywhere in 3D
DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding
CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings
Geometrically-Constrained Agent for Spatial Reasoning
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
DiP: Taming Diffusion Models in Pixel Space
Architecture Decoupling Is Not All You Need For Unified Multimodal Model
Vision Bridge Transformer at Scale
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
Qwen3-VL Technical Report
G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Video Generation Models Are Good Latent Reward Models
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Think Visually, Reason Textually: Vision-Language Synergy in ARC
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
Search Self-play: Pushing the Frontier of Agent Capability without Supervision
CudaForge: An Agent Framework with Hardware Feedback for CUDA Kernel Optimization
ScaleNet: Scaling up Pretrained Neural Networks with Incremental Parameters
Optimizing Mixture of Block Attention
FractalForensics: Proactive Deepfake Detection and Localization via Fractal Watermarks
Chain-of-Thought Hijacking
InstanceAssemble: Layout-Aware Image Generation via Instance Assembling Attention
3EED: Ground Everything Everywhere in 3D
DetectiumFire: A Comprehensive Multi-modal Dataset Bridging Vision and Language for Fire Understanding
CHIP: A multi-sensor dataset for 6D pose estimation of chairs in industrial settings
Geometrically-Constrained Agent for Spatial Reasoning
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
DiP: Taming Diffusion Models in Pixel Space
Architecture Decoupling Is Not All You Need For Unified Multimodal Model
Vision Bridge Transformer at Scale
AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement
REASONEDIT: Towards Reasoning-Enhanced Image Editing Models
OpenApps: Simulating Environment Variations to Measure UI-Agent Reliability
Qwen3-VL Technical Report
G2VLM: Geometry Grounded Vision Language Model with Unified 3D Reconstruction and Spatial Reasoning
Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
ENACT: Evaluating Embodied Cognition with World Modeling of Egocentric Interaction
Canvas-to-Image: Compositional Image Generation with Multimodal Controls
Video Generation Models Are Good Latent Reward Models
DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Think Visually, Reason Textually: Vision-Language Synergy in ARC
Harmony: Harmonizing Audio and Video Generation through Cross-Task Synergy
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation