Papers by Faculty and Students of the Gaoling School of Artificial Intelligence have been Accepted by the International Conference AAAI.
**Abstract:** On December 10, the acceptance results for the 2025 International Conference on Artificial Intelligence (AAAI) were announced, with 21 papers from faculty and students of the High-Tech Artificial Intelligence Academy at Renmin University of China (RUC) being accepted. AAAI, organized by the Association for the Advancement of Artificial Intelligence, is one of the oldest and most comprehensive top-tier academic conferences in the field, and is also classified as an A-class international academic conference by the Chinese Computer Federation (CCF). The conference is scheduled to take place from February 25 to March 4, 2025, in Philadelphia, Pennsylvania, USA. The accepted papers cover a broad spectrum of AI research, with a particular focus on large language models (LLMs), vision-language models, retrieval-augmented generation (RAG), and the application of AI in various domains such as e-commerce, urban planning, and protein structure analysis. Here are brief summaries of the key papers: 1. **Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers** - This paper explores the use of LLMs as prompt optimizers to improve their performance through iterative enhancements. The authors propose a new perspective by drawing an analogy between prompt optimization and gradient-based model optimization, focusing on key factors like update direction and method. They introduce Gradient-inspired Prompt Optimizer (GPO), which uses a cosine-based decay strategy to control edit distance while enhancing task prompts. Extensive experiments demonstrate the effectiveness and efficiency of GPO. 2. **Leveraging Large Vision-Language Model as User Intent-aware Encoder for Composed Image Retrieval** - Addressing the challenge of composed image retrieval (CIR), this paper introduces CIR-LVLM, a method that fine-tunes large vision-language models (LVLMs) to better understand and execute user modification intentions. CIR-LVLM uses a unified multimodal framework to capture more complete reference image information and employs task-level and instance-level prompts to enhance reasoning capabilities. The method shows significant potential in multimodal retrieval tasks, outperforming models like CLIP. 3. **One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models** - This paper presents a novel method to improve RAG by learning scalable and pluggable virtual tokens. The approach maintains the original parameters of the large model while fine-tuning the virtual tokens to enhance performance in RAG tasks without compromising general capabilities. The method is validated across 12 question-answering tasks, demonstrating superior performance. 4. **Descriptive and Discriminative Document Identifiers for Generative Retrieval** - Focusing on generative document retrieval, this paper introduces D2-DocID, a method that designs document identifiers (DocIDs) to be both descriptive and discriminative. D2-DocID enhances retrieval performance by accurately representing and distinguishing documents, even in the presence of noise. The method is tested on the MS MARCO and NQ320K datasets, showing significant improvements. 5. **Toward General Instruction-Following Alignment for Retrieval-Augmented Generation** - This paper addresses the alignment of LLMs with user instructions in RAG systems. The authors introduce VIF-RAG, a data synthesis framework that generates high-quality instructions for fine-tuning LLMs. They also present FollowRAG, a benchmark for evaluating instruction-following alignment, which demonstrates the effectiveness of VIF-RAG in enhancing LLM performance across various tasks. 6. **AdaO2B: Adaptive Online to Batch Conversion for Out-of-Distribution Generalization** - In the context of online optimization, this paper proposes AdaO2B, a method that adapts online learning to batch learning in non-i.i.d. data environments. AdaO2B uses a context-aware weighted function to combine models and achieve out-of-distribution (OOD) generalization. The method is validated on both synthetic and real-world datasets, showing robust OOD performance. 7. **Trigger3: Refining Query Correction via Adaptive Model Selector** - This paper introduces Trigger3, a multi-level adaptive model selection method for query correction in search scenarios. Trigger3 reduces computational costs by selectively using smaller models for corrections, only resorting to larger models when necessary. Experiments on public and commercial datasets show that Trigger3 improves query correction accuracy while maintaining high inference efficiency. 8. **Enhancing Audiovisual Speech Recognition by Bifocal Preference Optimization** - This paper proposes BPO-AVASR, a method that improves audiovisual speech recognition (AV-ASR) in noisy environments and spontaneous dialogues. BPO-AVASR uses a bifocal preference optimization strategy, integrating input-side and prediction-side preferences to enhance model performance. The method outperforms previous state-of-the-art models in real-world video speech recognition tasks. 9. **EyEar: Learning Audio Synchronized Human Gaze Trajectory based on Physics-informed Dynamics** - Addressing the gap in understanding human gaze in audio-visual environments, this paper introduces EyEar, a physics-informed learning framework that predicts human gaze trajectories based on synchronized audio and visual inputs. EyEar considers factors such as natural eye movement, visual salience, and audio semantics, and uses a probability density-based scoring method to improve model stability and reliability. 10. **Merging Mechanisms for Ads and Organic Items in E-commerce Platforms** - This paper tackles the challenge of merging ads and organic items on e-commerce search result pages. The authors propose G-Fix and G-Change mechanisms, which ensure optimal merging while maintaining incentive compatibility and individual rationality. Theoretical and experimental results show that these mechanisms outperform existing methods in multi-objective allocation tasks. 11. **GenAuction: A Generative Auction for Online Advertising** - This paper introduces GenAuction, a novel auction mechanism for online advertising that selects winning pages rather than individual ads. GenAuction uses a generator-evaluator architecture to optimize for both short-term and long-term key performance indicators (KPIs) and leverages global context information. Experiments using real industrial data and online A/B testing demonstrate its effectiveness and potential in practical applications. 12. **On Designing the Optimal Integrated Ad Auction in E-commerce Platforms** - This paper proposes JINTER Net, a joint integrated regret network for optimizing the integration of ads and organic content in e-commerce platforms. JINTER Net directly selects from a pool of candidates to generate an optimal list, considering both platform revenue and user satisfaction. The method is validated on simulated and real datasets, showing significant improvements over baseline models. 13. **A Plug-and-Play Bregman ADMM Module for Inferring Event Branches in Temporal Point Processes** - This paper introduces a Bregman Alternating Direction Method of Multipliers (BADMM) module for inferring event branch structures in temporal point processes (TPPs). The module imposes sparsity and low-rank constraints on event transition matrices, enhancing model interpretability and performance. Experiments on synthetic and real datasets demonstrate its effectiveness. 14. **An Optimal Transport-based Latent Mixer for Robust Multi-modal Learning** - This paper proposes an optimal transport-based latent mixer (OTM) for robust multi-modal learning in distributed environments. OTM aligns and enhances multi-modal data in the latent space, allowing for efficient and accurate model training without requiring data alignment or shared distributions. Experiments show that OTM outperforms baseline models in clustering and classification tasks. 15. **WatE: A Wasserstein t-distributed Embedding Method for Information-enriched Graph Visualization** - This paper presents WatE, a method for graph visualization that represents each graph as an ellipse rather than a single point, preserving node-level structure information. WatE uses a Wasserstein t-distributed embedding approach to learn graph neural networks, enhancing visualization and clustering performance. 16. **Stability and Generalization of Zeroth-Order Decentralized Stochastic Gradient Descent with Changing Topology** - This paper analyzes the generalization performance of zeroth-order decentralized stochastic gradient descent (ZO-DSGD) in dynamic network topologies. The authors provide a theoretical framework for understanding how local model generalization is influenced by client numbers, local sample sizes, and network structure. The study offers insights into the development of decentralized zeroth-order methods. 17. **HeMeNet: Heterogeneous Multichannel Equivariant Network for Protein Multitask Learning** - This paper introduces HeMeNet, a novel graph neural network for multitask learning in protein structure analysis. HeMeNet captures heterogenous relationships between atoms and achieves task-specific learning through an equivariant network. The model outperforms state-of-the-art methods on a benchmark dataset, demonstrating its effectiveness in protein function prediction tasks. 18. **Controlling Large Language Models Through Concept Activation Vectors** - This paper proposes GCAV, a lightweight framework for controlling LLMs by training concept activation vectors (CAVs) with a small dataset. GCAV injects CAVs into the model's activation layers during inference to adjust the output, achieving better control over generated text. Experiments show that GCAV can generate personalized content with improved accuracy and efficiency. 19. **FAP-CD: Fairness-Driven Age-Friendly Community Planning via Conditional Diffusion Generation** - This paper addresses the need for age-friendly community planning in rapidly aging populations. FAP-CD uses a conditional diffusion generation framework to learn the joint probability distribution of age-friendly facilities and their spatial relationships. The method ensures service distribution equity and optimizes 15-minute walkability, outperforming competitive benchmarks in balancing age-friendly needs and regional fairness. 20. **MotifGPL: Motif-Enhanced Graph Prototype Learning for Deciphering Urban Social Segregation** - This paper proposes MotifGPL, a framework for analyzing urban social segregation through graph prototype learning. MotifGPL integrates POI, street view images, and mobility indices to extract key prototypes and match them with motif patterns, providing insights into the structural and dynamic factors influencing social segregation. The method is validated through extensive experiments, showing its ability to reveal critical patterns and support the development of low-segregation urban structures. 21. **RATT: A Thought Structure for Coherent and Correct LLM Reasoning** - This paper introduces RATT, a retrieval-augmented thought tree structure designed to enhance the reasoning and decision-making capabilities of LLMs. RATT balances logical coherence and factual correctness by integrating RAG and LLM strengths at each step of the reasoning process. The method significantly improves model performance across various tasks, demonstrating its potential for generating reliable and coherent reasoning. These papers highlight the cutting-edge research being conducted at RUC's High-Tech Artificial Intelligence Academy, showcasing advancements in AI methodologies and their practical applications across multiple domains.
