Hermes unleashes self-improving AI on NVIDIA RTX and DGX Spark
Nous Research has released Hermes, a self-improving agentic AI framework that has rapidly gained adoption, surpassing 140,000 GitHub stars within three months and becoming the most utilized agent on OpenRouter. Designed to run reliably and autonomously on local hardware, Hermes addresses historical challenges in agent stability and self-improvement. The system is model-agnostic and optimized for continuous 24/7 operation, making it particularly suited for NVIDIA RTX personal computers, RTX PRO workstations, and the new DGX Spark appliance. The agent pairs effectively with Qwen 3.6, a new series of open-weight large language models from Alibaba. Qwen 3.6 introduces significant efficiency gains, with the 35B parameter model delivering performance that exceeds previous 120B parameter counterparts while utilizing only about 20GB of memory. Similarly, the 27B model matches the accuracy of the massive 400B Qwen 3.5 397B model despite being one-sixteenth the size. These compact, high-performance models enable complex reasoning tasks on local machines without requiring data center-scale infrastructure. Hermes distinguishes itself through four key capabilities centered on local execution. By running entirely on user hardware, the agent ensures privacy and eliminates reliance on cloud APIs. NVIDIA RTX GPUs provide the necessary specialized computing power, with Tensor Cores accelerating AI inference to reduce latency. This allows the agent to execute multistep tasks and refine its own skills in seconds rather than minutes. The NVIDIA DGX Spark serves as the ideal hardware companion for this always-on workflow. This compact, standalone machine features 128GB of unified memory and one petaflop of AI performance, enabling it to sustain the heavy computational demands of mixture-of-experts models throughout the day. When paired with Qwen 3.6, DGX Spark allows users to run concurrent workloads and handle continuous autonomous planning with high throughput. Getting started with Hermes on NVIDIA hardware is accessible to developers and enthusiasts alike. The agent is available via the Nous Research GitHub repository and integrates seamlessly with popular local runtimes such as llama.cpp, LM Studio, and Ollama. Support for LM Studio and Ollama is included out of the box, simplifying the setup process for local AI agents. In related developments, the NVIDIA ecosystem continues to expand its support for open models. Recent updates include NVIDIA RTX PRO GPUs delivering up to three times faster token generation with Qwen 3.6 models using llama.cpp. Google's Gemma 4 models are now available as NVFP4 checkpoints for Blackwell GPUs, offering similar performance boosts through multi-token prediction. Additionally, Mistral Medium version 3.5 now includes compatibility for local inference on NVIDIA systems. The industry is also seeing growth in open source tools designed to secure and optimize local AI. NVIDIA recently launched NemoClaw, an open stack that enhances OpenClaw experiences by improving security and supporting local models. The addition of Windows Subsystem for Linux (WSL2) support in NemoClaw has made it easier for Windows users to deploy these agents. As agentic AI evolves, the combination of Hermes and NVIDIA hardware represents a significant shift toward powerful, private, and self-sustaining artificial intelligence. This approach allows both individual users and developers to build robust local tooling without the latency or costs associated with cloud-dependent solutions.
