a month ago

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Adrian Kosowski Przemysław Uznański Jan Chorowski Zuzanna Stamirowska Michał Bartoszkiewicz

Abstract

The relationship between computing systems and the brain has served asmotivation for pioneering theoreticians since John von Neumann and Alan Turing.Uniform, scale-free biological networks, such as the brain, have powerfulproperties, including generalizing over time, which is the main barrier forMachine Learning on the path to Universal Reasoning Models. We introduce `Dragon Hatchling' (BDH), a new Large Language Modelarchitecture based on a scale-free biologically inspired network of \nlocally-interacting neuron particles. BDH couples strong theoreticalfoundations and inherent interpretability without sacrificing Transformer-likeperformance. BDH is a practical, performant state-of-the-art attention-based state spacesequence learning architecture. In addition to being a graph model, BDH admitsa GPU-friendly formulation. It exhibits Transformer-like scaling laws:empirically BDH rivals GPT2 performance on language and translation tasks, atthe same number of parameters (10M to 1B), for the same training data. BDH can be represented as a brain model. The working memory of BDH duringinference entirely relies on synaptic plasticity with Hebbian learning usingspiking neurons. We confirm empirically that specific, individual synapsesstrengthen connection whenever BDH hears or reasons about a specific conceptwhile processing language inputs. The neuron interaction network of BDH is agraph of high modularity with heavy-tailed degree distribution. The BDH modelis biologically plausible, explaining one possible mechanism which humanneurons could use to achieve speech. BDH is designed for interpretability. Activation vectors of BDH are sparseand positive. We demonstrate monosemanticity in BDH on language tasks.Interpretability of state, which goes beyond interpretability of neurons andmodel parameters, is an inherent feature of the BDH architecture.

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Adrian Kosowski Przemysław Uznański Jan Chorowski Zuzanna Stamirowska Michał Bartoszkiewicz

Abstract

Build AI with AI

Hyper Newsletters