HyperAI

Abstract

We aim to produce a smaller language model that is aligned to user intent.Previous research has shown that applying distilled supervised fine-tuning(dSFT) on larger models significantly improves task accuracy; however, thesemodels are unaligned, i.e. they do not respond well to natural prompts. Todistill this property, we experiment with the use of preference data from AIFeedback (AIF). Starting from a dataset of outputs ranked by a teacher model,we apply distilled direct preference optimization (dDPO) to learn a chat modelwith significantly improved intent alignment. The approach requires only a fewhours of training without any additional sampling during fine-tuning. The finalresult, Zephyr-7B, sets the state-of-the-art on chat benchmarks for 7Bparameter models, and requires no human annotation. In particular, results onMT-Bench show that Zephyr-7B surpasses Llama2-Chat-70B, the best open-accessRLHF-based model. Code, models, data, and tutorials for the system areavailable at https://github.com/huggingface/alignment-handbook.

Abstract

Lewis Tunstall Edward Beeching Nathan Lambert Nazneen Rajani Kashif Rasul Younes Belkada Shengyi Huang Leandro von Werra Clémentine Fourrier Nathan Habib

Abstract

Build AI with AI

HyperAI Newsletters

Lewis Tunstall Edward Beeching Nathan Lambert Nazneen Rajani Kashif Rasul Younes Belkada Shengyi Huang Leandro von Werra Clémentine Fourrier Nathan Habib

Abstract

Build AI with AI

HyperAI Newsletters

Lewis Tunstall Edward Beeching Nathan Lambert Nazneen Rajani Kashif Rasul Younes Belkada Shengyi Huang Leandro von Werra Clémentine Fourrier Nathan Habib

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Zephyr: Direct Distillation of LM Alignment

Lewis Tunstall Edward Beeching Nathan Lambert Nazneen Rajani Kashif Rasul Younes Belkada Shengyi Huang Leandro von Werra Clémentine Fourrier Nathan Habib4 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Zephyr: Direct Distillation of LM Alignment

Lewis Tunstall Edward Beeching Nathan Lambert Nazneen Rajani Kashif Rasul Younes Belkada Shengyi Huang Leandro von Werra Clémentine Fourrier Nathan Habib4 more

Abstract

Build AI with AI

HyperAI Newsletters

Command Palette

Zephyr: Direct Distillation of LM Alignment

Lewis Tunstall Edward Beeching Nathan Lambert Nazneen Rajani Kashif Rasul Younes Belkada Shengyi Huang Leandro von Werra Clémentine Fourrier Nathan Habib4 more

Abstract

Build AI with AI

HyperAI Newsletters

Lewis Tunstall Edward Beeching Nathan Lambert Nazneen Rajani Kashif Rasul Younes Belkada Shengyi Huang Leandro von Werra Clémentine Fourrier Nathan Habib

Lewis Tunstall Edward Beeching Nathan Lambert Nazneen Rajani Kashif Rasul Younes Belkada Shengyi Huang Leandro von Werra Clémentine Fourrier Nathan Habib

Lewis Tunstall Edward Beeching Nathan Lambert Nazneen Rajani Kashif Rasul Younes Belkada Shengyi Huang Leandro von Werra Clémentine Fourrier Nathan Habib