Language Modelling On Wiki 40B

Perplexity

Results

Performance results of various models on this benchmark

		Paper Title
Combiner-Fixed-8k	16.60	Combiner: Full Attention Transformer with Sparse Computation Cost
Combiner-Axial-8k	16.49	Combiner: Full Attention Transformer with Sparse Computation Cost
FLASH-Quad-8k	14.998	Transformer Quality in Linear Time

0 of 3 row(s) selected.