A 280B-parameter Transformer model that was later superseded by the compute-optimal Chinchilla model:contentReference[oaicite:10]{index=10}.