View Single Post
Old 05-04-2024, 00:35   #4
s12a
Senior Member
 
L'Avatar di s12a
 
Iscritto dal: Jan 2008
Messaggi: 11136
Una nuova tecnica da ricercatori Google Deepmind permetterebbe di risparmiare calcoli (e tempo) durante l'inferenza in maniera dinamica a seconda del token da predire.

https://arxiv.org/abs/2404.02258

Quote:
[Submitted on 2 Apr 2024]
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models

Transformer-based language models spread FLOPs uniformly across input sequences. In this work we demonstrate that transformers can instead learn to dynamically allocate FLOPs (or compute) to specific positions in a sequence, optimising the allocation along the sequence for different layers across the model depth. Our method enforces a total compute budget by capping the number of tokens (k) that can participate in the self-attention and MLP computations at a given layer. The tokens to be processed are determined by the network using a top-k routing mechanism. Since k is defined a priori, this simple procedure uses a static computation graph with known tensor sizes, unlike other conditional computation techniques. Nevertheless, since the identities of the k tokens are fluid, this method can expend FLOPs non-uniformly across the time and model depth dimensions. Thus, compute expenditure is entirely predictable in sum total, but dynamic and context-sensitive at the token-level. Not only do models trained in this way learn to dynamically allocate compute, they do so efficiently. These models match baseline performance for equivalent FLOPS and wall-clock times to train, but require a fraction of the FLOPs per forward pass, and can be upwards of 50\% faster to step during post-training sampling.
Twitter thread dove è spiegato in parole semplici: https://twitter.com/TheSeaMouse/stat...Tu5ad4lXOgAtZQ





__________________
CPU Intel i7-12700K ~ Cooler Noctua NH-D15S ~ Motherboard MSI PRO Z690-A WIFI DDR4 ~ RAM Corsair Vengeance LPX 64 GB DDR4-3600
GPU MSI GeForce RTX 3090 GAMING X TRIO 24G ~ SSD SK hynix Platinum P41 2TB + Samsung 990 Pro 4TB
PSU Corsair RM850x ~ Case Fractal Design Define C ~ Display Dell U2412M (A00) + NEC EA231WMi ~ OS
s12a è offline   Rispondi citando il messaggio o parte di esso