View Single Post
Old 10-04-2024, 11:29   #10
s12a
Senior Member
 
L'Avatar di s12a
 
Iscritto dal: Jan 2008
Messaggi: 11136
Le architetture RNN (Recurrent Neural Network), le antenate dell'attuale Transformer, in qualche variante riescono a competere con o addirittura migliorare quest'ultima. RWKV è un esempio di particolare interesse, ed oggi è stato rilasciato un paper che descrive in dettaglio le migliorie apportate nelle ultime versioni 5 e 6.

RWKV è anche una serie di LLM open source, non solo open weight.

https://arxiv.org/abs/2404.05892

Quote:
[Submitted on 8 Apr 2024]
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer


__________________
CPU Intel i7-12700K ~ Cooler Noctua NH-D15S ~ Motherboard MSI PRO Z690-A WIFI DDR4 ~ RAM Corsair Vengeance LPX 64 GB DDR4-3600
GPU MSI GeForce RTX 3090 GAMING X TRIO 24G ~ SSD SK hynix Platinum P41 2TB + Samsung 990 Pro 4TB
PSU Corsair RM850x ~ Case Fractal Design Define C ~ Display Dell U2412M (A00) + NEC EA231WMi ~ OS

Ultima modifica di s12a : 10-04-2024 alle 11:37.
s12a è offline   Rispondi citando il messaggio o parte di esso