View Single Post
Old 04-04-2024, 13:59   #3
s12a
Senior Member
 
L'Avatar di s12a
 
Iscritto dal: Jan 2008
Messaggi: 10922
Non un nuovo paper, ma blandamente correlato a quello di Anthropic dell'altro giorno.

https://arxiv.org/abs/2312.01552
Quote:
[Submitted on 4 Dec 2023]
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning

The alignment tuning process of large language models (LLMs) typically involves instruction learning through supervised fine-tuning (SFT) and preference tuning via reinforcement learning from human feedback (RLHF). A recent study, LIMA (Zhou et al. 2023), shows that using merely 1K examples for SFT can achieve significant alignment performance as well, suggesting that the effect of alignment tuning might be "superficial." This raises questions about how exactly the alignment tuning transforms a base LLM.
We analyze the effect of alignment tuning by examining the token distribution shift between base LLMs and their aligned counterpart. Our findings reveal that base LLMs and their alignment-tuned versions perform nearly identically in decoding on the majority of token positions. Most distribution shifts occur with stylistic tokens. These direct evidence strongly supports the Superficial Alignment Hypothesis suggested by LIMA.

Based on these findings, we rethink the alignment of LLMs by posing the research question: how effectively can we align base LLMs without SFT or RLHF? To address this, we introduce a simple, tuning-free alignment method, URIAL. URIAL achieves effective alignment purely through in-context learning (ICL) with base LLMs, requiring as few as three constant stylistic examples and a system prompt. We conduct a fine-grained and interpretable evaluation on a diverse set of examples, named JUST-EVAL-INSTRUCT. Results demonstrate that base LLMs with URIAL can match or even surpass the performance of LLMs aligned with SFT or SFT+RLHF. We show that the gap between tuning-free and tuning-based alignment methods can be significantly reduced through strategic prompting and ICL. Our findings on the superficial nature of alignment tuning and results with URIAL suggest that deeper analysis and theoretical understanding of alignment is crucial to future LLM research.
In pratica, è già noto che i modelli base possono essere "allineati" semplicemente fornendo qualche esempio simile a risposte "reali", ottenendo prestazioni competitive od in alcuni casi superiori a quelli dei modelli chat. Quindi, quello che Anthropic considera come "jailbreaking" in realtà nella pratica può essere semplicemente considerato come allineamento alle preferenze dell'utente via in-context learning (ICL). E, rispetto al finetuning vero e proprio, non richiede particolari risorse computazionali, quindi anche modelli di grande dimensione come Llama-2-70B o Mixtral 8x7B, possono facilmente diventare potenti chatbot senza particolari limitazioni.
__________________
CPU Intel i7-12700K ~ Cooler Noctua NH-D15S ~ Motherboard MSI PRO Z690-A WIFI DDR4 ~ RAM Corsair Vengeance LPX 64 GB DDR4-3600
GPU MSI GeForce RTX 3090 GAMING X TRIO 24G ~ SSD SK hynix Platinum P41 2TB + Samsung 980 Pro 1TB
PSU Corsair RM850x ~ Case Fractal Design Define C ~ Display Dell U2412M (A00) + NEC EA231WMi ~ OS

Ultima modifica di s12a : 04-04-2024 alle 14:03.
s12a è offline   Rispondi citando il messaggio o parte di esso