[Thread Ufficiale] Aspettando Bulldozer *leggere prima pagina con attenzione* - Pagina 12

bjt2 · 16-11-2009, 19:33

Quote:

Originariamente inviato da capitan_crasy

Grande bjt2, appena ho un po di tempo ti metto in prima pagina...

una cosa volevo chiederti:
Corsini nel suo articolo dice che AMD con Bulldozer ha prediletto i calcoli interi piuttosto che quelli in virgola mobile, lasciando quest'ultimi tra i due cluster integer, oppure dedicate specificamente ad uno dei due core per ogni ciclo di clock.
Che ne pensi? mi sembra un pò troppo limitante... (premetto che la mia competenza in questi argomenti è sotto al noviziato

)

In effetti se vedi, sono aumentate le unità integer per thread (4 contro 3, ammesso che queste pipeline integer siano del tipo ALU+AGU, e sembra di si), ma sono diminuite le unità FPU per thread: ossia 1 contro 3. C'è da dire che le 3 unità del K10 non sono generiche, e queste due del buldozer sono addirittura delle FMA (possono fare in una sola volta una fused multiply add), in più sono due unità FP per due thread, che è diverso che dire una unità FP per thread: se un thread è poco FPU intensive, l'altro thread si può pappare tutta la potenza di queste due FPU vitaminizzate (più potenti di quelle del K10 e anche del Nehalem). Quindi sulla carta la potenza FPU è diminuita, ma in pratica, essendo le unità più generiche è da vedere. E siccome in una vecchia slide AMD prometteva un incremento di potenza FPU maggiore di quella intera, posso ipotizzare che le svariate limitazioni dello scheduler FPU del K10 facevano sotto esprimere la potenza della FPU. Ora ci sono meno unità FP, ma sono generiche e più potenti: quindi la potenza di picco magari è inferiore, ma quello che conta è la potenza effettiva...

bjt2 · 16-11-2009, 19:43

Quote:

Originariamente inviato da Pihippo

Ciao
Se posso permettermi, potrei parlarne io. Premetto che possono essere solo castronerie:
Dunque quello che ha detto corsini non è proprio vero. Infatti la unità Fp è composta due parti speculari con ampiezza a 128bit. Se nel thread che sta eseguendo la cpu sono presenti vettori a 256bit allora il thread che richiede il vettore monopolizzerà la Fp per un ciclo di clock per eseguire l'operazione, da come si evince nella slide, l'unità amd è un FMAC, ovvero in grado di eseguire oltre a moltiplicazioni, addizioni, ed altre operazioni aritmetiche anche operazioni con la memoria (MOV) conversioni ed altro. Questo implica che le unità da 128bit siano dual ported ognuna verso la cache, dunque ognuna di essa può eseguire 2 operazioni per ciclo di clock. Questo approccio nasce dal fatto che le unità FP nel software corrente sono sottoutilizzate, essendo la maggioranza di istruzioni di tipo integer e sopratutto istruzioni che lavorano con la memoria( e speriamo che vi siano 4 agu). Quindi per farti un esempio un k10 può eseguire nella stragrande maggioranza dei casi possibili, NB non avviene quasi mai per il motivo detto sopra sulla natura del codice, 2 operazioni aritmetiche FP + un operazione non aritmetica. Buldozzer essendo strutturato cosi portebbe eseguire, essendo lo scheduler FP dedicato tra 2 core, 4 operazioni FP, 2 aritmetiche e due non, con conseguenza miglior sfruttamento delle risorse.
Come detto primase ho detto cassate correggetemi.

Io penso che queste pipeline FP possano sempre eseguire una istruzione a 128 bit per ciclo ciascuna. E penso anche una sola istruzione vecchio stampo (per dire FPU a 80 bit) per ciclo. L'unico modo per avvantaggiarsi dei 128 bit è usare le SSE, in modo da fare 2x64 o 4x32 operazioni FP per ciclo. Putroppo anche le istruzioni SSE intere nelle vecchie architetture usavano le unità FP per il semplice fatto che le vecchie istruzioni MMX usavano i registri FP per memorizzare i dati. Teoricamente le istruzioni SSE (che fanno uso di altri registri), almeno quelle intere, avrebbero potuto essere spostate verso le ALU intere, ma il problema è che i registri SSE devono poter essere acceduti anche dalla FPU. Per questo motivo si è sempre rinunciato a questa ottimizzazione: ossia tutte le istruzioni SSE, comprese quelle intere, vengono eseguite ATTUALMENTE dalle FPU. Lo svantaggio di questa architettura a cluster è che ora se si volesse fare questa ottimizzazione, sarebbe ancora più difficile, perchè tutte le unità intere di entrambi i core dovrebbero poter accedere ai registri SSE e le unità FP dovrebbero avere due banchi separati per i due cores... Se prima il dirottamento delle SSE intere era fattibile, ora non credo che sia più economicamente fattibile. Lieto di essere smentito, ovviamente, perchè le unità FP ora saranno più sotto pressione e liberarle dall'onere delle istruzioni SSEx intere sarebbe solo un vantaggio.

bjt2 · 16-11-2009, 19:47

Quote:

Originariamente inviato da Ren

Sono dubbioso sul load & store address e sulla LSU.

Quali saranno le unità che calcoleranno l'indirizzo delle cache ?

Vedendo 4 pipeline intere, mi viene da pensare ad una composizione 4alu + 4 agu 64bit indipendenti o concorrenti (stile P6), che al occorrenza calcolano insieme un indirizzo a 256bit.
Sparo sta boiata, perchè lo schema del bobcat specifica espressamente le unità alu e L&S, invece nel bulldozer c'è un generico integer pipe.

Queste unità dovranno manipolare 256 bit di dati verso ogni singola cache, quindi non si potranno condividere le unità address dei due cluster,perchè ogni indirizzo fa capo ad una cache L1 data, tranne che le due cache siano in mirroring, cioè condividano gli stessi dati.

La LSU di ogni cluster dovrà avere almeno due porte a 128bit in lettura o scrittura.

Ovviamente, sempre se le AVX 256 saranno eseguite in un ciclo.

Scusate se ho sparato delle amenità...

Attualmente la cache L1 del K10 ha 2 porte da 128 bit. Aumentarle a 256 bit è possibile. Ma bisogna vedere se lo hanno fatto, perchè un load AVX da 256 bit in un solo clock è già possibile farlo con il K10. E' la scrittura che non sarebbe possibile farla nell'attuale k10. E' probabile che il bus sia rimasto 2x128 bit, ma abbiano esteso anche la scrittura a 2x128 bit. Espanderlo a 2x256 bit per poi essere sottoutilizzato nel 99% dei casi (quando non c'è un load o store a 256 bit) non mi sembra il caso...

Ren · 16-11-2009, 21:14

Nuove speculazioni dai forum indicano delle nuove unità funzionali capaci di manipolare indirizzi o calcoli aritmetici(unità ibride), quindi non più unità o porte distinte per gli indirizzi.
In breve, se ho capito bene si parla di un totale di 8-alu o 8address, NON contemporaneamente.

Ren · 16-11-2009, 21:29

--------------------

Ren · 16-11-2009, 21:29

Altre novità che ancora non ho letto.

Pare circoli dello scritto sulle origini del Bulldozer direttamente da uno degli ingegneri ideatori del clustered core.

Eccovi il tutto:

Andy Krazy Glew

Newsgroups: comp.arch
From: "Andy \"Krazy\" Glew" <[email protected]>
Date: Sat, 14 Nov 2009 22:50:01 -0800
Local: Sun, Nov 15 2009 7:50 am
Subject: Re: Bulldozer details + bobcat
Reply | Reply to author | Forward | Print | View thread | Show original | Report this message | Find messages by this author

> Bulldozer details + bobcat

BRIEF:

AMD's Bulldozer is an MCMT (MultiCluster MultiThreaded)
microarchitecture. That's my baby!

DETAIL:

Thursday was both a very good day and a very bad day for me. Good,
because my MCMT ideas finally seem to be going into a product. Bad,
because I ended up driving 4 hours from where I work with IV in the
Seattle area back to Portland, to my wife who was taken to a hospital
emergency room. The latter is personal. The former is, well, personal
too, but also professional.

I can't express how good it feels to see MCMT become a product. It's
been public for years, but it gets no respect until it is in a product.
It would have been better if I had stayed at Intel to see it through.
I know that I won't get any credit for it. (Except from some of the guys
who were at AMD at the time.) But it feels good nevertheless.

The only bad thing is that some guys I know at AMD say that Bulldozer is
not really all that great a product, but is shipping just because AMD
needs a model refresh. "Sometimes you just gotta ship what you got." If
this is so, and if I deserve any credit for CMT, then I also deserve
some of the blame. Although it might have been different, better, if I
had stayed.

I came up with MCMT in 1996-2000 while at the University of Wisconsin.
It became public via presentations.

I brought MCMT back to Intel in 2000, and to AMD in 2002.

I was beginning to despair of MCMT ever seeing the light of day. I
thought that when I left AMD in 2004, the MCMT ideas may have left with
me. Apparently not. I must admit that I am surprised to see that the
concept endured so many years - 5+ years after I left, 7+ years to
market. Apparently they didn't have any better ideas.

True, there were rumors. For example, Chuck Moore presented a slide
with Multicluster Multithreading on it to analysts in 2004 or 2005. But
things went quiet. There were several patents filed, with diagrams that
looked very much like the ones I drew for the K10 proposal. But, one
often sees patent applications for cancelled projects.

Of course, AMD has undoubtedly changed and evolved MCMT in many ways
since I first proposed it to them. For example, I called the set of an
integer scheduler, integer execution units, and an L1 data cache a
"cluster", and the whole thing, consisting of shared front end, shared
FP, and 2 or more clusters, a processor core. Apparently AMD is calling
my clusters their cores, and my core their cluster. It has been
suggested that this change of terminology is motivated by marketing, so
that they can say they have twice as many cores.

My original motivation for MCMT was to work around some of the
limitations of Hyperthreading on Willamette. E.g. Willamette had a very
small L0 data cache, 4K in some of the internal proposals, although it
shipped at 8K. Two threads sharing such a tiny L0 data cache thrash.
Indeed, this is one of the reasons why hyperthreading is disabled on
many systems, including many current Nhm based machines with much larger
closest-in caches.

At the time, the small L0s were a given. You couldn't build a
Willamette style "fireball" high frequency machine, and have a much
bigger cache, and still preserve the same small cache latency.

To avoid threads thrashing each other, I wanted to give each thread
their own L0. But, you can't do so, and still keep sharing the
execution units and scheduler - you can't just build a 2X larger array,
or put two arrays side by side, and expect to have the same latency.
Wires. Therefore, I had to replicate the execution units, and enough of
the scheduler so that the "critical loop" of Scheduler->Execution->Data
Cache was all isolated from the other thread/cluster. Hence, the form
of multi-cluster multi-threading you see in Bulldozer.

True, there are differences, and I am sure more will become evident as
more Bulldozer information becomes public. For example, although I came
up with MCMT to make Willamette-style threading faster, I have always
wanted to put SpMT, Speculative Multithreading, on such a substrate.
SpMT has potential to speed up a single thread of execution, by
splitting it up into separate threads and running the separate threads
on different clusters, whereas Willamette-style hyperthreading, and
Bulldizer-style MCMT (apparently), only speed up workloads that have
existing independent threads. I still want to build SpMT. My work at
Wisconsin showed that SpMT on a Willamette substrate was constrained by
Willamette's poor threading microarchitecture, so naturally I had to
first create the best explicit threading microarchitecture I could, and
then run SpT on top of it.

If I received arows in my back for MCMT, I received 10 times as many
arrows for SpMT. And yet still I have hope for it. Unfortunately, I am
not currently working on SpMT. Haitham Akkary, the father of DMT,
continues the work.

I also tried, and still continue, to explore other ways of speeding up
single threads using multiple clusters.

Although I remain an advocate of SpMT, I have always recognized the
value of MCMT as an explicit threaded microarchitecture.

Perhaps I should say here that my MCMT had a significant difference from
clustering in, say, the Alpha 21264,
http://www.hotchips.org/archives/hc10/2 ... 10.1.1.pdf
Those clusters bypass to each other: there is a fast bypass within a
cluster, and a slightly slower (+1 cycle) bypass of results between
clusters. The clusters are execution units only, and share the data
cache. This bypassing makes it easy (or at least easier) to spread a
single thread across both clusters. My MCMT clusters, on the other
hand, do NOT bypass to each other. This motivates separate threads per
cluster, whether explicit or implicit.

I have a whole taxonomy of different sorts of clustering:
* fast vs slow bypass clusters
* fully bypassed vs. partially bypassed
* mechanisms to reduce bypassing
* physical layout of clusters
* bit interleaved datapaths
* datapaths flowing in opposite directions,
with bypassing where they touch
* what's in the cluster
* execute only
* execute + data cache
* schedule + execute + data cache
* renamer + schedule + execute + datacache
...
* what gets shared between clusters
* front-end
* renamer?
* data-cache - L0? L1? L2?
* TLBs...
* MSHRs...
* FP...

Anyway: if it has an L0 or L1 data cache in the cluster, with or
without the scheduler, it's my MCMT. If no cache in the cluster, not
mine (although I have enumerated many such possibilities).

Motivated by my work to use MCMT to speed up single threads, I often
propose a shared L2 instruction scheduler, to load balance between the
clusters dynamically. Although I admit that I only really figured out
how to do that properly after I left AMD, and before I joined Intel.
How to do this is part of the Multi-star microarchitecture, M*, that is
my next step beyond MCMT.

Also, although it is natural to have a single (explicit) thread per
cluster in MCMT, I have also proposed allowing two threads per cluster.
Mainly motivated by SpMT: I could fork to a "runt thread" running in
tghe same cluster, and then migrate the run thread to a different
cluster. Intra-cluster forking is faster than inter-cluster forkng, and
does not disturb the parent thread.
But, if you are not doing SpMT, there is much less motivation for
multiple threads per cluster. I would not want to do that unless I was
also trying to build a time-switched lightweight threading system.
Which, as you can imagine if you know me, I have also proposed. In
fact, I hope to go to the SC'09 Workshop on that topic.

I will be quite interested to see whether Bulldozer's cluster-private L1
caches (in AMD's swapped terminology, core-private L1 caches) are write
through or write-back. Willamette's L0 was write-through. I leaned
towards write-back, because my goal was to isolate clusters from each
other, to reduce thrashing. Also, because write-back lends itself
better to a speculative versionong cache, useful for SpMT.

With Willamette as background, I leaned towards a relatively small, L0,
cache in the cluster. Also, such a small L0 can often be pitch-matched
with the cluster execution unit datapath. A big L1, such as Bulldozer
seems to have, nearly always has to lie out of the datapath, and
requires wire turns. Wire turns waste area. I have, from time to time,
proposed putting the alignment muxes and barrel shifters in the wire
turn area. I'm surprised that a large cluster L1 makes sense, but that's
the sort of thing that you can only really tell from layout.

Some posters have been surprised by sharing the FP. Of course, AMD's K7
design, with separate clusters for integer and FP, was already half-way
there. They only had to double the integer cluster. It would have been
harder for Intel to go MCMT, since the P6 family had shared integer and
FP. Willamette might have been easier to go MCMT, since it had separate FP.

Anyway... of course, for FP threads you might like to have
thread-private FP. But, in some ways, it is the advent of expensve FP,
like Bulldozer's 2 sets of 128 bit, 4x32 bit, FMAs, that justify integer
MCMT: the FP is so big that the overhead of replicating the integer
cluster, including the OOO logic, is a drop in the bucket.
You'd like to have per-cluster-thread FP, but such big FP workloads are
often so memory intensive that they thrash the shared-between-clusters
L2 cache: threading may be disabled anyways. As it is, you get good
integer threads via MCMT, and you get 1 integer thread and 1 FP thread.
Two FP threads may have some slowdown, although, again, if memory
intensive they may be blocking on memory, and hence allowing the other
FP thread t use the FP. But two purely computational FP threads will
almost undoubtedly block, unless the schedulers are piss-poor and can't
use all of the FP for a single thread (e.g. by being too small).

I certainly want to explore possibilities such as SpMT and other single
thread speedups. But I know that you can't build all the neat ideas in
one project. Apparently MCMT by itself was enough for AMD Bulldozer.
(Actually, I am sure that there are other new ideas in Bulldozer. Just
apparently not SpMT or spreading a single thread across clusters.) Look
at the time-lag: 10-15 years from when I came up with MCMT in
Wisconsin, 1996-2000. It is now 7-5 years from when I was at AMD,
2002-2004, and it will be another 2 years or so before Bulldozer is a
real force in the marketplace.

I don't expect to get any credit for MCMT. In fact, I'm sure I'm going
to get shit for this post. I don't care. I know. The people who were
there, who saw my presentations and read my proposals, know. But, e.g.
Chuck Moore wasn't there at start; he came in later. Even Mike Haertel,
my usual collaborator, wasn't there; he was hired in later, although
before Chuck. Besides, Mike Haertel thinks that MCMT is obvious.
That's cool, although I ask: if MCMT is obvious, then why isn't Intel
doing it? Companies like Intel and AMD need idea generating people like
me about once every 10 years. In between, they don't need new ideas.
They need new incremental improvements of existing ideas.

Anyway... It's cool to see MCMT becoming real. It gives me hope that my
follow-on to MCMT, M* may still, eventually, also become real.

^[H3ad-Tr1p]^ · 17-11-2009, 13:55

up

checo · 17-11-2009, 15:02

sto qua mi sa uno col dente avvelenato.
però dice non ci sarà il reverse iperthreading, sprem di no

mack.gar · 17-11-2009, 17:15

Quote:

Originariamente inviato da Pihippo

Ciao
Premetto che non sono un esperto.
Dunque il core buldozzer dovrebbe essre composto agli effetti da 2 cluster di integers= 4 int pipe + 4 Address generator unit. Quindi tecnicamente un solo core di Buldozzer dovrebbe essere 4-way-wide. Detto ciò, l'ilp è difficile estrarlo con un architettura a 3 vie, semplicemente perchè dovresti sprecare un bel pò di transistor in logica di fowarding delle istruzioni, in scheduler e R.O.B che non sbagliano mai nell'allocare negli issue slot corretti le operazioni, ed infatti l'icu del k10(che è un grosso reorder buffer in verità) sbaglia e parecchio poichè sebbene vi siano 72 entrate a livello di decoder amd si ha che una istruzione x86= una M(acro)op amd = op logica+op matematica+op memoria, quindi dividi per 3 ed ottieni 24 m(icro)op (che è il numero di operazioni che contiene lo scheduler int comunque) ora possiedi 3 pipeline int che non sono totalmente simmetriche, l'adder è in fatti presente solo sulla pipe 1 e capisci come è facile sbagliare ad smistare le operazioni (oprazioni che vanno in una pipe non corretta od una che ha una coda piena) senza contare le varie ed eventuali pipeline bubbles... Buldozzer da quanto visto nella slide ha scheduler dedicati per ogni cluster, da ciò si potrebbe dedurre che le possibilità di OoO sono maggiori poichè lo scheduler si occupa di smistare solo a 2 pipelines(e si spera siano simmetriche)
Inoltre il vero rompimento di cogl.. nell'isa x86(che un pò è stato risolto da amd con l'estensioni 86-x64) sono le operazioni di memoria e l'addressing model, esse sono onniscenti, in un giochino circa il 55% di tutte le operazioni eseguite sono Load e Store ed in caso di chache miss, un costosissimo (anche per chi ha un mem controller perchè un 80 di cicli se ne va aspettando gli operandi) DMA e richiesta alla ram (che non è fulminea), quindi il bottleneck primario di quasi tutte le cpu è proprio la memoria. Ciò si può parzialmente risolvere parallelizzando le Dma(ovvero quando parte la request alla cache conteporaneamente si cerca di far partire dma in modo di nacrondere un pò le latenze) riordinando scritture e letture (che comunque non è sempre facile metti che devi scrivere su un address già aliased da un'altra scrittura?) e comunque comporta spese di transistor in termini di logica (i penryn possono riordinare scritture che precedono letture poichè hanno un algoritmo che specula sull'aliasing degli indirizzi, i k10 sono più conservativi e spendono il ciclo di clock per calcolare prima l'indirizzo e poi riordinare).
Ed ecco qui la frittate, scusatemi di castronerie varie.

Ciao, nemmeno io sono un esperto, seguo l'evoluzione delle cpu da appassionato da diversi anni ormai, ed è da un pò che leggo i vari thread qui su hw, sempre molto interessanti.
Sono d'accordo in via generale con quello che dici eccetto che Bulldozzer non è 4 issue wide. O meglio ha due cluster int 2 issue wide (sempre che siano simili al k10, se no chissà...). Le cache L1d sono separate, quindi è improbabile

che i due cluster int cooperino nell'esecuzione del medesimo thread ( il leggendario reverse hyperthreading).
Un'architettura a 3 vie non viene mai appieno sfruttata, come dici, tuttavia se si sono dati tanto da fare a farle (addirittura i core due sono 4-wide anche se con molti se e ma) deve esserne valsa la pena utilizzare tutti quei transistor in più. Imho AMD ha pensato che con un'architettura pesantemente multi-thread ottenga maggiori prestazioni per area utilizzata spostando il baricentro dall'ilp al tlp, almeno imho; Ciao!

capitan_crasy · 17-11-2009, 20:09

Ricordo che AMD sta preparando il suo Turbo mode chiamato "Automatic Proccesor Overclocking" (alcune voci dicono che sarà addirittura introdotto con Thuban)
Il Multi-Threading Technology o SMT, con tutta probabilità, sarà anti HyperThreading di Intel e per ora il Reverse HyperThreading dovrebbe essere legato alle esecuzioni delle istruzioni AVX Intel

Riassunto del buon bjt2

Quote:

Originariamente inviato da bjt2

Finalmente l'architettura ufficiale!

Da queste slide si capiscono le seguenti cose:

- Buldozer potrà avere solo un numero di cores pari.
- Differenze e somiglianze tra HyperThreading INTEL e doppio core AMD:
- Fetch e decode unico per due thread per entrambi gli approcci.
- Unità FP condivisa tra i due thread per entrambi, ma essendo le due unità FP uguali, possono essere accopiate per fare una istruzione a 256 di un solo thread (il mitico reverse HyperThreading), mentre i core INTEL hanno si due unità FP, ma una fa solo addizioni e una solo moltiplicazioni.
- Unità intere SEPARATE per l'architettura AMD. E in più le pipeline sono 4. Se sono uguali a quelle del K8/9/10, ossia formate da una ALU+AGU, allora è un passo in avanti. Se sono 4 ALU come quelle INTEL, allora è un passo indietro, ma comunque rispetto a INTEL è un passo in avanti, perchè i 2 thread si contendono 4 ALU, mentre qui i 2 thread hanno il proprio set di 4 ALU dedicate.
- Cache L1 dati (e unità load store) separate: grande vantaggio AMD
- Cache L2 condivisa: siamo pari.

Quindi il pubblicizzare questo cluster di due core, come due core separati è giustificato.
Le unità FP nel K8/9/10 sono 3, ma non sono generiche. Qui abbiamo 2 unità FP a 128 bit generiche. Quindi lo scheduling dovrebbe essere semplificato.

Pihippo · 18-11-2009, 16:16

Quote:

Originariamente inviato da mack.gar

Ciao, nemmeno io sono un esperto, seguo l'evoluzione delle cpu da appassionato da diversi anni ormai, ed è da un pò che leggo i vari thread qui su hw, sempre molto interessanti.
Sono d'accordo in via generale con quello che dici eccetto che Bulldozzer non è 4 issue wide. O meglio ha due cluster int 2 issue wide (sempre che siano simili al k10, se no chissà...). Le cache L1d sono separate, quindi è improbabile

che i due cluster int cooperino nell'esecuzione del medesimo thread ( il leggendario reverse hyperthreading).
Un'architettura a 3 vie non viene mai appieno sfruttata, come dici, tuttavia se si sono dati tanto da fare a farle (addirittura i core due sono 4-wide anche se con molti se e ma) deve esserne valsa la pena utilizzare tutti quei transistor in più. Imho AMD ha pensato che con un'architettura pesantemente multi-thread ottenga maggiori prestazioni per area utilizzata spostando il baricentro dall'ilp al tlp, almeno imho; Ciao!

Ciao
Le cache l1d sono separate ma la L1D è unica per ambedue i clusters, ciò a livello teorico dovrebbe implicare che nel fetch dalla L1i ambedue i clusters ricevano le stesse operazioni, ma le eseguano su operandi diversi (oppure uno esegue un pezzetto di codice ed uno un altro) ecco quindi spiegato il motivo di 2 L1D) sarebbe un ipotetico ICU a smistare le operazioni in maniera effciente ad ambedue gli scheduler distribuiti. Almeno questo è quello che ho capito io..
Riguardo all'efficienza dei core2, beh stando ad alcuni che ne capiscono più di me l'ipc medio è superiore del 10% ad i K8 http://www.realworldtech.com/page.cf...2808015436&p=9 ed ipoteticamente uguali ad i K10 tranne nelle applicazioni memory intensive, come mai ci sono alcune differenze lo dovresti chiedere ad i maghi del compiler...
Inoltre se vai a vedere l'ipc medio e di ambedue 1.1 1.2 istruzioni per ciclo di clock. I core2quad per spremere 4 istruzioni a ciclo richiedono di applicazioni interamente scritte per loro visto che soffrono di inefficienze di scheduler(Xbitlabs ha un articolo a proposito se non erro). Quindi i transistor spesi per una architettura 4-wide sono perfettamente, a mia opinione quindi a valore 0, inutili se si richiede programmazione ad hoc, che poi se vai a vedere il dispendio maggiore è tra cache e logica di controllo che si pappano più del 80%del die size.......

MonsterMash · 19-11-2009, 01:48

Non so se ci sia qualcosa di nuovo, e a dire il vero non so neanche se sia reale o fake, ma ho trovato questo su un NG:

Quote:

Andy Krazy Glew

Newsgroups: comp.arch
From: "Andy \"Krazy\" Glew" <[email protected]>
Date: Sat, 14 Nov 2009 22:50:01 -0800
Local: Sun, Nov 15 2009 7:50 am
Subject: Re: Bulldozer details + bobcat
Reply | Reply to author | Forward | Print | View thread | Show original |
Report this message | Find messages by this author

> Bulldozer details + bobcat

BRIEF:

AMD's Bulldozer is an MCMT (MultiCluster MultiThreaded)
microarchitecture. That's my baby!

DETAIL:

Thursday was both a very good day and a very bad day for me. Good,
because my MCMT ideas finally seem to be going into a product. Bad,
because I ended up driving 4 hours from where I work with IV in the
Seattle area back to Portland, to my wife who was taken to a hospital
emergency room. The latter is personal. The former is, well, personal
too, but also professional.

I can't express how good it feels to see MCMT become a product. It's
been public for years, but it gets no respect until it is in a product.
It would have been better if I had stayed at Intel to see it through.
I know that I won't get any credit for it. (Except from some of the guys
who were at AMD at the time.) But it feels good nevertheless.

The only bad thing is that some guys I know at AMD say that Bulldozer is
not really all that great a product, but is shipping just because AMD
needs a model refresh. "Sometimes you just gotta ship what you got." If
this is so, and if I deserve any credit for CMT, then I also deserve
some of the blame. Although it might have been different, better, if I
had stayed.

I came up with MCMT in 1996-2000 while at the University of Wisconsin.
It became public via presentations.

I brought MCMT back to Intel in 2000, and to AMD in 2002.

I was beginning to despair of MCMT ever seeing the light of day. I
thought that when I left AMD in 2004, the MCMT ideas may have left with
me. Apparently not. I must admit that I am surprised to see that the
concept endured so many years - 5+ years after I left, 7+ years to
market. Apparently they didn't have any better ideas.

True, there were rumors. For example, Chuck Moore presented a slide
with Multicluster Multithreading on it to analysts in 2004 or 2005. But
things went quiet. There were several patents filed, with diagrams that
looked very much like the ones I drew for the K10 proposal. But, one
often sees patent applications for cancelled projects.

Of course, AMD has undoubtedly changed and evolved MCMT in many ways
since I first proposed it to them. For example, I called the set of an
integer scheduler, integer execution units, and an L1 data cache a
"cluster", and the whole thing, consisting of shared front end, shared
FP, and 2 or more clusters, a processor core. Apparently AMD is calling
my clusters their cores, and my core their cluster. It has been
suggested that this change of terminology is motivated by marketing, so
that they can say they have twice as many cores.

My original motivation for MCMT was to work around some of the
limitations of Hyperthreading on Willamette. E.g. Willamette had a very
small L0 data cache, 4K in some of the internal proposals, although it
shipped at 8K. Two threads sharing such a tiny L0 data cache thrash.
Indeed, this is one of the reasons why hyperthreading is disabled on
many systems, including many current Nhm based machines with much larger
closest-in caches.

At the time, the small L0s were a given. You couldn't build a
Willamette style "fireball" high frequency machine, and have a much
bigger cache, and still preserve the same small cache latency.

To avoid threads thrashing each other, I wanted to give each thread
their own L0. But, you can't do so, and still keep sharing the
execution units and scheduler - you can't just build a 2X larger array,
or put two arrays side by side, and expect to have the same latency.
Wires. Therefore, I had to replicate the execution units, and enough of
the scheduler so that the "critical loop" of Scheduler->Execution->Data
Cache was all isolated from the other thread/cluster. Hence, the form
of multi-cluster multi-threading you see in Bulldozer.

True, there are differences, and I am sure more will become evident as
more Bulldozer information becomes public. For example, although I came
up with MCMT to make Willamette-style threading faster, I have always
wanted to put SpMT, Speculative Multithreading, on such a substrate.
SpMT has potential to speed up a single thread of execution, by
splitting it up into separate threads and running the separate threads
on different clusters, whereas Willamette-style hyperthreading, and
Bulldizer-style MCMT (apparently), only speed up workloads that have
existing independent threads. I still want to build SpMT. My work at
Wisconsin showed that SpMT on a Willamette substrate was constrained by
Willamette's poor threading microarchitecture, so naturally I had to
first create the best explicit threading microarchitecture I could, and
then run SpT on top of it.

If I received arows in my back for MCMT, I received 10 times as many
arrows for SpMT. And yet still I have hope for it. Unfortunately, I am
not currently working on SpMT. Haitham Akkary, the father of DMT,
continues the work.

I also tried, and still continue, to explore other ways of speeding up
single threads using multiple clusters.

Although I remain an advocate of SpMT, I have always recognized the
value of MCMT as an explicit threaded microarchitecture.

Perhaps I should say here that my MCMT had a significant difference from
clustering in, say, the Alpha 21264,
http://www.hotchips.org/archives/hc10/2 ... 10.1.1.pdf
Those clusters bypass to each other: there is a fast bypass within a
cluster, and a slightly slower (+1 cycle) bypass of results between
clusters. The clusters are execution units only, and share the data
cache. This bypassing makes it easy (or at least easier) to spread a
single thread across both clusters. My MCMT clusters, on the other
hand, do NOT bypass to each other. This motivates separate threads per
cluster, whether explicit or implicit.

I have a whole taxonomy of different sorts of clustering:
* fast vs slow bypass clusters
* fully bypassed vs. partially bypassed
* mechanisms to reduce bypassing
* physical layout of clusters
* bit interleaved datapaths
* datapaths flowing in opposite directions,
with bypassing where they touch
* what's in the cluster
* execute only
* execute + data cache
* schedule + execute + data cache
* renamer + schedule + execute + datacache
...
* what gets shared between clusters
* front-end
* renamer?
* data-cache - L0? L1? L2?
* TLBs...
* MSHRs...
* FP...

Anyway: if it has an L0 or L1 data cache in the cluster, with or
without the scheduler, it's my MCMT. If no cache in the cluster, not
mine (although I have enumerated many such possibilities).

Motivated by my work to use MCMT to speed up single threads, I often
propose a shared L2 instruction scheduler, to load balance between the
clusters dynamically. Although I admit that I only really figured out
how to do that properly after I left AMD, and before I joined Intel.
How to do this is part of the Multi-star microarchitecture, M*, that is
my next step beyond MCMT.

Also, although it is natural to have a single (explicit) thread per
cluster in MCMT, I have also proposed allowing two threads per cluster.
Mainly motivated by SpMT: I could fork to a "runt thread" running in
tghe same cluster, and then migrate the run thread to a different
cluster. Intra-cluster forking is faster than inter-cluster forkng, and
does not disturb the parent thread.
But, if you are not doing SpMT, there is much less motivation for
multiple threads per cluster. I would not want to do that unless I was
also trying to build a time-switched lightweight threading system.
Which, as you can imagine if you know me, I have also proposed. In
fact, I hope to go to the SC'09 Workshop on that topic.

I will be quite interested to see whether Bulldozer's cluster-private L1
caches (in AMD's swapped terminology, core-private L1 caches) are write
through or write-back. Willamette's L0 was write-through. I leaned
towards write-back, because my goal was to isolate clusters from each
other, to reduce thrashing. Also, because write-back lends itself
better to a speculative versionong cache, useful for SpMT.

With Willamette as background, I leaned towards a relatively small, L0,
cache in the cluster. Also, such a small L0 can often be pitch-matched
with the cluster execution unit datapath. A big L1, such as Bulldozer
seems to have, nearly always has to lie out of the datapath, and
requires wire turns. Wire turns waste area. I have, from time to time,
proposed putting the alignment muxes and barrel shifters in the wire
turn area. I'm surprised that a large cluster L1 makes sense, but that's
the sort of thing that you can only really tell from layout.

Some posters have been surprised by sharing the FP. Of course, AMD's K7
design, with separate clusters for integer and FP, was already half-way
there. They only had to double the integer cluster. It would have been
harder for Intel to go MCMT, since the P6 family had shared integer and
FP. Willamette might have been easier to go MCMT, since it had separate FP.

Anyway... of course, for FP threads you might like to have
thread-private FP. But, in some ways, it is the advent of expensve FP,
like Bulldozer's 2 sets of 128 bit, 4x32 bit, FMAs, that justify integer
MCMT: the FP is so big that the overhead of replicating the integer
cluster, including the OOO logic, is a drop in the bucket.
You'd like to have per-cluster-thread FP, but such big FP workloads are
often so memory intensive that they thrash the shared-between-clusters
L2 cache: threading may be disabled anyways. As it is, you get good
integer threads via MCMT, and you get 1 integer thread and 1 FP thread.
Two FP threads may have some slowdown, although, again, if memory
intensive they may be blocking on memory, and hence allowing the other
FP thread t use the FP. But two purely computational FP threads will
almost undoubtedly block, unless the schedulers are piss-poor and can't
use all of the FP for a single thread (e.g. by being too small).

I certainly want to explore possibilities such as SpMT and other single
thread speedups. But I know that you can't build all the neat ideas in
one project. Apparently MCMT by itself was enough for AMD Bulldozer.
(Actually, I am sure that there are other new ideas in Bulldozer. Just
apparently not SpMT or spreading a single thread across clusters.) Look
at the time-lag: 10-15 years from when I came up with MCMT in
Wisconsin, 1996-2000. It is now 7-5 years from when I was at AMD,
2002-2004, and it will be another 2 years or so before Bulldozer is a
real force in the marketplace.

I don't expect to get any credit for MCMT. In fact, I'm sure I'm going
to get shit for this post. I don't care. I know. The people who were
there, who saw my presentations and read my proposals, know. But, e.g.
Chuck Moore wasn't there at start; he came in later. Even Mike Haertel,
my usual collaborator, wasn't there; he was hired in later, although
before Chuck. Besides, Mike Haertel thinks that MCMT is obvious.
That's cool, although I ask: if MCMT is obvious, then why isn't Intel
doing it? Companies like Intel and AMD need idea generating people like
me about once every 10 years. In between, they don't need new ideas.
They need new incremental improvements of existing ideas.

Anyway... It's cool to see MCMT becoming real. It gives me hope that my
follow-on to MCMT, M* may still, eventually, also become real.

Il mittente dicono sia un ingegnere amd.

kaoss · 19-11-2009, 02:37

grande thread,
interessantissimo
iscritto

greeneye · 19-11-2009, 03:18

Quote:

Originariamente inviato da MonsterMash

Non so se ci sia qualcosa di nuovo, e a dire il vero non so neanche se sia reale o fake, ma ho trovato questo su un NG:

Il mittente dicono sia un ingegnere amd.

ex ingegnere di intel prima e di amd poi.

checo · 19-11-2009, 08:32

Quote:

Originariamente inviato da greeneye

ex ingegnere di intel prima e di amd poi.

sti ex ing... mi ricordo i famosi ex ing. 3dfx che custodivano tecnologie aliene

capitan_crasy · 19-11-2009, 09:37

Quote:

Originariamente inviato da MonsterMash

Non so se ci sia qualcosa di nuovo, e a dire il vero non so neanche se sia reale o fake, ma ho trovato questo su un NG:

Il mittente dicono sia un ingegnere amd.

Già stato postato da Ren...

Clicca qui...

bjt2 · 19-11-2009, 11:42

Quote:

Originariamente inviato da Ren

Altre novità che ancora non ho letto.

Pare circoli dello scritto sulle origini del Bulldozer direttamente da uno degli ingegneri ideatori del clustered core.

Eccovi il tutto:

Andy Krazy Glew

CUTTONE

In breve lui ha avuto l'idea del CMT nel 1996-2000, ha presentato le sue idee quando era in AMD nel 2000-2002 ed è quasi sicuro che sono state fatte aggiunte alle sue idee.

astroimager · 22-11-2009, 23:58

Quindi... ho capito male, o la ver. desktop di Bulldozer sarà compatibile con il socket AM3?

I prossimi chipset serie 800 saranno sufficienti per supportare la nuova architettura?

astroimager · 23-11-2009, 00:21

Quote:

Originariamente inviato da Ratatosk

Penso che se così sarà ce lo diranno in tutte le salse, anche perché Bulldozer ormai è definito, quindi già lo sapranno per certo.

Beh, il max sarebbe la retrocompatibilità con la serie 700, anche con alcune limitazioni (come era per il socket AM2)... chissà se sopravviveranno le attuali AM3 di punta...

Sensi · 23-11-2009, 00:34

Quote:

Originariamente inviato da astroimager

Beh, il max sarebbe la retrocompatibilità con la serie 700, anche con alcune limitazioni (come era per il socket AM2)... chissà se sopravviveranno le attuali AM3 di punta...

http://www.hwupgrade.it/forum/showpo...&postcount=166

16-11-2009, 21:14	#224
Ren Senior Member Iscritto dal: Apr 2003 Città: Roma Messaggi: 3237	Nuove speculazioni dai forum indicano delle nuove unità funzionali capaci di manipolare indirizzi o calcoli aritmetici(unità ibride), quindi non più unità o porte distinte per gli indirizzi. In breve, se ho capito bene si parla di un totale di 8-alu o 8address, NON contemporaneamente. Ultima modifica di Ren : 17-11-2009 alle 14:13.

16-11-2009, 21:29	#226
Ren Senior Member Iscritto dal: Apr 2003 Città: Roma Messaggi: 3237	Altre novità che ancora non ho letto. Pare circoli dello scritto sulle origini del Bulldozer direttamente da uno degli ingegneri ideatori del clustered core. Eccovi il tutto: Andy Krazy Glew Newsgroups: comp.arch From: "Andy \"Krazy\" Glew" <[email protected]> Date: Sat, 14 Nov 2009 22:50:01 -0800 Local: Sun, Nov 15 2009 7:50 am Subject: Re: Bulldozer details + bobcat Reply \| Reply to author \| Forward \| Print \| View thread \| Show original \| Report this message \| Find messages by this author > Bulldozer details + bobcat BRIEF: AMD's Bulldozer is an MCMT (MultiCluster MultiThreaded) microarchitecture. That's my baby! DETAIL: Thursday was both a very good day and a very bad day for me. Good, because my MCMT ideas finally seem to be going into a product. Bad, because I ended up driving 4 hours from where I work with IV in the Seattle area back to Portland, to my wife who was taken to a hospital emergency room. The latter is personal. The former is, well, personal too, but also professional. I can't express how good it feels to see MCMT become a product. It's been public for years, but it gets no respect until it is in a product. It would have been better if I had stayed at Intel to see it through. I know that I won't get any credit for it. (Except from some of the guys who were at AMD at the time.) But it feels good nevertheless. The only bad thing is that some guys I know at AMD say that Bulldozer is not really all that great a product, but is shipping just because AMD needs a model refresh. "Sometimes you just gotta ship what you got." If this is so, and if I deserve any credit for CMT, then I also deserve some of the blame. Although it might have been different, better, if I had stayed. I came up with MCMT in 1996-2000 while at the University of Wisconsin. It became public via presentations. I brought MCMT back to Intel in 2000, and to AMD in 2002. I was beginning to despair of MCMT ever seeing the light of day. I thought that when I left AMD in 2004, the MCMT ideas may have left with me. Apparently not. I must admit that I am surprised to see that the concept endured so many years - 5+ years after I left, 7+ years to market. Apparently they didn't have any better ideas. True, there were rumors. For example, Chuck Moore presented a slide with Multicluster Multithreading on it to analysts in 2004 or 2005. But things went quiet. There were several patents filed, with diagrams that looked very much like the ones I drew for the K10 proposal. But, one often sees patent applications for cancelled projects. Of course, AMD has undoubtedly changed and evolved MCMT in many ways since I first proposed it to them. For example, I called the set of an integer scheduler, integer execution units, and an L1 data cache a "cluster", and the whole thing, consisting of shared front end, shared FP, and 2 or more clusters, a processor core. Apparently AMD is calling my clusters their cores, and my core their cluster. It has been suggested that this change of terminology is motivated by marketing, so that they can say they have twice as many cores. My original motivation for MCMT was to work around some of the limitations of Hyperthreading on Willamette. E.g. Willamette had a very small L0 data cache, 4K in some of the internal proposals, although it shipped at 8K. Two threads sharing such a tiny L0 data cache thrash. Indeed, this is one of the reasons why hyperthreading is disabled on many systems, including many current Nhm based machines with much larger closest-in caches. At the time, the small L0s were a given. You couldn't build a Willamette style "fireball" high frequency machine, and have a much bigger cache, and still preserve the same small cache latency. To avoid threads thrashing each other, I wanted to give each thread their own L0. But, you can't do so, and still keep sharing the execution units and scheduler - you can't just build a 2X larger array, or put two arrays side by side, and expect to have the same latency. Wires. Therefore, I had to replicate the execution units, and enough of the scheduler so that the "critical loop" of Scheduler->Execution->Data Cache was all isolated from the other thread/cluster. Hence, the form of multi-cluster multi-threading you see in Bulldozer. True, there are differences, and I am sure more will become evident as more Bulldozer information becomes public. For example, although I came up with MCMT to make Willamette-style threading faster, I have always wanted to put SpMT, Speculative Multithreading, on such a substrate. SpMT has potential to speed up a single thread of execution, by splitting it up into separate threads and running the separate threads on different clusters, whereas Willamette-style hyperthreading, and Bulldizer-style MCMT (apparently), only speed up workloads that have existing independent threads. I still want to build SpMT. My work at Wisconsin showed that SpMT on a Willamette substrate was constrained by Willamette's poor threading microarchitecture, so naturally I had to first create the best explicit threading microarchitecture I could, and then run SpT on top of it. If I received arows in my back for MCMT, I received 10 times as many arrows for SpMT. And yet still I have hope for it. Unfortunately, I am not currently working on SpMT. Haitham Akkary, the father of DMT, continues the work. I also tried, and still continue, to explore other ways of speeding up single threads using multiple clusters. Although I remain an advocate of SpMT, I have always recognized the value of MCMT as an explicit threaded microarchitecture. Perhaps I should say here that my MCMT had a significant difference from clustering in, say, the Alpha 21264, http://www.hotchips.org/archives/hc10/2 ... 10.1.1.pdf Those clusters bypass to each other: there is a fast bypass within a cluster, and a slightly slower (+1 cycle) bypass of results between clusters. The clusters are execution units only, and share the data cache. This bypassing makes it easy (or at least easier) to spread a single thread across both clusters. My MCMT clusters, on the other hand, do NOT bypass to each other. This motivates separate threads per cluster, whether explicit or implicit. I have a whole taxonomy of different sorts of clustering: * fast vs slow bypass clusters * fully bypassed vs. partially bypassed * mechanisms to reduce bypassing * physical layout of clusters * bit interleaved datapaths * datapaths flowing in opposite directions, with bypassing where they touch * what's in the cluster * execute only * execute + data cache * schedule + execute + data cache * renamer + schedule + execute + datacache ... * what gets shared between clusters * front-end * renamer? * data-cache - L0? L1? L2? * TLBs... * MSHRs... * FP... Anyway: if it has an L0 or L1 data cache in the cluster, with or without the scheduler, it's my MCMT. If no cache in the cluster, not mine (although I have enumerated many such possibilities). Motivated by my work to use MCMT to speed up single threads, I often propose a shared L2 instruction scheduler, to load balance between the clusters dynamically. Although I admit that I only really figured out how to do that properly after I left AMD, and before I joined Intel. How to do this is part of the Multi-star microarchitecture, M, that is my next step beyond MCMT. Also, although it is natural to have a single (explicit) thread per cluster in MCMT, I have also proposed allowing two threads per cluster. Mainly motivated by SpMT: I could fork to a "runt thread" running in tghe same cluster, and then migrate the run thread to a different cluster. Intra-cluster forking is faster than inter-cluster forkng, and does not disturb the parent thread. But, if you are not doing SpMT, there is much less motivation for multiple threads per cluster. I would not want to do that unless I was also trying to build a time-switched lightweight threading system. Which, as you can imagine if you know me, I have also proposed. In fact, I hope to go to the SC'09 Workshop on that topic. I will be quite interested to see whether Bulldozer's cluster-private L1 caches (in AMD's swapped terminology, core-private L1 caches) are write through or write-back. Willamette's L0 was write-through. I leaned towards write-back, because my goal was to isolate clusters from each other, to reduce thrashing. Also, because write-back lends itself better to a speculative versionong cache, useful for SpMT. With Willamette as background, I leaned towards a relatively small, L0, cache in the cluster. Also, such a small L0 can often be pitch-matched with the cluster execution unit datapath. A big L1, such as Bulldozer seems to have, nearly always has to lie out of the datapath, and requires wire turns. Wire turns waste area. I have, from time to time, proposed putting the alignment muxes and barrel shifters in the wire turn area. I'm surprised that a large cluster L1 makes sense, but that's the sort of thing that you can only really tell from layout. Some posters have been surprised by sharing the FP. Of course, AMD's K7 design, with separate clusters for integer and FP, was already half-way there. They only had to double the integer cluster. It would have been harder for Intel to go MCMT, since the P6 family had shared integer and FP. Willamette might have been easier to go MCMT, since it had separate FP. Anyway... of course, for FP threads you might like to have thread-private FP. But, in some ways, it is the advent of expensve FP, like Bulldozer's 2 sets of 128 bit, 4x32 bit, FMAs, that justify integer MCMT: the FP is so big that the overhead of replicating the integer cluster, including the OOO logic, is a drop in the bucket. You'd like to have per-cluster-thread FP, but such big FP workloads are often so memory intensive that they thrash the shared-between-clusters L2 cache: threading may be disabled anyways. As it is, you get good integer threads via MCMT, and you get 1 integer thread and 1 FP thread. Two FP threads may have some slowdown, although, again, if memory intensive they may be blocking on memory, and hence allowing the other FP thread t use the FP. But two purely computational FP threads will almost undoubtedly block, unless the schedulers are piss-poor and can't use all of the FP for a single thread (e.g. by being too small). I certainly want to explore possibilities such as SpMT and other single thread speedups. But I know that you can't build all the neat ideas in one project. Apparently MCMT by itself was enough for AMD Bulldozer. (Actually, I am sure that there are other new ideas in Bulldozer. Just apparently not SpMT or spreading a single thread across clusters.) Look at the time-lag: 10-15 years from when I came up with MCMT in Wisconsin, 1996-2000. It is now 7-5 years from when I was at AMD, 2002-2004, and it will be another 2 years or so before Bulldozer is a real force in the marketplace. I don't expect to get any credit for MCMT. In fact, I'm sure I'm going to get shit for this post. I don't care. I know. The people who were there, who saw my presentations and read my proposals, know. But, e.g. Chuck Moore wasn't there at start; he came in later. Even Mike Haertel, my usual collaborator, wasn't there; he was hired in later, although before Chuck. Besides, Mike Haertel thinks that MCMT is obvious. That's cool, although I ask: if MCMT is obvious, then why isn't Intel doing it? Companies like Intel and AMD need idea generating people like me about once every 10 years. In between, they don't need new ideas. They need new incremental improvements of existing ideas. Anyway... It's cool to see MCMT becoming real. It gives me hope that my follow-on to MCMT, M may still, eventually, also become real. Ultima modifica di Ren : 16-11-2009 alle 21:34.

17-11-2009, 15:02	#228
checo Senior Member Iscritto dal: Aug 2000 Messaggi: 17963	sto qua mi sa uno col dente avvelenato. però dice non ci sarà il reverse iperthreading, sprem di no __________________ .

16-11-2009, 21:29	#225
Ren Senior Member Iscritto dal: Apr 2003 Città: Roma Messaggi: 3237	--------------------

17-11-2009, 13:55	#227
^[H3ad-Tr1p]^ Senior Member Iscritto dal: Jan 2002 Città: Trance City Messaggi: 7301	up

22-11-2009, 23:58	#238
astroimager Senior Member Iscritto dal: Apr 2005 Città: MC Messaggi: 7649	Quindi... ho capito male, o la ver. desktop di Bulldozer sarà compatibile con il socket AM3? I prossimi chipset serie 800 saranno sufficienti per supportare la nuova architettura?

Strumenti
Mostra una versione stampabile Invia questa pagina per email