|
|
|
![]() |
|
Strumenti |
![]() |
#221 | |
Senior Member
Iscritto dal: Apr 2005
Città: Napoli
Messaggi: 6817
|
Quote:
__________________
0 A.D. React OS La vita è troppo bella per rovinarsela per i piccoli problemi quotidiani... IL MIO PROFILO SOUNDCLOUD! ![]() ![]() ![]() |
|
![]() |
![]() |
#222 | |
Senior Member
Iscritto dal: Apr 2005
Città: Napoli
Messaggi: 6817
|
Quote:
__________________
0 A.D. React OS La vita è troppo bella per rovinarsela per i piccoli problemi quotidiani... IL MIO PROFILO SOUNDCLOUD! ![]() ![]() ![]() |
|
![]() |
![]() |
#223 | |
Senior Member
Iscritto dal: Apr 2005
Città: Napoli
Messaggi: 6817
|
Quote:
![]()
__________________
0 A.D. React OS La vita è troppo bella per rovinarsela per i piccoli problemi quotidiani... IL MIO PROFILO SOUNDCLOUD! ![]() ![]() ![]() |
|
![]() |
![]() |
#224 |
Senior Member
Iscritto dal: Apr 2003
Città: Roma
Messaggi: 3237
|
Nuove speculazioni dai forum indicano delle nuove unità funzionali capaci di manipolare indirizzi o calcoli aritmetici(unità ibride), quindi non più unità o porte distinte per gli indirizzi.
In breve, se ho capito bene si parla di un totale di 8-alu o 8address, NON contemporaneamente. Ultima modifica di Ren : 17-11-2009 alle 13:13. |
![]() |
![]() |
#225 |
Senior Member
Iscritto dal: Apr 2003
Città: Roma
Messaggi: 3237
|
--------------------
|
![]() |
![]() |
#226 |
Senior Member
Iscritto dal: Apr 2003
Città: Roma
Messaggi: 3237
|
Altre novità che ancora non ho letto.
Pare circoli dello scritto sulle origini del Bulldozer direttamente da uno degli ingegneri ideatori del clustered core. Eccovi il tutto: Andy Krazy Glew Newsgroups: comp.arch From: "Andy \"Krazy\" Glew" <ag-n...@patten-glew.net> Date: Sat, 14 Nov 2009 22:50:01 -0800 Local: Sun, Nov 15 2009 7:50 am Subject: Re: Bulldozer details + bobcat Reply | Reply to author | Forward | Print | View thread | Show original | Report this message | Find messages by this author > Bulldozer details + bobcat BRIEF: AMD's Bulldozer is an MCMT (MultiCluster MultiThreaded) microarchitecture. That's my baby! DETAIL: Thursday was both a very good day and a very bad day for me. Good, because my MCMT ideas finally seem to be going into a product. Bad, because I ended up driving 4 hours from where I work with IV in the Seattle area back to Portland, to my wife who was taken to a hospital emergency room. The latter is personal. The former is, well, personal too, but also professional. I can't express how good it feels to see MCMT become a product. It's been public for years, but it gets no respect until it is in a product. It would have been better if I had stayed at Intel to see it through. I know that I won't get any credit for it. (Except from some of the guys who were at AMD at the time.) But it feels good nevertheless. The only bad thing is that some guys I know at AMD say that Bulldozer is not really all that great a product, but is shipping just because AMD needs a model refresh. "Sometimes you just gotta ship what you got." If this is so, and if I deserve any credit for CMT, then I also deserve some of the blame. Although it might have been different, better, if I had stayed. I came up with MCMT in 1996-2000 while at the University of Wisconsin. It became public via presentations. I brought MCMT back to Intel in 2000, and to AMD in 2002. I was beginning to despair of MCMT ever seeing the light of day. I thought that when I left AMD in 2004, the MCMT ideas may have left with me. Apparently not. I must admit that I am surprised to see that the concept endured so many years - 5+ years after I left, 7+ years to market. Apparently they didn't have any better ideas. True, there were rumors. For example, Chuck Moore presented a slide with Multicluster Multithreading on it to analysts in 2004 or 2005. But things went quiet. There were several patents filed, with diagrams that looked very much like the ones I drew for the K10 proposal. But, one often sees patent applications for cancelled projects. Of course, AMD has undoubtedly changed and evolved MCMT in many ways since I first proposed it to them. For example, I called the set of an integer scheduler, integer execution units, and an L1 data cache a "cluster", and the whole thing, consisting of shared front end, shared FP, and 2 or more clusters, a processor core. Apparently AMD is calling my clusters their cores, and my core their cluster. It has been suggested that this change of terminology is motivated by marketing, so that they can say they have twice as many cores. My original motivation for MCMT was to work around some of the limitations of Hyperthreading on Willamette. E.g. Willamette had a very small L0 data cache, 4K in some of the internal proposals, although it shipped at 8K. Two threads sharing such a tiny L0 data cache thrash. Indeed, this is one of the reasons why hyperthreading is disabled on many systems, including many current Nhm based machines with much larger closest-in caches. At the time, the small L0s were a given. You couldn't build a Willamette style "fireball" high frequency machine, and have a much bigger cache, and still preserve the same small cache latency. To avoid threads thrashing each other, I wanted to give each thread their own L0. But, you can't do so, and still keep sharing the execution units and scheduler - you can't just build a 2X larger array, or put two arrays side by side, and expect to have the same latency. Wires. Therefore, I had to replicate the execution units, and enough of the scheduler so that the "critical loop" of Scheduler->Execution->Data Cache was all isolated from the other thread/cluster. Hence, the form of multi-cluster multi-threading you see in Bulldozer. True, there are differences, and I am sure more will become evident as more Bulldozer information becomes public. For example, although I came up with MCMT to make Willamette-style threading faster, I have always wanted to put SpMT, Speculative Multithreading, on such a substrate. SpMT has potential to speed up a single thread of execution, by splitting it up into separate threads and running the separate threads on different clusters, whereas Willamette-style hyperthreading, and Bulldizer-style MCMT (apparently), only speed up workloads that have existing independent threads. I still want to build SpMT. My work at Wisconsin showed that SpMT on a Willamette substrate was constrained by Willamette's poor threading microarchitecture, so naturally I had to first create the best explicit threading microarchitecture I could, and then run SpT on top of it. If I received arows in my back for MCMT, I received 10 times as many arrows for SpMT. And yet still I have hope for it. Unfortunately, I am not currently working on SpMT. Haitham Akkary, the father of DMT, continues the work. I also tried, and still continue, to explore other ways of speeding up single threads using multiple clusters. Although I remain an advocate of SpMT, I have always recognized the value of MCMT as an explicit threaded microarchitecture. Perhaps I should say here that my MCMT had a significant difference from clustering in, say, the Alpha 21264, http://www.hotchips.org/archives/hc10/2 ... 10.1.1.pdf Those clusters bypass to each other: there is a fast bypass within a cluster, and a slightly slower (+1 cycle) bypass of results between clusters. The clusters are execution units only, and share the data cache. This bypassing makes it easy (or at least easier) to spread a single thread across both clusters. My MCMT clusters, on the other hand, do NOT bypass to each other. This motivates separate threads per cluster, whether explicit or implicit. I have a whole taxonomy of different sorts of clustering: * fast vs slow bypass clusters * fully bypassed vs. partially bypassed * mechanisms to reduce bypassing * physical layout of clusters * bit interleaved datapaths * datapaths flowing in opposite directions, with bypassing where they touch * what's in the cluster * execute only * execute + data cache * schedule + execute + data cache * renamer + schedule + execute + datacache ... * what gets shared between clusters * front-end * renamer? * data-cache - L0? L1? L2? * TLBs... * MSHRs... * FP... Anyway: if it has an L0 or L1 data cache in the cluster, with or without the scheduler, it's my MCMT. If no cache in the cluster, not mine (although I have enumerated many such possibilities). Motivated by my work to use MCMT to speed up single threads, I often propose a shared L2 instruction scheduler, to load balance between the clusters dynamically. Although I admit that I only really figured out how to do that properly after I left AMD, and before I joined Intel. How to do this is part of the Multi-star microarchitecture, M*, that is my next step beyond MCMT. Also, although it is natural to have a single (explicit) thread per cluster in MCMT, I have also proposed allowing two threads per cluster. Mainly motivated by SpMT: I could fork to a "runt thread" running in tghe same cluster, and then migrate the run thread to a different cluster. Intra-cluster forking is faster than inter-cluster forkng, and does not disturb the parent thread. But, if you are not doing SpMT, there is much less motivation for multiple threads per cluster. I would not want to do that unless I was also trying to build a time-switched lightweight threading system. Which, as you can imagine if you know me, I have also proposed. In fact, I hope to go to the SC'09 Workshop on that topic. I will be quite interested to see whether Bulldozer's cluster-private L1 caches (in AMD's swapped terminology, core-private L1 caches) are write through or write-back. Willamette's L0 was write-through. I leaned towards write-back, because my goal was to isolate clusters from each other, to reduce thrashing. Also, because write-back lends itself better to a speculative versionong cache, useful for SpMT. With Willamette as background, I leaned towards a relatively small, L0, cache in the cluster. Also, such a small L0 can often be pitch-matched with the cluster execution unit datapath. A big L1, such as Bulldozer seems to have, nearly always has to lie out of the datapath, and requires wire turns. Wire turns waste area. I have, from time to time, proposed putting the alignment muxes and barrel shifters in the wire turn area. I'm surprised that a large cluster L1 makes sense, but that's the sort of thing that you can only really tell from layout. Some posters have been surprised by sharing the FP. Of course, AMD's K7 design, with separate clusters for integer and FP, was already half-way there. They only had to double the integer cluster. It would have been harder for Intel to go MCMT, since the P6 family had shared integer and FP. Willamette might have been easier to go MCMT, since it had separate FP. Anyway... of course, for FP threads you might like to have thread-private FP. But, in some ways, it is the advent of expensve FP, like Bulldozer's 2 sets of 128 bit, 4x32 bit, FMAs, that justify integer MCMT: the FP is so big that the overhead of replicating the integer cluster, including the OOO logic, is a drop in the bucket. You'd like to have per-cluster-thread FP, but such big FP workloads are often so memory intensive that they thrash the shared-between-clusters L2 cache: threading may be disabled anyways. As it is, you get good integer threads via MCMT, and you get 1 integer thread and 1 FP thread. Two FP threads may have some slowdown, although, again, if memory intensive they may be blocking on memory, and hence allowing the other FP thread t use the FP. But two purely computational FP threads will almost undoubtedly block, unless the schedulers are piss-poor and can't use all of the FP for a single thread (e.g. by being too small). I certainly want to explore possibilities such as SpMT and other single thread speedups. But I know that you can't build all the neat ideas in one project. Apparently MCMT by itself was enough for AMD Bulldozer. (Actually, I am sure that there are other new ideas in Bulldozer. Just apparently not SpMT or spreading a single thread across clusters.) Look at the time-lag: 10-15 years from when I came up with MCMT in Wisconsin, 1996-2000. It is now 7-5 years from when I was at AMD, 2002-2004, and it will be another 2 years or so before Bulldozer is a real force in the marketplace. I don't expect to get any credit for MCMT. In fact, I'm sure I'm going to get shit for this post. I don't care. I know. The people who were there, who saw my presentations and read my proposals, know. But, e.g. Chuck Moore wasn't there at start; he came in later. Even Mike Haertel, my usual collaborator, wasn't there; he was hired in later, although before Chuck. Besides, Mike Haertel thinks that MCMT is obvious. That's cool, although I ask: if MCMT is obvious, then why isn't Intel doing it? Companies like Intel and AMD need idea generating people like me about once every 10 years. In between, they don't need new ideas. They need new incremental improvements of existing ideas. Anyway... It's cool to see MCMT becoming real. It gives me hope that my follow-on to MCMT, M* may still, eventually, also become real. Ultima modifica di Ren : 16-11-2009 alle 20:34. |
![]() |
![]() |
#227 |
Senior Member
Iscritto dal: Jan 2002
Città: Trance City
Messaggi: 7299
|
up
![]() |
![]() |
![]() |
#229 | |
Member
Iscritto dal: Nov 2009
Messaggi: 78
|
Quote:
Sono d'accordo in via generale con quello che dici eccetto che Bulldozzer non è 4 issue wide. O meglio ha due cluster int 2 issue wide (sempre che siano simili al k10, se no chissà...). Le cache L1d sono separate, quindi è improbabile ![]() Un'architettura a 3 vie non viene mai appieno sfruttata, come dici, tuttavia se si sono dati tanto da fare a farle (addirittura i core due sono 4-wide anche se con molti se e ma) deve esserne valsa la pena utilizzare tutti quei transistor in più. Imho AMD ha pensato che con un'architettura pesantemente multi-thread ottenga maggiori prestazioni per area utilizzata spostando il baricentro dall'ilp al tlp, almeno imho; Ciao! |
|
![]() |
![]() |
#230 | |
Senior Member
Iscritto dal: Nov 2003
Messaggi: 24169
|
Ricordo che AMD sta preparando il suo Turbo mode chiamato "Automatic Proccesor Overclocking" (alcune voci dicono che sarà addirittura introdotto con Thuban)
Il Multi-Threading Technology o SMT, con tutta probabilità, sarà anti HyperThreading di Intel e per ora il Reverse HyperThreading dovrebbe essere legato alle esecuzioni delle istruzioni AVX Intel Riassunto del buon bjt2 Quote:
__________________
AMD Ryzen 9600x|Thermalright Peerless Assassin 120 Mini W|MSI MAG B850M MORTAR WIFI|2x16GB ORICO Raceline Champion 6000MHz CL30|1 M.2 NVMe SK hynix Platinum P41 1TB (OS Win11)|1 M.2 NVMe Lexar EQ790 2TB (Games)|1 M.2 NVMe Silicon Power A60 2TB (Varie)|PowerColor【RX 9060 XT Hellhound Spectral White】16GB|MSI Optix MAG241C [144Hz] + AOC G2260VWQ6 [Freesync Ready]|Enermax Revolution D.F. 650W 80+ gold|Case Antec CX700|Fans By Noctua e Thermalright |
|
![]() |
![]() |
#231 | |
Senior Member
Iscritto dal: Sep 2008
Città: Provincia di reggio, costa dei gelsomini :D
Messaggi: 1691
|
Quote:
Le cache l1d sono separate ma la L1D è unica per ambedue i clusters, ciò a livello teorico dovrebbe implicare che nel fetch dalla L1i ambedue i clusters ricevano le stesse operazioni, ma le eseguano su operandi diversi (oppure uno esegue un pezzetto di codice ed uno un altro) ecco quindi spiegato il motivo di 2 L1D) sarebbe un ipotetico ICU a smistare le operazioni in maniera effciente ad ambedue gli scheduler distribuiti. Almeno questo è quello che ho capito io.. Riguardo all'efficienza dei core2, beh stando ad alcuni che ne capiscono più di me l'ipc medio è superiore del 10% ad i K8 http://www.realworldtech.com/page.cf...2808015436&p=9 ed ipoteticamente uguali ad i K10 tranne nelle applicazioni memory intensive, come mai ci sono alcune differenze lo dovresti chiedere ad i maghi del compiler... Inoltre se vai a vedere l'ipc medio e di ambedue 1.1 1.2 istruzioni per ciclo di clock. I core2quad per spremere 4 istruzioni a ciclo richiedono di applicazioni interamente scritte per loro visto che soffrono di inefficienze di scheduler(Xbitlabs ha un articolo a proposito se non erro). Quindi i transistor spesi per una architettura 4-wide sono perfettamente, a mia opinione quindi a valore 0, inutili se si richiede programmazione ad hoc, che poi se vai a vedere il dispendio maggiore è tra cache e logica di controllo che si pappano più del 80%del die size.......
__________________
Amore mio, forza ed onore, io sono nel cuore tuo. Insieme ce la possiamo fare, a vincere questa battaglia per la vita |
|
![]() |
![]() |
#232 | |
Senior Member
Iscritto dal: Sep 2005
Messaggi: 4337
|
Non so se ci sia qualcosa di nuovo, e a dire il vero non so neanche se sia reale o fake, ma ho trovato questo su un NG:
Quote:
__________________
I7 3930K: @ 4400 @ 1.32V, su ASUS P9X79 Deluxe - RAM: 16GB Geil 4x4GB @ 2133Mhz - Video: ATI HD6970 @ 1000 / 6000 - HD: SSD OCZ Vertex2 120gb + RAID0: 2XHitachi Deskstar T7K250 250gb SATA2 - P.S.U. Corsair TX850M - Monitor Iiyama B2712HDS LCD 27'' - Cooled by: Ybris Eclipse + Ybris ACS-G + SILENTstar HD-Quad Rev2.0 + WaterStation HomeMade: Doppio Rad triventola + Pompa Sanso PDH054
|
|
![]() |
![]() |
#233 |
Senior Member
Iscritto dal: Dec 2005
Città: catanzaro Trattative¨OK: molte
Messaggi: 2197
|
grande thread,
interessantissimo iscritto ![]()
__________________
trattative concluse ![]() Codice:
Diablo_lord_of_terror; crockett87; qtaps80; hubert_2711; rampo83; Kerdrak; MATTEW1; scupetta; Gianpy871; LkNuke; Nemesi2008; lele.miky; rebagliatiedilio; Helper; Drive97; spidey755; BabyNeo.g]R[e; Le085; mauroalfa; GANDALF; Salvo1583; Grievas; lupennino; songuid; Auronns; KekkoZ92 |
![]() |
![]() |
#234 |
Senior Member
Iscritto dal: Dec 2000
Città: Parma
Messaggi: 3121
|
|
![]() |
![]() |
#236 | |
Senior Member
Iscritto dal: Nov 2003
Messaggi: 24169
|
Quote:
![]() Clicca qui...
__________________
AMD Ryzen 9600x|Thermalright Peerless Assassin 120 Mini W|MSI MAG B850M MORTAR WIFI|2x16GB ORICO Raceline Champion 6000MHz CL30|1 M.2 NVMe SK hynix Platinum P41 1TB (OS Win11)|1 M.2 NVMe Lexar EQ790 2TB (Games)|1 M.2 NVMe Silicon Power A60 2TB (Varie)|PowerColor【RX 9060 XT Hellhound Spectral White】16GB|MSI Optix MAG241C [144Hz] + AOC G2260VWQ6 [Freesync Ready]|Enermax Revolution D.F. 650W 80+ gold|Case Antec CX700|Fans By Noctua e Thermalright |
|
![]() |
![]() |
#237 |
Senior Member
Iscritto dal: Apr 2005
Città: Napoli
Messaggi: 6817
|
In breve lui ha avuto l'idea del CMT nel 1996-2000, ha presentato le sue idee quando era in AMD nel 2000-2002 ed è quasi sicuro che sono state fatte aggiunte alle sue idee.
__________________
0 A.D. React OS La vita è troppo bella per rovinarsela per i piccoli problemi quotidiani... IL MIO PROFILO SOUNDCLOUD! ![]() ![]() ![]() |
![]() |
![]() |
#238 |
Senior Member
Iscritto dal: Apr 2005
Città: MC
Messaggi: 7649
|
Quindi... ho capito male, o la ver. desktop di Bulldozer sarà compatibile con il socket AM3?
I prossimi chipset serie 800 saranno sufficienti per supportare la nuova architettura? |
![]() |
![]() |
#239 |
Senior Member
Iscritto dal: Apr 2005
Città: MC
Messaggi: 7649
|
Beh, il max sarebbe la retrocompatibilità con la serie 700, anche con alcune limitazioni (come era per il socket AM2)... chissà se sopravviveranno le attuali AM3 di punta...
|
![]() |
![]() |
#240 | |
Senior Member
Iscritto dal: May 2006
Città: Caserta
Messaggi: 2722
|
Quote:
__________________
|
|
![]() |
![]() |
Strumenti | |
|
|
Tutti gli orari sono GMT +1. Ora sono le: 21:22.