--- Log opened Sun Dec 03 00:00:42 2017 | ||
bandvig | Let me describe background of my question. | 04:38 |
---|---|---|
bandvig | Actually, I'm not expert in multi-processor systems. Now I'm reading Culler's book "Parallel Computer Architecture". | 04:39 |
bandvig | If I understand correctly, designing a SMP machine we should make decision which kind of memory consistency model the machine will implement. | 04:39 |
bandvig | The model could be: (a) one of an academic variant like Sequential Consistency (SC) or Processor Consistency (PC) etc | 04:39 |
bandvig | (b) one of an industry variant like SPARC's Total Store Order (TSO) or Partial Store Order (PSO) | 04:39 |
bandvig | (c) something self designed. | 04:40 |
bandvig | Various hardware solutions like atomic and memory synch. instructions or | 04:40 |
bandvig | caches architecture and cache synch. should be designed in the way to provide SMP machine functioning | 04:40 |
bandvig | in according with selected memory consistency model. | 04:41 |
bandvig | While internally in hardware we can use various tricks to speed up our SMP system, | 04:41 |
bandvig | a programmer should continue view declared consistency model. | 04:41 |
bandvig | We can build simple SMP machine which sequentially consistent using just | 04:41 |
bandvig | single level write-through snoop-invalidation caches (without write buffers !!!) and appropriately integrated l.lwa/l.swa execution. | 04:42 |
bandvig | Next we can implement various synchronization algorithms with l.lwa/l.swa in assumption | 04:42 |
bandvig | that our SMP machine is sequentially consistent. | 04:42 |
bandvig | However if we add write buffers in our HW design | 04:42 |
bandvig | they destroy sequential consistency and our previously designed soft has become malfunctioning. | 04:43 |
bandvig | We have either to implement some trick in hardware to keep SC from programmer view (I'm not sure that is possible for the example) | 04:43 |
bandvig | or declare another memory consistency model for our (another now !!!) SMP machine (like SPRAC's TSO for example). | 04:43 |
bandvig | For the second variant we have to redesign our SMP soft libraries. | 04:43 |
bandvig | OR1K spec is quite unclear regarding multi-processor systems. | 04:43 |
bandvig | "The OpenRISC 1000 architecture specifies a weakly ordered memory model for uniprocessor and shared memory multiprocessor systems ". | 04:44 |
bandvig | If I understand correctly "weak order" or "strong order" are not describe memory consistency model them selfs | 04:44 |
bandvig | They are comparative categories. For example, TSO is weakly than SC, but stronger than PSO. | 04:44 |
bandvig | So, OR1K spec does not tell anything about how to build multi-core OpenRISC system. | 04:46 |
bandvig | wallento: If I understand correctly a tile of OpTiMSoC could be an SMP machine. | 04:46 |
bandvig | wallento: If so, could your describe in several words which consistency model is used inside a tile and how it matches to hardware decisions? | 04:46 |
wallento | bandvig, I think 1.2 is supposed to help there | 05:24 |
wallento | but I agree we should clearer work this out | 05:25 |
wallento | the main reason is that there are no high performance cores where it matters | 05:25 |
wallento | mor1kx is TSO I think with the store buffer | 05:26 |
wallento | I propose you start a proper thread on the mailing list for that with the clear goal to make the spec better in this respect | 05:27 |
wallento | I think we should keep the consistency model simple and with the common use case of OpenRISC in mind | 05:36 |
shorne | bandvig: right, that much detail is not there, spec verions 1.2 (released a few weeks, months back) tries to explain the options a bit | 06:01 |
shorne | but does not get into TSO/PSO details, also one thing I see in other architecture is the memory barriers (write barrier, read barrier) which we dont really have | 06:02 |
shorne | we just depend on the lwa/swa pairs and cache coherency via sync | 06:03 |
shorne | Probably we can do what we always do any find a architecture or architecutes we like and compare their models and spec and see which fits best. It would be good to discuss on the list | 06:05 |
bandvig | shorne: If I understand correctly, TSO machine needn't barriers but for PSO and SPARC's V9 Relaxed Memory Order (RMO) they are mandatory. | 06:53 |
bandvig | By the way, here https://pages.cpsc.ucalgary.ca/~higham/Research/papers/WriteBuffers.pdf is an interesting article | 06:53 |
bandvig | there proven that TSO/PSO could be correctly implemented on machine with write buffers while RMO couldn't. | 06:54 |
bandvig | I've downloaded 1.2 spec. Today or tomorrow I'll write in mailing list some initial latter regarding multi-core spec. | 06:56 |
shorne | bandvig: thank you, I think we have TSO now, but if there is benefit to relax it to PSO or RMO I have no objection if people really want it, just more bugs to fix in the kernel :) | 07:06 |
shorne | BUt probably no one sane would would to go with a really relaxed model like alpha had | 07:07 |
shorne | bandvig: if you haven't read, this one is really good too https://www.kernel.org/doc/Documentation/memory-barriers.txt | 07:07 |
--- Log closed Mon Dec 04 00:00:43 2017 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!