IRC logs for #openrisc Tuesday, 2014-06-24

--- Log opened Tue Jun 24 00:00:48 2014
juliusbstekern: cool thanks, it's something I'd like to document10:10
juliusbhow to run the mor1kx verification env under FuseSoC but you should have given me enough to get started, thanks10:10
stekernwallento: looks like it's your "keep in WRITE state on snoop_hit" that causes troubles10:22
stekern+ did you consider all possible cases when you removed IDLE?10:22
stekernI know I said it might be redundant, but I was only thinking about the WRITE->READ case then10:23
stekernit takes two cycles for a written value to be valid on the read port unless you have bypass registers in the RAM10:23
stekernI think it might fail on Xilinx devices now10:24
stekernbut I might be wrong too, if you have considered that possibility, then it's probably fine10:25, actaully, I remodeled the RAMs so they should even fail in verilator if it's an issue10:28
stekernbut back to the actual problem, I don't understand the purpose of staying WRITE on snoop_hit?10:37
wallentostekern: yes, thats exactly the issue i identified10:38
wallentomaybe you can help me10:38
wallentoproblem is the adresses changes too fast10:39
wallentoits because the read->write transition is delayed10:39
wallentoi am not sure whether this can be handled by the LSU10:39
wallentoor i need to store it in the cache's state machine, what could be done when detecting we_i and snoop_read_tagmem10:39
stekernwhich address changes too fast?10:40
wallentoit already has the next read if we have a write->read sequence10:41
wallentoi think that may be a general issue in the lsu if the store buffer is full10:41
wallentohence i think it may be better adressed there10:42
wallentoi will send you an elf file plus timestamp via mail10:42
stekernhmm, it has nothing to do with the storebuffer being full, but what I don't understand is the purpose of staying in write state when the snoop_hit is asserted?10:43
wallentoi think it may also occur when the store buffer is full10:43
wallentoits if the write is not executed immediately10:43
wallentothe processor can change the address too fast10:43
stekernbut the write to cache is always a 1-cycle operation10:44
wallentonot if there is a snoop10:44
wallentoand if the store buffer is full its also delayed, correct?10:44
stekernyeah... but I mean in the cache statemachine10:45
wallentoi implemented it to not have an extra snoop state10:45
wallentootherwise we need some extra registers to store the context of an operation or so10:46
stekernwhen you are in WRITE state, you've got to get out of there within one cycle (unless the next instruction upcoming instruction is a store)10:46
wallentothis is impossible if there is a snoop10:46
wallentosnoop needs to be handled instantl10:46
wallentootherwise we need backpressure10:46
wallentoand i think backpressure to the cpu is better than to the bus10:47
stekern...but you're not doing anything with the snoop in the WRITE state?10:47
wallentoyes, i am10:47
stekern(I'm looking for a bit of an explanation of how things work)10:48
wallentoi need to check the tagmem if no snoop tagmem is there10:48
wallentoand i might need to access the tagmem on a hit10:48
wallentothats the reason the version with extra tag memory for snooping "works"10:49
wallentoonce you start reading the tagmem the probability to delay a write cycle increases10:49
stekern...but isn't the whole snoop mechanism just to invalidate the address that was "snoop written"10:49
wallentobut it needs two cycles10:49
wallentolet me explain:10:49
wallento- you need to read the tag memory to check whether the cache has a copy. if OPTION_DCACHE_SNOOP_TAGMEM == "NONE" this is the normal tag memory, otherwise it reads from the duplicate snoopmem10:51
wallento- if the tag matches you need to write. the write is shared among both memories if present10:51
wallento- the first cycle is the snoop_read, the second is the snoop_check10:52
wallento- if the tag memory is involved (i.e., there is only one tag memory), the snoop_read_tagmem and snoop_check_tagmem signals are up to obstruct the other cache operation10:53
wallento- you cannot read from tag memory if snoop_read_tagmem matches10:53
wallento- the hit result is not valid for normal operation if snoop_check_tagmem is up10:53
wallento- REFILL is not affected at all, because once you are in there and get acks you cannot get snoops (as you hold the bus). this is an assumption for the moment, that snoops only come from the same bus you access10:55
wallento- READ is also safe, as you pointed out it is the actual IDLE state.. It is only necessary to check that !snoop_check_tagmem when !cpu_hit10:56
wallento- also the WRITE transitions are safe, we stay in WRITE as long as a snoop forbids to access the tagmem10:57
wallento(there is a further fix i will commit soon)10:57
wallentobut you need to allow that WRITE is longer then one cycle10:57
wallentothe LSU can check this via the snoop_*_tagmem and snoop_hit signals10:58
wallentobut i am a little confused about where to do this properly, i think storebuffer_write is the correct place10:58
wallentoalternatively i can store the address in the cache, but I think the same could happen if store buffer is full, i will try to stress this situation and verify my statement ;)11:00
stekernwouldn't it be easier to just stall the cpu on a snoop_hit and then invalidate the address via the INVALIDATE state?11:04
stekernbecause, if you want to wait in WRITE state you will need to stall the cpu while doing so, that's why it's not working now11:05
wallentoif you do so, the performance will die11:07
wallentoits already bad if you don't have the extra snoop memory11:08
stekernyeah, I think you can remove the non-extra snoop memory version, that's not going to ever work well11:08
wallentowhen snoop and write occur concurrently, you have to backpressure one11:08
wallentothere is no way to avoid this11:08
stekernI don't know what "backpressure" means in this context ;)11:09
wallentothat you can delay the upstream operation11:09
wallentowhich means either tell the cpu to stall11:09
wallentoor the bus11:09
wallentomaybe once I narrow down the same can happen when the store buffer is full, this would be needed nevertheless11:10
wallentocan't we use the lsu_valid_o maybe?11:10
wallentoi am sorry my knowledge ends with the interface from LSU to the pipeline11:10
wallentomaybe you know11:10
stekernI'll go for a walk and think about it a bit, bbl ;)11:13
wallentoenjoy your walk11:13
wallentoi will also rethink the whole thing, but i am quite sure that this is the last bit of it to work11:14
stekernwallento: yes, negating lsu_valid_o will stall the cpu12:13
wallentostekern: I think the problem comes from exec vs. ctrl, let me shortly describe the situation16:15
wallento- there are two consecutive stores (or store followed by load)16:16
wallento- exec_lsu_adr_i is A and exec_op_lsu_store_i is high during the first cycle16:17
wallento- there is a snoop and therefore store_buffer_write stays low (i think the same applies if store buffer is full), lsu_valid_o is also low16:18
wallento- dc_adr is A due to this situation16:19
wallento- pipeline advances, next cycle:16:19
wallento- ctrl_lsu_adr_i is A now and ctrl_op_store_i is high, the next write sets exec_lsu_adr_i to B and exec_op_store_i is high16:20
wallento- no snoop, so that dc_we is one now, store_buffer_write is high16:20
wallento- dc_adr is B16:21
wallento- the write to A got lost16:21
wallentowhich is the stack in my case, and the program crashes16:22
wallentoi will commit and send you an elf16:22
wallentoplus put the waveform somewhere16:22
wallentoi am not sure how to best solve this, maybe pipeline should not advance from exec to ctrl if lsu_valid_o is not set? Otherwise the LSU needs to track whether the exec operation is effective or not16:23
wallentommh, lsu_valid_o goes to wb_mux and not to ctrl16:27
wallento195.26 us16:30
wallentoah, its execute_ctrl :)16:33
wallentosorry, i had a wrong one, its 195.6316:55
stekernwallento left again...20:47
stekernI'll put something in the log to paste when he get back20:47
stekernI probably need to read the snoop code more in depth, because to me it's not clear he invalidation works at all20:48
stekernto me, the natural implementation would be as follows:20:49
stekern1) create a duplicate of the tag mem, to do snoop lookups from20:50
stekern2) when a snoop lookup is a hit, the pipeline is stalled and the cache line is invalidated20:50
stekernas far as I can tell, 1 is now implemented20:51
stekernbut to me it seems that 2 doesn't happen, instead the snoop invalidation just hijacks the write port of the tag mem and discards the current operation20:57
stekernI don't understand how that's supposed to work20:57
stekernbecause, the current operation is most likely completely unrelated to the snoop hit20:58
stekernoh, and I think we could just use a true_dpram for the "duplicate" tag mem21:14
stekernactually, I don't think we'd need to stall at all then21:20
stekernjust use the other port to invalidate the snooped cahceline21:21'll need some changes to the state machine, since the true_dpram doesn't have the bypass logic though21:21
stekernah.. no, that's of course not going to work21:42
stekernwe'll still need to stall on the write21:43
--- Log closed Wed Jun 25 00:00:49 2014

Generated by 2.15.2 by Marius Gedminas - find it at!