IRC logs for #openrisc Sunday, 2012-04-22

jonibo\|laptop	hi stekern	11:44
stekern	hello	11:44
jonibo\|laptop	figured it might be as easy to discuss here as by mail	11:44
jonibo\|laptop	the cache invalidate register, as far as I can tell, only works at startup today because it's a 1-way cache	11:45
stekern	nah, it works because it's not spec-complient	11:45
jonibo\|laptop	how so?	11:46
stekern	the spec says that _only_ the address that is written to CBIR should be invalidated, but or1200 wipse any tag matching	11:47
jonibo\|laptop	ok, that's broken	11:47
jonibo\|laptop	but it's a 1-way cache	11:47
jonibo\|laptop	so it's the same thing	11:47
jonibo\|laptop	it should match on the tag, find the way it corresponds to, and invalidate that cache-way only	11:48
stekern	is it? I don't think so	11:48
jonibo\|laptop	but for a 1-way cache, _any_ tag will be the only way	11:48
jonibo\|laptop	or maybe not...??? let me think	11:49
jonibo\|laptop	actually,, you're right... it's _really_ broken	11:50
stekern	I don't think 1-way or multi-way makes any different in this case	11:50
jonibo\|laptop	no, it should not invalidate the line if the tag doesn' tmatch	11:50
jonibo\|laptop	so, you're right	11:50
jonibo\|laptop	anyway, it should still be invalidated at reset, automatically	11:50
jonibo\|laptop	otherwise we need to loop over all the entire physical mmeory space	11:50
stekern	but the problem is, to do it according to the spec, you'd have to loop through the whole memory to invalidate the whole cache	11:51
jonibo\|laptop	yeah, that's what you want to avoid doing	11:51
jonibo\|laptop	are you _for_ the invalidate at reset? i interpreted your mail as you being against it	11:52
jonibo\|laptop	automatic, I mean	11:52
stekern	so a "invalidate entire cache" command would be needed for that	11:52
jonibo\|laptop	nah, that 's not needed... you don't want to both software with this at all	11:52
jonibo\|laptop	it should just be done automatically at reset	11:53
jonibo\|laptop	who wants a cache with unknown state at startup anyway?	11:53
jonibo\|laptop	you want the whole thing invalid	11:53
stekern	I'm for it in theory, but it'll bloat the hardware	11:53
jonibo\|laptop	yeah, I guess so	11:54
jonibo\|laptop	today we use the fact that the or1200 is broken w.r.t. BIR to get a quick invalidate at startup	11:55
jonibo\|laptop	but it will be brutal if we need to do every tag individually	11:56
stekern	how does other architectures handle the dilemma?	11:56
jonibo\|laptop	not sure... i'm pretty sure most caches come up invalidated at reset, though	11:57
stekern	I know lm32 invalidates on startup, and it doesn't have a fine-grained invalidate flush. If you invalidate/flush, the whole cache goes	11:57
jonibo\|laptop	that's no good	11:58
stekern	no	11:58
jonibo\|laptop	for DMA you want to be able to flush just the line you've modified	11:58
stekern	but I wonder, is it so bad that you might get some collateral damage when you invalidate? (i.e, the way or1200 works)	11:59
jonibo\|laptop	not sure... i've got no numbers	11:59
jonibo\|laptop	but maybe	12:00
jonibo\|laptop	it's not a performance win in any case	12:00
jonibo\|laptop	for a multi-way cache, it's worse	12:00
jonibo\|laptop	anyway, i've got to run... i'm for hardware invalidate and spec-compliant BIR	12:03
jonibo\|laptop	i'll leave it at that	12:03
jonibo\|laptop	even if the hardware invalidate is implemented as an instruction loop in ROM that just iterates over all memory and invalidates the cache for it	12:04
jonibo\|laptop	(the entire memory space, that would be... :) )	12:04
juliusb_	so this whole thing is solved by my proposal here: http://opencores.org/or1k/Architecture_Specification#Cache_Block_Invalidate_Behaviour_Clarification	12:57
juliusb_	basically we should just observe the set number of the address written to the BIR	12:57
juliusb_	and then invalidate that block	12:57
juliusb_	all done, simple, compatible with what we have now, and makes it simple for software to loop through and invalidate each set	12:58
juliusb_	regarding multiway - I've been asking this question for over a year now ;)	12:58
juliusb_	i say just invalidate all ways, simplest thing	12:58
juliusb_	as there's no way-specific block invalidate reg	12:58
juliusb_	but.... the way you guys just discussed is more sensible - only invalidate if the address is in cache	12:59
stekern	yes, and jonibo has a point, you'll have the performance penalty	13:00
juliusb_	but my way is less of a burden on hardware implementation (i'm not a fan of putting in heaps of logic just for single use at reset)	13:00
juliusb_	performance penalty?	13:00
juliusb_	regarding all of these changes to OR1K - I'm more inclined to go with something which, despite maybe not being 100% the best approach, gives us maximum backward compatibility with minimum amount of work to adapt existing software and models to the new spec	13:02
stekern	yes, since you've got the collateral damage of addresses you did not mean to invalidate	13:02
juliusb_	so, in this case, defining the behaviour of the cache BIR means we don't have any change anything in OR1200 or software	13:02
juliusb_	(but we're clarifying the behaviour for future developers and users)	13:03
stekern	I wonder how much logic is actually needed to do the invalidate on reset, should only be a counter basically (and some control logic for the state in the fsm)	13:03
stekern	is it only during reset that you'd want to invalidate the whole cache?	13:05
juliusb_	You'd also want to have it capable of being run by poking SPR bits too	13:13
juliusb_	but really, I'm against putting in this sort of stuff - I say, for the simplicity of the implementation, we should leave this to software	13:14
juliusb_	we're going to need it for all the memories, and that'll add up	13:14
juliusb_	overall transistor count, though, to do the invalidation - it's something I'd like to know which is smaller to do - the 8 instructions it takes in software to do it (8*32-bits = 256 FFs, essentially) or the hardware (probably a counter as wide as the number of addresses we need to clear and some muxing)	13:16
juliusb_	clearly the reset-by-FSM thing will be more power efficient and be quicker, but i'm still not sold on moving chunks of on-shot initialisation stuff into HW	13:17
juliusb_	one-shot	13:17
stekern	I agree on that, on many FPGA targets you could parameterize it away though	13:21
stekern	(if it is really only needed on reset)	13:22
jonibo\|laptop	juliusb_: I don't care much for your BIR clarification... I can accept that the or1200 is broken and does it that way, but let's not generalize that error	15:26
jonibo\|laptop	the reset case is special, let's ignore that for now	15:26
jonibo\|laptop	but at runtime you want a sane invalidation behaviour	15:26
jonibo\|laptop	if the line's not in cache, it's a no-op	15:26
jonibo\|laptop	and for multi-way, it's inelegant to trample over cache lines that may be in use by other processes	15:27
jonibo\|laptop	it's a conundrum, I know, and the or1200 implementation is fine as long as it's documented... but for the next generation we might be able tocome up with something more elegant... just not sure what that should be at this point	15:27
juliusb_	well, i say my desribed behaviour should be fine - it's more of a hardware-centric view, I'll accept that (basically use that BIR as a line invalidate interface)	16:14
jonibo\|laptop	it's fine for the reset case	16:15
jonibo\|laptop	it's less nice for regualar operation	16:15
juliusb_	so remove any idea of it being "intelligent" i guess	16:15
jonibo\|laptop	like I said, the or1200 does it that way... that's an implementation detail	16:16
jonibo\|laptop	that's fine	16:16
jonibo\|laptop	i don't like the idea of generalizing it, though	16:16
juliusb_	yes, but it was done that way for a reason	16:16
jonibo\|laptop	i understand that... it's less than optimal	16:17
juliusb_	and that reason is to avoid reset logic, and probably to get around a sloppily defined cache system	16:17
juliusb_	or rather, work with a sloppily defined cache system	16:17
jonibo\|laptop	it's not that sloppily defined...	16:17
jonibo\|laptop	in fact, it's pretty well-defined in the spec	16:18
jonibo\|laptop	the only problem is the reset case	16:18
stekern	i was just about to say that	16:18
jonibo\|laptop	as it stands now SW is required to do _long_ loop to invalidate... that's fine per se	16:18
jonibo\|laptop	it just makes for a long startup time, but it's a one-time cost	16:18
jonibo\|laptop	and the or1200 solves that for the time being with a "less than optimal" solution... but it works	16:19
jonibo\|laptop	but I don't care to see that encoded in the spec	16:19
jonibo\|laptop	because I hope that someday somebody will come along and implement this properly... and then the spec shouldn't stand in their way	16:19
juliusb_	hmm, no, there's issues with what happens when the EA written into BIR isn't in the cache (it's not clear in the spec)	16:19
jonibo\|laptop	what? it's a no-op	16:20
juliusb_	spec says that EA is "EA that targets byte inside cache block	16:20
jonibo\|laptop	isn't that obvious	16:20
juliusb_	no it isn't	16:20
juliusb_	because that just says targets byte inside cache block	16:20
jonibo\|laptop	ok... it seems obvious to me	16:20
juliusb_	that says nothing about matching EA to the tag address and invalidating only in that case	16:20
juliusb_	that definition says to me it does an address mapping of the EA to the appropriate bytes in the cache	16:20
jonibo\|laptop	yeah, but think about it... what's the point of an invalidate? either then line is in cache and you want a fetch next time it's accessed, or it's not in cache in which case you get that anyway	16:21
juliusb_	yes, true, but you stil might have a case where you want to entirely clear the cache for a context switch or something	16:21
juliusb_	which is basically the reset cache	16:21
jonibo\|laptop	no, never...	16:22
jonibo\|laptop	the cache is physically tagged	16:22
jonibo\|laptop	you never clear the cache	16:22
jonibo\|laptop	the MMU makes sure that processes can't access others cached data	16:22
juliusb_	OK	16:22
juliusb_	yes of course, sounds good	16:22
juliusb_	so, as always, basically I'm arguging for the thing which is the simplest to implement in HW :)	16:23
jonibo\|laptop	i know exactly where you're coming from though... I had this conversation with myself last year!	16:23
juliusb_	current system is	16:23
jonibo\|laptop	yeah, and I'm arguing for a "correct" spec and "cutting corners in implementations is fine as long as you respect the spec"	16:23
jonibo\|laptop	...which is what we have with the or1200	16:23
jonibo\|laptop	almost	16:23
juliusb_	so i'm arguging to adapt the spec to do what OR1200 does now. To do it the way it should be done would require 1) some reset logic or some clear-all-cache-block-tags feature, and 2) something to read and compare the block tag when BIR is written to, to determine if it should be done or not	16:23
jonibo\|laptop	actually, not "almost"... it does respect the spec	16:24
jonibo\|laptop	yeah, if it were optimal, you'd have that	16:24
juliusb_	and then there's the issue of multi-way	16:24
jonibo\|laptop	but the or1200 cuts corners on 2) and invalidates everytime... that's fine, it's just less than optimal	16:24
juliusb_	which I've, again, gone with the simplest, quickest dirties way of handling it	16:24
juliusb_	:)	16:24
jonibo\|laptop	and your case 1) is a sw problem	16:25
jonibo\|laptop	we don't even have multi-way	16:25
jonibo\|laptop	...in implementaiton, I mean	16:25
juliusb_	I think stekern has it working somewhere	16:25
jonibo\|laptop	ok	16:25
juliusb_	i'm quite close to being able to publish my new CPU - got word that a release has been drafted and i've just got to go through the process of showing it to people that can OK it	16:26
juliusb_	so i'd hope within a week or two	16:26
juliusb_	:)	16:26
jonibo\|laptop	yay!	16:26
juliusb_	... but that's an aside, but I think stekern was playing with multi-way cache in that	16:26
juliusb_	i really need to run	16:26
stekern	yes, that's correct, 2-way is (optionally) available there	16:27
juliusb_	:)	16:27
jonibo\|laptop	ok... hope you see my point, though	16:27
jonibo\|laptop	it's an issue for the spec	16:27
juliusb_	but, quick and dirty multi-way invalidate works well, too, and I imagine would be very simple to implement	16:27
jonibo\|laptop	it's _not_ an issue for the spec :)	16:28
jonibo\|laptop	it's for the implementation documentation	16:28
juliusb_	well, I want the implementations and the spec to be in harmony	16:28
jonibo\|laptop	no...	16:28
juliusb_	so we have to change one or the other	16:28
jonibo\|laptop	no, the implementation is just sub-optimal... but it's still correct	16:29
jonibo\|laptop	i don't see an issue here	16:29
juliusb_	no but it's not clear from spec how you clear it at reset	16:29
stekern	What does the "Missing cache block in the local processor does not cause any action" mean?	16:29
juliusb_	or it's not clear what happens for multi-way	16:29
jonibo\|laptop	on the or1200 we can cheat because the BIR isn't so clever... on another implementation we can't cheat	16:30
juliusb_	stekern: mmm, yes, perhaps that's the sentence saying it should not do anything if the line isn't there	16:30
jonibo\|laptop	stekern: it means "no-op"	16:30
juliusb_	mmm ok	16:30
juliusb_	I'm wrong!	16:30
juliusb_	:)	16:30
juliusb_	in that case the OR1200 is wrong	16:30
jonibo\|laptop	not "wrong", "just-enlightened" :)	16:30
juliusb_	because it doesn't do nothing, it does invalidate the block	16:30
juliusb_	so in this case, one or the other must change	16:31
juliusb_	and im really late!!	16:31
juliusb_	bbl	16:31
jonibo\|laptop	ok, but I don't agree it has to change	16:31
juliusb_	(cache invalidate discussions are surprisingly exciting)	16:31
jonibo\|laptop	it's a nice "cheat_	16:31
jonibo\|laptop	:)	16:31
stekern	well, tbh, my cache implementation cheats too ;)	16:31
jonibo\|laptop	it just causes a performance degradation	16:31
jonibo\|laptop	cheats are fine as long as they are fundamentally correct...	16:31
jonibo\|laptop	poor performance is another issue altogether	16:32
jonibo\|laptop	stekern: how does your implementaion cheat?	16:32
stekern	it does the invalidation the or1200 way	16:32
jonibo\|laptop	right... which is a performance hit, but nothing else	16:33
stekern	yes, it's not gonna break software that expects correct behaviour	16:33
jonibo\|laptop	how?	16:33
jonibo\|laptop	I don't see why that would _break_ anything	16:33
stekern	that's why it's an "OK" cheat	16:33
jonibo\|laptop	oh sorry, misread you	16:33
stekern	:)	16:33
jonibo\|laptop	yeah, I agree	16:33
jonibo\|laptop	exactly, it's an implementation detail... those are fine... these get documented in the release notes and then you're done with it	16:34
jonibo\|laptop	but let's not update the arch spec to conform to the implementation just because somebody decide to cut that particular corner	16:35
jonibo\|laptop	as for the cache invalidation at reset... I'd say we just defer that discussion until we have an implementation that actually has a BIR that considers the address tag in question	16:37
jonibo\|laptop	...it's moot until then	16:37
-!- Netsplit .net <-> .split quits: jonibo		23:54

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!