IRC logs for #openrisc Monday, 2012-11-05

@juliusb	cool, made mor1kx-prontoespresso CPU go to sleep when you have a l.j 0 instruction	01:38
@juliusb	and wakes up on tick and PIC interrupts	01:38
@stekern	juliusb: (sleep) is that just for testing or how do you intend to "continue" after the sleep?	05:22
@stekern	juliusb: configurable burst length sounds good	06:30
@stekern	and I agree that it probably will help shake out bugs running the tests with different settings there	06:30
@stekern	regarding figuring out several months later what you've done can be a good form of masochism, I'm usually inclined to do cleanup when I try to understand what I've done	06:34
@stekern	(finding some cruft in the icache atm)	06:34
@juliusb	stekern: sleep is what I'm calling the fetch stage shutting down and not fetching anymore	11:40
@juliusb	an interrupt causes fetching to begin from either the PIC or tick vector	11:40
@juliusb	it's a very small patch to put in that functionality, actually	11:43
@juliusb	https://github.com/juliusbaxter/mor1kx/commit/6ef4dd4722df1a7d5d4d306f9de47bed462abbfd	11:43
_franck_	it becomes a big patch when you commit white space / tab changes ;)	11:48
@juliusb	ya :/	11:48
@juliusb	so in my rewrite of the fetch unit for pronto, I have started to use easier-to-read coding style of big if-else statements	11:48
@juliusb	rather than single lines with a bunch of mux statements in them	11:48
@juliusb	much easier to follow and debug	11:49
_franck_	much better I think	11:49
@juliusb	like this crap: https://github.com/juliusbaxter/mor1kx/blob/8826f37b0c1391e8748a5a6a53d49a3a72676c84/rtl/verilog/mor1kx_fetch_prontoespresso.v#L137	11:50
@juliusb	no more of that	11:50
@juliusb	it's just too annoying to debug	11:50
@juliusb	haha	11:50
_franck_	your link doesn't work	11:51
_franck_	https://github.com/juliusbaxter/mor1kx/blob/8826f37b0c1391e8748a5a6a53d49a3a72676c84/rtl/verilog/mor1kx_fetch_pro jonmaste~	11:51
_franck_	oups	11:52
_franck_	yes it works	11:52
_franck_	yeah I really dislike this king of thing. The king of thing one can find in or1200 RTL	11:53
@stekern	yeah, I tried to stay away from those when I redid the cappuccino fetch stage ;)	11:53
@stekern	this is the worse I got in there: https://github.com/openrisc/mor1kx/blob/master/rtl/verilog/mor1kx_fetch_cappuccino.v#L128	11:53
@stekern	but it's hard in the fetcher, you find some corner case to some corner case and they easily build up	11:54
_franck_	once you've found them all, you rewrite it ;)	11:54
@juliusb	_franck_: Indeed :)	11:55
@juliusb	that's a tame one stekern	11:55
@stekern	_franck_: that's exactly what happened ;)	11:55
@stekern	I think I rewrote that fetcher 3 or 4 times	11:56
@juliusb	there's some nasty ones in the ctrl stages too	11:56
@juliusb	haha	11:56
@juliusb	i synthesised with my new fetch stage, the critical path isn't in the or out of he fetch unit, so I'm happy :)	11:56
@stekern	was it before?	11:58
@stekern	haha, I just saw that signalname: awkward_transition_to_branch_target ;)	12:08
@juliusb	no, but considering I was doing a bunch of decoding of the incoming instruction for control of the next bus access cycle, i worried it might	12:49
@juliusb	yes, it was awkward! annoying branch right at the end of a burst	12:49
@juliusb	in hindsight I made the fetch unit far too complicated	12:49
@juliusb	for pronto, probably espresso too	12:49
@juliusb	i made it buffer instructions wherever it could, which is just annoying, because you have to cover all the corner cases	12:50
@juliusb	it justs adds a lot of complexity for not much gain IMO, and if you're tightly coupled with the memory, then it's no biggie	12:50
@juliusb	I honestly reckon that new fetch stage would go well with the cache for the prontoespresso	12:51
@juliusb	in fact you could do 0 latency branching I think	12:51
@stekern	cache for pronto?	12:55
jemarch	are you guys going to FSCONS next weekend?	12:59
@stekern	juliusb: how does your fetch stage work now?	13:02
@stekern	the problem with the cache is that you always have the one cycle latency before you know if it was a hit or miss	13:03
@juliusb	ah right	13:03
@juliusb	jemarch: yes olofk and I will be at FSCONS	13:03
@stekern	that was the whole reason for the closely coupled fetcher and cache	13:04
jemarch	juliusb: great. As it is now traditional a bunch of us GNU people will be there :)	13:04
@juliusb	stekern: it detects branch instructions as they come in, so deasserts the request out line immediately, ensuring as reduced as bus turnaround time as possible	13:05
@juliusb	jemarch: currently we're the only talk listed for the embedded track: https://fscons.org/2012/schedule/track/embeddded-systems/	13:05
@stekern	deassserts right after the ack on the fetched branch?	13:06
jemarch	juliusb: will be there.	13:06
@stekern	so you have a small decoder in your fetch stage? ;)	13:07
@stekern	I guess that makes sense for (pronto)espresso	13:07
@juliusb	stekern: yep	13:08
jemarch	hmm, now that I see the schedule it is quite disappointing	13:08
jemarch	https://fscons.org/2012/schedule/	13:08
@juliusb	jemarch: we can make our own fun :)	13:09
jemarch	sure :)	13:09
@juliusb	I think tehy're still getting it together	13:10
@juliusb	at least, so I'm lead to believe	13:10
jemarch	I just developed an algorithm for a _very_ efficient way to model a MMU TLB	13:10
jemarch	would be glad to share it :)	13:11
@juliusb	stekern: https://github.com/juliusbaxter/mor1kx/blob/master/rtl/verilog/mor1kx_fetch_prontoespresso.v#L161	13:11
@juliusb	jemarch: sounds good, you should explain that to me over a beer at some point :)	13:11
jemarch	are you putting openrisc in espresso machines?	13:12
@juliusb	stekern: we also can determine if flag-based-branches will occur, and the target for everything except l.jr and l.jalr (this is actually duplicating some logic I believe, possibly)	13:12
@juliusb	jemarch: no, this is the name of the pipelines	13:13
@juliusb	they've been named after coffees	13:13
jemarch	ah, funny :)	13:13
@juliusb	we have 3 pipelines at present, cappuccino, espresso and the pronto espresso, which is the same length as the espresso but without a delay (slot)	13:13
@juliusb	:)	13:13
@juliusb	stekern: anyway, so far this fetch unit passes all the tests and executes them in less time, maybe 10% in some cases	13:14
jemarch	is the delay slot optional in openrisc?	13:14
@juliusb	jemarch: it is now	13:14
jemarch	heh, I see	13:14
@juliusb	we've agreed on some architectural modifications which allow this	13:14
jemarch	that will have a big impact in the toolchain, simulator, etc	13:15
@juliusb	sure, but the work has already been done	13:16
@juliusb	we have switch, -mno-delay to emit code without delay slot	13:16
jemarch	yep	13:16
jemarch	I was thinking on optimizations in the simulator	13:16
@juliusb	simulator, RTL models, etc, have all been updated, kernel port not, but it's pretty trivial to modify the assembly code	13:16
@juliusb	well, the big thing for my mind is the RTL gets simpler	13:17
jemarch	sure	13:17
jemarch	you are using verilog isnt it?	13:17
@juliusb	and doing things like I've done, optimising the fetch unit for the small pipeline CPU is easier	13:17
@juliusb	yep	13:17
@juliusb	I'm no fan of VHDL	13:17
jemarch	oh	13:17
@juliusb	plus, there's no open source simulators	13:17
jemarch	I am :D	13:17
@juliusb	ghdl maybe	13:18
jemarch	GNU vhdl	13:18
@juliusb	does it work?	13:18
jemarch	yes	13:18
jemarch	it can run most of the grlib models	13:18
@juliusb	is it actively developed?	13:18
jemarch	more or less	13:18
jemarch	it is pretty stable	13:18
@juliusb	oh nice	13:18
@juliusb	still, I don't like VHDL, it does too much	13:18
@juliusb	all those types, uck	13:19
jemarch	I will use it the following months for TLM<->RTL cosimulation	13:19
@juliusb	great	13:19
@juliusb	but still, if you code in Verilog you can make use of Verilator	13:19
jemarch	never worked with SystemC before thou	13:19
jemarch	verilator is simply great :)	13:19
@juliusb	I agree	13:20
@juliusb	makes extremely fast models	13:20
@juliusb	really helps with verification	13:20
jemarch	so, many beer-topics for FSCONS	13:21
jemarch	:)	13:21
@juliusb	jemarch: indeed	13:21
@juliusb	looking forward to it	13:21
@juliusb	we can also debate the license I've put out since last year	13:21
jemarch	which is not GPL I guess	13:21
@juliusb	http://juliusbaxter.net/ohdl/	13:21
@juliusb	nope\	13:21
@juliusb	but you can relicense things under the OHDL to GPL if you desire	13:21
jemarch	looks interesting	13:22
@juliusb	that is permitted, like the Mozilla public license, on which it's based	13:22
jemarch	will read it later	13:22
@juliusb	nps	13:22
@juliusb	stekern: so the idea is to turn around the accesses on the bus as quick as possible by 1) deasserting the req as soon as we can, which we now achieve 2) putting out the new address as soon as we can, which we can do for all but l.j[al]r insns	13:23
@juliusb	it's kind of annoying actually, to have the 1 cycle turnaround for Wishbone, although I'm not certain if it's needed or not, Ive done it just to be safe (by this I mean deassert cyc/stb for 1 cycle between two accesses)	13:24
@juliusb	you kind-of need it to terminate burst accesses	13:24
@juliusb	although, now I think of it, the cache would work better if you put out the the addresses and then expected the result a cycle later	13:25
@stekern	well, you don't need to, but if you don't you'll get that annoying loop between cyc/stb and ack	13:25
@stekern	that's how the cahce works now	13:25
@juliusb	rather than, on Wishbone, where the burst accesses give you the data on the same cycle as Wishbone, AMBA doesn't though I think and is why designing for AMBA would be a bit easier to integrate with a cache	13:26
@stekern	the address is put out from the address stage 1 cycle before the fetch stage expects it	13:26
@juliusb	yep	13:26
@juliusb	for the cache, right?	13:26
@stekern	exactly	13:26
@juliusb	yep, so AMBA does this, the address cycle and the data cycle	13:26
@juliusb	it's a bit of a failing of Wishbone now that I've written for both, but oh well, for small enough doesn't it doesn't matter too much	13:27
@juliusb	s/small enough/small enough designs/	13:27
@juliusb	urgh, you get what I mean	13:27
@juliusb	hehe	13:27
@stekern	I'll get there when I've stared at the sentence long enough ;)	13:28
@juliusb	anyway, it is a bit bad to spread the decode logic like I've done, I think it's a bit of duplication, but hopefully it's worth it	13:28
@juliusb	the performance improvement I'ev seen is about 10% so far, although that's also including the linear bursting, which improves performance too	13:29
@juliusb	for long sequences of single-cycle instructions without branches you just stream them in	13:29
@juliusb	very nice	13:29
@juliusb	I also want to do stuff like posted writes in the LSU	13:29
@stekern	yes	13:30
@juliusb	anyway, it's a cool little core with optimisations like these	13:30
@stekern	I'm working on the icache 'bypasser' atm	13:30
@juliusb	and, I'm getting closer to playing with ideas like superscalar, you know, LSU is in use for a store and the following instruction is something which just uses the pipeline, so why can't we do both	13:31
@juliusb	actually, that's not superscalar	13:31
@juliusb	but, you know, trying to up through put from 1 insn at a time where possible	13:31
@juliusb	it would be easy I reckon, to take a lot of pairs and execute them simultaneously	13:32
@juliusb	store followed by an arithmetic op or a branch or something, are a good one	13:34
@stekern	then I want to look at splitting up mem and wb stage	13:34
@juliusb	yes?	13:34
@juliusb	out of order stuff could be possible to do if you wanted to do a known-single-cycle op while you're doing your load	13:35
@stekern	the pipeline forwarding probably needs an overlook as well, wouldn't it be better to take the result direct from the stage instead of saving it in the rf	13:38
@stekern	?	13:38
@juliusb	yeah you'd probably need to do that. In single cycle, you're safe to assume that whatever you just did will be back in the RF, but if you're doing anything tricky, yes, perhaps you can't assume that	13:39
@juliusb	and so would be good to store intermediate results from LSU/ALU in those stages	13:39
@juliusb	now consider debug :(	13:40
@juliusb	like, software or hardware breakpoints	13:40
@juliusb	you'd have to have 2 running modes, one like, non-debug mode, one debug mode	13:40
@juliusb	where you kind-of revert back to single-issue	13:40
@juliusb	I would argue debugging tricky-mode issues is hardware debugging, and you probably wouldn't try at all to make it debugabble by software debugging	13:41
@stekern	tricky-mode issues?	13:42
@juliusb	like, out of order, ALU-while-LSU etc.	13:42
@juliusb	any tricky pipeline handling to try and speed things up	13:42
@stekern	ah, yeah	14:02
LoneTech	hello	14:37
_franck_	hi	14:43
@juliusb	LoneTech: hi Yann	15:31
@stekern	yes, icache bypasser is starting to pass tests!	17:49
@juliusb	:)	18:35
@stekern	still failing some...	18:49
@stekern	it's not completely trivial, I could of course just make it increadibly slow and simple, but I'd want it to be able to make use of the bursting of the wb_if	18:49
@stekern	yay, all tests pass	19:15
@olofk	juliusb: Get off your lazy ass and apply the or1ktrace patch I sent you ;)	21:22
@juliusb	sir yes sir!	23:03
@juliusb	sorry, is guy fawkes night here	23:28
@juliusb	been out watching fireworks	23:28
@juliusb	and now everyone is letting them off around the neighbourhood	23:28
@juliusb	olofk: you there?	23:35
@juliusb	I've added --enable-shared to the first build stage here: http://opencores.org/or1k/OpenRISC_GNU_tool_chain#Installation_of_development_versions	23:40
@juliusb	and it didn't work first go :(	23:40
@juliusb	just deleted my half-built version and am giving it another go	23:42

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!