@juliusb | cool, made mor1kx-prontoespresso CPU go to sleep when you have a l.j 0 instruction | 01:38 |
---|---|---|
@juliusb | and wakes up on tick and PIC interrupts | 01:38 |
@stekern | juliusb: (sleep) is that just for testing or how do you intend to "continue" after the sleep? | 05:22 |
@stekern | juliusb: configurable burst length sounds good | 06:30 |
@stekern | and I agree that it probably will help shake out bugs running the tests with different settings there | 06:30 |
@stekern | regarding figuring out several months later what you've done can be a good form of masochism, I'm usually inclined to do cleanup when I try to understand what I've done | 06:34 |
@stekern | (finding some cruft in the icache atm) | 06:34 |
@juliusb | stekern: sleep is what I'm calling the fetch stage shutting down and not fetching anymore | 11:40 |
@juliusb | an interrupt causes fetching to begin from either the PIC or tick vector | 11:40 |
@juliusb | it's a very small patch to put in that functionality, actually | 11:43 |
@juliusb | https://github.com/juliusbaxter/mor1kx/commit/6ef4dd4722df1a7d5d4d306f9de47bed462abbfd | 11:43 |
_franck_ | it becomes a big patch when you commit white space / tab changes ;) | 11:48 |
@juliusb | ya :/ | 11:48 |
@juliusb | so in my rewrite of the fetch unit for pronto, I have started to use easier-to-read coding style of big if-else statements | 11:48 |
@juliusb | rather than single lines with a bunch of mux statements in them | 11:48 |
@juliusb | much easier to follow and debug | 11:49 |
_franck_ | much better I think | 11:49 |
@juliusb | like this crap: https://github.com/juliusbaxter/mor1kx/blob/8826f37b0c1391e8748a5a6a53d49a3a72676c84/rtl/verilog/mor1kx_fetch_prontoespresso.v#L137 | 11:50 |
@juliusb | no more of that | 11:50 |
@juliusb | it's just too annoying to debug | 11:50 |
@juliusb | haha | 11:50 |
_franck_ | your link doesn't work | 11:51 |
_franck_ | https://github.com/juliusbaxter/mor1kx/blob/8826f37b0c1391e8748a5a6a53d49a3a72676c84/rtl/verilog/mor1kx_fetch_pro jonmaste~ | 11:51 |
_franck_ | oups | 11:52 |
_franck_ | yes it works | 11:52 |
_franck_ | yeah I really dislike this king of thing. The king of thing one can find in or1200 RTL | 11:53 |
@stekern | yeah, I tried to stay away from those when I redid the cappuccino fetch stage ;) | 11:53 |
@stekern | this is the worse I got in there: https://github.com/openrisc/mor1kx/blob/master/rtl/verilog/mor1kx_fetch_cappuccino.v#L128 | 11:53 |
@stekern | but it's hard in the fetcher, you find some corner case to some corner case and they easily build up | 11:54 |
_franck_ | once you've found them all, you rewrite it ;) | 11:54 |
@juliusb | _franck_: Indeed :) | 11:55 |
@juliusb | that's a tame one stekern | 11:55 |
@stekern | _franck_: that's exactly what happened ;) | 11:55 |
@stekern | I think I rewrote that fetcher 3 or 4 times | 11:56 |
@juliusb | there's some nasty ones in the ctrl stages too | 11:56 |
@juliusb | haha | 11:56 |
@juliusb | i synthesised with my new fetch stage, the critical path isn't in the or out of he fetch unit, so I'm happy :) | 11:56 |
@stekern | was it before? | 11:58 |
@stekern | haha, I just saw that signalname: awkward_transition_to_branch_target ;) | 12:08 |
@juliusb | no, but considering I was doing a bunch of decoding of the incoming instruction for control of the next bus access cycle, i worried it might | 12:49 |
@juliusb | yes, it was awkward! annoying branch right at the end of a burst | 12:49 |
@juliusb | in hindsight I made the fetch unit far too complicated | 12:49 |
@juliusb | for pronto, probably espresso too | 12:49 |
@juliusb | i made it buffer instructions wherever it could, which is just annoying, because you have to cover all the corner cases | 12:50 |
@juliusb | it justs adds a lot of complexity for not much gain IMO, and if you're tightly coupled with the memory, then it's no biggie | 12:50 |
@juliusb | I honestly reckon that new fetch stage would go well with the cache for the prontoespresso | 12:51 |
@juliusb | in fact you could do 0 latency branching I think | 12:51 |
@stekern | cache for pronto? | 12:55 |
jemarch | are you guys going to FSCONS next weekend? | 12:59 |
@stekern | juliusb: how does your fetch stage work now? | 13:02 |
@stekern | the problem with the cache is that you always have the one cycle latency before you know if it was a hit or miss | 13:03 |
@juliusb | ah right | 13:03 |
@juliusb | jemarch: yes olofk and I will be at FSCONS | 13:03 |
@stekern | that was the whole reason for the closely coupled fetcher and cache | 13:04 |
jemarch | juliusb: great. As it is now traditional a bunch of us GNU people will be there :) | 13:04 |
@juliusb | stekern: it detects branch instructions as they come in, so deasserts the request out line immediately, ensuring as reduced as bus turnaround time as possible | 13:05 |
@juliusb | jemarch: currently we're the only talk listed for the embedded track: https://fscons.org/2012/schedule/track/embeddded-systems/ | 13:05 |
@stekern | deassserts right after the ack on the fetched branch? | 13:06 |
jemarch | juliusb: will be there. | 13:06 |
@stekern | so you have a small decoder in your fetch stage? ;) | 13:07 |
@stekern | I guess that makes sense for (pronto)espresso | 13:07 |
@juliusb | stekern: yep | 13:08 |
jemarch | hmm, now that I see the schedule it is quite disappointing | 13:08 |
jemarch | https://fscons.org/2012/schedule/ | 13:08 |
@juliusb | jemarch: we can make our own fun :) | 13:09 |
jemarch | sure :) | 13:09 |
@juliusb | I think tehy're still getting it together | 13:10 |
@juliusb | at least, so I'm lead to believe | 13:10 |
jemarch | I just developed an algorithm for a _very_ efficient way to model a MMU TLB | 13:10 |
jemarch | would be glad to share it :) | 13:11 |
@juliusb | stekern: https://github.com/juliusbaxter/mor1kx/blob/master/rtl/verilog/mor1kx_fetch_prontoespresso.v#L161 | 13:11 |
@juliusb | jemarch: sounds good, you should explain that to me over a beer at some point :) | 13:11 |
jemarch | are you putting openrisc in espresso machines? | 13:12 |
@juliusb | stekern: we also can determine if flag-based-branches will occur, and the target for everything except l.jr and l.jalr (this is actually duplicating some logic I believe, possibly) | 13:12 |
@juliusb | jemarch: no, this is the name of the pipelines | 13:13 |
@juliusb | they've been named after coffees | 13:13 |
jemarch | ah, funny :) | 13:13 |
@juliusb | we have 3 pipelines at present, cappuccino, espresso and the pronto espresso, which is the same length as the espresso but without a delay (slot) | 13:13 |
@juliusb | :) | 13:13 |
@juliusb | stekern: anyway, so far this fetch unit passes all the tests and executes them in less time, maybe 10% in some cases | 13:14 |
jemarch | is the delay slot optional in openrisc? | 13:14 |
@juliusb | jemarch: it is now | 13:14 |
jemarch | heh, I see | 13:14 |
@juliusb | we've agreed on some architectural modifications which allow this | 13:14 |
jemarch | that will have a big impact in the toolchain, simulator, etc | 13:15 |
@juliusb | sure, but the work has already been done | 13:16 |
@juliusb | we have switch, -mno-delay to emit code without delay slot | 13:16 |
jemarch | yep | 13:16 |
jemarch | I was thinking on optimizations in the simulator | 13:16 |
@juliusb | simulator, RTL models, etc, have all been updated, kernel port not, but it's pretty trivial to modify the assembly code | 13:16 |
@juliusb | well, the big thing for my mind is the RTL gets simpler | 13:17 |
jemarch | sure | 13:17 |
jemarch | you are using verilog isnt it? | 13:17 |
@juliusb | and doing things like I've done, optimising the fetch unit for the small pipeline CPU is easier | 13:17 |
@juliusb | yep | 13:17 |
@juliusb | I'm no fan of VHDL | 13:17 |
jemarch | oh | 13:17 |
@juliusb | plus, there's no open source simulators | 13:17 |
jemarch | I am :D | 13:17 |
@juliusb | ghdl maybe | 13:18 |
jemarch | GNU vhdl | 13:18 |
@juliusb | does it work? | 13:18 |
jemarch | yes | 13:18 |
jemarch | it can run most of the grlib models | 13:18 |
@juliusb | is it actively developed? | 13:18 |
jemarch | more or less | 13:18 |
jemarch | it is pretty stable | 13:18 |
@juliusb | oh nice | 13:18 |
@juliusb | still, I don't like VHDL, it does too much | 13:18 |
@juliusb | all those types, uck | 13:19 |
jemarch | I will use it the following months for TLM<->RTL cosimulation | 13:19 |
@juliusb | great | 13:19 |
@juliusb | but still, if you code in Verilog you can make use of Verilator | 13:19 |
jemarch | never worked with SystemC before thou | 13:19 |
jemarch | verilator is simply great :) | 13:19 |
@juliusb | I agree | 13:20 |
@juliusb | makes extremely fast models | 13:20 |
@juliusb | really helps with verification | 13:20 |
jemarch | so, many beer-topics for FSCONS | 13:21 |
jemarch | :) | 13:21 |
@juliusb | jemarch: indeed | 13:21 |
@juliusb | looking forward to it | 13:21 |
@juliusb | we can also debate the license I've put out since last year | 13:21 |
jemarch | which is not GPL I guess | 13:21 |
@juliusb | http://juliusbaxter.net/ohdl/ | 13:21 |
@juliusb | nope\ | 13:21 |
@juliusb | but you can relicense things under the OHDL to GPL if you desire | 13:21 |
jemarch | looks interesting | 13:22 |
@juliusb | that is permitted, like the Mozilla public license, on which it's based | 13:22 |
jemarch | will read it later | 13:22 |
@juliusb | nps | 13:22 |
@juliusb | stekern: so the idea is to turn around the accesses on the bus as quick as possible by 1) deasserting the req as soon as we can, which we now achieve 2) putting out the new address as soon as we can, which we can do for all but l.j[al]r insns | 13:23 |
@juliusb | it's kind of annoying actually, to have the 1 cycle turnaround for Wishbone, although I'm not certain if it's needed or not, Ive done it just to be safe (by this I mean deassert cyc/stb for 1 cycle between two accesses) | 13:24 |
@juliusb | you kind-of need it to terminate burst accesses | 13:24 |
@juliusb | although, now I think of it, the cache would work better if you put out the the addresses and then expected the result a cycle later | 13:25 |
@stekern | well, you don't *need* to, but if you don't you'll get that annoying loop between cyc/stb and ack | 13:25 |
@stekern | that's how the cahce works now | 13:25 |
@juliusb | rather than, on Wishbone, where the burst accesses give you the data on the same cycle as Wishbone, AMBA doesn't though I think and is why designing for AMBA would be a bit easier to integrate with a cache | 13:26 |
@stekern | the address is put out from the address stage 1 cycle before the fetch stage expects it | 13:26 |
@juliusb | yep | 13:26 |
@juliusb | for the cache, right? | 13:26 |
@stekern | exactly | 13:26 |
@juliusb | yep, so AMBA does this, the address cycle and the data cycle | 13:26 |
@juliusb | it's a bit of a failing of Wishbone now that I've written for both, but oh well, for small enough doesn't it doesn't matter too much | 13:27 |
@juliusb | s/small enough/small enough designs/ | 13:27 |
@juliusb | urgh, you get what I mean | 13:27 |
@juliusb | hehe | 13:27 |
@stekern | I'll get there when I've stared at the sentence long enough ;) | 13:28 |
@juliusb | anyway, it is a bit bad to spread the decode logic like I've done, I think it's a bit of duplication, but hopefully it's worth it | 13:28 |
@juliusb | the performance improvement I'ev seen is about 10% so far, although that's also including the linear bursting, which improves performance too | 13:29 |
@juliusb | for long sequences of single-cycle instructions without branches you just stream them in | 13:29 |
@juliusb | very nice | 13:29 |
@juliusb | I also want to do stuff like posted writes in the LSU | 13:29 |
@stekern | yes | 13:30 |
@juliusb | anyway, it's a cool little core with optimisations like these | 13:30 |
@stekern | I'm working on the icache 'bypasser' atm | 13:30 |
@juliusb | and, I'm getting closer to playing with ideas like superscalar, you know, LSU is in use for a store and the following instruction is something which just uses the pipeline, so why can't we do both | 13:31 |
@juliusb | actually, that's not superscalar | 13:31 |
@juliusb | but, you know, trying to up through put from 1 insn at a time where possible | 13:31 |
@juliusb | it would be easy I reckon, to take a lot of pairs and execute them simultaneously | 13:32 |
@juliusb | store followed by an arithmetic op or a branch or something, are a good one | 13:34 |
@stekern | then I want to look at splitting up mem and wb stage | 13:34 |
@juliusb | yes? | 13:34 |
@juliusb | out of order stuff could be possible to do if you wanted to do a known-single-cycle op while you're doing your load | 13:35 |
@stekern | the pipeline forwarding probably needs an overlook as well, wouldn't it be better to take the result direct from the stage instead of saving it in the rf | 13:38 |
@stekern | ? | 13:38 |
@juliusb | yeah you'd probably need to do that. In single cycle, you're safe to assume that whatever you just did will be back in the RF, but if you're doing anything tricky, yes, perhaps you can't assume that | 13:39 |
@juliusb | and so would be good to store intermediate results from LSU/ALU in those stages | 13:39 |
@juliusb | now consider debug :( | 13:40 |
@juliusb | like, software or hardware breakpoints | 13:40 |
@juliusb | you'd have to have 2 running modes, one like, non-debug mode, one debug mode | 13:40 |
@juliusb | where you kind-of revert back to single-issue | 13:40 |
@juliusb | I would argue debugging tricky-mode issues is hardware debugging, and you probably wouldn't try at all to make it debugabble by software debugging | 13:41 |
@stekern | tricky-mode issues? | 13:42 |
@juliusb | like, out of order, ALU-while-LSU etc. | 13:42 |
@juliusb | any tricky pipeline handling to try and speed things up | 13:42 |
@stekern | ah, yeah | 14:02 |
LoneTech | hello | 14:37 |
_franck_ | hi | 14:43 |
@juliusb | LoneTech: hi Yann | 15:31 |
@stekern | yes, icache bypasser is starting to pass tests! | 17:49 |
@juliusb | :) | 18:35 |
@stekern | still failing some... | 18:49 |
@stekern | it's not completely trivial, I could of course just make it increadibly slow and simple, but I'd want it to be able to make use of the bursting of the wb_if | 18:49 |
@stekern | yay, all tests pass | 19:15 |
@olofk | juliusb: Get off your lazy ass and apply the or1ktrace patch I sent you ;) | 21:22 |
@juliusb | sir yes sir! | 23:03 |
@juliusb | sorry, is guy fawkes night here | 23:28 |
@juliusb | been out watching fireworks | 23:28 |
@juliusb | and now everyone is letting them off around the neighbourhood | 23:28 |
@juliusb | olofk: you there? | 23:35 |
@juliusb | I've added --enable-shared to the first build stage here: http://opencores.org/or1k/OpenRISC_GNU_tool_chain#Installation_of_development_versions | 23:40 |
@juliusb | and it didn't work first go :( | 23:40 |
@juliusb | just deleted my half-built version and am giving it another go | 23:42 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!