IRC logs for #openrisc Monday, 2012-11-05

@juliusbcool, made mor1kx-prontoespresso CPU go to sleep when you have a l.j 0 instruction01:38
@juliusband wakes up on tick and PIC interrupts01:38
@stekernjuliusb: (sleep) is that just for testing or how do you intend to "continue" after the sleep?05:22
@stekernjuliusb: configurable burst length sounds good06:30
@stekernand I agree that it probably will help shake out bugs running the tests with different settings there06:30
@stekernregarding figuring out several months later what you've done can be a good form of masochism, I'm usually inclined to do cleanup when I try to understand what I've done06:34
@stekern(finding some cruft in the icache atm)06:34
@juliusbstekern: sleep is what I'm calling the fetch stage shutting down and not fetching anymore11:40
@juliusban interrupt causes fetching to begin from either the PIC or tick vector11:40
@juliusbit's a very small patch to put in that functionality, actually11:43
_franck_it becomes a big patch when you commit white space / tab changes ;)11:48
@juliusbya :/11:48
@juliusbso in my rewrite of the fetch unit for pronto, I have started to use easier-to-read coding style of big if-else statements11:48
@juliusbrather than single lines with a bunch of mux statements in them11:48
@juliusbmuch easier to follow and debug11:49
_franck_much better I think11:49
@juliusblike this crap:
@juliusbno more of that11:50
@juliusbit's just too annoying to debug11:50
_franck_your link doesn't work11:51
_franck_  jonmaste~11:51
_franck_yes it works11:52
_franck_yeah I really dislike this king of thing. The king of thing one can find in or1200 RTL11:53
@stekernyeah, I tried to stay away from those when I redid the cappuccino fetch stage ;)11:53
@stekernthis is the worse I got in there:
@stekernbut it's hard in the fetcher, you find some corner case to some corner case and they easily build up11:54
_franck_once you've found them all, you rewrite it ;)11:54
@juliusb_franck_: Indeed :)11:55
@juliusbthat's a tame one stekern11:55
@stekern_franck_: that's exactly what happened ;)11:55
@stekernI think I rewrote that fetcher 3 or 4 times11:56
@juliusbthere's some nasty ones in the ctrl stages too11:56
@juliusbi synthesised with my new fetch stage, the critical path isn't in the or out of he fetch unit, so I'm happy :)11:56
@stekernwas it before?11:58
@stekernhaha, I just saw that signalname: awkward_transition_to_branch_target ;)12:08
@juliusbno, but considering I was doing a bunch of decoding of the incoming instruction for control of the next bus access cycle, i worried it might12:49
@juliusbyes, it was awkward! annoying branch right at the end of a burst12:49
@juliusbin hindsight I made the fetch unit far too complicated12:49
@juliusbfor pronto, probably espresso too12:49
@juliusbi made it buffer instructions wherever it could, which is just annoying, because you have to cover all the corner cases12:50
@juliusbit justs adds a lot of complexity for not much gain IMO, and if you're tightly coupled with the memory, then it's no biggie12:50
@juliusbI honestly reckon that new fetch stage would go well with the cache for the prontoespresso12:51
@juliusbin fact you could do 0 latency branching I think12:51
@stekerncache for pronto?12:55
jemarchare you guys going to FSCONS next weekend?12:59
@stekernjuliusb: how does your fetch stage work now?13:02
@stekernthe problem with the cache is that you always have the one cycle latency before you know if it was a hit or miss13:03
@juliusbah right13:03
@juliusbjemarch: yes olofk and I will be at FSCONS13:03
@stekernthat was the whole reason for the closely coupled fetcher and cache13:04
jemarchjuliusb:  great.  As it is now traditional a bunch of us GNU people will be there :)13:04
@juliusbstekern: it detects branch instructions as they come in, so deasserts the request out line immediately, ensuring as reduced as bus turnaround time as possible13:05
@juliusbjemarch: currently we're the only talk listed for the embedded track:
@stekerndeassserts right after the ack on the fetched branch?13:06
jemarchjuliusb: will be there.13:06
@stekernso you have a small decoder in your fetch stage? ;)13:07
@stekernI guess that makes sense for (pronto)espresso13:07
@juliusbstekern: yep13:08
jemarchhmm, now that I see the schedule it is quite disappointing13:08
@juliusbjemarch: we can make our own fun :)13:09
jemarchsure :)13:09
@juliusbI think tehy're still getting it together13:10
@juliusbat least, so I'm lead to believe13:10
jemarchI just developed an algorithm for a _very_ efficient way to model a MMU TLB13:10
jemarchwould be glad to share it :)13:11
@juliusbjemarch: sounds good, you should explain that to me over a beer at some point :)13:11
jemarchare you putting openrisc in espresso machines?13:12
@juliusbstekern: we also can determine if flag-based-branches will occur, and the target for everything except l.jr and l.jalr (this is actually duplicating some logic I believe, possibly)13:12
@juliusbjemarch: no, this is the name of the pipelines13:13
@juliusbthey've been named after coffees13:13
jemarchah, funny :)13:13
@juliusbwe have 3 pipelines at present, cappuccino, espresso and the pronto espresso, which is the same length as the espresso but without a delay (slot)13:13
@juliusbstekern: anyway, so far this fetch unit passes all the tests and executes them in less time, maybe 10% in some cases13:14
jemarchis the delay slot optional in openrisc?13:14
@juliusbjemarch: it is now13:14
jemarchheh, I see13:14
@juliusbwe've agreed on some architectural modifications which allow this13:14
jemarchthat will have a big impact in the toolchain, simulator, etc13:15
@juliusbsure, but the work has already been done13:16
@juliusbwe have  switch, -mno-delay to emit code without delay slot13:16
jemarchI was thinking on optimizations in the simulator13:16
@juliusbsimulator, RTL models, etc, have all been updated, kernel port not, but it's pretty trivial to modify the assembly code13:16
@juliusbwell, the big thing for my mind is the RTL gets simpler13:17
jemarchyou are using verilog isnt it?13:17
@juliusband doing things like I've done, optimising the fetch unit for the small pipeline CPU is easier13:17
@juliusbI'm no fan of VHDL13:17
@juliusbplus, there's no open source simulators13:17
jemarchI am :D13:17
@juliusbghdl maybe13:18
jemarchGNU vhdl13:18
@juliusbdoes it work?13:18
jemarchit can run most of the grlib models13:18
@juliusbis it actively developed?13:18
jemarchmore or less13:18
jemarchit is pretty stable13:18
@juliusboh nice13:18
@juliusbstill, I don't like VHDL, it does too much13:18
@juliusball those types, uck13:19
jemarchI will use it the following months for TLM<->RTL cosimulation13:19
@juliusbbut still, if you code in Verilog you can make use of Verilator13:19
jemarchnever worked with SystemC before thou13:19
jemarchverilator is simply great :)13:19
@juliusbI agree13:20
@juliusbmakes extremely fast models13:20
@juliusbreally helps with verification13:20
jemarchso, many beer-topics for FSCONS13:21
@juliusbjemarch: indeed13:21
@juliusblooking forward to it13:21
@juliusbwe can also debate the license I've put out since last year13:21
jemarchwhich is not GPL I guess13:21
@juliusbbut you can relicense things under the OHDL to GPL if you desire13:21
jemarchlooks interesting13:22
@juliusbthat is permitted, like the Mozilla public license, on which it's based13:22
jemarchwill read it later13:22
@juliusbstekern: so the idea is to turn around the accesses on the bus as quick as possible by 1) deasserting the req as soon as we can, which we now achieve 2) putting out the new address as soon as we can, which we can do for all but l.j[al]r insns13:23
@juliusbit's kind of annoying actually, to have the 1 cycle turnaround for Wishbone, although I'm not certain if it's needed or not, Ive done it just to be safe (by this I mean deassert cyc/stb for 1 cycle between two accesses)13:24
@juliusbyou kind-of need it to terminate burst accesses13:24
@juliusbalthough, now I think of it, the cache would work better if you put out the the addresses and then expected the result a cycle later13:25
@stekernwell, you don't *need* to, but if you don't you'll get that annoying loop between cyc/stb and ack13:25
@stekernthat's how the cahce works now13:25
@juliusbrather than, on Wishbone, where the burst accesses give you the data on the same cycle as Wishbone, AMBA doesn't though I think and is why designing for AMBA would be a bit easier to integrate with a cache13:26
@stekernthe address is put out from the address stage 1 cycle before the fetch stage expects it13:26
@juliusbfor the cache, right?13:26
@juliusbyep, so AMBA does this, the address cycle and the data cycle13:26
@juliusbit's a bit of a failing of Wishbone now that I've written for both, but oh well, for small enough doesn't it doesn't matter too much13:27
@juliusbs/small enough/small enough designs/13:27
@juliusburgh, you get what I mean13:27
@stekernI'll get there when I've stared at the sentence long enough ;)13:28
@juliusbanyway, it is a bit bad to spread the decode logic like I've done, I think it's a bit of duplication, but hopefully it's worth it13:28
@juliusbthe performance improvement I'ev seen is about 10% so far, although that's also including the linear bursting, which improves performance too13:29
@juliusbfor long sequences of single-cycle instructions without branches you just stream them in13:29
@juliusbvery nice13:29
@juliusbI also want to do stuff like posted writes in the LSU13:29
@juliusbanyway, it's a cool little core with optimisations like these13:30
@stekernI'm working on the icache 'bypasser' atm13:30
@juliusband, I'm getting closer to playing with ideas like superscalar, you know, LSU  is in use for a store and the following instruction is something which just uses the pipeline, so why can't we do both13:31
@juliusbactually, that's not superscalar13:31
@juliusbbut, you know, trying to up through put from 1 insn at a time where possible13:31
@juliusbit would be easy I reckon, to take a lot of pairs and execute them simultaneously13:32
@juliusbstore followed by an arithmetic op or a branch or something, are a good one13:34
@stekernthen I want to look at splitting up mem and wb stage13:34
@juliusbout of order stuff could be possible to do if you wanted to do a known-single-cycle op while you're doing your load13:35
@stekernthe pipeline forwarding probably needs an overlook as well, wouldn't it be better to take the result direct from the stage instead of saving it in the rf13:38
@juliusbyeah you'd probably need to do that. In single cycle, you're safe to assume that whatever you just did will be back in the RF, but if you're doing anything tricky, yes, perhaps you can't assume that13:39
@juliusband so would be good to store intermediate results from LSU/ALU in those stages13:39
@juliusbnow consider debug :(13:40
@juliusblike, software or hardware breakpoints13:40
@juliusbyou'd have to have 2 running modes, one like, non-debug mode, one debug mode13:40
@juliusbwhere you kind-of revert back to single-issue13:40
@juliusbI would argue debugging tricky-mode issues is hardware debugging, and you probably wouldn't try at all to make it debugabble by software debugging13:41
@stekerntricky-mode issues?13:42
@juliusblike, out of order, ALU-while-LSU etc.13:42
@juliusbany tricky pipeline handling to try and speed things up13:42
@stekernah, yeah14:02
@juliusbLoneTech: hi Yann15:31
@stekernyes, icache bypasser is starting to pass tests!17:49
@stekernstill failing some...18:49
@stekernit's not completely trivial, I could of course just make it increadibly slow and simple, but I'd want it to be able to make use of the bursting of the wb_if18:49
@stekernyay, all tests pass19:15
@olofkjuliusb: Get off your lazy ass and apply the or1ktrace patch I sent you ;)21:22
@juliusbsir yes sir!23:03
@juliusbsorry, is guy fawkes night here23:28
@juliusbbeen out watching fireworks23:28
@juliusband now everyone is letting them off around the neighbourhood23:28
@juliusbolofk: you there?23:35
@juliusbI've added --enable-shared to the first build stage here:
@juliusband it didn't work first go :(23:40
@juliusbjust deleted my half-built version and am giving it another go23:42

Generated by 2.15.2 by Marius Gedminas - find it at!