--- Log opened Mon May 13 00:00:56 2013 | ||
-!- Netsplit *.net <-> *.split quits: zhai365 | 08:27 | |
-!- Netsplit *.net <-> *.split quits: zhai365 | 08:36 | |
-!- Netsplit over, joins: zhai365 | 08:37 | |
-!- Netsplit *.net <-> *.split quits: mboehnert | 11:21 | |
-!- Netsplit over, joins: mboehnert | 11:25 | |
--- Log closed Mon May 13 12:11:55 2013 | ||
--- Log opened Mon May 13 12:12:09 2013 | ||
-!- Irssi: #openrisc: Total of 22 nicks [0 ops, 0 halfops, 0 voices, 22 normal] | 12:12 | |
-!- Irssi: Join to #openrisc was synced in 18 secs | 12:12 | |
stekern | ok, that branch prediction bug is at least fixed now... | 13:21 |
---|---|---|
stekern | still no luck with the nfsboot | 13:22 |
--- Log closed Mon May 13 13:44:07 2013 | ||
--- Log opened Mon May 13 13:44:22 2013 | ||
-!- Irssi: #openrisc: Total of 21 nicks [0 ops, 0 halfops, 0 voices, 21 normal] | 13:44 | |
-!- Irssi: Join to #openrisc was synced in 15 secs | 13:44 | |
-!- Netsplit *.net <-> *.split quits: hno | 13:59 | |
-!- Logxen- is now known as Logxen | 14:30 | |
--- Log closed Mon May 13 17:13:42 2013 | ||
--- Log opened Mon May 13 17:13:57 2013 | ||
-!- Irssi: #openrisc: Total of 22 nicks [0 ops, 0 halfops, 0 voices, 22 normal] | 17:13 | |
!leguin.freenode.net [freenode-info] help freenode weed out clonebots -- please register your IRC nick and auto-identify: http://freenode.net/faq.shtml#nicksetup | 17:13 | |
-!- Irssi: Join to #openrisc was synced in 19 secs | 17:14 | |
-!- Netsplit *.net <-> *.split quits: juliusb, trem | 17:18 | |
-!- Netsplit over, joins: trem | 17:21 | |
-!- Netsplit *.net <-> *.split quits: larks | 17:43 | |
juliusb_ | man this has been netsplitting badly lately | 19:55 |
stekern | I'm netsplitting by kicking the ethernet switch power plug out of it's socket | 20:01 |
stekern | ...and as I said, the nfsboot problem was something really silly... 'rw' is not a valid option | 20:03 |
juliusb_ | ahh is that all it was?? :P | 20:32 |
stekern | yup :/ | 20:35 |
stekern | but branch prediction seems stable now at least, and with the nfs root working, I can more easily test some more serious stuff than just running 'top' ;) | 20:38 |
juliusb_ | hardcore | 20:45 |
juliusb_ | that's some serious work | 20:45 |
juliusb_ | I haven't had much tiem lately :( still struggling to get my little flops cache working | 20:46 |
juliusb_ | so many annoying corner cases | 20:46 |
juliusb_ | it works with like 2,4,8 instructions but 16 it breaks | 20:46 |
juliusb_ | on eceptiosn | 20:46 |
juliusb_ | exceptions | 20:46 |
juliusb_ | so your stuff with branch prediction is faster? | 20:46 |
stekern | faster than stalling on l.sfxx; l.b(n)f, yes | 20:47 |
stekern | slower than resolving the branch completely in decode | 20:48 |
stekern | this is still completely based solely on coremark though | 20:48 |
stekern | the numbers are roughly: 80 for stall on sf;bf, 90 for resolving in decode and 87 with branch prediction | 20:50 |
juliusb_ | ah cool | 20:52 |
juliusb_ | that's pretty close to resolving in decode! | 20:52 |
stekern | it's just a simple static backwards taken, forward not, and the prediction is done in decode and then the real resolving is in execute | 20:53 |
juliusb_ | ah right | 20:54 |
stekern | and I implemented it as 'flag prediction', so in execute you just check if the real flag is equal to the predicted flag | 20:54 |
juliusb_ | is it a lot more logic? | 20:54 |
stekern | no, it's pretty simple logic and I haven't even tried to optimize it yet | 20:55 |
juliusb_ | cool | 20:56 |
stekern | the actual prediction logic is very simple, just compare the msb in the imm field | 20:56 |
stekern | (and check if it's a bf or bnf to get the predicted flag) | 20:57 |
stekern | the control logic in the fetcher is a bit hairier | 20:57 |
juliusb_ | is it not complex and annoying to cancel the fetch that's wrong? | 20:57 |
juliusb_ | yeah, that was my guess | 20:57 |
stekern | basically, I'm looking if there is a cond branch in decode stage and if so and the mispredicted signal goes high, then gate all signals out to decode | 21:00 |
stekern | that's about it, but of course there were a lot of cases I had forgot about ;) | 21:00 |
stekern | the last bug was that I forgot to gate the immu exceptions | 21:01 |
juliusb_ | yeah, i nkow what you mean, the major portion of the update are there, it's just neatening around the edges | 21:02 |
stekern | everything is in three (messy) commits here: https://github.com/skristiansson/mor1kx/commits/master | 21:03 |
stekern | espresso fails this test btw: http://oompa.chokladfabriken.org/tmp/or1k-sfbf.S | 21:07 |
stekern | I might take closer look at it, but if I forget =P | 21:07 |
juliusb_ | nps I should check it out | 21:08 |
juliusb_ | can you submit ap ull request to add that test to mor1kx-dev-env? | 21:08 |
stekern | yeah, I will | 21:09 |
stekern | I think there is one pending too | 21:10 |
stekern | (pull request) | 21:10 |
juliusb_ | hmmm, I'd better check that | 21:15 |
juliusb_ | you're right | 21:16 |
juliusb_ | I don't like to say it... but maybe I just made my flop cache finally work?! | 21:18 |
stekern | that test is in that pull request too now | 21:18 |
juliusb_ | (regressino still running) | 21:18 |
juliusb_ | great, thanks man | 21:18 |
stekern | heh, famous last words(?) | 21:18 |
juliusb_ | It's so annoying, in the end because of the way the pipeline is basically made expecting delays on the bus when branches occur, I had to insert delays into the logic to handle the feeding of instructions out of the cache into the pipeline | 21:19 |
juliusb_ | feels kinda stupid | 21:19 |
juliusb_ | ic ould feed that instructions into it immediately, but no, somewhere somehow it'll break | 21:19 |
juliusb_ | really needs a rewrite for the fast-as-possible pipeline | 21:19 |
juliusb_ | you know what it is stekern , it's some of your adopted homeland's spirit. I'm listening to Sibelius' Finlandia :) | 21:20 |
juliusb_ | There's something bout getting OpenRISC working and Finland | 21:20 |
juliusb_ | I will keep it on loop while the regression runs | 21:21 |
juliusb_ | :) | 21:21 |
juliusb_ | to be hoenst, I was thinking of ditching espresso | 21:21 |
juliusb_ | also | 21:21 |
juliusb_ | I mean, if you want the super smallest implementation, you might as well just go for pronto | 21:22 |
juliusb_ | maybe it's not hard to keep it | 21:22 |
juliusb_ | it's just more work | 21:22 |
stekern | are they really that different? | 21:25 |
juliusb_ | not really, but getting more different | 21:25 |
juliusb_ | I just don't know who would use espresso over pronto | 21:25 |
stekern | I've been thinking about if the delay slot perhaps could just be a parameter | 21:26 |
stekern | in cappuccino | 21:26 |
juliusb_ | oh cool :) | 21:26 |
stekern | I mean there's not _that_ much delay slot specific stuff, it's in the fetcher and the exceptions | 21:27 |
stekern | I think I wait until the it's boringly stable with the delay slot before I set out on that though | 21:28 |
stekern | -the | 21:28 |
juliusb_ | good plan | 21:28 |
stekern | but right now there's only one bug I know of, setting caches to 16kb on the atlys board makes it act up | 21:30 |
stekern | oh, I almost forgot, I've been playing with cappuccino and milkymist again | 21:31 |
juliusb_ | oh yes? | 21:32 |
juliusb_ | an even milkier coffee? | 21:32 |
juliusb_ | :) | 21:33 |
stekern | with the set flag critical path cut away, we're able to run at 83 Mhz (that's what the milkymist-ng soc runs at) and we get about the same results in coremark as lm32 | 21:33 |
stekern | with the branch prediction in place | 21:33 |
stekern | lm32: http://pastie.org/7821264 | 21:33 |
stekern | cappuccino: http://pastie.org/7827438 | 21:34 |
stekern | to be fair, I ran with it compiled with gcc 4.5.1 too: http://pastie.org/7827601 | 21:35 |
stekern | -with | 21:35 |
stekern | we're about 1200 LUTs larger than lm32 in a comparable setup though :/ | 21:36 |
juliusb_ | hmm interesting | 21:42 |
juliusb_ | so cappuccino performance is the same then? | 21:44 |
juliusb_ | im surprised lm32 is that good, I thought it was about the same as the OR1200 | 21:44 |
stekern | in coremark at least, yes | 21:45 |
juliusb_ | hah, I'm just going to do some area comparisons for my new pronto design, and it's so nice running this quartus tool chain, it's so quick | 21:46 |
juliusb_ | in contrast, this afternoon, I kicked off just the backend for a Virtex 7 build (synthesis was done couple days ago, that takes about 8 hours) | 21:47 |
stekern | at least as long as you keep your fingers away from the speed optimisation buttons ;) | 21:47 |
juliusb_ | So placement took about 3 hours, now it's trying to route | 21:49 |
stekern | (same as or1200) naah, I've known all along that it's a good implementation | 21:49 |
juliusb_ | in an hour it's already done an initial route, and has just done a "Rip-and and Reroute" | 21:50 |
juliusb_ | these designs are beasts, these smaller oens are so much more manageable :) | 21:50 |
juliusb_ | oh yeah, why do you think the lm32 is good, just good ISA? | 21:50 |
stekern | no, the isa is pretty much the same as or1k | 21:51 |
juliusb_ | so the implementation is simply better then? | 21:51 |
stekern | than or1200? is that so hard? ;) | 21:52 |
juliusb_ | no, not at all haha | 21:52 |
juliusb_ | well, it's not that bad really | 21:52 |
juliusb_ | in terms of performance, maybe a bit big, but definitely not nice to understand what's going on | 21:52 |
stekern | they do a trick with the multiplier that we might...ehrm...lend | 21:54 |
juliusb_ | oh really? | 21:54 |
juliusb_ | btw how big is cappuccino in LC again? you want to get it under 4k ideally right? | 21:54 |
stekern | around 4800 | 21:55 |
juliusb_ | pronto with serial multiply, shifter, is 3479LC, 1317FF | 21:56 |
stekern | (mul) when you have the three-stage mul, instead of stalling execute stage, write the pipelined result in to the rf in wb stage | 21:57 |
stekern | and interlock if instructions after the mul needs the result | 21:57 |
stekern | isn't threestage mul smaller than serial on cyclone iv? | 21:58 |
juliusb_ | oh, not sure! | 21:58 |
juliusb_ | coudl be | 21:58 |
stekern | 14 LC it claims to occupy here | 21:59 |
juliusb_ | we need some automated build system which does runs with a bunch of different parameters on each technology to see what the implementation stats are | 21:59 |
stekern | yup, and run the tests against that | 21:59 |
juliusb_ | indeed :) | 21:59 |
stekern | too | 21:59 |
stekern | shouldn't be hard to just generate a parameter.v file that you just include in the top file | 22:01 |
juliusb_ | ya | 22:01 |
juliusb_ | I'll get around to it one of these days | 22:01 |
stekern | I got to take a look at migen when I was playing with milkymist too, seems pretty handy for soc generation, not sure I'd write cores in it | 22:05 |
stekern | https://github.com/skristiansson/milkymist-ng-mor1kx/blob/master/milkymist/mor1kx/__init__.py | 22:05 |
stekern | mor1kx instantiated in it | 22:05 |
juliusb_ | oh cool | 22:05 |
juliusb_ | yeah, fair enough :) | 22:06 |
juliusb_ | Python is the right language to do this in | 22:06 |
juliusb_ | you're going to call me a liar if I say this, but I think that for some reason, pronto-espresso mor1kx is _smaller_ with the flop-cache enabled?! | 22:07 |
juliusb_ | so 3479LC,1317FF with it disabled | 22:07 |
stekern | are you looking at fitter or synthesis result? | 22:08 |
juliusb_ | Then 3283LC,1081FF with 4-word flop cache | 22:08 |
juliusb_ | umm, the map.rpt file | 22:08 |
juliusb_ | .map.rpt file | 22:08 |
juliusb_ | I must be doing something really wrong, because with the cache supposedly disabled, there's 588 flops in there, but with it enabled it's less ?! | 22:09 |
juliusb_ | ahhh I know why | 22:09 |
juliusb_ | whoops | 22:09 |
juliusb_ | I was supposed to set the parameter to "DISABLED" not "NONE" | 22:10 |
juliusb_ | :P | 22:10 |
stekern | ;) | 22:10 |
stekern | my numbers have been from the .fit.rpt file | 22:10 |
juliusb_ | ahh right ok | 22:10 |
juliusb_ | I'll check those instead | 22:10 |
stekern | for comparison between builds, it probably doesn't matter | 22:11 |
juliusb_ | cool well the regression are passing for this flop cache guy, I think I'll do some comitting this weekend, very busy rest of week :( | 22:11 |
juliusb_ | and want to test it on the board before I put it in there | 22:11 |
stekern | and in the map file you don't get as much odd results from optimisations | 22:12 |
stekern | I think | 22:12 |
stekern | anyways, time for bed, now when everything is working ;) | 22:12 |
juliusb_ | ok, I'll reference fit results in future | 22:13 |
stekern | tomorrow is a new day to break things | 22:13 |
juliusb_ | me too, night! | 22:14 |
juliusb_ | ah that looks better, pronto with no flopcache is 2991LC,762FF (from fit.rpt) | 22:15 |
juliusb_ | 8 instruction flopcache takes us up to 3555LC, 1246FF | 22:20 |
juliusb_ | (fetch went from 319LC, 104FF to 939LC, 588FF) | 22:21 |
--- Log closed Tue May 14 00:00:57 2013 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!