--- Log opened Wed May 29 00:00:19 2013 | ||
stekern | juliusb: to once again show that I'm following up on things I promise to do, I'm looking at this now: http://pastie.org/6470345 | 03:40 |
---|---|---|
stekern | and it's interesting, the latest linaro arm toolchain miscompiles that too | 03:41 |
stekern | http://pastie.org/7977989 | 03:43 |
stekern | so presumable, it's a generic gcc 4.8 bug | 03:47 |
stekern | presumably | 03:47 |
stekern | well, actually, not a bug, but a change in behaviour | 04:26 |
stekern | since signed integer overflow is undefined behaviour according to the standard | 04:27 |
stekern | oh, when it's compiled in the testsuite, it's compiled with '-fwrapv', thus defining the behaviour | 04:41 |
juliusb | bfff, my flop-cache improvements only yield an improvement of 13 CoreMarks to 17 CoreMarks with a 32-word cache | 08:39 |
juliusb | I think I'll just plumb in the real cache now | 08:39 |
juliusb | this is on pronto | 08:39 |
juliusb | the pipeline logic is all sorted out to be able to handle that I'm quite sure | 08:39 |
* stekern likes that idea | 08:40 | |
juliusb | I thinkt he flop cache is useful if you put in, say, an 8-word cache, that way tight loops will be cached | 08:41 |
stekern | I agree, but wonder how many loops in a typical program is that tight? | 08:43 |
juliusb | memcpy()? | 08:45 |
stekern | oh, good point | 08:45 |
juliusb | memset()? | 08:45 |
juliusb | things like that | 08:45 |
juliusb | but yes, beyond that, not much | 08:45 |
juliusb | the hotpath, too | 08:45 |
olofk | flop cache? | 08:45 |
olofk | Is it some kind of "L0 cache"? | 08:46 |
juliusb | the flop cache will keep only the instructions on the hot path, up to 8, so you can do quite a bit | 08:46 |
juliusb | yes, register/flop cache - not intended to be made out of memory macros in either ASIC or FPGA, but instead just registers | 08:46 |
juliusb | one performance advantage is that that you get the hit indication immediately (if you're not running too quickly) - but this works fine at 50MHz | 08:47 |
juliusb | one downside is the resource use/design size and routability of the design when it's big | 08:47 |
juliusb | but for 8 words you have 8*32 + about 8*20 flops, so about ~400 flops, or maybe ~2k-3k gates in ASIC | 08:48 |
juliusb | could be worth it on your embedded, low power RISC processor | 08:49 |
juliusb | but on FPGA, it makes a lot more sense to use the block RAMs | 08:50 |
olofk | So it's meant as an alternative L1 implementation? | 08:51 |
olofk | Or is it still backed with an L1 cache? | 08:51 |
juliusb | yes, coudl be coupled with a L1 | 08:51 |
juliusb | but I don't think it matters much with the prontoespresso | 08:51 |
juliusb | there's still stalls all over the place after branches for seemingly no good reason :-/ | 08:52 |
juliusb | well, not many, but we don't get zero-stall branch execution when executing from that guy | 08:52 |
olofk | A bit related, what's the state of branch prediction in the or1k family? | 08:53 |
juliusb | stekern might be able to tell you more, but I understand there's none except for some experimental stuff in the mor1kx cappuccino pipeline | 08:55 |
olofk | Have you designed mor1kx to make it easier to hook up custom predictors? | 08:56 |
olofk | I remember that looked pretty hard in or1200 | 08:56 |
juliusb | hmm, I don't know about every pipeline in the mor1kx, maybe in the cappuccino, as there already is one, it could be easier to replace | 08:57 |
stekern | mor1kx cappuccino has a beautifully crafted branch prediction framework | 08:59 |
stekern | I'm heavily biased regarding the beautifullness since I put it there, but I at least put some concern into that it should be easy to swap prediction schemes | 09:02 |
stekern | olofk: have someone considered putting branch prediction into or1200? | 09:02 |
stekern | https://github.com/openrisc/mor1kx/blob/master/rtl/verilog/mor1kx_branch_prediction.v | 09:04 |
stekern | that's where you'd do the changes if you want to do something fancier than the static one that's in there now | 09:05 |
juliusb | that is beautiful and simple | 09:06 |
ams | Someone should make one of these (not that this has anything to do with openrisc, but anyway): http://homepage.cs.uiowa.edu/~jones/arch/risc/ | 09:20 |
olofk | ams: I think that there are a few single-instruction Open Source CPUs floating around. Pretty sure that I read about one quite recently | 09:43 |
olofk | stekern: I considered adding BP to or1200 a while ago, but was told it would probably be messy | 09:43 |
ams | olofk: ah, well, i was always fond of the move instruction set... | 09:43 |
ams | big bus, where you can attach specialized hardware and just map it to an address.. | 09:44 |
ams | (something to do with my love for PDP's) | 09:44 |
olofk | stekern: I think that your branch prediction module was just a big mess. You could probably split it into a few separate modules instead ;) | 09:45 |
stekern | olofk: I know ;) | 09:46 |
olofk | I'll send a patch | 09:46 |
stekern | the rule of thumb is, there should be at least one module per output port, right? | 09:47 |
olofk | Yes, but I usually work around that by concatenating all signals into one large inout vector. That way I can have larger modules and still fulfilling that requirement | 09:51 |
stekern | good thinking, I'll adapt that strategy in the future | 09:52 |
olofk | module(inout [124:0] a); | 09:52 |
olofk | So much easier to work with | 09:52 |
stekern | totally, and so much easier to remember that one signal name than many different | 09:57 |
olofk | :) | 09:58 |
stekern | also, interconnect between different modules becomes a breeze | 09:58 |
stekern | the only disadvantage is when you realize that 125 signals isn't enough, but that's pretty easily solved by preprocessing the source with a quickbasic program | 10:01 |
olofk | Yes, but you only need DEFINT A-A to make everything run fast | 10:03 |
stekern | A-Z you mean? | 10:15 |
olofk | Not if your only signal name is a | 10:18 |
olofk | That defint thing is still my all-time favorite programming hack. Who the hell thought that up? | 10:19 |
stekern | but then DEFINT A should be enough | 10:27 |
_franck_ | for your information: http://opencores.org/bug,view,1809 | 14:05 |
_franck_ | after a fight with some strange behavior with the uart, I registered it and then I found this bug is already tracked since 2010 | 14:06 |
_franck_ | *I registered srx_pad_i | 14:07 |
stekern | _franck_: better late than never, huh? =) | 18:30 |
stekern | what is strange behaviour? I've seen missing characters some times, but I've never suspected the UART in those cases | 18:31 |
_franck_ | I'm using the UART in a design we tested from - 0 to +70 degres celcius | 19:17 |
_franck_ | the UART is used intensively | 19:17 |
_franck_ | at some point (when it cold) we had some characters lost (like big lost) so I asked myself if the RX was registered... | 19:19 |
_franck_ | * -20 degres | 19:19 |
--- Log closed Thu May 30 00:00:21 2013 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!