IRC logs for #openrisc Wednesday, 2013-05-29

--- Log opened Wed May 29 00:00:19 2013
stekernjuliusb: to once again show that I'm following up on things I promise to do, I'm looking at this now:
stekernand it's interesting, the latest linaro arm toolchain miscompiles that too03:41
stekernso presumable, it's a generic gcc 4.8 bug03:47
stekernwell, actually, not a bug, but a change in behaviour04:26
stekernsince signed integer overflow is undefined behaviour according to the standard04:27
stekernoh, when it's compiled in the testsuite, it's compiled with '-fwrapv', thus defining the behaviour04:41
juliusbbfff, my flop-cache improvements only yield an improvement of 13 CoreMarks to 17 CoreMarks with a 32-word cache08:39
juliusbI think I'll just plumb in the real cache now08:39
juliusbthis is on pronto08:39
juliusbthe pipeline logic is all sorted out to be able to handle that I'm quite sure08:39
* stekern likes that idea08:40
juliusbI thinkt he flop cache is useful if you put in, say, an 8-word cache, that way tight loops will be cached08:41
stekernI agree, but wonder how many loops in a typical program is that tight?08:43
stekernoh, good point08:45
juliusbthings like that08:45
juliusbbut yes, beyond that, not much08:45
juliusbthe hotpath, too08:45
olofkflop cache?08:45
olofkIs it some kind of "L0 cache"?08:46
juliusbthe flop cache will keep only the instructions on the hot path, up to 8, so you can do quite a bit08:46
juliusbyes, register/flop cache - not intended to be made out of memory macros in either ASIC or FPGA, but instead just registers08:46
juliusbone performance advantage is that that you get the hit indication immediately (if you're not running too quickly) - but this works fine at 50MHz08:47
juliusbone downside is the resource use/design size and routability of the design when it's big08:47
juliusbbut for 8 words you have 8*32 + about 8*20 flops, so about ~400 flops, or maybe ~2k-3k gates in ASIC08:48
juliusbcould be worth it on your embedded, low power RISC processor08:49
juliusbbut on FPGA, it makes a lot more sense to use the block RAMs08:50
olofkSo it's meant as an alternative L1 implementation?08:51
olofkOr is it still backed with an L1 cache?08:51
juliusbyes, coudl be coupled with a L108:51
juliusbbut I don't think it matters much with the prontoespresso08:51
juliusbthere's still stalls all over the place after branches for seemingly no good reason :-/08:52
juliusbwell, not many, but we don't get zero-stall branch execution when executing from that guy08:52
olofkA bit related, what's the state of branch prediction in the or1k family?08:53
juliusbstekern might be able to tell you more, but I understand there's none except for some experimental stuff in the mor1kx cappuccino pipeline08:55
olofkHave you designed mor1kx to make it easier to hook up custom predictors?08:56
olofkI remember that looked pretty hard in or120008:56
juliusbhmm, I don't know about every pipeline in the mor1kx, maybe in the cappuccino, as there already is one, it could be easier to replace08:57
stekernmor1kx cappuccino has a beautifully crafted branch prediction framework08:59
stekernI'm heavily biased regarding the beautifullness since I put it there, but I at least put some concern into that it should be easy to swap prediction schemes09:02
stekernolofk: have someone considered putting branch prediction into or1200?09:02
stekernthat's where you'd do the changes if you want to do something fancier than the static one that's in there now09:05
juliusbthat is beautiful and simple09:06
amsSomeone should make one of these (not that this has anything to do with openrisc, but anyway):
olofkams: I think that there are a few single-instruction Open Source CPUs floating around. Pretty sure that I read about one quite recently09:43
olofkstekern: I considered adding BP to or1200 a while ago, but was told it would probably be messy09:43
amsolofk: ah, well, i was always fond of the move instruction set...09:43
amsbig bus, where you can attach specialized hardware and just map it to an address..09:44
ams(something to do with my love for PDP's)09:44
olofkstekern: I think that your branch prediction module was just a big mess. You could probably split it into a few separate modules instead ;)09:45
stekernolofk: I know ;)09:46
olofkI'll send a patch09:46
stekernthe rule of thumb is, there should be at least one module per output port, right?09:47
olofkYes, but I usually work around that by concatenating all signals into one large inout vector. That way I can have larger modules and still fulfilling that requirement09:51
stekerngood thinking, I'll adapt that strategy in the future09:52
olofkmodule(inout [124:0] a);09:52
olofkSo much easier to work with09:52
stekerntotally, and so much easier to remember that one signal name than many different09:57
stekernalso, interconnect between different modules becomes a breeze09:58
stekernthe only disadvantage is when you realize that 125 signals isn't enough, but that's pretty easily solved by preprocessing the source with a quickbasic program10:01
olofkYes, but you only need DEFINT A-A to make everything run fast10:03
stekernA-Z you mean?10:15
olofkNot if your only signal name is a10:18
olofkThat defint thing is still my all-time favorite programming hack. Who the hell thought that up?10:19
stekernbut then DEFINT A should be enough10:27
_franck_for your information:,view,180914:05
_franck_after a fight with some strange behavior with the uart, I registered it and then I found this bug is already tracked since 201014:06
_franck_*I registered srx_pad_i14:07
stekern_franck_: better late than never, huh? =)18:30
stekernwhat is strange behaviour? I've seen missing characters some times, but I've never suspected the UART in those cases18:31
_franck_I'm using the UART in a design we tested from - 0 to +70 degres celcius19:17
_franck_the UART is used intensively19:17
_franck_at some point (when it cold) we had some characters lost (like big lost) so I asked myself if the RX was registered...19:19
_franck_* -20 degres19:19
--- Log closed Thu May 30 00:00:21 2013

Generated by 2.15.2 by Marius Gedminas - find it at!