IRC logs for #openrisc Wednesday, 2014-08-20

--- Log opened Wed Aug 20 00:00:15 2014
stekerndalias: didn't you disguss that with poke53281 already?03:16
stekernI get: alignof.c:6:31: error: 'max_align_t' undeclared (first use in this function)03:18
poke53281you have to take the stddef from gcc.04:12
poke53281and in principle yes, we discussed this.04:12
poke53281the max_align_t was 4 and the sizeof I think was 8. It's somewhere in the logbook.04:13
poke53281sorry, in the chatlogs04:13
daliassize is definitely not 804:45
daliasthe min possible size is 16 (8 bytes each for long long and long double)04:45
daliasyeah i was just unclear on the alignment. i know the natural alignment of those types on or1k is 4, but it's also 4 on i386, whereas max_align_t gets alignment 8 on i38604:46
daliasbecause gcc has this stupid concept of separate in-struct/out-of-struct alignments and __alignas__ uses the "out of struct alignment"04:46
daliasso i was trying to confirm whether the "out of struct alignment" on or1k is also 4 for these types04:47
daliasand according to jor1k's gcc, it is04:47
daliasanyway, in order to have the public headers not depend on stupid gcc implementation-internals that should never have leaked (the whole point of the "out of struct alignment" is that it should be 100% non-observable; observable alignments are supposed to use the "in struct" alignment)04:48
daliasi think we just need per-arch definitions of max_align_t04:49
poke53281 22:0405:00
poke53281"alignof(max_align_t) => 4     sizeof(max_align_t) => 16"05:00
poke53281with musl-compiled gcc but stddef.h from gcc.05:01
poke53281The jor1k's gcc on the official site is compiled with uclibc.05:01
poke53281I don't think that matters anyhow05:03
poke53281dump question, how did you figure it out? The stddef.h is not included. :)05:05
poke53281Oh wait, there is one :)05:09
poke53281I wonder whether sizeof(max_align_t) has any reasonable application, dalias.05:40
stekernif I read the stddef.h from gcc, it's only exposed for c11/c++11?06:16
stekernbut yeah, 'in struct' alignment should be 8 for that06:17
stekernerr... no06:18
poke53281Take my pastie. That works06:28
poke53281It takes the important part from stddef.h06:28
poke53281otherwise you have to take g++. Indeed06:28
poke53281according to the specification, the alignment should be 4.06:29
poke53281The vector types don't matter here.06:29
poke53281because the vector types are no basic types.06:30
stekernyes, but the 'in struct' alignment doesn't show from that, only the 'out of struct' (but as I understood it, that was what dalias was interested in)06:46
stekernanyway.. I've been annoyed by the alignment rules we have before, why aren't our long long and double aligned to 8 and why don't we have stack alignment of 8?06:48
poke53281hehe, don't know. I am also annoyed by our calling conventions :)06:51
poke53281Probably the or1k ABI was not done by an expert group, but only of one person.06:52
stekernyou're speaking about the vararg calling convention?06:53
stekernit's a bit sucky, but really, it's just bad code that breaks with it06:54
poke53281well, at the moment it seems, that directfb is indeed the one and only program were this is important.06:54
stekern...directfb should be fixed then06:54
stekernregardless of what vararg calling convention we use ;)06:55
poke53281Yes, but at the beginning I was afraid, that a lot of programs use this conventions. Now, I seems, that it is only directfb.06:56
stekerniirc dalias pointed out some case on x86(_64) where that kind of assumptions break too06:56
poke53281that would make sense for a more complicated ABI. e. g. providing the floatiing points numbers directly in the SSE registers or so. But I have never read what AMD suggested as ABI.06:59
poke53281good to know.07:04
wallentostekern: everything looks good, thanks!07:49
stekernwallento: great!07:50
stekernif I just can catch this bug I've been chasing the last week, I'd then like to start moving over multicore stuff into master07:51
wallentois it in the snooping?07:51
stekernI'm not sure where the bug actually is07:51
wallentobtw. any planning for multicore handson at ORConf?07:51
stekerncould even be a sw bug07:52
stekernI'm hoping on at least being able to give a Linux SMP demo07:52
stekernthe problem is that one cpu takes a spinlock twice07:53
wallentosounds awful to debug07:56
stekernit's... interesting =)07:56
wallentodebugging is learning ;)07:57
stekernI can catch the occasion where the spinlock is aqcuired for the second time, but I need to get to how it get from between the initial lock/unlock to that07:58
stekernby my understanding, that shouldn't be able to happen under the circumstances that lock is held... but as you say, debugging is learning07:58
stekernI've made some assumptions that wasn't true already on the way to where I am right now, so I might have made some more mis-assumptions =)07:59
maxpalnwell, panic() is called from within pcpu_dump_alloc_info() - which in itself is interesting because the C code for that routine doesn't include a call to panic() - pcpu_dump_alloc_info() is only called from within pcpu_setup_first_chunk(). I am guessing pcpu_setup_first_chunk() is expected behaviour - so whatever goes wrong must happen after the jump to this function.09:48
stekernBUG_ON => panic09:50
maxpalnwell, that is interesting - the version of percpu.c you pointed me at differs from the one in my kernel!09:51
stekernwhat kernel version are you using?09:52
maxpalnah, no its not!09:52
maxpalnbut you have made the connection that BUG_ON equates to a jump to panic() - gotcha!09:52
maxpalnI am a little confused though - it would seem that BUG_ON will always be called (unless ai->nr_groups == 0, which i am guessing it wouldn't be if everything were working well) which suggests BUG_ON only calls panic if the expression passed to it is true (or possibly false, but I'll assume true)09:55
maxpalnny idea where BUG_ON is defined?09:55
maxpalnmaking more sense now - at least the BUG_ON bit09:59
ysionneauto navigate in kernel sources lxr is very convenient :
ysionneau < something like that10:01
maxpalnysionneau: you're right, that is much more convenient!!10:01
stekernas you found out, BUG_ON means literally what it says, 'there is a bug on the condition within the ()'10:07
maxpaln:-) yeah, sometimes SW really does do what it says on the tin!10:07
maxpalnhmm, so panic gets called because gi->nr_units is not equal to upa. Since neither of these variables means much to me I am not sure I can infer much from that. ANyone got any clues? Otherwise, back to the HW...10:12
maxpalnwell, not so much equal as an even divisor...10:14
stekernit's not an ideal place to get a crash... you could try to locate from the asm where those values are loaded and check those loads10:15
maxpalnyeah, I was coming round to that approach...10:16
maxpalnI do like the implication that there are some *ideal* places to get a crash! :-)10:16
stekernwell, a crash in a small function that is not inlined because of an access to a global variable, that's pretty close to ideal ;)10:18
maxpalnok, I think I might be onto something10:46
maxpalnmaking a few leaps of faith - I think I have identified the section of assembler that relates to that modulo expression BUG_ON(gi->nr_units % upa)10:48
maxpalnc007e158:84 d6 00 20 l.lwz r6,0x20(r22)10:48
maxpalnc007e15c:e0 a6 93 09 l.div r5,r6,r1810:48
maxpalnc007e160:e0 45 93 06 l.mul r2,r5,r1810:48
maxpalnc007e164:e0 c6 10 02 l.sub r6,r6,r210:48
maxpalnc007e168:bc 26 00 00 l.sfnei r6,0x010:48
maxpalnMy spidey-sense immediately picked up the presence of the l.mul since I already have suspsicions about that10:49
maxpalntracking through the analyser (and this bit is tricky because the register reads/writes are out of sequence) I can see that r5 and r18 are both 1 but r2 gets written with 010:49
maxpalnI thought I had run a mult testbench though - maybe I should go back and check the results...10:50
stekernat what point is r5 1?10:51
maxpalnit gets written with 1 on the clock cycle after the l.div completes10:52
maxpalnoh, hang-oin - this is interesting10:53
stekernit should be 2 cycles10:53
maxpalnyep, two clock cycles you are right10:54
maxpalnso, this is interesting -10:54
maxpalnat the start of the l.mul r5 is read10:54
maxpalnand contains 010:54
maxpalnthen 1 clock cycle later the result of the l.div writes it with 110:55
stekernyeah, but that's all expected (and accounted for)10:55
maxpalnOK - so the l.mul operation on values of 1 and 0 really should return 010:56
stekernyou can't look at the register file RAM outputs, you have to look at the output to execute stage10:56
maxpalnis that right?10:56
stekernbecause the result from the div will be forwarded to the mul10:56
maxpalnaha - ok, the output to the execute stage - let me see if I can find that.10:56
maxpalnthat would be better than trying to track the RAM outputs10:57
stekernit's execute_rfa_o and execute_rfb_o10:57
stekernwhat version/commit of mor1kx are you using? I've made pretty significant changes to this area lately10:58
stekern(but those signals are still named the same, so that info holds11:00
maxpalnhmmm, what's the best way to check - I checked it out a few weeks ago, July 29th11:00
ysionneaucat .git/HEAD11:05
maxpaln[root@localhost mor1kx]# cat .git/HEAD11:05
maxpalnref: refs/heads/master11:05
ysionneauright :p11:06
ysionneaugit rev-parse HEAD11:06
maxpaln:-) - I looekd at it and thought 'that isn't what he meant' but since I don't really know my way around git I decided to offer it up anyway11:07
ysionneauit's my fault, I should have thought about 'what if your on a branch tip'11:07
ysionneautry git rev-parse HEAD11:08
stekern...or just git log11:08
maxpalnhmmm, unusually my linux machine is doing something instead of responding to me....11:09
maxpaln[root@localhost mor1kx]# git log11:11
maxpalncommit 17afdf2250c5dc4853b4032622a78c14b8dd3ec311:11
maxpalnAuthor: Stefan Kristiansson <>11:11
maxpalnDate:   Tue Jul 22 16:27:42 2014 +030011:11
maxpalnthink it could be lunch time while I rebuild a new FPGA image11:12
maxpalnback shortly...11:12
stekernok, that's before the rework11:15
stekernso, I wouldn't expect you to find anything I might have broke then ;)11:15
maxpalnok, back on it now11:48
maxpalnso at the start of the l.mul instruction execute_rfa_o and execute_rfb_o are both 1 - which is consistent with my earelier conclusions11:48
maxpalnand on the last clock cycle of the l.mul instruction, r2 gets written with 011:49
maxpalnI am assuming the store to the destination register happens on this last clock cycle of the intruction11:50
maxpalnoh, hang on11:50
maxpalnmisreading the lines11:50
maxpalnr2 gets written two clock cycles after the and of l.mul - it still gets written with 0 though11:50
maxpalnso, at this stage I am thinking this is incorrect and that the execution of the l.mul instruction is not right11:51
maxpalnBefore I dive into this, would you agree?11:51
stekernyou could look at mul_result too11:55
maxpalnwell, that was considerably more difficult than it should have been - mainly due to my own dipshittery14:01
maxpalnbut I can now see very clearly that mul_result is zero for all time14:01
maxpalnI can see op_mul_i assert, the two operands set to 114:01
maxpalnand the result stay as 014:02
maxpalnso diving into how this has been implemented on our hardware I think the problem is that the format of teh RTL doesn't really map into our dedicated HW blocks very well14:02
maxpalnit isn't clear to me why the result is always zero - I am sure there is a reason, but I guess the real question is what is the best way to resolve.14:03
maxpalnThe PIPELINED implementation doesn't really look to map that well either - the addition of the ctrl signal to gate the multiplication is unlikely to map into anything sensible in silicon.14:04
maxpalnAs a local fix I could easily add another implementation, something like LATTICE_MULT that simply instantiates a HW macro14:05
maxpalnbut its not that portable and not ideal for many reasons14:05
maxpalnreviewing the synthesis and map logs to see if there are clues about what it is in the RTL that gets mapped incorrectly....14:11
maxpalnHmmm, well at the very least this RTL looks to map inefficiently to our architecture in Synplify. A 32x32 mult should use two Multiplier slices - Synplify maps the mor1kx to 4. I suspect that somewhere in there is the root cause of my problems.14:34
maxpalnSo I think I might understand the problem -14:41
maxpalnthis line is, I think, aiming to keep the result of the MULT to the width of the operands (32 in this case, but it seems the code has at least provisioned for 64-bit):14:43
maxpalnmul_result1 <= (mul_opa * mul_opb) & {OPTION_OPERAND_WIDTH{1'b1}};14:43
maxpalnI suspect that this is confusing Synplify since our multiplier allows the result to naturally grow to the full bit width of the MULT operation - 64-bit in this case (128-bit in 64-bit mode)14:44
maxpalnBy truncating the result to 32-bits using '& {OPTION_OPERAND_WIDTH{1'b1}', Synplify appears to be splitting the MULT into two operations that each produce a 32-bit result.14:46
maxpalnThis wouldn't necessarily explain the 'always zero' result but it would explain why Synplify instantiates 4 MULTs instead of 2.14:47
maxpalnrunning an experiment now to test my hypothesis.14:47
stekernif that '& {OPTION_OPERAND_WIDTH{1'b1}};' causes problems, I don't see a reason to keep it15:18
maxpalnwell, it isn't clear what the root cause is at the moment. Synplify is definitely the culprit but everything I try produces a different result to what I expect. I have discovered that the cause of the always zero result is the clock enables to the ALU section of our DSP block being tied to GND - never helpful!! But a simpler testcase based on your code didn't do the same thing! :-|15:20
stekernfun ;)15:21
maxpalnyep :-)15:28
maxpalnyowser - first breakthrough15:52
maxpalntook a while15:52
maxpalnthe implementation of mul_signed_overflow is at the heart of it15:52
maxpalnSynplify tries to merge this into the instantiation of the DSP block but makes a mess of it and ends up having no clock enable on the ALU instance. I am wondering if the presence of the pseudo clock enable op_mul_i on the operand register is causing the issue!!15:53
maxpalnhmm, gotta say- this is feeling like a synthesis bug rather than a mapping issue!16:07
maxpalnok, false alarm - misunderstanding on my part. Back on the hunt...16:11
maxpalnactually, I've changed my mind again - I think this is a Synthesis bug. But the fact I can't make up my mind suggests I should come back to this with a fresh pair of eyes tomorrow.16:18
maxpalnThe problem appears to be that on one of the two DSP slices Synplify attaches neither a clock nor a clock enable to the ALU - this results in an all zero output almost certainly triggering the overall result of zero. I can't see how a coding style could produce this results - but a fresh look tomorrow should help.16:20
stekernyeah, that sounds like a bug alright.16:22
stekernbut it's not like we don't have work-arounds for other bugs in the tools in the codebase ;)16:23
maxpaln:-) yeah, I am sure I can work around it - but it's nice to have it reproduced in a small testcase. I think I'll submit it to the SW group and let them deal with SYnplify - it seems to get fixed quicker that way. I'll think about a way to work around it tomorrow...16:25
daliasstekern, re: alignment, i think it's a mistake for any new ABI to define the alignment of any fundamental type as anything other than the largest power of two which divides the size of the type (which, for most types, is just the size of the type)17:05
dalias(note: that's the max possible alignment)17:06
daliasstekern, for maximum extensibility without having to change the ABI, you want the max possible alignment.17:06
daliasunfortunately 'fixing' it for or1k looks like it would be pretty much work17:06
stekernyes, I agree18:42
--- Log closed Thu Aug 21 00:00:16 2014

Generated by 2.15.2 by Marius Gedminas - find it at!