IRC logs for #openrisc Wednesday, 2014-08-20

--- Log opened Wed Aug 20 00:00:15 2014
stekern	dalias: didn't you disguss that with poke53281 already?	03:16
stekern	*discuss	03:17
stekern	I get: alignof.c:6:31: error: 'max_align_t' undeclared (first use in this function)	03:18
poke53281	you have to take the stddef from gcc.	04:12
poke53281	and in principle yes, we discussed this.	04:12
poke53281	the max_align_t was 4 and the sizeof I think was 8. It's somewhere in the logbook.	04:13
poke53281	sorry, in the chatlogs	04:13
dalias	size is definitely not 8	04:45
dalias	the min possible size is 16 (8 bytes each for long long and long double)	04:45
dalias	yeah i was just unclear on the alignment. i know the natural alignment of those types on or1k is 4, but it's also 4 on i386, whereas max_align_t gets alignment 8 on i386	04:46
dalias	because gcc has this stupid concept of separate in-struct/out-of-struct alignments and __alignas__ uses the "out of struct alignment"	04:46
dalias	so i was trying to confirm whether the "out of struct alignment" on or1k is also 4 for these types	04:47
dalias	and according to jor1k's gcc, it is	04:47
dalias	anyway, in order to have the public headers not depend on stupid gcc implementation-internals that should never have leaked (the whole point of the "out of struct alignment" is that it should be 100% non-observable; observable alignments are supposed to use the "in struct" alignment)	04:48
dalias	i think we just need per-arch definitions of max_align_t	04:49
poke53281	http://juliusbaxter.net/openrisc-irc/%23openrisc.2014-08-08.log.html 22:04	05:00
poke53281	"alignof(max_align_t) => 4 sizeof(max_align_t) => 16"	05:00
poke53281	with musl-compiled gcc but stddef.h from gcc.	05:01
poke53281	The jor1k's gcc on the official site is compiled with uclibc.	05:01
poke53281	I don't think that matters anyhow	05:03
poke53281	dump question, how did you figure it out? The stddef.h is not included. :)	05:05
poke53281	Oh wait, there is one :)	05:09
poke53281	http://pastie.org/9487968	05:21
poke53281	I wonder whether sizeof(max_align_t) has any reasonable application, dalias.	05:40
stekern	if I read the stddef.h from gcc, it's only exposed for c11/c++11?	06:16
stekern	+correctly	06:16
stekern	it=max_align_t	06:16
stekern	but yeah, 'in struct' alignment should be 8 for that	06:17
stekern	err... no	06:18
poke53281	Take my pastie. That works	06:28
poke53281	It takes the important part from stddef.h	06:28
poke53281	otherwise you have to take g++. Indeed	06:28
poke53281	according to the specification, the alignment should be 4.	06:29
poke53281	The vector types don't matter here.	06:29
poke53281	because the vector types are no basic types.	06:30
stekern	yes, but the 'in struct' alignment doesn't show from that, only the 'out of struct' (but as I understood it, that was what dalias was interested in)	06:46
stekern	anyway.. I've been annoyed by the alignment rules we have before, why aren't our long long and double aligned to 8 and why don't we have stack alignment of 8?	06:48
poke53281	hehe, don't know. I am also annoyed by our calling conventions :)	06:51
poke53281	Probably the or1k ABI was not done by an expert group, but only of one person.	06:52
stekern	you're speaking about the vararg calling convention?	06:53
poke53281	Yes	06:53
stekern	it's a bit sucky, but really, it's just bad code that breaks with it	06:54
poke53281	well, at the moment it seems, that directfb is indeed the one and only program were this is important.	06:54
poke53281	Yes	06:54
stekern	...directfb should be fixed then	06:54
stekern	regardless of what vararg calling convention we use ;)	06:55
poke53281	Yes, but at the beginning I was afraid, that a lot of programs use this conventions. Now, I seems, that it is only directfb.	06:56
stekern	iirc dalias pointed out some case on x86(_64) where that kind of assumptions break too	06:56
stekern	http://juliusbaxter.net/openrisc-irc/%23openrisc.2014-07-07.log.html#t03:53	06:59
poke53281	that would make sense for a more complicated ABI. e. g. providing the floatiing points numbers directly in the SSE registers or so. But I have never read what AMD suggested as ABI.	06:59
poke53281	good to know.	07:04
wallento	stekern: everything looks good, thanks!	07:49
stekern	wallento: great!	07:50
stekern	if I just can catch this bug I've been chasing the last week, I'd then like to start moving over multicore stuff into master	07:51
wallento	is it in the snooping?	07:51
stekern	I'm not sure where the bug actually is	07:51
wallento	btw. any planning for multicore handson at ORConf?	07:51
stekern	could even be a sw bug	07:52
stekern	I'm hoping on at least being able to give a Linux SMP demo	07:52
stekern	the problem is that one cpu takes a spinlock twice	07:53
wallento	sh*t	07:55
wallento	sounds awful to debug	07:56
stekern	it's... interesting =)	07:56
wallento	debugging is learning ;)	07:57
stekern	I can catch the occasion where the spinlock is aqcuired for the second time, but I need to get to how it get from between the initial lock/unlock to that	07:58
stekern	by my understanding, that shouldn't be able to happen under the circumstances that lock is held... but as you say, debugging is learning	07:58
stekern	I've made some assumptions that wasn't true already on the way to where I am right now, so I might have made some more mis-assumptions =)	07:59
maxpaln	well, panic() is called from within pcpu_dump_alloc_info() - which in itself is interesting because the C code for that routine doesn't include a call to panic() - pcpu_dump_alloc_info() is only called from within pcpu_setup_first_chunk(). I am guessing pcpu_setup_first_chunk() is expected behaviour - so whatever goes wrong must happen after the jump to this function.	09:48
stekern	maxpaln: http://lxr.free-electrons.com/source/mm/percpu.c#L1141	09:50
stekern	BUG_ON => panic	09:50
maxpaln	well, that is interesting - the version of percpu.c you pointed me at differs from the one in my kernel!	09:51
stekern	what kernel version are you using?	09:52
maxpaln	ah, no its not!	09:52
maxpaln	but you have made the connection that BUG_ON equates to a jump to panic() - gotcha!	09:52
maxpaln	I am a little confused though - it would seem that BUG_ON will always be called (unless ai->nr_groups == 0, which i am guessing it wouldn't be if everything were working well) which suggests BUG_ON only calls panic if the expression passed to it is true (or possibly false, but I'll assume true)	09:55
maxpaln	ny idea where BUG_ON is defined?	09:55
maxpaln	making more sense now - at least the BUG_ON bit	09:59
ysionneau	to navigate in kernel sources lxr is very convenient : http://lxr.free-electrons.com/ident?i=BUG_ON	10:00
ysionneau	http://lxr.free-electrons.com/source/include/asm-generic/bug.h#L48 < something like that	10:01
maxpaln	ysionneau: you're right, that is much more convenient!!	10:01
stekern	as you found out, BUG_ON means literally what it says, 'there is a bug on the condition within the ()'	10:07
maxpaln	:-) yeah, sometimes SW really does do what it says on the tin!	10:07
stekern	=P	10:07
maxpaln	hmm, so panic gets called because gi->nr_units is not equal to upa. Since neither of these variables means much to me I am not sure I can infer much from that. ANyone got any clues? Otherwise, back to the HW...	10:12
maxpaln	well, not so much equal as an even divisor...	10:14
stekern	it's not an ideal place to get a crash... you could try to locate from the asm where those values are loaded and check those loads	10:15
maxpaln	yeah, I was coming round to that approach...	10:16
maxpaln	I do like the implication that there are some ideal places to get a crash! :-)	10:16
stekern	well, a crash in a small function that is not inlined because of an access to a global variable, that's pretty close to ideal ;)	10:18
maxpaln	:-)	10:19
maxpaln	ok, I think I might be onto something	10:46
maxpaln	making a few leaps of faith - I think I have identified the section of assembler that relates to that modulo expression BUG_ON(gi->nr_units % upa)	10:48
maxpaln	c007e158:84 d6 00 20 l.lwz r6,0x20(r22)	10:48
maxpaln	c007e15c:e0 a6 93 09 l.div r5,r6,r18	10:48
maxpaln	c007e160:e0 45 93 06 l.mul r2,r5,r18	10:48
maxpaln	c007e164:e0 c6 10 02 l.sub r6,r6,r2	10:48
maxpaln	c007e168:bc 26 00 00 l.sfnei r6,0x0	10:48
maxpaln	My spidey-sense immediately picked up the presence of the l.mul since I already have suspsicions about that	10:49
stekern	mmm	10:49
maxpaln	tracking through the analyser (and this bit is tricky because the register reads/writes are out of sequence) I can see that r5 and r18 are both 1 but r2 gets written with 0	10:49
maxpaln	I thought I had run a mult testbench though - maybe I should go back and check the results...	10:50
stekern	at what point is r5 1?	10:51
maxpaln	it gets written with 1 on the clock cycle after the l.div completes	10:52
maxpaln	oh, hang-oin - this is interesting	10:53
stekern	it should be 2 cycles	10:53
maxpaln	yep, two clock cycles you are right	10:54
maxpaln	so, this is interesting -	10:54
maxpaln	at the start of the l.mul r5 is read	10:54
maxpaln	and contains 0	10:54
maxpaln	then 1 clock cycle later the result of the l.div writes it with 1	10:55
stekern	yeah, but that's all expected (and accounted for)	10:55
maxpaln	OK - so the l.mul operation on values of 1 and 0 really should return 0	10:56
stekern	you can't look at the register file RAM outputs, you have to look at the output to execute stage	10:56
maxpaln	is that right?	10:56
stekern	because the result from the div will be forwarded to the mul	10:56
maxpaln	aha - ok, the output to the execute stage - let me see if I can find that.	10:56
maxpaln	that would be better than trying to track the RAM outputs	10:57
stekern	it's execute_rfa_o and execute_rfb_o	10:57
stekern	what version/commit of mor1kx are you using? I've made pretty significant changes to this area lately	10:58
stekern	(but those signals are still named the same, so that info holds	11:00
stekern	)	11:00
maxpaln	hmmm, what's the best way to check - I checked it out a few weeks ago, July 29th	11:00
ysionneau	cat .git/HEAD	11:05
maxpaln	[root@localhost mor1kx]# cat .git/HEAD	11:05
maxpaln	ref: refs/heads/master	11:05
ysionneau	right :p	11:06
ysionneau	git rev-parse HEAD	11:06
ysionneau	sorry	11:06
maxpaln	:-) - I looekd at it and thought 'that isn't what he meant' but since I don't really know my way around git I decided to offer it up anyway	11:07
ysionneau	it's my fault, I should have thought about 'what if your on a branch tip'	11:07
ysionneau	try git rev-parse HEAD	11:08
stekern	...or just git log	11:08
ysionneau	yup	11:09
maxpaln	hmmm, unusually my linux machine is doing something instead of responding to me....	11:09
maxpaln	[root@localhost mor1kx]# git log	11:11
maxpaln	commit 17afdf2250c5dc4853b4032622a78c14b8dd3ec3	11:11
maxpaln	Author: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>	11:11
maxpaln	Date: Tue Jul 22 16:27:42 2014 +0300	11:11
maxpaln	think it could be lunch time while I rebuild a new FPGA image	11:12
maxpaln	back shortly...	11:12
stekern	ok, that's before the rework	11:15
stekern	so, I wouldn't expect you to find anything I might have broke then ;)	11:15
maxpaln	:-)	11:33
maxpaln	ok, back on it now	11:48
maxpaln	so at the start of the l.mul instruction execute_rfa_o and execute_rfb_o are both 1 - which is consistent with my earelier conclusions	11:48
maxpaln	and on the last clock cycle of the l.mul instruction, r2 gets written with 0	11:49
maxpaln	I am assuming the store to the destination register happens on this last clock cycle of the intruction	11:50
maxpaln	oh, hang on	11:50
maxpaln	misreading the lines	11:50
maxpaln	r2 gets written two clock cycles after the and of l.mul - it still gets written with 0 though	11:50
maxpaln	so, at this stage I am thinking this is incorrect and that the execution of the l.mul instruction is not right	11:51
maxpaln	Before I dive into this, would you agree?	11:51
stekern	yeah	11:54
stekern	you could look at mul_result too	11:55
maxpaln	coolio	11:57
maxpaln	well, that was considerably more difficult than it should have been - mainly due to my own dipshittery	14:01
maxpaln	but I can now see very clearly that mul_result is zero for all time	14:01
maxpaln	I can see op_mul_i assert, the two operands set to 1	14:01
maxpaln	and the result stay as 0	14:02
maxpaln	so diving into how this has been implemented on our hardware I think the problem is that the format of teh RTL doesn't really map into our dedicated HW blocks very well	14:02
maxpaln	it isn't clear to me why the result is always zero - I am sure there is a reason, but I guess the real question is what is the best way to resolve.	14:03
maxpaln	The PIPELINED implementation doesn't really look to map that well either - the addition of the ctrl signal to gate the multiplication is unlikely to map into anything sensible in silicon.	14:04
maxpaln	As a local fix I could easily add another implementation, something like LATTICE_MULT that simply instantiates a HW macro	14:05
maxpaln	but its not that portable and not ideal for many reasons	14:05
maxpaln	reviewing the synthesis and map logs to see if there are clues about what it is in the RTL that gets mapped incorrectly....	14:11
maxpaln	Hmmm, well at the very least this RTL looks to map inefficiently to our architecture in Synplify. A 32x32 mult should use two Multiplier slices - Synplify maps the mor1kx to 4. I suspect that somewhere in there is the root cause of my problems.	14:34
maxpaln	So I think I might understand the problem -	14:41
maxpaln	this line is, I think, aiming to keep the result of the MULT to the width of the operands (32 in this case, but it seems the code has at least provisioned for 64-bit):	14:43
maxpaln	mul_result1 <= (mul_opa * mul_opb) & {OPTION_OPERAND_WIDTH{1'b1}};	14:43
maxpaln	I suspect that this is confusing Synplify since our multiplier allows the result to naturally grow to the full bit width of the MULT operation - 64-bit in this case (128-bit in 64-bit mode)	14:44
maxpaln	By truncating the result to 32-bits using '& {OPTION_OPERAND_WIDTH{1'b1}', Synplify appears to be splitting the MULT into two operations that each produce a 32-bit result.	14:46
maxpaln	This wouldn't necessarily explain the 'always zero' result but it would explain why Synplify instantiates 4 MULTs instead of 2.	14:47
maxpaln	running an experiment now to test my hypothesis.	14:47
stekern	if that '& {OPTION_OPERAND_WIDTH{1'b1}};' causes problems, I don't see a reason to keep it	15:18
maxpaln	well, it isn't clear what the root cause is at the moment. Synplify is definitely the culprit but everything I try produces a different result to what I expect. I have discovered that the cause of the always zero result is the clock enables to the ALU section of our DSP block being tied to GND - never helpful!! But a simpler testcase based on your code didn't do the same thing! :-\|	15:20
stekern	fun ;)	15:21
maxpaln	yep :-)	15:28
maxpaln	yowser - first breakthrough	15:52
maxpaln	took a while	15:52
maxpaln	the implementation of mul_signed_overflow is at the heart of it	15:52
maxpaln	Synplify tries to merge this into the instantiation of the DSP block but makes a mess of it and ends up having no clock enable on the ALU instance. I am wondering if the presence of the pseudo clock enable op_mul_i on the operand register is causing the issue!!	15:53
maxpaln	hmm, gotta say- this is feeling like a synthesis bug rather than a mapping issue!	16:07
maxpaln	ok, false alarm - misunderstanding on my part. Back on the hunt...	16:11
maxpaln	actually, I've changed my mind again - I think this is a Synthesis bug. But the fact I can't make up my mind suggests I should come back to this with a fresh pair of eyes tomorrow.	16:18
ysionneau	:)	16:19
maxpaln	The problem appears to be that on one of the two DSP slices Synplify attaches neither a clock nor a clock enable to the ALU - this results in an all zero output almost certainly triggering the overall result of zero. I can't see how a coding style could produce this results - but a fresh look tomorrow should help.	16:20
stekern	yeah, that sounds like a bug alright.	16:22
stekern	but it's not like we don't have work-arounds for other bugs in the tools in the codebase ;)	16:23
maxpaln	:-) yeah, I am sure I can work around it - but it's nice to have it reproduced in a small testcase. I think I'll submit it to the SW group and let them deal with SYnplify - it seems to get fixed quicker that way. I'll think about a way to work around it tomorrow...	16:25
dalias	stekern, re: alignment, i think it's a mistake for any new ABI to define the alignment of any fundamental type as anything other than the largest power of two which divides the size of the type (which, for most types, is just the size of the type)	17:05
dalias	(note: that's the max possible alignment)	17:06
dalias	stekern, for maximum extensibility without having to change the ABI, you want the max possible alignment.	17:06
dalias	unfortunately 'fixing' it for or1k looks like it would be pretty much work	17:06
stekern	yes, I agree	18:42
--- Log closed Thu Aug 21 00:00:16 2014

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!