IRC logs for #openrisc Saturday, 2015-03-07

--- Log opened Sat Mar 07 00:00:11 2015
stekern	poke53281: I tested your 'mandelpar' against mor1kx's fpu now, got ~500 sec vs ~700 sec with and without -mhard-float	09:43
stekern	on 4 cores	09:43
stekern	I modified it to use paletted fb too	09:46
stekern	my lk pull-request got merged too	09:55
bandvig	stekern: good news about lk!!!	09:56
bandvig	stekern: does the 'mandelpar' includes a lot of trigonometric, logarithms, square roots, pow, etc (i.e. functions, rather arithmetic)?	09:57
stekern	bandvig: this is the original code I got from poke53281 http://pastie.org/9692650	09:59
bandvig	stekern: I couldn't to download it. "Sorry, there is no pastie #9692650 or it has been removed. Why not create a new pastie?"	10:01
stekern	oh, I had it in cache obviously...	10:02
stekern	http://pastie.org/10007030	10:03
bandvig	stekern: btw, which SoC you use for multicore? optimsoc?	10:26
poke53281	stekern: Great	10:41
poke53281	But 500ms vs 700ms is not that good.	10:42
poke53281	The coide contains the log functions. Is this one also executed in hardware?	10:43
poke53281	Maybe you have to compile musl also with hard-float	10:43
poke53281	Or is a log calculation provided by gcc and not libc?	10:46
poke53281	But I hope you like openmp as much as I do.	10:48
bandvig	stekern: poke53281: yes, it contains log() and (I believe) at least sqrt() as a part of abs(complex). The functions aren't supported in hardware. So it should be checked if these functions (and other ones, of course) are computed with soft-float or hard-float arithmetics.	11:05
dalias	?	11:14
poke53281	The question mark tells me, that the software implemented log and sqrt function is provided by gcc and not musl :)	11:15
dalias	no	11:17
dalias	i meant the question of hard vs soft does not make sense	11:17
dalias	if you have hard float and you're using it, soft float will not be used for anything	11:18
dalias	sqrt and log just need to be built up from elementary (hard) float operations rather than having hardware do the whole operation as a unit	11:18
poke53281	well, the hard-float supports only very basic operations. If the program uses other numeric functions, the library, which contains these function should be also compiled with hard-float.	11:18
dalias	and the way this is done is the same whether the underlying float arithmetic is hard or soft	11:18
dalias	ah i see. are you thinking of a case where the app was compiled with hard-float but libc was compiled for soft?	11:19
poke53281	really?	11:19
poke53281	Yes	11:19
dalias	yes. sqrt is just a .c file	11:19
dalias	it doesn't care if +-*/ are implemented with hardware or software	11:19
poke53281	Yes, the C- file doesn't care. But compiled lib cares.	11:20
dalias	well either way it computes it in the same manner. it's just a matter of whether the +-*/ are optimized	11:21
poke53281	Yes	11:21
dalias	that's what i meant	11:21
dalias	sorry for the confusion	11:21
poke53281	I wonder why the hard-float unit gives only a speed up of 40%.	11:22
poke53281	And that could be a reason.	11:23
dalias	how fast is your hard float?	11:23
poke53281	I don't know. Ask bandvig and stekern.	11:24
dalias	unless fpu ops are as fast (or nearly) as integer ops, hard float is probably not going to be a "huge" win	11:24
poke53281	They are probably slower.	11:28
dalias	yeah	11:28
bandvig	dalias: poke53281: :))) I've got 10...20 times speed up on Whetstone tests which use arithmetic (+-*/) only. But, I don't see any improvement for Whetstone tests which uses functions. I use NewLIB.	11:31
bandvig	Le me several lines, I'll put the whole table with results.	11:32
dalias	well you would need to compile newlib with hardfloat	11:33
poke53281	sorry, I don't use logf and sqrtf	11:34
poke53281	but log and sqrt which uses double.	11:35
poke53281	So maybe, this is the reason.	11:35
bandvig	dalias: yes, I believe it is the path for further imrovement also foy 'mandelpar'	11:35
bandvig	poke53281: it is also must be corrected	11:36
poke53281	stekern: Please change every function to the corresponding single floating point function and compile you library also with hard-float. Then test again.	11:37
bandvig	poke53281: btw, you use abs(complex<float>). Is it involves sqrtf(float) or sqrt(double)?	11:39
poke53281	don't know	11:40
poke53281	well, abs include sqrt	11:41
poke53281	better would be to calculate the square of it and compare against 4, not 2.	11:41
bandvig	poke53281: perhaps it coul be safely to replace it with sqrtf(real(z)real(z)+imag(z)imag(z))	11:41
dalias	poke53281, using tgmath.c could do that automatically :-p	11:42
dalias	but tgmath.h is hideous	11:42
dalias	bandvig, cabsf?	11:42
poke53281	never heart about tgmath.	11:42
dalias	tgmath.h was a hideous addition in c99	11:43
poke53281	Wel, I usually never use float. It is terrible inaccurate. The mandelpar was never meant to be fast. Just a way to test SMP with a parallelized program.	11:44
dalias	yeah, float is pretty bad for most things	11:46
dalias	makes sense for audio samples tho	11:46
bandvig	dalias: perhaps, yuo are right, cabsf(), I'm not veri familiar with complex lib.	11:46
stekern	bandvig: poke53281: my test was just of the type "something that use the fpu"	11:46
poke53281	http://pastie.org/10007124	11:47
poke53281	try this	11:47
dalias	btw it's unfortunate when fpus lack sqrt instruction	11:47
dalias	sqrt is one of the most expensive ops to do in C	11:47
dalias	because it needs to be exact/correctly-rounded, not just a good approximation	11:48
stekern	not so much trying to read out the performance of the fpu	11:48
bandvig	poke5381: I wold make a correction: (z.real()z.real()+z.imag()z.imag() <= 4.0f or (z.real()z.real()+z.imag()z.imag() <= (float)4.	11:48
poke53281	Ok	11:49
stekern	yeah, that shaves of 410 sec of it	11:51
stekern	+f	11:52
stekern	(i.e. 90 sec remaining)	11:52
poke53281	great	11:53
stekern	and no, I didn't recomile anything else than the actual program with -mhard-float	11:53
bandvig	dalias: personally I'm interested exactly in float as I widely use it to implement acquizition/tracking algorithms in digital receivers. The float usage speed up design cycle many many times.	11:54
stekern	vs 170 when compiled with softfloat	11:56
poke53281	90 vs 170. Sounds better.	11:59
poke53281	Ok, so there might be still the log fuction for the color calculation.	12:01
poke53281	And the conversion float to int.	12:01
stekern	bandvig: did you do any delibarate area optimisations too? I recall that the or1200 fpu was about the same size as or1200, while pfpu32 is only about half the size of mor1kx	12:02
stekern	poke53281: the color calculation is precalculated in my modification	12:02
poke53281	Ok	12:03
bandvig	poke53281: float <-> int are supported in FPU	12:03
stekern	http://pastie.org/10007137	12:04
poke53281	Ok, that means, that the effective speedup is around a factor of two.	12:06
bandvig	stekern: in fact, the FPU almost completely refactored. In particular OR1200-FPU uses separate post-normalization units for each operation. Mor1kX-FPU uses common align and rounding post-operation steps.	12:08
stekern	ah, ok.	12:11
stekern	nice work	12:12
bandvig	Thanks. Additionally, OR1200-FPU uses digit recurrence division. In Mor1kX-FPU the Goldshmidth division is implemented and DIV/MUL units share multiplier.	12:15
stekern	yeah, I saw your question about that on the list earlier and the commit	12:17
bandvig	stekern: it looks my last post overwrite yor's one with new reference on pastie.org isn't it?	12:18
stekern	sorry, couldn't parse that, what do you mean?	12:20
bandvig	well, how to say... Was the http://pastie.org/10007137 the last your post with reference on pastie.org. If "yes", don't worry.	12:25
bandvig	well, next 12 lines will contain Whetstone comparison	12:28
bandvig	please, be patient	12:29
bandvig	Single Precision C/C++ Whetstone Benchmark	12:29
bandvig	Loop content soft-float OR1200 mor1kx	12:29
bandvig	FPU FPU	12:29
bandvig	N1 floating point (MFLOPS) 0.409 3.200 9.600	12:30
bandvig	N2 floating point (MFLOPS) 0.336 3.360 6.720	12:30
bandvig	N3 if then else (MOPS) 0.000 0.000 0.000	12:30
bandvig	N4 fixed point (MOPS) 2.250 31.500 31.500	12:30
bandvig	N5 sin,cos etc. (MOPS) 0.019 0.020 0.020	12:30
bandvig	N6 floating point (MFLOPS) 0.409 2.075 7.706	12:31
bandvig	N7 assignments (MOPS) 0.000 0.000 0.000	12:31
bandvig	N8 exp,sqrt etc. (MOPS) 0.009 0.009 0.009	12:31
bandvig	MWIPS 0.954 1.128 1.156	12:31
bandvig	done	12:31
bandvig	well, lets back to library building flags. I've disassembled libm.a from NewLIB. I didn't found any lf.* instructions.	13:37
bandvig	On the other hand the sinf(), for example, is computed by Tailor series through call __kernel_sinf() and __kernel_cosf().	13:37
bandvig	It means that we have to set -mhard-float option somewhere in make files to build NewLIB hard-float variant of libm.m with lf.* instructions.	13:38
bandvig	So, could someone consult me how to to it?	13:38
bandvig	it is interesting that I found l.mul in the disassembled libm.a. It means that it least -mhard-mul was used. Am I right?	13:45
rcallan	hi. Is there a list of development boards being actively developed on? There seem to be several old webpages and dead links	14:11
stekern	bandvig: -mhard-mul is default, yes	14:25
bandvig	stekern: but I haven't found how -mhard-mul is provided into command line of or1k's gcc while building NewLIB. Do you know that?	15:17
dalias	if it's the compiler default it doens't need to be provided	15:34
dalias	-mhard-float presumably uses hard mul unless you do -mno-hard-mul too, no?	15:35
bandvig	dalias: stekern: Oh, I haven't uderstood correclty. I thought "the -mhard-mul is provided to NewLIB building command line by default".	16:11
bandvig	stekern: btw, the "$or1k-elf-gcc --target-help" lists or1k specific options, but it doesn't say if some of them are active by default. Are there other default options?	16:15
bandvig	dalias: Actually, I don't know about relations between or1k specefic options. Is there description somewhere?	16:18
dalias	i dunno either	16:18
stekern	bandvig: I'm not sure if there's a way to see from the command line the default options, but you can get it from here: https://github.com/openrisc/or1k-gcc/blob/or1k/gcc/common/config/or1k/or1k-common.c#L59	19:59
stekern	and all the MASK_ options can be seen here: https://github.com/openrisc/or1k-gcc/blob/or1k/gcc/config/or1k/or1k.opt	20:00
stekern	from that you get the Init value of mredzone too	20:00
bandvig	stekern: Thanks. I'll look at that. And I've got a genius idea. :) Let's add the '(default)' into --target-help output to mark the options activated by default. :)	20:06
stekern	yeah, but to be honest, I doesn't have the faintest idea how the --target-help list is generated ;)	20:14
--- Log closed Sun Mar 08 00:00:12 2015

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!