--- Log opened Sat Mar 07 00:00:11 2015 | ||
stekern | poke53281: I tested your 'mandelpar' against mor1kx's fpu now, got ~500 sec vs ~700 sec with and without -mhard-float | 09:43 |
---|---|---|
stekern | on 4 cores | 09:43 |
stekern | I modified it to use paletted fb too | 09:46 |
stekern | my lk pull-request got merged too | 09:55 |
bandvig | stekern: good news about lk!!! | 09:56 |
bandvig | stekern: does the 'mandelpar' includes a lot of trigonometric, logarithms, square roots, pow, etc (i.e. functions, rather arithmetic)? | 09:57 |
stekern | bandvig: this is the original code I got from poke53281 http://pastie.org/9692650 | 09:59 |
bandvig | stekern: I couldn't to download it. "Sorry, there is no pastie #9692650 or it has been removed. Why not create a new pastie?" | 10:01 |
stekern | oh, I had it in cache obviously... | 10:02 |
stekern | http://pastie.org/10007030 | 10:03 |
bandvig | stekern: btw, which SoC you use for multicore? optimsoc? | 10:26 |
poke53281 | stekern: Great | 10:41 |
poke53281 | But 500ms vs 700ms is not that good. | 10:42 |
poke53281 | The coide contains the log functions. Is this one also executed in hardware? | 10:43 |
poke53281 | Maybe you have to compile musl also with hard-float | 10:43 |
poke53281 | Or is a log calculation provided by gcc and not libc? | 10:46 |
poke53281 | But I hope you like openmp as much as I do. | 10:48 |
bandvig | stekern: poke53281: yes, it contains log() and (I believe) at least sqrt() as a part of abs(complex). The functions aren't supported in hardware. So it should be checked if these functions (and other ones, of course) are computed with soft-float or hard-float arithmetics. | 11:05 |
dalias | ? | 11:14 |
poke53281 | The question mark tells me, that the software implemented log and sqrt function is provided by gcc and not musl :) | 11:15 |
dalias | no | 11:17 |
dalias | i meant the question of hard vs soft does not make sense | 11:17 |
dalias | if you have hard float and you're using it, soft float will not be used for anything | 11:18 |
dalias | sqrt and log just need to be built up from elementary (hard) float operations rather than having hardware do the whole operation as a unit | 11:18 |
poke53281 | well, the hard-float supports only very basic operations. If the program uses other numeric functions, the library, which contains these function should be also compiled with hard-float. | 11:18 |
dalias | and the way this is done is the same whether the underlying float arithmetic is hard or soft | 11:18 |
dalias | ah i see. are you thinking of a case where the app was compiled with hard-float but libc was compiled for soft? | 11:19 |
poke53281 | really? | 11:19 |
poke53281 | Yes | 11:19 |
dalias | yes. sqrt is just a .c file | 11:19 |
dalias | it doesn't care if +-*/ are implemented with hardware or software | 11:19 |
poke53281 | Yes, the C- file doesn't care. But compiled lib cares. | 11:20 |
dalias | well either way it computes it in the same manner. it's just a matter of whether the +-*/ are optimized | 11:21 |
poke53281 | Yes | 11:21 |
dalias | that's what i meant | 11:21 |
dalias | sorry for the confusion | 11:21 |
poke53281 | I wonder why the hard-float unit gives only a speed up of 40%. | 11:22 |
poke53281 | And that could be a reason. | 11:23 |
dalias | how fast is your hard float? | 11:23 |
poke53281 | I don't know. Ask bandvig and stekern. | 11:24 |
dalias | unless fpu ops are as fast (or nearly) as integer ops, hard float is probably not going to be a "huge" win | 11:24 |
poke53281 | They are probably slower. | 11:28 |
dalias | yeah | 11:28 |
bandvig | dalias: poke53281: :))) I've got 10...20 times speed up on Whetstone tests which use arithmetic (+-*/) only. But, I don't see any improvement for Whetstone tests which uses functions. I use NewLIB. | 11:31 |
bandvig | Le me several lines, I'll put the whole table with results. | 11:32 |
dalias | well you would need to compile newlib with hardfloat | 11:33 |
poke53281 | sorry, I don't use logf and sqrtf | 11:34 |
poke53281 | but log and sqrt which uses double. | 11:35 |
poke53281 | So maybe, this is the reason. | 11:35 |
bandvig | dalias: yes, I believe it is the path for further imrovement also foy 'mandelpar' | 11:35 |
bandvig | poke53281: it is also must be corrected | 11:36 |
poke53281 | stekern: Please change every function to the corresponding single floating point function and compile you library also with hard-float. Then test again. | 11:37 |
bandvig | poke53281: btw, you use abs(complex<float>). Is it involves sqrtf(float) or sqrt(double)? | 11:39 |
poke53281 | don't know | 11:40 |
poke53281 | well, abs include sqrt | 11:41 |
poke53281 | better would be to calculate the square of it and compare against 4, not 2. | 11:41 |
bandvig | poke53281: perhaps it coul be safely to replace it with sqrtf(real(z)*real(z)+imag(z)*imag(z)) | 11:41 |
dalias | poke53281, using tgmath.c could do that automatically :-p | 11:42 |
dalias | but tgmath.h is hideous | 11:42 |
dalias | bandvig, cabsf? | 11:42 |
poke53281 | never heart about tgmath. | 11:42 |
dalias | tgmath.h was a hideous addition in c99 | 11:43 |
poke53281 | Wel, I usually never use float. It is terrible inaccurate. The mandelpar was never meant to be fast. Just a way to test SMP with a parallelized program. | 11:44 |
dalias | yeah, float is pretty bad for most things | 11:46 |
dalias | makes sense for audio samples tho | 11:46 |
bandvig | dalias: perhaps, yuo are right, cabsf(), I'm not veri familiar with complex lib. | 11:46 |
stekern | bandvig: poke53281: my test was just of the type "something that use the fpu" | 11:46 |
poke53281 | http://pastie.org/10007124 | 11:47 |
poke53281 | try this | 11:47 |
dalias | btw it's unfortunate when fpus lack sqrt instruction | 11:47 |
dalias | sqrt is one of the most expensive ops to do in C | 11:47 |
dalias | because it needs to be exact/correctly-rounded, not just a good approximation | 11:48 |
stekern | not so much trying to read out the performance of the fpu | 11:48 |
bandvig | poke5381: I wold make a correction: (z.real()*z.real()+z.imag()*z.imag() <= 4.0f or (z.real()*z.real()+z.imag()*z.imag() <= (float)4. | 11:48 |
poke53281 | Ok | 11:49 |
stekern | yeah, that shaves of 410 sec of it | 11:51 |
stekern | +f | 11:52 |
stekern | (i.e. 90 sec remaining) | 11:52 |
poke53281 | great | 11:53 |
stekern | and no, I didn't recomile anything else than the actual program with -mhard-float | 11:53 |
bandvig | dalias: personally I'm interested exactly in float as I widely use it to implement acquizition/tracking algorithms in digital receivers. The float usage speed up design cycle many many times. | 11:54 |
stekern | vs 170 when compiled with softfloat | 11:56 |
poke53281 | 90 vs 170. Sounds better. | 11:59 |
poke53281 | Ok, so there might be still the log fuction for the color calculation. | 12:01 |
poke53281 | And the conversion float to int. | 12:01 |
stekern | bandvig: did you do any delibarate area optimisations too? I recall that the or1200 fpu was about the same size as or1200, while pfpu32 is only about half the size of mor1kx | 12:02 |
stekern | poke53281: the color calculation is precalculated in my modification | 12:02 |
poke53281 | Ok | 12:03 |
bandvig | poke53281: float <-> int are supported in FPU | 12:03 |
stekern | http://pastie.org/10007137 | 12:04 |
poke53281 | Ok, that means, that the effective speedup is around a factor of two. | 12:06 |
bandvig | stekern: in fact, the FPU almost completely refactored. In particular OR1200-FPU uses separate post-normalization units for each operation. Mor1kX-FPU uses common align and rounding post-operation steps. | 12:08 |
stekern | ah, ok. | 12:11 |
stekern | nice work | 12:12 |
bandvig | Thanks. Additionally, OR1200-FPU uses digit recurrence division. In Mor1kX-FPU the Goldshmidth division is implemented and DIV/MUL units share multiplier. | 12:15 |
stekern | yeah, I saw your question about that on the list earlier and the commit | 12:17 |
bandvig | stekern: it looks my last post overwrite yor's one with new reference on pastie.org isn't it? | 12:18 |
stekern | sorry, couldn't parse that, what do you mean? | 12:20 |
bandvig | well, how to say... Was the http://pastie.org/10007137 the last your post with reference on pastie.org. If "yes", don't worry. | 12:25 |
bandvig | well, next 12 lines will contain Whetstone comparison | 12:28 |
bandvig | please, be patient | 12:29 |
bandvig | Single Precision C/C++ Whetstone Benchmark | 12:29 |
bandvig | Loop content soft-float OR1200 mor1kx | 12:29 |
bandvig | FPU FPU | 12:29 |
bandvig | N1 floating point (MFLOPS) 0.409 3.200 9.600 | 12:30 |
bandvig | N2 floating point (MFLOPS) 0.336 3.360 6.720 | 12:30 |
bandvig | N3 if then else (MOPS) 0.000 0.000 0.000 | 12:30 |
bandvig | N4 fixed point (MOPS) 2.250 31.500 31.500 | 12:30 |
bandvig | N5 sin,cos etc. (MOPS) 0.019 0.020 0.020 | 12:30 |
bandvig | N6 floating point (MFLOPS) 0.409 2.075 7.706 | 12:31 |
bandvig | N7 assignments (MOPS) 0.000 0.000 0.000 | 12:31 |
bandvig | N8 exp,sqrt etc. (MOPS) 0.009 0.009 0.009 | 12:31 |
bandvig | MWIPS 0.954 1.128 1.156 | 12:31 |
bandvig | done | 12:31 |
bandvig | well, lets back to library building flags. I've disassembled libm.a from NewLIB. I didn't found any lf.* instructions. | 13:37 |
bandvig | On the other hand the sinf(), for example, is computed by Tailor series through call __kernel_sinf() and __kernel_cosf(). | 13:37 |
bandvig | It means that we have to set -mhard-float option somewhere in make files to build NewLIB hard-float variant of libm.m with lf.* instructions. | 13:38 |
bandvig | So, could someone consult me how to to it? | 13:38 |
bandvig | it is interesting that I found l.mul in the disassembled libm.a. It means that it least -mhard-mul was used. Am I right? | 13:45 |
rcallan | hi. Is there a list of development boards being actively developed on? There seem to be several old webpages and dead links | 14:11 |
stekern | bandvig: -mhard-mul is default, yes | 14:25 |
bandvig | stekern: but I haven't found how -mhard-mul is provided into command line of or1k's gcc while building NewLIB. Do you know that? | 15:17 |
dalias | if it's the compiler default it doens't need to be provided | 15:34 |
dalias | -mhard-float presumably uses hard mul unless you do -mno-hard-mul too, no? | 15:35 |
bandvig | dalias: stekern: Oh, I haven't uderstood correclty. I thought "the -mhard-mul is provided to NewLIB building command line by default". | 16:11 |
bandvig | stekern: btw, the "$or1k-elf-gcc --target-help" lists or1k specific options, but it doesn't say if some of them are active by default. Are there other default options? | 16:15 |
bandvig | dalias: Actually, I don't know about relations between or1k specefic options. Is there description somewhere? | 16:18 |
dalias | i dunno either | 16:18 |
stekern | bandvig: I'm not sure if there's a way to see from the command line the default options, but you can get it from here: https://github.com/openrisc/or1k-gcc/blob/or1k/gcc/common/config/or1k/or1k-common.c#L59 | 19:59 |
stekern | and all the MASK_ options can be seen here: https://github.com/openrisc/or1k-gcc/blob/or1k/gcc/config/or1k/or1k.opt | 20:00 |
stekern | from that you get the Init value of mredzone too | 20:00 |
bandvig | stekern: Thanks. I'll look at that. And I've got a genius idea. :) Let's add the '(default)' into --target-help output to mark the options activated by default. :) | 20:06 |
stekern | yeah, but to be honest, I doesn't have the faintest idea how the --target-help list is generated ;) | 20:14 |
--- Log closed Sun Mar 08 00:00:12 2015 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!