stekern | I just added support for l.cmov in llvm, now I just need some hardware that supports it... | 06:51 |
---|---|---|
jemarch | hi | 12:42 |
juliusb | hi jose | 16:01 |
jeremybennett | Hi all | 16:02 |
juliusb | stekern: l.cmov huh? in which situations does LLVM use that? | 16:02 |
stekern | juliusb: in rather many | 16:04 |
juliusb | cool | 16:05 |
stekern | somevar = somecond ? opt1 : opt2; | 16:05 |
stekern | for example ;) | 16:05 |
juliusb | yeah that's great | 16:07 |
stekern | select_cc gets lowered into it, instead of l.sfxx; l.bf skip; | 16:07 |
juliusb | !! | 16:07 |
juliusb | that seems like a much better way to do things than branching over an assign | 16:07 |
juliusb | branching is expensive :( | 16:07 |
stekern | I'm rather certain our gcc port could use it to, if someone put in support for it | 16:08 |
juliusb | have you got any benchmark software running on LLVM yet? | 16:15 |
juliusb | or rather, compiling on LLVM yet which you can then run and evaluate? | 16:15 |
juliusb | I guess you need the HW support for l.cmov as you mentioned | 16:15 |
stekern | or1ksim does support it, but that one isn't really reliable for benchmarking, so I'm planning on implementing it in your mor1kx baby ;;) | 16:18 |
stekern | as far as benchmarks goes, I've ran dhrystone and coremark on or1200 and mor1kx comparing gcc4.5.1, gcc4.8.0 and llvm | 16:22 |
stekern | result is: mor1kx is about 33% faster in coremark than or1200, in the order of speed: gcc4.8, gcc4.5.1 llvm | 16:24 |
stekern | gcc4.8 produces better results than gcc4.5.1 on mor1kx, while gcc4.5.1 produces better results than gcc4.8.0 on or1200 | 16:26 |
stekern | mor1kx is faster in dhrystone than or1200 | 16:28 |
juliusb | cool | 16:28 |
juliusb | wow one third faster, i'm a little bit surprised | 16:28 |
juliusb | that branching thing must be cheaper in mor1kx ;) | 16:29 |
stekern | guess so :) | 16:29 |
stekern | grepping for l.cmov in coremark shows 14 instances when compiled with cmov support, will be interesting to see the results when compared to it compiled without it | 16:42 |
stekern | and how much area it will steal to implement it | 16:43 |
stekern | shouldn't be that much, a mux and a bit of control logic | 16:44 |
juliusb | yeah, I'd expect marginal increase in area | 16:48 |
juliusb | if you put it through the ALU, and just mux out the result based on the flag | 16:48 |
stekern | yup, that's the plan ;) | 16:48 |
juliusb | cmov_result = flag ? a : b; | 16:48 |
stekern | exactly, my guess is that it's more work feeding back the flag into the alu (i.e. wiring in the top module) than to implement it =p | 17:34 |
juliusb | you could be lazy and just register it thre in the ALU too, to save doing the wiring ;) | 17:42 |
juliusb | we calculate it in the ALU and when to write it, so you have the info in that module | 17:42 |
juliusb | :P | 17:43 |
juliusb | maybe the synthesis tool be smart enough to optimise it away, but i doubt it | 17:43 |
stekern | yeah, I saw that, but there's some glue logic to sr[f] and the clear/set so I think it still might be clearer to feed back the "ready-made" flag | 17:45 |
juliusb | ya | 18:45 |
* juliusb agrees | 18:45 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!