--- Log opened Sun Sep 25 00:00:27 2016 | ||
olofk | kc5tja: Congratulations on your progress! | 00:45 |
---|---|---|
kc5tja | Thanks! | 00:50 |
mor1kx | [mor1kx] spacemonkeydelivers closed pull request #40: Introducing PCU (master...master) https://github.com/openrisc/mor1kx/pull/40 | 08:32 |
mor1kx | [mor1kx] spacemonkeydelivers opened pull request #41: Introducing PCU (master...pcu) https://github.com/openrisc/mor1kx/pull/41 | 08:40 |
mor1kx | [mor1kx] spacemonkeydelivers opened pull request #42: Fixes for saturating counter branch predictor (master...bpred_fixes) https://github.com/openrisc/mor1kx/pull/42 | 08:41 |
mor1kx | [mor1kx] spacemonkeydelivers opened pull request #43: Introducing gshare branch predictor (master...gshare) https://github.com/openrisc/mor1kx/pull/43 | 08:42 |
bandvig | I've downloaded stekern's (http://oompa.chokladfabriken.org/tmp/dhry/) dhry test | 09:45 |
bandvig | and run dhry-atlys.bin (converted to u-boot image) on the two pipelines: CAPPUCCINO and MAROCCHINO. | 09:45 |
bandvig | Initial results were: | 09:45 |
bandvig | CAPPUCCINO: DMIPS / DMIPS/MHz: 69 / 1.380000 | 09:45 |
bandvig | MAROCCHINO: DMIPS / DMIPS/MHz: 82 / 1.640000 | 09:45 |
bandvig | To achieve better results, I rebuild the benchmark with GCC-5.3.0 | 09:45 |
bandvig | (the tool chain was build with "old" newLIB - from openrisc/or1k-src repository, | 09:46 |
SMDhome1 | bandvig: have you modified sources of dhrystone? | 09:46 |
bandvig | because we couldn't build newLIB with "mhard-float" option http://juliusbaxter.net/openrisc-irc/%23openrisc.2016-07-07.log.html) | 09:46 |
bandvig | Compiler command line was: | 09:46 |
bandvig | or1k-elf-gcc -flto -pipe -O2 -mcmov -mhard-mul -mhard-div -mhard-float -mboard=atlys dhry.c -lm -o dhry_lto.elf | 09:46 |
bandvig | In fact in my cofiguration both pipes contains hardware multiplier, divider and l.mov. Also "link time optimization" was activated: -flto. | 09:46 |
bandvig | Results: | 09:47 |
bandvig | CAPPUCCINO: DMIPS / DMIPS/MHz: 183 / 3.660000 | 09:47 |
bandvig | MAROCCHINO: DMIPS / DMIPS/MHz: 197 / 3.940000 | 09:47 |
bandvig | SMDhome1: I used exactly source I downloaded from link http://oompa.chokladfabriken.org/tmp/dhry | 09:48 |
bandvig | SMDhome1: Moreover, please let me repeat, initially I just used pre-compiled binary: dhry-atlys.bin | 09:51 |
SMDhome1 | bandvig: I believe that results are not quite correct due to compiler could remove some parts of unused code(nothing got printed) | 09:51 |
SMDhome1 | Could you, please, compile and run this one: http://fossies.org/linux/privat/old/dhrystone-2.1.tar.gz/ | 09:51 |
bandvig | SMDhome1: Does the fossies's source contain OR1K timer related stuff? | 09:55 |
SMDhome1 | bandvig: nope, it doesn't | 09:56 |
wallento | olofk: still don't get the vhdl library issue | 11:39 |
wallento | everything gets compiled into worklib, right? | 11:39 |
wallento | is there an example somewhere? | 11:41 |
wallento | which fails | 11:41 |
wallento | xil_defaultlib I mean | 11:43 |
wallento | olofk: added top_module | 11:52 |
ZipCPU|Laptop | bandvig SMDhome1: There's some curious differences between the oompa version and the one I've been using. | 13:50 |
ZipCPU|Laptop | I'll note two: | 13:51 |
ZipCPU|Laptop | 1. The function call to test1(10,20) before the routine, guaranteeing that all of the code will be loaded into the cache before starting. | 13:51 |
ZipCPU|Laptop | 2. The initialization of Ch_Loc to zero, allowing the compiler to then optimize based upon it. | 13:51 |
ZipCPU|Laptop | Further, the instructions for Dhrystone specifically state that the *two* Dhrystone files must be compiled separately, not as a single file. | 13:52 |
ZipCPU|Laptop | This becomes partly a linker test, then, since they need to be placed together without final optimizations available. | 13:52 |
ZipCPU|Laptop | Enabling the link-time-optimization flag disables this part of the test, and therefore artificially inflates the score. | 13:52 |
ZipCPU|Laptop | To be a valid Dhrystone measure, the dhry.c file needs to be split properly into the two component files, dhry1.c and dhry2.c. | 13:53 |
ZipCPU|Laptop | These files must then be compiled separately, linked together, and then the test may begin. | 13:53 |
bandvig | ZipCPU|Laptop: the requirement to prevent using LTO sounds strange for me. | 14:10 |
bandvig | In fact If I want achieve maximum performance I'm going to use any king of suitable optimizations. And LTO is one of them. | 14:10 |
bandvig | Regarding caches. Default cache size for instructions and data is 32Kbytes for each. Dhry binary is ~83.5 Kbytes. It could not be cached completely. | 14:14 |
Dan | bandvig: The no-LTO requirement is a consequence of the Dhrystone instructions. I did not create that requirement. I'm just trying to make certain that one Dhrystone number can properly be compared to another. | 14:34 |
Dan | bandvig: As for the caches, I think 1) the impact is minimal (if any), 2) that it would only affect the first time through the loop, and 3) that the code that is actually executed is definitely small enough to fit within the respective caches. | 14:35 |
Dan | Keep in mind, a lot of the data requirement is for the statistics reporting at the end. | 14:35 |
-!- Dan is now known as ZipCPU | 14:35 | |
ZipCPU | There are lots of strings, maintained in the code space, for that reporting. These are not used until after the code has completed. | 14:36 |
ZipCPU | I'm sure the same is true of the libraries that link with it. | 14:36 |
ZipCPU | As I recall, I was able to get the entire test to fit within 1kW (4kB) when I worked with it. | 14:37 |
olofk | wallento: Add a dependency on libstorage and try this code to see fails without library assignment http://a6cc216f27fa3665.paste.se/ | 16:08 |
olofk | Aha! Finally found why the or1k-basic test fails. It's because of an uninitialized wire when the FPU is enabled | 16:27 |
olofk | If anyone sees bandvig, please ask him if spr_bus_ack_fpu_i is supposed to be connected to something. It's currently not, and that brings in an undefined value which causes havoc deep inside mor1kx | 16:31 |
olofk | Maybe stekern_ or wallento knows | 16:32 |
stekern_ | why are you guys so obsessed with running dhrystone? I always thought coremark was regarded as a better test. I almost exclusively used that to compare the changes I did to mor1kx. | 17:11 |
-!- stekern_ is now known as stekern | 17:12 | |
ZipCPU | stekern: I may be the source of the Dhrystone obsession. | 17:38 |
ZipCPU | I was hoping to have a comparison between the ZipCPU and mor1kx-generic to present as part of ORCONF this year. | 17:39 |
ZipCPU | While I have downloaded Coremark to examine it, I have not gotten it to run and I may not be able to get it to run without violating it's rules. | 17:39 |
ZipCPU | For example, coremark depends upon a byte-size of 8-bits. On the ZipCPU, the byte-size is 32-bits. | 17:39 |
ZipCPU | This makes for all kinds of hassles when trying to port software to the ZipCPU. | 17:40 |
ZipCPU | Still ... I was looking for a benchmark. | 17:40 |
ZipCPU | I'm open to alternatives ... ? | 17:40 |
ZipCPU|Laptop | For bandvig when he returns: I just ran his code in mor1kx-generic, and got nowhere near the score he's claiming. | 19:43 |
ZipCPU|Laptop | (Even with the criticisms mentioned above ... I didn't fix those ...) | 19:43 |
ZipCPU|Laptop | I wonder if the problem is in mor1kx-generic ... that it's somehow not up to the speed of the OR1k ATLYS implementation? | 19:44 |
ZipCPU|Laptop | Okay, I can now just about reproduce bandvig's work. Looks like my big problem was not including the -mhard-div compiler flag. | 22:30 |
ZipCPU|Laptop | However, if you combine the two files, as I mentioned above, you get an (illegal) artificial performance boost of perhaps 25% or so. | 22:31 |
ZipCPU|Laptop | Since it violates the "rules" of Dhrystone, his measure remains inflated, while the one I had calculated had been much too low. | 22:31 |
ZipCPU|Laptop | Oh, and I should point out, I'm making my measurements with mor1kx-generic. While I did try or1200-generic, it was significantly slower. | 22:36 |
kc5tja | Getting all of OP-IMM instructions working (I think), but now I've broken JALR somehow. | 23:03 |
--- Log closed Mon Sep 26 00:00:06 2016 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!