IRC logs for #openrisc Tuesday, 2012-12-18

ErantAnyone familiar with the minsoc?01:07
ErantI'm trying to get it to work with my Atlys board, and the GENERIC_TAP, but it's throwing out errors about tap_top and adbg_top01:08
ErantI tried adding tap_top.v to the list of Verilog files in minsoc_top.prj, and I added jtag_top.xst to the list of synthesis files, but still:01:08
ErantERROR:HDLCompiler:1654 - "/home/parallels/minsoc/prj/../rtl/verilog/minsoc_top.v" Line 505: Instantiating <tap_top> from unknown module <tap_top>01:08
ErantI'm using XST 14.3, under Linux.01:09
ErantIt works (or at least synthesizes fine), if I select the internal scan variant.01:09
ErantIt looks like the tap_top.v needs to be one of the blackboxes.02:12
wazIs there any plan for a 64-bit implementation?02:30
wazOn the other side, which fmax have you reached in an FPGA?02:32
juliusbother side hey?02:36
juliusbi didn't realise the otherside of the 64-bit implementation question was maximum frequency, an interesting take02:36
juliusbanyway, no 64-bit implementation02:36
juliusbyou'd need the compiler to support it, as well as the SOC you're running on02:36
juliusbno one's done it, no real need02:37
juliusbmaximum frequency, that's another question02:39
juliusbit depends on the RTL implementation, the FPGA, the constraints02:39
juliusbfastest on cheap commodity FPGAs like spartan 6 is about 80-100MHz02:39
juliusbErant: sorry, not sure about that myself02:40
juliusbperhaps a module is missing, the actual TAP02:40
Erantjuliusb: It looks like the tap was missing, but adding it wasn't as obvious to me as it should've been.02:41
ErantIt's synthesizing now.02:41
ErantWe'll see if it works02:41
wazSorry, I should have said "on the other hand" or moreover.02:51
ErantWe'll see if it works, if the damned physical synthesis didn't hang.02:52
wazI would want to know the possible fmax on both low-end and high-end Altera devices. I'm doing a comparison between OpenRISC and Nios II.02:54
Erantwaz: fmax doesn't tell you much. You're probably much more interested in the MIPS, for example.02:56
ErantOr FLOPS02:56
wazI know that fmax is not enough for comparison purposes, but it tells me something about the optimization of the pipelines. I prefer CoreMark.02:56
ErantAnyway, I just synthed minsoc (a minimal SoC based on the OpenRISC), and the synthesizer says ~125MHz on a Spartan6.02:57
wazPretty good I think. For example, if I remember well, the Leon-3 can't pass the 100 MHz in low-end devices.03:05
ErantI wouldn't call the Spartan6 truly low-end (Though I should try and run this for my Virtex-4 and Stratix-II)03:07
wazCyclones and Spartans are low-end, according to their manufacturers.03:10
ErantMjeh, fine.03:12
wazWhy there's no real need for 64-bit computing?03:17
juliusbthere's no real need for a 64-bit OpenRISC03:17
juliusbI don't see the application03:18
juliusbit just makes it bigger, probably slower03:18
juliusbmy focus is on embedded computing, anyway03:18
juliusbi don't see any use03:18
wazNo "current need" perhaps, but we better prepare for the next challenges of tomorrow.03:19
wazLarger address spaces come to my mind.03:20
wazTake as an example that Altera has expanded the support in its system generator (QSys) for address buses larger than 32 bits.03:21
wazRAM is pretty cheap these days.03:21
wazAlso, how can we know how much slower it will be, if it's been never implemented?03:24
waz(OK, fmax goes down, but still it's worth to try)03:25
Erantwaz: More memory can always be solved with a clever MMU.04:06
wazSimple is almost always better. Look at PAE in x86. Barely used. Programmers prefer a linear address space.04:34
wazMore memory could possibly supress the need for demand-paging in many systems.04:40
waz(I'm starting to ramble)04:41
stekernwaz: if you want coremark scores, this is mor1kx running @80 MHz on an atlys board (spartan6)05:38
stekernI get the same result running 80 MHz on a de0 nano board (cyclone IV)05:38
stekernI don't agree with juliusb that there aren't applications for 64-bit, it's just that it hasn't scratched anyones itch enough to implement it05:42
stekernI mean if there aren't applications, why is there 64-bit versions of MIPS?05:42
wazstekern: Thanks for the info. Those 80MHz were the maximum achievable?05:43
stekernyes, at least in orpsoc05:44
stekernthat's a lot better than or1200 though, that doesn't go over 50MHz05:44
stekern(don't know how Erant got it to go to 125MHz on minsoc, that doesn't sound feasible)05:45
stekernwaz: I'm hoping to further improve both the fmax and MIPS on the mor1kx cappuccino pipeline though05:51
stekernwould be interesting to see a comparison with Nios II in an otherwise identical soc05:54
stekernMy guess is that it's faster (both fmax and mips wise), but that's a subject to (hopefully) change ;)05:57
stekernthey have the advantage of being a (vendorspecific) FPGA only implementation05:58
stekernthat allow them doing a couple of shortcuts, that we try to avoid05:59
Erantstekern: Eh, it's what the synthesizer put out. We'll see after PAR06:10
ErantWhich, for some reason, is taking a _really_ long time...06:12
ErantLike, I kicked it off an hour ago.06:12
ErantAnd it's in Phase 5 of the PAR06:13
ErantTotal REAL time to MAP completion:  57 mins 48 secs06:14
ErantThat's on a Linux VM with 2 cores assigned to it, Intel i7 at 3.4GHz, and 4GB RAM.06:14
ErantAnd it's not even done PARing06:15
wazstekern: I think that we will need to compare the complete systems under similar loads. With "load" I mean without inadvertently congesting the generated netlist and provoking lower fmax (after adding peripherals, etc).06:15
wazMay I ask what subsets of the specification are implemented in minsoc?06:16
Erantwaz: It's the same core. The SoC surrounding it is smaller.06:17
Erantor1200, that is06:18
stekernmor1kx cappuccino is pretty similiar to or1200 in what it is implemented, the biggest thing missing atm is MMUs06:20
stekern(i.e. or1200 has them, mor1kx doesn't)06:20
stekernor1200 has FPU too, where mor1kx doesn't06:21
stekernthere's some descriptions on the different implementations06:23
wazIt only contains ORBIS32 then. With MMU and FPU the fmax should be lower, so a full or1200 is slower.06:26
wazAt least, I guess that.06:27
stekernnot necessarily06:31
stekernthe fpu can be pretty decoupled from the critical paths (at least I imagine it could)06:31
wazIn an ideal world. However, even manufacturers are afraid of adding FPUs because of critical paths (Microblaze and Nios II are the proof).06:34
stekernis that really the reason?06:35
stekernI mean, if that's the case, wouldn't that be a trade-off choice for the user then06:36
wazOh no, it's not the reason. My point is with a FPU, high are the chances that fmax will descend. Even highly optimized and specific implementations of the FPGA vendors don't do it (although definitely, marketing reasons count for them, at least for clients that got excited by higher frequencies).06:46
ErantTotal REAL time to PAR completion: 45 mins 48 secs <-- There has to be something wrong here :/07:03
ErantMAP + PAR ended up being an hour and 40 minutes.07:04
stekernErant: it usually takes ~25 min to fully synth+par orpsoc for atlys at my ws07:36
stekernactually or1200 has a bit higher fmax than I thought on cyclone iv when MMUs are disabled, I get around 72 MHz using the same setup as for mor1kx cappuccino (where I got 80 MHz)07:43
stekernwaz: and probably the demand for it doesn't live up to the cost of implementing one07:47
stekernwith mmus enabled or1200 get around 62 MHz on the cyclone iv07:51
wazstekern: with both MMU and FPU enabled?07:52
stekernno, only MMUs07:52
stekernFPU has been disabled on all runs07:53
stekernI'm running a MMUs disabled FPU enabled test run now07:53
wazOnly ORBIS32?07:53
wazIf that's the case, there is a bottleneck there.07:54
stekernin the MMU?07:54
wazNo, in the pipeline. And also I forgot if both I and D caches are included.07:54
stekernboth are included (both on or1200 and mor1kx cappuccino)07:55
wazHow large is the cost in the fmax of adding precise interrupts (or the whole interrupt subsystem)? If it can be disabled.07:57
wazfor testing.07:57
stekernhaven't tested07:58
wazIt could be an interesting test.07:58
stekernyou still have exceptions that you really can't disable though07:58
stekernwell you could, but...07:58
wazSure, you'll need to do heavy modifications.07:59
wazHow many stage it has?07:59
wazpipeline stages07:59
stekernI reckon there's some critical paths there, but not much you can do about them if you want it to be usable still ;)07:59
stekernor1200 has 4, mor1kx cappuccino has 607:59
stekern... so there's a lot more room to move things around in cappuccino to resolve the bottlenecks08:00
waz4 is too low.08:00
wazDefinitely a very critical path has formed.08:01
stekernI agree, but it depends on your application08:01
stekernshorter pipeline => you can get away with smaller implementation08:01
wazDo you refer to trading area vs timing?08:01
stekern... something or1200 fails to do though ;)08:01
wazOK. I got it. Still I think that some people may want maximum performance.08:02
stekernI agree, that's why I'm working on that on the mor1kx cappuccino08:02
wazEven Nios II that is has almost the same features has 5 stages in the minimum pipelined version.08:02
stekernthat's what great about the mor1kx, it's pretty easy to modify the pipelines in it08:03
stekernjuliusb (the mor1kx founder) has a 3-stage pipeline version of it as well08:04
wazMay I ask something about an specific point? I'm thinking about how costly is to implement load with a granularity lower than 32-bits.08:04
wazI mean, byte and half-word loads.08:05
wazYou need to mux to select the value you just got from the cache (assume a hit).08:05
stekernactually, that's a pretty interesting thought08:06
stekernit is pretty critical path, I agree08:06
stekernyou could make byte and half-word accesses slower (i.e. registering those results) and possibly gain something there08:07
stekernsomething that should be pretty easy to test too08:07
stekernok, FPU enabled and MMU disabled results in an fmax of 73 MHz (so 1 MHz faster than without)08:09
wazI found out some days ago that the Alpha architecture only allows 32-bit and 64-bit loads (being a 64-bit architecture). DEC people realized that smaller loads would be too costly for its superscalar implementation. Here we talk about scalar ones of course, but still -as you say- it's quite costly.08:10
wazI was almost shocked about this clever choice.08:11
wazstekern, the results are interesting. Evidence that synthesis realms are sometimes mysterious.08:14
stekernyeah, I wouldn't read into that 1 MHz increase too much ;)08:15
wazHow larger is the area?08:15
stekernI didn't look08:16
stekernbut IIRC, the FPU is pretty large08:16
wazAh OK.08:16
stekerncould possibly be implemented smaller/better too08:16
stekernI think juliusb just took an existing FPU core and hooked it up to or120008:17
stekernhe can correct me if I'm wrong ;)08:17
wazIt would be interesting to see the STA report.08:19
stekernmmus enabled and fpu enabled => 65 MHz08:20
wazYou should publish online some runs, given how long it takes.08:20
wazThat's pretty much expected.08:21
stekernthat's from the last run, with mmus and fpu enabled08:23
wazThank you. Question: why don't you enable parallel compilation?08:27
stekernhere's the map report:
stekern|or1200_fpu:or1200_fpu | 4071 (44) ; 1263 (37)08:28
stekernabout as large as the rest of the cpu ;)08:29
stekernparallel compilation, that's not available in the web version, no?08:29
wazIn the last version (12.1) if you enable TalkBack it's free.08:34
waz(and also the 64-bit version is included)08:34
stekernah, ok08:35
stekernperhaps time to upgrade ;)08:35
stekerndoes the parallell compilation make a huge difference though?08:37
stekernin ISE it's hardly noticable08:37
wazI need to test it more, but I read in forums that about 20~35%08:37
wazNot much really in old versions.08:38
wazBut perhaps it has been enhanced.08:38
stekernok, that's pretty much08:38
stekerncompilation times isn't unbearable as it is now anyways08:39
wazAt least the environment feels snappier (with fancy graphics included).08:39
stekernoh, I don't use the gui much... ;)08:39
stekernnothing is as snappy as a b&w console you know08:40
wazFor batch building, nothing better.08:40
wazBut I must say that Quartus is a quite simple IDE. Feels snappy in part because of that.08:42
stekernyeah, I agree, it is good08:42
stekernand they have fixed a very annoying "feature" they had in the older versions08:43
stekernif you would press ctrl-x in the editor, it would cut the current line, even if nothing was selected08:43
wazI've never seen that.08:44
stekernyou could imagine what happens when an emacs user is doing 'ctrl-x ctrl-s' by old habit08:44
stekernwell, it bit me a couple of times (I believe it was in version 9 something)08:45
wazPretty bad behavior.08:45
wazBut it doesn't compare with what happened in some previous version. The text editor went crazy.08:46
wazThe lines blended in front of you.08:47
wazIf you didn't scroll to refresh the drawn content and you saved the file, the lines blended, well, disappeared.08:49
stekernheh, that's nice...08:49
stekernI haven't figured out how to get a textual representation of the Worst-Case paths though08:50
wazThe timing closure recommendations?08:50
stekernI mean the path reporting in the TimeQuest UI08:51
wazTry exporting it as HTML08:52
stekernwouldn't I need to do that in the ui?08:55
stekernI'd like to avoid starting the ui all together08:55
wazI think that it's more useful the "Report Timing Closure Recommendations".08:56
wazCould you please export the "Long Combinational Path" report?08:57
wazI didn't understand what you asked about needing to do it in the UI.08:58
stekernI meant, to be able to export to HTML, don't I have to start the UI to do that?08:59
wazNo, just right click on the name of the generated report.09:00
wazExport and select HTML file type.09:00
stekernright clicking in the commandline isn't doing anything for me, am I'm doing it wrong? ;)09:02
stekernI was probably not clear enough, I want to generate that report from the commandline09:03
wazOh, I can't help you much. I barely use the command line (for Nios).09:05
stekernwhere's that "Long Combinational Path" report?09:06
stekernI haven't seen that09:06
wazIn the Timequest UI, it is generated under the "Report Timing Closure Recommendations".09:08
wazThanks for the information, stekern.09:16
wazI have to go. I hope to collaborate with something for the project in the near future.09:18
wazThanks again and bye.09:18
stekernjust when I was about to thank about the guidance about Timing Closure Recommendations09:23
stekernoh, and tweaking the compilation settings upped the fmax to 89 MHz on mor1kx09:45
mor1kx[mor1kx] skristiansson pushed 3 new commits to master:
mor1kxmor1kx/master 2fb5c78 Stefan Kristiansson: Remove remains from when icache was located outside cpu13:08
mor1kxmor1kx/master 7dae85b Stefan Kristiansson: cappuccino/ctrl_branch: connect pipeline_flush to imm_branch reset13:08
mor1kxmor1kx/master b0f0adc Stefan Kristiansson: move dcache to lsu13:08
stekernjuliusb: I changed the mor1kx github description from 'mor1kx' to 'mor1kx - an OpenRISC 1000 processor IP core'13:16
stekernbecause I realised that it's hard to get what it actually is when browsing pages like this:
LoneTechwhy is gc-sections architecture specific?14:05
stekernLoneTech: in BFD?14:14
LoneTechI assume so14:18
stekernthe only really architecture specific thing I have seen going on there is updating the got reference counting for the sections being removed14:19
stekernwhy the or32 didn't support it, I don't know14:21
stekernwell, it still doesn't, but the or1k toolchain does14:22
stekernand the openrisc target in bfd did as well14:22
LoneTechI am clearly a bit out of date on how manu permutations of toolchains there are14:23
stekernlet me bring you up to date :)14:23
stekernin ~2000 Johan Rydberg submitted a cgen generated openrisc target to "sourceware"14:24
stekernin ~2001 Ivan Guzvinec submitted a or32 target to "sourceware"14:25
stekernin ~2011 Julius Baxter merged the two into the or1k target14:26
stekernand world order has (almost) been restored ;)14:26
LoneTechthank you :)14:31
blueCmdare there any patch-series for gcc 4.7 and binutils 2.23 ?16:18
blueCmdalso, have anybody made any real effort in porting glibc?16:18
stekernblueCmd: no, but for gcc 4.8 and 2.22.5216:23
stekernwas a while since it was synced against upstream16:23
stekernto your second question, no16:23
blueCmdstekern: ah, where can I find these patches?16:25
blueCmdsynced against upstream isn't the same as that they were accepted by upstream, right?16:26
blueCmdoh, sweet. is this  linked on the opencores-site?16:27
stekernthere are some dynamic linking patches that I haven't pushed there (I should clean them up) here too:16:27
jeremybennettblueCmd: There is a lot about the tool chains on the Wiki, including test results.16:28
blueCmdjeremybennett: indeed, as somewhat of a newcomer it's quite hard to know what information is current16:28
stekernI probably should just merge my changes and do a changelog right away16:29
stekernand cleanup the few warts later16:29
blueCmdjeremybennett: I found it under "Installation of development versions"16:30
blueCmdso I guess it's there :)16:30
blueCmdstekern: is there some kind of plan for merging these patches with upstream?16:44
stekernthe biggest problem with that is to get permission from all the people that have hacked on it over the years to assign the copyrights to fsf16:59
stekernor at least that's my understanding of it16:59
blueCmdAh, I see. it would probably be really beneficial to openrisc though.17:08
blueCmdstekern: may I bother you for what configure flags you use when bootstraping or1k-gcc?17:20
blueCmdwhen it tries to build mno-delay/ it wants to link with libc which I don't have one yet17:21
jeremybennettstekern: You are right about the legal problems.17:22
jeremybennettIt is why I have always held off doing it.17:22
jeremybennettWe'd have to get assignment from all the people who had been involved.17:22
blueCmdbut that's a problem that won't go away, isn't it?17:23
jeremybennettblueCmd: It's very useful to have feedback on the Wiki from new users. Please edit it to make it clearer in the light of your experience.17:23
jeremybennettblueCmd: You are right, but to solve the problem requires a great deal of very tedious effort, and it is hard to justify spending that effort.17:24
blueCmdjeremybennett: Yeah, I understand17:25
blueCmdjeremybennett: I might just do that, there is often a lot of redundant and outdated information. I'm going to write an article (mainly for my own use) when I get everything running as I want it anyway so it will fit nicely17:27
jeremybennettblueCmd: Thanks17:28
blueCmdstekern: oh man, did I miss that somewhere? :(17:29
stekernno, it's my own cheat-sheet17:29
blueCmdah, puh17:29
stekernit has been pointed out that two make install steps are missing in that17:29
blueCmdthat seems correct yes, not that I would have read it that closely to have noticed it though :P17:31
stekernthat should have those17:33
blueCmdstekern: jeremybennett: do you guys work with openrisc or what are your stories?17:49
jeremybennettjeremybennett: I run an open source software company specializing in compiler tool chains for embedded systems17:55
stekernblueCmd: I'm in it for the kicks ;)17:55
blueCmdstekern: jeremybennett cool :)17:56
stekernactually, I was supposed to use openrisc in a hobby project around two years ago, I kind of forgot about that hobby project and kept hacking on various openrisc projects instead...17:59
blueCmdheh, that might as well be me in 2 years then :P18:01
stekernyeah, watch out, it is an addictive drug ;)18:01
blueCmdcool, gcc 4.8.0 just compiled linux 3.6.10 and it works. great!18:01
blueCmdhaha, this is great - my init-process runs!18:05
blueCmdstekern: jeremybennett thanks for your help :)18:05
jeremybennettblueCmd: You're welcome.18:23
poke53281Stekern: I have problems running shared library programs with this toolchain20:05
poke53281static: no problem20:05
poke53281but when I run my shared library hello world program I get the error message20:06
poke53281"/bin/sh: ./a.out: not found20:06
poke53281ldd with debug messages gives me the output20:06
poke53281 # ./ldd a.out20:06
poke53281ldd: can't open cache '/etc/'20:06
poke53281checking sub-depends for '/usr/lib/'20:06
poke53281argc=1 argv=0x7feaded4 envp=0x7feadedc20:06
poke53281ELF header=0x3000000020:06
poke53281First Dynamic section entry=0x30009c8c20:06
poke53281Scanning DYNAMIC section20:06
poke53281Done scanning DYNAMIC section20:06
poke53281About to do library loader relocations20:06
poke53281Done relocating ldso; we can now use globals and make function calls!20:06
poke53281_dl_get_ready_to_run:446: Cool, ldso survived making function calls20:06
poke53281_dl_get_ready_to_run:625: Position Independent Executable: app_tpnt->loadaddr=0x20:06
poke53281_dl_malloc:236: mmapping more memory20:06
poke53281So, in the end the program finds alls libraries. But still the same problem.20:07
poke53281Sorry for spamming the chat :)20:07
poke53281Normally I would track down the problem with strace. But unfortunately it is not supported.20:14
ErantRight now it's kinda hacky, but I got minsoc to run on an atlys board (*** Self-test PASSED ***, yay), who do I talk to about getting that port into mainline?20:35
poke53281@stekern: Found the problem. The compiled program tries to find the program interpreter in /usr/lib/ . But uClibc don't use this file. A symlink to /lib/ solves the problem.21:00
poke53281So, either there is an error in gcc or uClibc.21:01
poke53281"readelf -l program_name" shows the link to the program interpreter. You can try it in your toolchain21:02
-!- X-Scale is now known as Guest11621:22
-!- X-Scale` is now known as X-Scale21:23
stekernpoke53281: it's actually in bfd, it's hardcoded to /usr/lib/
stekernI was suppose to change that, but obviously I've forgot about it21:31
stekernah, I see that peter gavin has merged my changes and synced with mainline21:41
stekernmaybe we should just push that to github/openrisc21:41
poke53281Ok, I will clone and try again tomorrow.21:44
poke53281Next Problem  :)21:44
poke53281can't resolve symbol '__udiv13' in lib '/usr/lib/'21:44
poke53281This is a simple hello world program. Nothing fancy.21:45
poke53281Had no time so far to figure out the problem. Maybe you have a short answer.21:47
poke53281can't resolve symbol '__udivsi3' in lib '/usr/lib/'.21:47
poke53281This is the correct error message21:48
stekernyeah, I thought the udiv13 was weird21:48
stekern__udivsi3, that should be in libgcc21:48
poke53281The whole output21:55
stekernhmm, can't say I can see why that is happening from the top of my head22:27
blueCmdErant: is it,400,836&Prod=ATLYS&CFID=131907&CFTOKEN=13787544 ?23:34
Erantpoke53281: Can you nm that libc? See what symbols it exports?23:36
blueCmdErant: that's just weird, my plan was to get some SoC running on that board in ~february or something - I would love to take a look at your patches23:39
blueCmdnot that I can help you or anything, just curious23:39
ErantblueCmd: They're quite simple. And I'm not using the internal JTAG scan chain23:41
ErantI'm using the adv_sys_debug TAP brought out to the Pmod connector23:41
Erant(And then an FT2232 dongle)23:41
blueCmdI haven't looked at the board yet, I will borrow it from a friend around february, so I don't know anything about the internals of it yet23:42
ErantRight now I'm trying to get physical synthesis to be faster. There's something wrong here...23:42

Generated by 2.15.2 by Marius Gedminas - find it at!