IRC logs for #openrisc Monday, 2016-09-26

--- Log opened Mon Sep 26 00:00:06 2016
--- Day changed Mon Sep 26 2016
promachhi, I have some verilog and VHDL background but very very little CPU/processor design experience. May I know how would i approach openrisc or opensparc ?00:00
promachFinde: must I finish computer architecture course to learn openrisc mor1kx ?00:06
Findeno no I'm just teasing00:06
promachto be honest, I am really a beginner00:06
Findeunfortunately I'm yet to find a good intermediate online resource for what comes after digital logic design and before that computer architecture class00:07
promachI see00:07
FindeI'm sure some others would have good recommendations for you00:08
kc5tjaNot really.  :)  (At least not in my case.)00:17
kc5tjaI would strongly recommend learning how to build stack-architecture (Forth language) CPUs.00:17
kc5tjaThese are reasonably good performing machines, but they are architecturally very simple (on par with accumulator architecture CPUs).00:18
kc5tjaThis will teach you about how state machines are used in CPUs to keep track of operating state, and how to decode instructions, etc.00:18
kc5tjaI would consider moving to RISC architectures only after you feel comfortable with stack CPUs.00:18
kc5tjaOP-IMM and JALR are implemented and fixed, respectively.00:21
kc5tja(for my own RISC-V CPU work)00:21
kc5tjaThinking of calling it an evening before I do any more damage.  :D00:21
promachstack-architecture ?00:24
promachkc5tja: like
kc5tjaI was going to confirm his question, but oh well.00:55
-!- Netsplit *.net <-> *.split quits: nurelin01:21
-!- Netsplit *.net <-> *.split quits: mithro01:21
-!- Netsplit *.net <-> *.split quits: mripard, olofk, poke5328101:21
-!- Netsplit over, joins: nurelin, poke5328101:28
bandvigolofk:  in fact the only register to control FPU and return its status is FPCSR which is a part of System Group #0.07:07
bandvigOR1K spec doesn’t define any other FPU related control/status registers.07:07
bandvigAs far as I understand, SPR bus FPU-related signals were crated just for general approach, i.e. in the same way as for MMUs and caches, for example.07:08
bandvigAs FPU itself doesn’t contain any SPR, and FPCSR access is implemented internally in CTRL, all FPU related SPR-bus like spr_bus_dat_fpu_i  or spr_bus_ack_fpu_i should be forced to zero.07:08
bandvigstekern: regarding CoreMark. In fact till yesterday I’ve used only CoreMark as you. I tried Dhry just because I saw several discussions around it here.07:22
bandvigSo, it was interesting for me for a short time. However, after ZipCPU’s explanation that Dhry usage methodic deprecates involving LTO (specially for ZipCPU: I clearly understand that this requirement is not yours) I agree with you that CoreMark is much more suitable.07:23
bandvigSo, I’m going to continue with CoreMark only.07:23
stekernbandvig: but what's suppose to go under OR1K_SPR_FPU_BASE then?07:39
stekernyou're saying that nothing is defined under that?07:39
ZipCPU|Laptopbandvig: I did run tests with and without LTO.  If you merge the two files, dhry_1 and dhry_2 into dhry.c, then LTO only buys you a couple of clocks outside the loop.  It doesn't significantly impact the score.07:48
ZipCPU|LaptopMerging the two files, dhry_1.c and dhry_2.c, into one file does significantly impact the score, however.07:48
ZipCPU|Laptopkc5tja: Congratulations on your kestrel success!  Sounds like it's now time to go looking for a next stage test case ... ;)07:51
olofk_bandvig: Ah ok. So then we just need a simple patch to force them to zero somewhere in the design07:54
olofk_But I can't understand why this hasn't been a problem before. For me it failed on the first mtspr (or mfspr)07:55
bandvigstekern: yes, it looks like clarification is needed.  The OR1K_SPR_FPU_BASE specifies FPU’s group in according with OR1K architectural manual.08:04
bandvigAt the same time there isn’t a register in the group (also in according with manual). The only FPU-related SPR is FPCSR from group #0 (system group).08:04
bandvigZipCPU: Do you use Dhry source with OR1K-related timer stuff from NewLIB?08:08
ZipCPU|Laptopbandvig: I've actually been using simulator counts.  I run Dhry twice at two different iteration numbers, take the difference in simulator counts, divide by two (the simulator counts include high and low clock cycles), and use that as my clock count.08:09
ZipCPU|LaptopSo, for example, last night I was running 120k iterations and 10k iterations.  Subtract the two, and then divide by 220k to get the number of clocks per iteration.08:10
ZipCPU|LaptopWhile I'd love to run or1k on hardware, I don't have de0, de1, de2, or atlys. ;)  So I've been using mor1kx-generic.08:11
ZipCPU|LaptopI've also made adjustments to mor1kx-generic so that the UART works--without pseudo simulation instructions.  I would like to push these upstream, but ... I'm not certain how to do so.08:12
stekernbandvig: ok, so they are purely for "custom spr" then. but we should still hook the spr bus up on fpu implementations that do not have any custom sprs08:26
stekern(and just set the ack to zero)08:27
Andy___did it finally happen? Did OpenCores died?08:29
Andy___] die08:29
wallentoolofk: I see, fixed and added to the PR08:41
wallentoAndy___: it seemes like, but they seem to recover again08:42
bandvigstekern: Hm… I think the more accurate handling accesses to not implemented SPR are: detect access request from SPR BUS, set data to zero (permanently), generate 1-clock ACK.08:43
bandvigIf ACK would be zero permanently than request to (say) empty FPU group would hang up CPU.08:43
bandvigBut the behavior should be implemented for all empty group or non-existing SPR. In next turn it would increase LUT consuming :).08:44
stekernthat's a good point08:44
olofk_wallento: You won't believe what I just did :)09:34
-!- olofk_ is now known as olofk09:34
wallentoolofk: are you anti-PRs? ;)09:37
wallentoall this cherry-picking :)09:38
olofkwallento: I don't think it makes sense to bring in half-finished commits in the tree. Just makes bisecting harder09:38
olofkAnd yes, I like my trees straight :)09:39
_franck__olofk: I think yo can now merge pull request without having an extra commit message09:47
_franck__something like merge and squash option in github09:48
_franck__(if this is what you're tlking about)09:48
wallentoYeah, I read it too before, but cannot find the option10:44
_franck__wallento: I did it once, there was a pull button and a pull & squash button IIRC10:47
wallento_frank__: thanks, I found it needs to be activated:
wallentoolofk, do it10:49 That could be handy. I usually do fetch and cherry-pick from FETCH_HEAD with a rebase to squash if needed13:34
olofkTwitter is going WILD over the new FuseSoC vivado backend :)13:35
olofkIt's almost like people are actually using FuseSoC :)13:35
imphilolofk, travis is also going wild about broken commits! :)13:56
stekernolofk: why did a lemon steal monade?14:09
olofkimphil: Travis is happy, but not appveyor. It's because pip is a shitty broken sorry excuse for a package manager14:28
olofkstekern: :)14:29
olofkIt's a bit ironic that FuseSoC is now better at handling dependencies than pip14:29
olofkThe thing is that one package require attrs >= 15.2.0, and another one requires attrs < 16.2.0. pip sees the first one and happily installs the latest version, totally ignoring the other request14:35
olofkFor Travis, I manually install 16.0.0 first to have a known good version. Just haven't figured out how to do that in a windows environment with appveyor yet14:35
kc5tjaZipCPU|Laptop: Thanks.  I'm planning on OP-REG instructions next, since I can re-use a lot of the logic I implemented for OP-IMM.17:38
kc5tjaZipCPU|Laptop: After that come the *BIG* whammies: loads and stores.17:38
ZipCPU|LaptopWhat simulator are you working with?  Can I offer any of my Verilator code to help?17:39
kc5tjaI'm not really sure what Verilator will buy me.  Maybe I can use it in a larger integration test.  I'd definitely like some help with that, since then I can feed it more comprehensive instruction set test programs.17:41
ZipCPU|LaptopAh ... please allow me to share ...17:42
kc5tjaThe current test benches are designed to be just enough to get a working machine, but I definitely have a lot of gaps in my test coverage.17:42
ZipCPU|LaptopI have, within Verilator, implemented an entire simulation of most of my projects.17:42
ZipCPU|LaptopThis includes simulated hardware.17:42
ZipCPU|LaptopThe simulation is so complete that my host s/w, which would communicate with the FPGA, can't tell the difference.17:42
ZipCPU|LaptopSure, it runs slower, but my goal is always that it runs in a cycle accurate fashion.17:43
kc5tjaAwesome.  I'm definitely interested.17:43
ZipCPU|LaptopHence, for the XuLA2-LX25 SoC I built, I can (using Verilator) run a Dhrystone test-bench and count clocks.17:43
ZipCPU|LaptopThose clocks will be (if I did everything right) identical to the clocks I'd have on the real hardware.17:43
ZipCPU|LaptopWhen someone tries to run a program on my hardware and tells me it doesn't work, I usually try to run the same program within Verilator to see if it runs or doesn't.17:44
ZipCPU|LaptopIf it fails in Verilator, I can debug it any way I choose--although I am rather partial to debug by printf.17:44
ZipCPU|LaptopTo do this, I need to create simulated versions of all of the hardware my FPGA must interact with.17:45
ZipCPU|LaptopSo ... I have cycle accurate simulated hardware for: flash, block RAM, SD-card (via SPI), UART (exports to TCP/IP), and so on.17:46
ZipCPU|LaptopI even built a VGA simulation once ... ;)17:46
kc5tjaSorry for lag on my part; at work, working from home, interruptions from managers and cats alike.17:53
kc5tjaReading backlog17:54
ZipCPU|LaptopNot a problem--my wife came in to talk to me, and the puppy wanted to play with her rope in the meantime here.17:55
kc5tjare: counting Dhrystones, that's sweet!  Ideally, I'd like to get a virtual Kestrel-3 that includes video output at some point, just so I can play around with it, even if it's ultra-slow.17:55
kc5tjaIt'd be a nice demo to have at the RISC-V workshop.17:55
kc5tjaOh, I see you did VGA simulation.  How quick was that from the user's perspective?17:58
ZipCPU|LaptopLet me check ...17:59
ZipCPU|LaptopSo, on every clock tick I called a vga simulation routine.  (Sort of the way Verilator works)18:00
ZipCPU|LaptopOn every 4th clock (assuming a 100MHz FPGA clock) a pixel would be output to the screen.18:01
ZipCPU|LaptopFrom the user perspective, though, on every clock tick the screen buffer was updated to the window18:01
ZipCPU|Laptopas necessary.18:01
ZipCPU|LaptopBasically the inner loop was a GTK++ class, that called its update routine as fast as possible.18:03
ZipCPU|LaptopSo if the windowing system needed to do updates, the FPGA simulator wouldn't get called, but otherwise the simulator would hog a full CPU.18:03
ZipCPU|LaptopI liked it because I could verify the VGA timings for a 640x480@60Hz screen, as well as making sure18:03
ZipCPU|Laptopthat my video compression and decompression worked.  (There wasn't enough RAM on that board18:04
ZipCPU|Laptopfor a proper offscreen buffer ...)18:04
kc5tjaOK, that must have been quite a slow user experience in the simulator.20:38
kc5tjaBut, then again, correctness over speed.   :)20:38
kc5tjaThat's just the thing I'd need to verify correct behavior of the CGIA when I get to that point.20:38
ZipCPU|LaptopKeep in mind, the user wasn't doing anything more than pointing at static slides.  While bad, it wasn't that bad.20:48
kc5tjaWell, I'm sure there were ways of speeding things up.  Not using G++ comes to mind; something like SDL might have given some higher throughput, especially with direct access to textures.  But, yeah, I doubt Verilator will compete any time soon with MAME.  :D22:22
--- Log closed Tue Sep 27 00:00:30 2016

Generated by 2.15.2 by Marius Gedminas - find it at!