--- Log opened Mon Sep 26 00:00:06 2016 | ||
--- Day changed Mon Sep 26 2016 | ||
promach | hi, I have some verilog and VHDL background but very very little CPU/processor design experience. May I know how would i approach openrisc or opensparc ? | 00:00 |
---|---|---|
Finde | https://www.coursera.org/learn/comparch | 00:00 |
Finde | :) | 00:00 |
promach | Finde: must I finish computer architecture course to learn openrisc mor1kx ? | 00:06 |
Finde | no no I'm just teasing | 00:06 |
promach | ? | 00:06 |
promach | to be honest, I am really a beginner | 00:06 |
Finde | unfortunately I'm yet to find a good intermediate online resource for what comes after digital logic design and before that computer architecture class | 00:07 |
promach | I see | 00:07 |
Finde | I'm sure some others would have good recommendations for you | 00:08 |
kc5tja | Not really. :) (At least not in my case.) | 00:17 |
kc5tja | I would strongly recommend learning how to build stack-architecture (Forth language) CPUs. | 00:17 |
kc5tja | These are reasonably good performing machines, but they are architecturally very simple (on par with accumulator architecture CPUs). | 00:18 |
kc5tja | This will teach you about how state machines are used in CPUs to keep track of operating state, and how to decode instructions, etc. | 00:18 |
kc5tja | I would consider moving to RISC architectures only after you feel comfortable with stack CPUs. | 00:18 |
kc5tja | OP-IMM and JALR are implemented and fixed, respectively. | 00:21 |
kc5tja | (for my own RISC-V CPU work) | 00:21 |
kc5tja | Thinking of calling it an evening before I do any more damage. :D | 00:21 |
promach | stack-architecture ? | 00:24 |
promach | kc5tja: like https://people.ece.cornell.edu/land/courses/ece5760/DE2/Stack_cpu.html | 00:24 |
promach | ? | 00:25 |
kc5tja | I was going to confirm his question, but oh well. | 00:55 |
-!- Netsplit *.net <-> *.split quits: nurelin | 01:21 | |
-!- Netsplit *.net <-> *.split quits: mithro | 01:21 | |
-!- Netsplit *.net <-> *.split quits: mripard, olofk, poke53281 | 01:21 | |
-!- Netsplit over, joins: nurelin, poke53281 | 01:28 | |
bandvig | olofk: in fact the only register to control FPU and return its status is FPCSR which is a part of System Group #0. | 07:07 |
bandvig | OR1K spec doesn’t define any other FPU related control/status registers. | 07:07 |
bandvig | As far as I understand, SPR bus FPU-related signals were crated just for general approach, i.e. in the same way as for MMUs and caches, for example. | 07:08 |
bandvig | As FPU itself doesn’t contain any SPR, and FPCSR access is implemented internally in CTRL, all FPU related SPR-bus like spr_bus_dat_fpu_i or spr_bus_ack_fpu_i should be forced to zero. | 07:08 |
bandvig | stekern: regarding CoreMark. In fact till yesterday I’ve used only CoreMark as you. I tried Dhry just because I saw several discussions around it here. | 07:22 |
bandvig | So, it was interesting for me for a short time. However, after ZipCPU’s explanation that Dhry usage methodic deprecates involving LTO (specially for ZipCPU: I clearly understand that this requirement is not yours) I agree with you that CoreMark is much more suitable. | 07:23 |
bandvig | So, I’m going to continue with CoreMark only. | 07:23 |
stekern | bandvig: but what's suppose to go under OR1K_SPR_FPU_BASE then? | 07:39 |
stekern | you're saying that nothing is defined under that? | 07:39 |
ZipCPU|Laptop | bandvig: I did run tests with and without LTO. If you merge the two files, dhry_1 and dhry_2 into dhry.c, then LTO only buys you a couple of clocks outside the loop. It doesn't significantly impact the score. | 07:48 |
ZipCPU|Laptop | Merging the two files, dhry_1.c and dhry_2.c, into one file does significantly impact the score, however. | 07:48 |
ZipCPU|Laptop | kc5tja: Congratulations on your kestrel success! Sounds like it's now time to go looking for a next stage test case ... ;) | 07:51 |
olofk_ | bandvig: Ah ok. So then we just need a simple patch to force them to zero somewhere in the design | 07:54 |
olofk_ | But I can't understand why this hasn't been a problem before. For me it failed on the first mtspr (or mfspr) | 07:55 |
bandvig | stekern: yes, it looks like clarification is needed. The OR1K_SPR_FPU_BASE specifies FPU’s group in according with OR1K architectural manual. | 08:04 |
bandvig | At the same time there isn’t a register in the group (also in according with manual). The only FPU-related SPR is FPCSR from group #0 (system group). | 08:04 |
bandvig | ZipCPU: Do you use Dhry source with OR1K-related timer stuff from NewLIB? | 08:08 |
ZipCPU|Laptop | bandvig: I've actually been using simulator counts. I run Dhry twice at two different iteration numbers, take the difference in simulator counts, divide by two (the simulator counts include high and low clock cycles), and use that as my clock count. | 08:09 |
ZipCPU|Laptop | So, for example, last night I was running 120k iterations and 10k iterations. Subtract the two, and then divide by 220k to get the number of clocks per iteration. | 08:10 |
ZipCPU|Laptop | While I'd love to run or1k on hardware, I don't have de0, de1, de2, or atlys. ;) So I've been using mor1kx-generic. | 08:11 |
ZipCPU|Laptop | I've also made adjustments to mor1kx-generic so that the UART works--without pseudo simulation instructions. I would like to push these upstream, but ... I'm not certain how to do so. | 08:12 |
stekern | bandvig: ok, so they are purely for "custom spr" then. but we should still hook the spr bus up on fpu implementations that do not have any custom sprs | 08:26 |
stekern | (and just set the ack to zero) | 08:27 |
Andy___ | did it finally happen? Did OpenCores died? | 08:29 |
Andy___ | ] die | 08:29 |
wallento | olofk: I see, fixed and added to the PR | 08:41 |
wallento | Andy___: it seemes like, but they seem to recover again | 08:42 |
bandvig | stekern: Hm… I think the more accurate handling accesses to not implemented SPR are: detect access request from SPR BUS, set data to zero (permanently), generate 1-clock ACK. | 08:43 |
bandvig | If ACK would be zero permanently than request to (say) empty FPU group would hang up CPU. | 08:43 |
bandvig | But the behavior should be implemented for all empty group or non-existing SPR. In next turn it would increase LUT consuming :). | 08:44 |
stekern | that's a good point | 08:44 |
olofk_ | wallento: You won't believe what I just did :) | 09:34 |
-!- olofk_ is now known as olofk | 09:34 | |
wallento | thanks | 09:37 |
wallento | olofk: are you anti-PRs? ;) | 09:37 |
wallento | all this cherry-picking :) | 09:38 |
olofk | wallento: I don't think it makes sense to bring in half-finished commits in the tree. Just makes bisecting harder | 09:38 |
olofk | And yes, I like my trees straight :) | 09:39 |
wallento | lol | 09:40 |
_franck__ | olofk: I think yo can now merge pull request without having an extra commit message | 09:47 |
_franck__ | something like merge and squash option in github | 09:48 |
_franck__ | (if this is what you're tlking about) | 09:48 |
wallento | Yeah, I read it too before, but cannot find the option | 10:44 |
_franck__ | wallento: I did it once, there was a pull button and a pull & squash button IIRC | 10:47 |
wallento | _frank__: thanks, I found it needs to be activated: https://github.com/blog/2141-squash-your-commits | 10:49 |
wallento | olofk, do it | 10:49 |
olofk | ahhh...cool. That could be handy. I usually do fetch and cherry-pick from FETCH_HEAD with a rebase to squash if needed | 13:34 |
olofk | Twitter is going WILD over the new FuseSoC vivado backend :) | 13:35 |
olofk | It's almost like people are actually using FuseSoC :) | 13:35 |
imphil | olofk, travis is also going wild about broken commits! :) | 13:56 |
stekern | olofk: why did a lemon steal monade? | 14:09 |
olofk | imphil: Travis is happy, but not appveyor. It's because pip is a shitty broken sorry excuse for a package manager | 14:28 |
olofk | stekern: :) | 14:29 |
olofk | It's a bit ironic that FuseSoC is now better at handling dependencies than pip | 14:29 |
olofk | The thing is that one package require attrs >= 15.2.0, and another one requires attrs < 16.2.0. pip sees the first one and happily installs the latest version, totally ignoring the other request | 14:35 |
olofk | For Travis, I manually install 16.0.0 first to have a known good version. Just haven't figured out how to do that in a windows environment with appveyor yet | 14:35 |
kc5tja | ZipCPU|Laptop: Thanks. I'm planning on OP-REG instructions next, since I can re-use a lot of the logic I implemented for OP-IMM. | 17:38 |
kc5tja | ZipCPU|Laptop: After that come the *BIG* whammies: loads and stores. | 17:38 |
ZipCPU|Laptop | What simulator are you working with? Can I offer any of my Verilator code to help? | 17:39 |
kc5tja | iverilog. | 17:40 |
ZipCPU|Laptop | Shucks. | 17:41 |
kc5tja | I'm not really sure what Verilator will buy me. Maybe I can use it in a larger integration test. I'd definitely like some help with that, since then I can feed it more comprehensive instruction set test programs. | 17:41 |
ZipCPU|Laptop | Ah ... please allow me to share ... | 17:42 |
kc5tja | The current test benches are designed to be just enough to get a working machine, but I definitely have a lot of gaps in my test coverage. | 17:42 |
ZipCPU|Laptop | I have, within Verilator, implemented an entire simulation of most of my projects. | 17:42 |
ZipCPU|Laptop | This includes simulated hardware. | 17:42 |
ZipCPU|Laptop | The simulation is so complete that my host s/w, which would communicate with the FPGA, can't tell the difference. | 17:42 |
ZipCPU|Laptop | Sure, it runs slower, but my goal is always that it runs in a cycle accurate fashion. | 17:43 |
kc5tja | Awesome. I'm definitely interested. | 17:43 |
ZipCPU|Laptop | Hence, for the XuLA2-LX25 SoC I built, I can (using Verilator) run a Dhrystone test-bench and count clocks. | 17:43 |
ZipCPU|Laptop | Those clocks will be (if I did everything right) identical to the clocks I'd have on the real hardware. | 17:43 |
ZipCPU|Laptop | When someone tries to run a program on my hardware and tells me it doesn't work, I usually try to run the same program within Verilator to see if it runs or doesn't. | 17:44 |
ZipCPU|Laptop | If it fails in Verilator, I can debug it any way I choose--although I am rather partial to debug by printf. | 17:44 |
ZipCPU|Laptop | To do this, I need to create simulated versions of all of the hardware my FPGA must interact with. | 17:45 |
ZipCPU|Laptop | So ... I have cycle accurate simulated hardware for: flash, block RAM, SD-card (via SPI), UART (exports to TCP/IP), and so on. | 17:46 |
ZipCPU|Laptop | I even built a VGA simulation once ... ;) | 17:46 |
kc5tja | Sorry for lag on my part; at work, working from home, interruptions from managers and cats alike. | 17:53 |
kc5tja | Reading backlog | 17:54 |
ZipCPU|Laptop | Not a problem--my wife came in to talk to me, and the puppy wanted to play with her rope in the meantime here. | 17:55 |
kc5tja | re: counting Dhrystones, that's sweet! Ideally, I'd like to get a virtual Kestrel-3 that includes video output at some point, just so I can play around with it, even if it's ultra-slow. | 17:55 |
kc5tja | It'd be a nice demo to have at the RISC-V workshop. | 17:55 |
kc5tja | Oh, I see you did VGA simulation. How quick was that from the user's perspective? | 17:58 |
ZipCPU|Laptop | Let me check ... | 17:59 |
ZipCPU|Laptop | So, on every clock tick I called a vga simulation routine. (Sort of the way Verilator works) | 18:00 |
ZipCPU|Laptop | On every 4th clock (assuming a 100MHz FPGA clock) a pixel would be output to the screen. | 18:01 |
ZipCPU|Laptop | From the user perspective, though, on every clock tick the screen buffer was updated to the window | 18:01 |
ZipCPU|Laptop | as necessary. | 18:01 |
ZipCPU|Laptop | Basically the inner loop was a GTK++ class, that called its update routine as fast as possible. | 18:03 |
ZipCPU|Laptop | So if the windowing system needed to do updates, the FPGA simulator wouldn't get called, but otherwise the simulator would hog a full CPU. | 18:03 |
ZipCPU|Laptop | I liked it because I could verify the VGA timings for a 640x480@60Hz screen, as well as making sure | 18:03 |
ZipCPU|Laptop | that my video compression and decompression worked. (There wasn't enough RAM on that board | 18:04 |
ZipCPU|Laptop | for a proper offscreen buffer ...) | 18:04 |
kc5tja | OK, that must have been quite a slow user experience in the simulator. | 20:38 |
kc5tja | But, then again, correctness over speed. :) | 20:38 |
kc5tja | That's just the thing I'd need to verify correct behavior of the CGIA when I get to that point. | 20:38 |
ZipCPU|Laptop | Keep in mind, the user wasn't doing anything more than pointing at static slides. While bad, it wasn't that bad. | 20:48 |
kc5tja | Well, I'm sure there were ways of speeding things up. Not using G++ comes to mind; something like SDL might have given some higher throughput, especially with direct access to textures. But, yeah, I doubt Verilator will compete any time soon with MAME. :D | 22:22 |
--- Log closed Tue Sep 27 00:00:30 2016 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!