--- Log opened Fri Aug 19 00:00:32 2016 | ||
kc5tja | ZipCPU: Take pictures and upload them to Github? | 00:36 |
---|---|---|
kc5tja | Initial and thoroughly untested rewrite of the instruction decode logic for SMG takes only 59 lines of code. | 01:29 |
kc5tja | hand-crafted Verilog takes 793 lines of code. | 01:30 |
kc5tja | Verilog output from SMG takes up 377 LOC. | 01:32 |
ZipCPU | kc5tja: Yes, but ... how readable is that code? Would you, by reading it, know what was going on? | 07:19 |
-!- [R2]asahsieh is now known as asahsieh | 11:01 | |
kc5tja | No, because it's impossible to name all the intermediate products. | 12:23 |
kc5tja | You do see inputs and outputs in human readable form though, obviously. | 12:23 |
kc5tja | But, that's not the point. | 12:23 |
kc5tja | You're supposed to use the state table as your "source code." | 12:23 |
kc5tja | The Verilog is just intermediate representation describing the intended circuit. | 12:24 |
kc5tja | In fact, the quantity of "nameless" interior products is a good indicator why I kept failing to complete my previous designs; the complexity is substantial. | 12:25 |
kc5tja | 59 to 377 LOC for a *stack* CPU; I'd hate to see what it'll be for a RISC-V. | 12:26 |
ZipCPU | Oh a lighter note, my (new) wishbone controlled DDR3 SDRAM controller has now made it through 2M clocks of testing with no failures, and its still going! | 12:30 |
ZipCPU | (That's simulated testing ...) | 12:31 |
ZipCPU | The extended, non-pipelind, write test has passed simulation ... | 12:39 |
ZipCPU | The end result of all the clocking mess is that I can now read/write 64-bits to the core at a time. I've even implemented the wishbone select bits--a rather new practice for me ... | 12:44 |
ZipCPU | FPGA clock speed:200MHz. Memory clock speed: 400 MHz. (I could've gone up to 600, but didn't like dividing by threes ...) | 12:46 |
mafm | threes are evil, definitely | 12:53 |
ZipCPU | I mean ... the memory would always transfer four memory clocks worth of data, but I would be parallelizing that at 3 at a time. Yuck. | 12:54 |
ZipCPU | Okay -- first test failure on pipelined writes: it's taking 4 clocks to write 2 words. Still: it passed the non-pipelined read and write tests, so ... things are moving forward! | 12:58 |
mafm | never mind me, just kidding :) | 13:02 |
mafm | are you designing a new CPU, not risc-v, not openrisc, not anything else? | 13:02 |
ZipCPU | Yes. Although ... the peripherals (such as the memory controller I am working on) should work on (roughly) any wishbone compliant bus. | 13:14 |
ZipCPU | mafm: Gotta run. If you'd like to chat or learn about what I'm up to, I should be back in about an hour or so. | 13:15 |
mafm | ZipCPU: will do, thanks... nothing serious but I am curious about a couple of things | 13:16 |
kc5tja | ZipCPU: YAY! (on SDRAM core) | 13:46 |
ZipCPU | mafm: I'm back, and I'd be happy to answer any questions you might have. | 14:08 |
mafm | ZipCPU: just chit-chat :P | 14:08 |
ZipCPU | Sure--that too. | 14:08 |
mafm | I'm curious if you're using a new design because of learning, or for some other reason? | 14:08 |
ZipCPU | (These simulations take a while to run ...) | 14:08 |
ZipCPU | Yes. | 14:09 |
ZipCPU | Okay, perhaps that's too simple, but the answer is "yes" to both questions. | 14:09 |
ZipCPU | It's the first CPU I've put together, and so its helping to cement a lot of the Computer Architecture lessons I learned so many years ago. | 14:09 |
ZipCPU | I'm also trying to build something that is "simpler" than the other options out there, although the measure of what constitutes "simpler" has eluded me. :( | 14:10 |
ZipCPU | The 200+ instructions in the OpenRISC ISA were part of my initial motivation. | 14:10 |
mafm | so you don't plan to create an ecosystem around it (compilers etc) and try to market it and so on, I suppose? | 14:11 |
SMDhome1 | ZipCPU, 200+ instructions for openrisc include all the extentions and both 32/64 modes? | 14:12 |
ZipCPU | Well ... I have managed to build GCC/binutils back ends. I haven't yet gotten newlib working. | 14:12 |
mafm | since I see people complaining here about the complexity of RISC-V (mostly kc5tja and you in the last few days), I was wondering if you had some plans to create another "openrisc" | 14:12 |
ZipCPU | SMDhome1: 200+ instructions is what you get when you count every instruction in the ISA. | 14:12 |
mafm | you or anybody else | 14:13 |
ZipCPU | mafm: Here's my problem: What "measure" can I use to determine if I've met my goals? kc5tja's complaining makes a nice anecdote, but it's not really a valid "measure". | 14:13 |
ZipCPU | That's why I've been so interested in kc5tja's comments. | 14:13 |
mafm | it seems that RISC-V is completely locked down from the PoV of contributing to the design, so I wouldn't be amazed that new "openriscs" start to emerge | 14:13 |
ZipCPU | SMDhome1: Counting all those instructions does include all 32/64 bit modes, vector operations, and more. But when I first started, I didn't realize you could get flavours of the CPU that didn't support all 200+ instructions. | 14:14 |
ZipCPU | mafm: Is RISC-V finally locked down? Every ISA document I've looked at seems to still have gigantic holes in it. | 14:14 |
ZipCPU | One feeling I get from both RISC-V and OpenRISC is that, whether true or not, I get the feeling that they were designed by committee. | 14:15 |
ZipCPU | You know, the difference between the horse and the camel: the horse was designed by committee? | 14:15 |
SMDhome1 | ZipCPU Ok, I'm looking at the ISA doc now and I guess you're right: counting all vector/dsp/float extentions could make it 200+ instructions. I've looked only at the base set. | 14:15 |
mafm | I mean "locked down" in the sense that if you try to approach them and try to contribute, you will probably be ignored (unless you're somehow admitted to the Inner Ring) | 14:16 |
ZipCPU | (Sorry, the camel was designed by committee ... you'd think I'd get my jokes straight.) | 14:16 |
mafm | so for anybody wanting to at least have the possibility to participate in the design, I suppose that one has to go its own way | 14:16 |
mafm | (I'm a latecomer to openrisc, so not sure about the history there) | 14:17 |
ZipCPU | mafm: Understand, though, the ZipCPU has separate goals from OpenRISC and RISC-V. | 14:17 |
ZipCPU | ZipCPU is not designed to be a one-size does all replacement processor for the greatest processors out there. | 14:17 |
SMDhome1 | And as an offtopic theme: I just wonder how the ISAs are made? I mean you have to do a lot of research and profiling before, don't you? | 14:17 |
ZipCPU | SMDhome1: Perhaps that makes the most sense. That's not how I started, however. | 14:18 |
ZipCPU | The instruction set I started with was designed initially by the bus definition and the ease of decoding an instruction. | 14:18 |
ZipCPU | As I went along, I discovered the instruction set was inadequate for all I wanted to do, and so I switched from 4-bit opcodes to a 5-bit opcode. | 14:20 |
SMDhome1 | ZipCPU, I don't know for sure but is there such big deal for insn decoding? Could you estimate difference between decoding openrisc and zipcpu instruction and how it affects design? | 14:20 |
ZipCPU | That is, 5 bits of every 32-bit instruction, determine what that instruction does. | 14:20 |
mafm | ZipCPU: well, I was wondering if you had plans to create another iteration of openrisc, whether with ZipCPU or with a later evolution | 14:20 |
ZipCPU | SMDhome1: Good question. Sadly, I haven't dived into either OpenRISC or RISC-V implementations. My problem is, I'd like to make comparisons between them and the ZipCPU, and without having dived into them ... I'm not able to truly make the comparisons I'd like. | 14:21 |
mafm | (I didn't know that openrisc had 64-bit mode, btw) | 14:21 |
ZipCPU | Thus, while I'd like to declare that the ZipCPU is simpler, I have no objective measure at this point whereby to declare that's true. | 14:21 |
ZipCPU | The closest thing I have is LUT count ... | 14:21 |
SMDhome1 | ZipCPU yep, let's refer to LUT count | 14:22 |
ZipCPU | mafm: The ZipCPU has one drawback which will prevent it from ever replacing either OpenRISC or RISC-V without a major redesign: | 14:22 |
ZipCPU | The ZipCPU does not support byte or half-word instructions. All operations, and indeed the "byte size" of the ZipCPU are 32-bits. | 14:22 |
SMDhome1 | ZipCPU I mean is it really matters if decoder part takes i.e. 200 vs 300 LUTs? | 14:23 |
ZipCPU | SMDhome1: I think what matters more, as far as I'm concerned, is total LUT count--decoder included. | 14:23 |
SMDhome1 | mafm: I guess bandvig(I could misspell his nickname) has his own performance branch of openrisc and he's heading to OoO design w/ Tomasulo's algo | 14:24 |
ZipCPU | I've made a series of LUT measures that I intend to present at ORCONF in October. Sadly, they are very one sided: here' | 14:24 |
ZipCPU | here's the LUT counts for severl implementations of the ZipCPU--I just don't have anything valid to compare them with. | 14:24 |
SMDhome1 | ZipCPU you want to fit whole SoC(cpu core + mem controller + some periph) into average fpga, right? | 14:25 |
ZipCPU | For example, I can fit both CPU, peripherals, and O/S within a Spartan 6 LX4, such as Digilent's CMod-S6. | 14:26 |
ZipCPU | The CMod-S6 has no external memory, so the whole thing needed to fit in either block RAM or flash. | 14:26 |
ZipCPU | The example, though, has its problems. While I can fit, I had to turn off caches, pipelining, divides, and so forth in order to do so. | 14:26 |
SMDhome1 | Why don't you just take a bigger fpga than? | 14:27 |
ZipCPU | 'Cause I wouldn't get bragging rights for fitting in the smallest FPGA. | 14:27 |
ZipCPU | ;) | 14:27 |
ZipCPU | I am working with larger FPGA's, though, such as the Artix-7 35T (Basys-3, Arty) or the Spartan 6, LX25 (XuLA2-LX25). | 14:28 |
mafm | ZipCPU: I see. since you were porting GCC and binutils and so on, I was wondering if you had plans to bring it to the market or something :) | 14:29 |
ZipCPU | My premise is this: if you just wanted a fast CPU, you would've bought one. You instead purchased an FPGA for a reason. Why was it? | 14:29 |
SMDhome1 | ZipCPU I thought Spartan6 is rather outdated, isn't it? | 14:29 |
ZipCPU | Yeah, I think the Spartan 6 is a touch outdated. I wouldn't start any new projects on it today. Still, I get to keep the bragging rights on the LX4 comment. | 14:29 |
ZipCPU | mafm: The entire ZipCPU project, tool chain and all, is released under the GPL. | 14:30 |
ZipCPU | mafm: Part of my purpose is that I may now use it as a resume of sorts. Thus, even if you don't want the CPU, I can still sell the services of someone who has built such a CPU. | 14:31 |
kc5tja | RISC-V user instruction set is locked down. Future revisions are bug-fixes and "filling in the holes" only type documents. | 14:31 |
kc5tja | Privileged-mode specifications are NOT yet locked down, but close (they're at v1.9). | 14:31 |
SMDhome1 | ZipCPU, since your cpu has only load/store word, how do you read/write a byte? load - modify - store? | 14:31 |
ZipCPU | mafm: SMDhome1: Will I see either of you and ORCONF? I intend to give a full presentation on my reasoning there ... | 14:32 |
ZipCPU | SMDhome1: The easy answer is that "byte"s have 32-bits. | 14:32 |
mafm | kc5tja: didn't 2.1 change the function calling convention? it doesn't seem very bugfix-only to me | 14:33 |
mafm | :D | 14:33 |
SMDhome1 | ZipCPU I'm going to be there | 14:33 |
kc5tja | mafm: The calling convention is not relevant to the instruction set. | 14:33 |
kc5tja | mafm: That change just brought the specs in-line with what Linux compilers were already doing. | 14:33 |
SMDhome1 | kc5tja: are you going to be at ORCONF too? | 14:34 |
kc5tja | mafm: But even so, let's say tomorrow I decide that X31 is the new return address register for some reason, that has no bearing on how hardware implementors create instruction decoders for the architecture. | 14:34 |
kc5tja | SMDhome1: I can't afford to fly overseas, sorry. | 14:34 |
mafm | ZipCPU: sadly probably not, will be starting a new job in a new place and lots of things to sort out in the next months | 14:34 |
kc5tja | Plus, I can no longer find my passport, which is somewhat troubling. | 14:34 |
SMDhome1 | kc5tja: oh, I hope to meet you somehow on my next trip to CA then | 14:34 |
ZipCPU | So, one of the things I'll be presenting is a slide showing LUT counts and Dhrystone measures versus capability. | 14:35 |
kc5tja | SMDhome1: Any idea when that will be? I was originally planning ot meet with olofk in October, but last I heard, he won't have that option this time. | 14:35 |
ZipCPU | mafm: I'll have to answer all of your questions in this forum then. ;) | 14:35 |
kc5tja | SMDhome1: Also, I'm located in the San Francisco Bay Area/Silicon Valley region. | 14:36 |
SMDhome1 | kc5tja: That depends on my bosses and not me. I was going to spend this summer in Sanca-Clara but plans have changed. Let's hope I could come at winter: I bet the weather in CA is even better in December ;) | 14:37 |
kc5tja | That depends on where you are. In Los Angeles and San Diego, yes. | 14:38 |
kc5tja | In Bay Area, it's not always the most pleasant; overcast, rainy at times, surprisingly cold if you happen to be near the water, etc. Certainly better than any winter in New York state though, so if your native climate is similar, you'll find even Bay Area's winters warm and cozy in comparison. | 14:39 |
kc5tja | But as someone who exhibits symptoms of SADD, even Bay Area winters can be a real downer for me. | 14:39 |
mafm | kc5tja: true, not the instructions, but I meant that they were not only editorial changes... and I wouldn't be surprised if they come-up with spec 3.x at some point | 14:39 |
kc5tja | mafm: It actually was an editorial change. GCC calling conventions actually had changed about a year before the standard was updated. | 14:41 |
SMDhome1 | kc5tja: Since the temperature is somewhere about 0 celsius, it's enough for me | 14:41 |
kc5tja | Oh, heavens, yes, you'll be walking around in a tank-top and flipflops with a drink and a little umbrella in it here. ;) | 14:41 |
kc5tja | even in the dead of winter. | 14:41 |
kc5tja | 0C is about what winters in NY state are like. | 14:42 |
kc5tja | Often colder, but not by much. | 14:42 |
ZipCPU | So you'll be "dreaming of a white christmas" since you never get one, then? ;) | 14:42 |
kc5tja | Mostly blustery snow and wind. | 14:42 |
mafm | kc5tja: ah, interesting, I didn't know that | 14:42 |
kc5tja | ZipCPU: I want to be as far away from a white christmas as I can get. | 14:43 |
ZipCPU | I'm a Minnesota type. There's just something wrong with green grass in winter time. | 14:43 |
SMDhome1 | kc5tja: I'm from Moscow, we could have i.e. -20 and it's sort of normal here | 14:43 |
kc5tja | ZipCPU: When I moved from NY to CA, it was in November, people outside of LAX were wearing Uggs and fur coats because it was "cold outside." I was in a t-shirt and beaming with how warm it was. | 14:43 |
kc5tja | SMDhome1: That degree of chill is rare (except for wind-chill) in NY, but I've felt it. I had to attend an outdoor funeral in -28C, but those are quite rare. | 14:44 |
mafm | kc5tja: my binaries compiled with recent compiler versions won't work in older system images though, I attributed that to the changes in calling conventions and so on | 14:45 |
kc5tja | SMDhome1: It is fun in one respect: throwing boiling water into the sky and watching it come down as ice particles as fine as some snow. :) | 14:45 |
SMDhome1 | Btw, is there any ready dev kits with openrisc/risc-v thing? I'm too lazy to port it on my board. | 14:46 |
kc5tja | SMDhome1: picorv32 is a 32-bit RISC-V processor core written in Verilog that looks easy to synthesize. Not sure how easy it is to get RISC-V versions of GCC working with it though. | 14:47 |
kc5tja | Only thing is, this CPU was designed before the new privilege spec was even released to the public, so it's not standards compliant enough to run Linux or the like. It's intended for embedded use. | 14:48 |
kc5tja | Places like lowRISC, PULP, and Sifive have real silicon in hand, but they're still testing and is not up for sale yet. | 14:49 |
kc5tja | Considering it takes about 6 months for a semiconductor wafer run, this is probably the point where we wait the longest. | 14:49 |
SMDhome1 | kc5tja: I don't know much about risc-v right now. As far as I know there are some specs written by guys from Berkly and others. And lots of people do their own implementation of risc-v isa. | 14:49 |
kc5tja | To be fair, I am too, in part to accomplish something I can be proud of, and also in part, because no silicon exists yet. :) | 14:50 |
kc5tja | commercially, that is. | 14:50 |
SMDhome1 | kc5tja: I've seen presentation from HotChips'15 about RISC-V. It was quite amaizing, especially for OoO + superscalar version which could compete w/ ARM cpus. | 14:52 |
SMDhome1 | kc5tja: so you're implementing your own risc-v core, is that correct? | 14:52 |
kc5tja | Trying to. Not succeeding very well just yet, but hopefully I'll catch a break on my latest attempt. | 14:54 |
SMDhome1 | kc5tja: may I ask you why are you doing this? I mean there is(or should be) an opensource design which could be taken as reference so you could modify it in any way. Or you want to make smth specific? I.e. ultra low power cpu for embedded. | 14:56 |
kc5tja | brb, then will answer. | 14:58 |
SMDhome1 | thanks | 14:58 |
kc5tja | SMDhome1: My original plan was to extend the PicoRV32 CPU to 64-bits. | 15:04 |
kc5tja | But the amount of rework involved was rather substantial; it was almost an entire redesign. | 15:04 |
SMDhome1 | kc5tja: does RISC-V 32bit isa differ that much to 64 bit one? | 15:05 |
kc5tja | Not only would I need to widen all the hardwired 32-bit data paths to 64-bits, I would have needed to implement the machine-mode features that were missing too. | 15:05 |
kc5tja | SMDhome1: They're quite similar; but that's only the programmer-facing aspect. | 15:06 |
kc5tja | The Verilog code backing that needs adjustment, and it's not as simple as tweaking a single define. | 15:06 |
kc5tja | The other thing that prevents me from using PicoRV32 is its front-side bus. | 15:06 |
kc5tja | I need a 16-bit Wishbone interface to talk to external RAM. | 15:07 |
SMDhome1 | kc5tja: Ok, I think I should look throught risc-v isa | 15:07 |
ZipCPU | kc5tja: But ... you have a hardware specific reason for needing a 16-bit Wishbone, right? | 15:07 |
ZipCPU | It's not something having to do with RISC-V, per se, but rather your chosen implementation? | 15:08 |
kc5tja | ZipCPU: That's all I could fit on a 96-pin DIN 61412 connector. | 15:08 |
kc5tja | Also, my circuitry is built with asynchronous SRAM chips, which typically expose a 16-bit interface. | 15:08 |
kc5tja | Otherwise, no other reason. | 15:08 |
kc5tja | If I could go with a 64-bit bus, I would have done so; much easier and would have simplified pretty much everything. | 15:09 |
ZipCPU | My point is, the RISC-V doesn't require such an interface to the external world, so SMDhome1 won't find those details in the ISA. | 15:09 |
kc5tja | Correct; the RISC-V cores that Sifive are using typically have 256-bit paths to cache memory, for example. | 15:09 |
kc5tja | (assuming they haven't tweaked the Rocket cores) | 15:10 |
ZipCPU | Wow: 256-bit paths?! Sheesh. And here I get excited about my single 32-bit path. | 15:11 |
SMDhome1 | kc5tja: but you could implement 256 to 16bit bus converter, right? | 15:11 |
kc5tja | SMDhome1: I don't use cache, so I don't need a 256-bit interconnect. I'm building my core matched precisely to the 16-bit bus width. | 15:12 |
kc5tja | I could use a 64-bit (or 128-bit, or whatever) to 16-bit bridge, yes; however, it's extra logic, another state-machine to implement, and takes up resources. | 15:13 |
kc5tja | Instead, I'm building my core's instruction decode state logic explicitly with the 16-bit bus in mind. | 15:13 |
kc5tja | One plan I had for the S64X7 (my 64-bit Forth CPU concept) was to use such a bridge, but that was before I saw how much space the CPU took up. I don't think I can fit both the CPU and bridge on the same iCE40-HX8K part. | 15:14 |
kc5tja | Not, at least, as it's currently written. | 15:15 |
SMDhome1 | iCE40-HX8K is sort of opensource fpga platform w/ opensource tools and etc? | 15:15 |
kc5tja | Yes. | 15:15 |
kc5tja | It's what the Yosys and Project Icestorm toolchain targets. | 15:16 |
ZipCPU | Does the iCE40 have any on-chip block RAM? | 15:18 |
kc5tja | Yes, and let me tell you how much I #*@#%ing hate it. | 15:19 |
kc5tja | I don't mind synchronous writes to RAM, but synchronous reads is utterly inexcusable. | 15:19 |
ZipCPU | Really? I've been loving on-chip block RAM. | 15:19 |
kc5tja | 50% *easily* of the hardships I've had to endure on my CPU designs are directly attributable to that misfeature. | 15:19 |
ZipCPU | What's difficult about synchronous reads from on-chip block RAM? | 15:20 |
kc5tja | Block RAMs are great, when they have asynchronous read ports. | 15:20 |
kc5tja | You are forced to literally pipeline E V E R Y T H I N G, even stuff that doesn't/shouldn't. | 15:20 |
ZipCPU | So ... is this a "feature" of the iCE chips? | 15:21 |
kc5tja | It's very much a mis-feature. | 15:21 |
kc5tja | For use as a register file, for instance, I'm forced to insert an extra cycle in the execution pipeline. | 15:21 |
ZipCPU | Or ... is it a feature of Yosys? | 15:21 |
kc5tja | For the stack CPU, I cannot just have a single "fetch" instruction, I have to split it into "here's the address" and "now fetch" pair of instructions. | 15:21 |
kc5tja | Yosys doesn't care; it's just a synthesis tool. No, this is a severe misfeature of iCE40s. | 15:22 |
SMDhome1 | stupid question: can I rely on bitstrim toolchain(it's hard for me to spell synthesizer) to place block-RAM if there any for reg bla[][] or should I instanciate it manually? | 15:22 |
ZipCPU | But ... given how simple a stack CPU is, woudn't you want to pipeline it? | 15:22 |
kc5tja | Xilinx gets this VERY right. Synchronous writes, async reads. It's bliss. | 15:22 |
kc5tja | No. That just adds latency. | 15:23 |
kc5tja | Now every block RAM access is two cycles instead of one. | 15:23 |
kc5tja | The top of stack feeds into the ALU, and the ALU feeds _directly_ into the top of stack register again. | 15:23 |
kc5tja | There's literally zero pipelining in a stack CPU. | 15:23 |
ZipCPU | Okay ... but because it is a stack CPU, you find yourself *heavily* using the block RAM then? | 15:24 |
kc5tja | No, my stack CPU assumes external memory (asynchronous RAM). | 15:24 |
ZipCPU | Is the stack in external memory? | 15:25 |
kc5tja | If I did build it around block RAMs, I'm forced to make every instruction take two cycles to run. | 15:25 |
kc5tja | Everything. | 15:25 |
kc5tja | It has zero internal memory except for the stacks, which are implemented via Verilog "reg" statements. | 15:25 |
kc5tja | There's just no way I could reconcile single-cycle instruction execution with iCE40's block RAMs. Just can't do it. | 15:26 |
kc5tja | And that's the thing | 15:26 |
kc5tja | iCE40's block RAMs are built for FIFOs and queues, not for general purpose storage. | 15:26 |
ZipCPU | So ... your IPC is 1/2, or was it 1/4? | 15:26 |
kc5tja | The S64X7 has IPC 1/1, but Polaris (RV64) had 1/4. | 15:27 |
ZipCPU | But ... doesn't a stack machine need to both read and write during each instruction? | 15:39 |
ZipCPU | Read operand one from stack, read operand two from stack, do operation, write result back to stack ...? | 15:39 |
kc5tja | Not if your stacks are on-chip. | 15:52 |
kc5tja | The stack elements are just registers. | 15:52 |
ZipCPU | But I thought you weren't using block RAM ...? | 15:52 |
kc5tja | Register != block RAM. | 15:52 |
kc5tja | Like I said, I'm just using reg [63:0] x, y, z; in Verilog to represent the top-most 3 stack items. | 15:53 |
ZipCPU | Okay. | 15:53 |
kc5tja | Z is wired directly to the ALU, and ALU back into Z. | 15:53 |
kc5tja | X and Y are wired to each other (and to Z), as well as ancilliary ports, like the data bus and such. | 15:53 |
ZipCPU | Is that all the stack you have, those three registers? | 15:53 |
kc5tja | I have a total of 7 data stack registers, and 5 return-stack registers. | 15:54 |
kc5tja | Although, 3 data registers is the absolute minimum you can get by with. (My S16X4 only had 3 to start with.) | 15:54 |
kc5tja | I've considered the possibility of removing the deeper stack elements as well, but if you're going to support interrupts at some point, you definitely want to have at least 6 registers for your data stack. | 15:55 |
kc5tja | (3 for user-mode programs, plus another 3 for interrupt handler use, possibly to store the user-state to memory.) | 15:55 |
kc5tja | Return stacks aren't strictly necessary, but are really handy to have, so I'd recommend maybe 3-deep R-stack as well. | 15:56 |
kc5tja | They're mostly used to preserve Forth runtime semantics in the face of colon definitions and >R and R> (Forth) words. | 15:56 |
ZipCPU | mafm: That isn't to say that I haven't thought about building my own RISC CPU ..... :) | 16:52 |
mafm | ZipCPU: yeah, that was the other part, I was wondering if it would be easier to start from RISC-V (I assume that it's what you mean) or OpenRISC as base | 16:57 |
mafm | because of the complications that people comment here with decoding, etc | 16:57 |
mafm | the immediates... | 16:57 |
ZipCPU | Personally? I'd start from scratch because I can. ;) No offense to either team. | 16:58 |
ZipCPU | If you'll recall, one of my goals has always been to learn. I'd get more starting from scratch than I would starting from either base. | 16:58 |
ZipCPU | If I did such a project, my goals would expand to include: out of order execution, and full byte/half-word/word support. | 16:58 |
ZipCPU | Oh, and I'd want multiple issue as well. | 16:59 |
ZipCPU | That would be a fun project. :) | 16:59 |
ZipCPU | One of the fun and unique things about working independently is that you can pivot on a dime. | 17:02 |
ZipCPU | As it is, I've already made a couple ISA changes since the ISA first came out. I didn't have to discuss it with a committee or working group, I just ... had a problem, a need, and then a fix. It's fun being agile. | 17:03 |
kc5tja | mafm: Yes, start from scratch. RISC-V is not intended to be the simplest possible CPU architecture; only an open source, industry-grade CPU that supports a wide variety of microarchitectures. | 17:04 |
kc5tja | O.O | 17:04 |
kc5tja | DUUHHH.... | 17:04 |
kc5tja | WHY | 17:04 |
kc5tja | DID I NOT THINK OF THIS SOONER?! | 17:04 |
ZipCPU | What, are we handing out "olympic" scores to CPU ISA's? | 17:04 |
ZipCPU | :) | 17:05 |
kc5tja | No, the ISA is OK (\o/ 8.7); but it's not designed for first-time implementation. | 17:05 |
ZipCPU | I've learned another thing along this road, that isn't necessarily apparent from either OpenRISC or RISC-V ISA: | 17:06 |
ZipCPU | Different people have different needs. As a result, there are multiple builds of each CPU, adding or removing various features. | 17:06 |
ZipCPU | When I'm trying to fit in a tighter area, I add in as many features as I can, but that's it. | 17:06 |
kc5tja | Epiphany was that I should (instead of implementing a stack CPU) implement a very minimal RISC CPU, ignoring all the oddities of RISC-V completely, and then 'evolve' the design as close to RISC-V as possible (if time permits, even completing the implementation) before the fifth RISC-V workshop. | 17:07 |
kc5tja | The only problem is that I'd have to tweak my software development tooling with each architectural revision. | 17:07 |
kc5tja | Will need to think of this further when I have more time. | 17:07 |
ZipCPU | (CRUD! The DDR3 controller passed the single read/write test, the pipelined read/write test, but failed on the random access test.) | 17:07 |
ZipCPU | kc5tja: I've ended up with multiple `defines and parameter IMPLEMENT_x's throughout my design, allowing it to be expanded or | 17:08 |
ZipCPU | shrunk as necessary. | 17:08 |
ZipCPU | In many ways, I start with a working CPU, implement a new feature, make that new feature "optional" and go on. | 17:09 |
ZipCPU | For example, one feature I'd like to implement would be fast subroutine returns. Right now, a subroutine return is a jump to register value, and it suffers a full pipeline stall. | 17:09 |
kc5tja | Well, what I mean is, for example, how to handle traps of various kinds, the more bizarre architectural features (like wonky immediate encodings), whether or not CPU traps on certain illegal instruction patterns, etc. | 17:09 |
ZipCPU | Sure. Makes a lot of sense. | 17:10 |
kc5tja | Well, the problem I have is that I've built all my software development tooling targeting RISC-V "as defined" in the specs already. | 17:10 |
kc5tja | I don't want to have to waste that effort. | 17:10 |
kc5tja | But it's looking like I'm going to have to. | 17:10 |
mafm | ZipCPU: that's true, but that would be only for educational purposes. the obvious benefit of reusing designs would be support from existing software ecosystem, for example (if the design is not incompatible) | 17:14 |
ZipCPU | True, but how hard can a new design be? My big problem is that I don't support 8 and 16-bit words. This keeps me from compiling standard libraries. But, if I had that ... how much more difficult is an operating system? | 17:16 |
ZipCPU | Sure, the assembler and compiler take some work. Been there, done that, could do it again now if I needed to. | 17:16 |
ZipCPU | But at some point, programs just "compile" and "build" and the CPU gets out of the way, right? | 17:16 |
mafm | ZipCPU: yes, but again, for educational purposes. if one wants to put the design to the market, it's much easier to say "and you can go to debian and download the image for risc-v" | 17:17 |
ZipCPU | So let's go back to the "market" business. What "market"? Am I going to compete with Intel? AMD? ARM? While I'd love to, it's not likely. | 17:18 |
ZipCPU | If a "market" is going to drive a product, it starts with a need that isn't being filled by the current market products. | 17:19 |
kc5tja | Funny you mention the cell size issue; a surprising amount of S64X7 logic goes into multi-width loads and stores, and I suspect that'll be the same for RISC-V too. | 17:19 |
ZipCPU | Were I to try to compete head to head with RISC-V or OpenRISC ... I'd have a *long* way to go to catch up. | 17:19 |
mafm | ZipCPU: I'm thinking of something like the rationale in Parallela to go with risc-v | 17:20 |
mafm | or Nvidia's internal designs ("falcon") | 17:20 |
mafm | (be back in a bit) | 17:20 |
kc5tja | I'm genuinely surprised that Falcon is switching to RISC-V, honestly. | 17:21 |
ZipCPU | Is RISC-V really mature enough to support it? | 17:21 |
kc5tja | I mean, this company is the world leader in GPU design; you'd think they'd've already managed to build a reasonable RISC on their own. | 17:21 |
kc5tja | Definitely; RISC-V's most basic definition (support for only RV64I and RV64S) is at least at MIPS 3000 level of capability. | 17:22 |
ZipCPU | No, I'm more referring to the exception model and the other ... requirements of a chip other than just a simple set of basic instructions. | 17:23 |
kc5tja | RV64S provides the trap and exception model. | 17:23 |
kc5tja | Not sure what other requirements you have in mind. | 17:24 |
ZipCPU | Maybe I'm missing something. I just didn't think the spec regarding that was yet complete, or perhaps that it had just finished becoming so. | 17:24 |
kc5tja | It's not finalized, no. But in Falcon's case, which is a deep-embedded application, it doesn't matter. Where it will become more of an issue is for desktop and server CPUs. | 17:26 |
kc5tja | here, upward compatibility of OSes, and not just user apps, becomes important. | 17:26 |
kc5tja | Also, it's why RISC-V instruction decode is so damn annoying; invalid CSRs and other features are *defined* to cause illegal instruction traps, precisely to support emulation of obsolete or undefined features. | 17:27 |
kc5tja | If it weren't for that, decoding RISC-V instructions would be a piece of cake (decode low 5 bits of opcode space for most instructions, passing another 3 bits of the instruction register to the ALU to determine precise operation). | 17:27 |
ZipCPU | Hmmm ... why not do that, catching any gross violations, and then let the ALU catch any finer point violations? | 17:29 |
kc5tja | Clarify? | 17:30 |
ZipCPU | Sort of like a two-stage invalid instruction detector. The first stage does a coarse illegal instruction detection, subsequent stages detect more detailed. | 17:31 |
kc5tja | The meaning of the function control bits changes with the class of instruction. 000 = ADD for OP and OP-IMM instructions, but load byte/store byte for LOAD/STORE instructions. | 17:32 |
ZipCPU | Exactly! | 17:32 |
ZipCPU | Break it into classes. If it doesn't fit in any class, declare an illegal instruction. Otherwise let the code for handling the specific class figure out the further details. | 17:33 |
kc5tja | That's what I currently have (broken into case statements). But the logic is still bloody huge. | 17:33 |
kc5tja | It's easier if I can just write everything out as a giant truth-table (with inputs on one side, and outputs on the other side), and generate Verilog from that. | 17:34 |
kc5tja | Another problem I have with CPU design in general, and RISC design in particular, is the inherent non-testability of the circuit. | 17:34 |
kc5tja | It's just not testable. | 17:35 |
kc5tja | And that only adds to my distress. | 17:35 |
kc5tja | Because I hate, hate, hate not having benchtests for how things work on the inside. | 17:35 |
kc5tja | I rely on tests as much for documentation on how things work as I do anything else. | 17:35 |
ZipCPU | Yes, yes, yes. Testability is *very* important. I became persona non-grata at my last office for saying that ... :) | 17:36 |
ZipCPU | We were trying to sell a product no one wanted, partly because no one could *prove* that it worked. | 17:36 |
ZipCPU | I came up with a way to know whether or not it worked, and immediately became the customers best friend and my bosses worst enemy. | 17:37 |
kc5tja | Because now your boss knew he had to actually work. | 17:39 |
kc5tja | They hate it when you do that. :) | 17:39 |
ZipCPU | That was before the lawyers got involved. Apparently "fraud" includes knowing something about your product that you aren't sharing with the customer. | 17:40 |
ZipCPU | At least I wasn't "asked" or "told" to leave ... | 17:40 |
olofk | For heaven's sake. Haven't I told you not to talk when I'm gone? I just spent a lot of time going through the backlog here :) | 17:50 |
olofk | But here it goes | 17:50 |
ZipCPU | ROFL! | 17:50 |
olofk | Regarding big companies talking about their love of RISC-V (such as NVidia), I highly suspect they use this as a lever in negotiations with ARM | 17:51 |
olofk | I used to work for a big company that were always very careful to avoid single-sourcing so that they wouldn't be caught in bad contracts | 17:51 |
ZipCPU | (You haven't read very far back, yet, have you ... :-> ) | 17:52 |
olofk | SMDhome1: You asked about ready ports for RISC-V and OpenRISC designs. I'd advise you to check out what's available in the FuseSoC core library (orpsoc-cores). There are OpenRISC ports there for several boards, and also a few RISC-V designs | 17:54 |
olofk | And about the inability to affect the RISC-V core decisions, we tried to get a community-drafted OpenRISC 2000 spec out of the door some years ago | 17:55 |
olofk | The discussion from back then is still on OpenCores wiki. A funny thing about that is that mostly all of the things we wanted to improve from the or1k spec are present in the RISC-V spec. And since we didn't have man power at that point to get anywhere, we have now dropped the idea | 17:56 |
ZipCPU | Do you mean ... the faults in Or1k are also faults of RISC-V? | 17:57 |
olofk | Ah. That came out wrong :) | 17:57 |
kc5tja | olofk: That, and 64-bit capability, are why I opted for RISC-V instead of OpenRISC. | 17:57 |
olofk | kc5tja: OpenRISC have had 64-bit mode for 15 years | 17:58 |
ZipCPU | (SUCCESS!!!!! The new/updated DDR3 memory controller now passes it's entire (updated) <simulated> test suite!!) | 17:58 |
kc5tja | olofk: Not very well advertised; literally every page I read on the topic showed it was "in development" in some way. | 17:58 |
kc5tja | ZipCPU: Congrats! | 17:59 |
olofk | Just that no one has implemented it. I think part of the reason is that until quite recently, a 64-bit CPU in an FPGA wasn't that feasible | 17:59 |
ZipCPU | That leaves tomorrow (or next week, more likely) to start working on the actual hardware interface (again). | 18:00 |
olofk | And I think RISC-V is a good ISA, from what I understand about it. I still would find it interesting to see what a modern community-drafted ISA would look like though | 18:00 |
kc5tja | olofk: Wish I'd known about this 4 years ago. :) | 18:01 |
olofk | And the original or1k ISA was most definitely not designed by committee. It was a couple of Slovenian students who had read the Patterson/Hennesey books and thought, hey, we should implement this in an FPGA :) | 18:01 |
olofk | From what I understand. I wasn't involved until 2010 or so | 18:02 |
ZipCPU | Heheh ... so ... there'd still be some use for some young upstart to come in and build a better ISA? | 18:04 |
* ZipCPU breaks into devious grin. | 18:04 | |
kc5tja | OR3K -- 128-bit design. ;D | 18:05 |
olofk | 128-bit might be too much. I think we should go for 64-bit with an extra leap bit every four years | 18:06 |
ZipCPU | Something to compete with SSA? and the various Vector math beasts? | 18:06 |
olofk | See. There are already ideas being thrown around :) | 18:06 |
ZipCPU | Oh, come on, Cray was working on a vector CPU with 64x64 vectors over a generation ago ... | 18:06 |
ZipCPU | Surely a new design could do at least as well ... | 18:07 |
olofk | Most times I use a CPU in an FPGA it's as a glorified state machine to handle some basic task, so performance isn't the only thing that counts. I could well see ZipCPU being of use for people who need a small 32-bit CPU | 18:07 |
ZipCPU | Ahh ... shucks ... that's one of the best compliments I've gotten for this work yet ... (Thanks). | 18:08 |
olofk | I think Jan Gray's GRVI Phalanx is really cool too. Not sure he'll show up at orconf this year | 18:08 |
kc5tja | olofk: According to a few trends, sometime around 2025, we'll exceed 64-bit address widths (at least in a data center context). | 18:12 |
kc5tja | Though, honestly, an ISA really ought to be width independent. 16-bit, or 256-bit, it ought not matter how wide your registers are. | 18:13 |
kc5tja | I've seen strong, vociferous opposition to 128-bit machines before, and I remember that attitude when 64-bit was being discussed for the first time in the industry. | 18:13 |
kc5tja | "We'll never need 64 bits!" the argument went. But then, there was a time when 24-bit address buses were so large that nobody could possibly use 16MB of memory either. | 18:14 |
olofk | No one will ever need more than 640kB wide buses | 18:15 |
ZipCPU | kc5tja: Have you read Agner's CPU/ISA proposal? He's proposed a CPU/ISA with vector registers, but that can handle vector length independent code. That way, when the vector register length is increased, the same code still works without needing to be recompiled. | 18:15 |
kc5tja | OK, I'll agree that a 5,252,880-bit wide bus will be hard to package and use, and needing that will be long after I'm dead. :) | 18:16 |
kc5tja | ZipCPU: Krste was talking about that for RISC-V during the 3rd workshop. | 18:16 |
ZipCPU | So ... it isn't that outrageous?? | 18:17 |
kc5tja | Nope. Since about half of the RISC-V Workshop attendees are supercomputer engineers, they were salivating over it. | 18:17 |
olofk | supercomputer engineers, are those the guys who think they need something that they might not necessarily need? :) | 18:18 |
ZipCPU | My Dad was a supercomputer engineer, so watch it! ;) | 18:20 |
ZipCPU | Actually, he was a systems programmer for Cray Research--so, close enough. | 18:20 |
ZipCPU | Let's see, some favorite quotes. Most people would say, "If it ain't broken, don't fix it." | 18:21 |
ZipCPU | The engineer says, "If it ain't broke, it ain't got 'nough features." | 18:21 |
ZipCPU | The program manager says, "If it ain't broke, shoot the engineer." | 18:22 |
ZipCPU | kc5tja: Did you notice that http://fpga.org/grvi-phalanx/ claims that a RISC-V core can be implemented in 320 6-LUTs? | 18:24 |
olofk | ZipCPU, kc5tja: Yeah, he packed like 300 cores or so in a high-end Xilinx FPGA | 18:27 |
-!- blueCmd is now known as biot_ | 18:32 | |
-!- biot_ is now known as blueCmd | 18:33 | |
olofk | Time for bed. Now you all keep quiet until I return ;) | 18:35 |
kc5tja | ZipCPU: Veeeeerrry stripped down, w/out traps of any kind, and it's a subset of the instructions provided in RV64I and RV64M. | 18:39 |
kc5tja | Oh, and 32-bit. | 18:39 |
kc5tja | Picorv32 is actually a more complete RISC-V implementation than Phalanx cores. | 18:39 |
kc5tja | AND, he packed those LUTs by hand. | 18:41 |
kc5tja | That's something I'm not willing to do, b/c I have no intention of selling my designs. | 18:41 |
--- Log closed Sat Aug 20 00:00:33 2016 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!