--- Log opened Thu Dec 15 00:00:28 2016 | ||
-!- ZipCPU_ is now known as ZipCPU | 08:27 | |
arand_ | Does the dcache addressing behaviour differ depending on if it's disabled or enabled? | 10:03 |
---|---|---|
olofk_ | shorne: You're in the news today :) http://phoronix.com/scan.php?page=news_item&px=Linux-4.10-OpenRISC | 16:41 |
olofk_ | kc5tja: Yes, there are some plans and discussions for future Wishbone. A packet mode has been suggested as one part of a new spec | 16:42 |
olofk_ | I was actually in a video conference a few days ago with a bunch of people from different particle accelerators around the world, who are using wishbone as well and want to make improvements | 16:42 |
olofk_ | We will try to move things forward with wishbone, but it hasn't risen high enough on the list of priorities yet | 16:43 |
ZipCPU | olofk_: You mean ... bonus points are available for anyone who can build something faster/better/cheaper ... first? ;) | 16:45 |
olofk_ | ZipCPU: Especially cheaper. We must bring down the price on wishbone quite a bit :) | 16:48 |
ZipCPU | Maybe I could get you something that would cost 4x fewer $? | 16:49 |
olofk_ | That would be fantastic! If you do that I'll give you a 70% discount on FuseSoC | 16:49 |
ZipCPU | LOL. | 16:50 |
ZipCPU | I wonder if the UART-WB converter might offer a possible form. Every transaction takes up to 36 bits on a bus. The first four bits define what sort of transaction it is, the other 32 are available for data. | 16:52 |
ZipCPU | One word can be used to set an address, the next (several) to write to the address, or (if so commanded) to read a number of values. | 16:53 |
ZipCPU | Hence, a read request would require 2-words to be transmitted, one for the address, one for the number of items to read. (Whether or not to increment the address between reads is captured in there too.) | 16:54 |
ZipCPU | A write request would require one address word, and then one word per word written to the bus. The length is determined by the number of words in the message. One bit of each word tells you whether or not to increment the address along the way. | 16:54 |
ZipCPU | Perhaps an extra word would need to be added to the beginning of every packet as well, identifying the node and giving an ID for the transaction. | 16:55 |
olofk_ | Doing like that would make serializing more natural, and I guess that's why you chose this format for the bridge | 16:58 |
olofk_ | (out of curiousity. Have you looked at etherbone, which is a standard for transmitting WB over ethernet?) | 16:58 |
ZipCPU | No ... I didn't know there was such, as I've been tempted to build my own ethernet over wishbone protocol. | 16:59 |
olofk_ | http://www.ohwr.org/projects/etherbone-core | 17:00 |
olofk_ | anyway, back to your idea | 17:00 |
ZipCPU | Okay ... so the width of any channel would need to be the word size, plus the word size / 8, plus two bits. | 17:03 |
ZipCPU | The word size is obvious, one bit for each data bit to be sent. | 17:03 |
ZipCPU | The word size / 8 shouldn't be too hard to figure out: One bit for each of the select lines. | 17:04 |
ZipCPU | The last two bits are (for each word written) a '1' to specify a write (as opposed to anything else), and a 0/1 to specify whether or not the address should be incremented between writes. | 17:04 |
ZipCPU | Thus, for a 32-bit data channel, you would need 42 bits total. Realistically, you'd need two more: one for STB, another for STALL. | 17:05 |
ZipCPU | STB traveling with the packet, STALL going against the current. | 17:06 |
ZipCPU | Probably needs more specific thought/documentation, but ... it should be quite doable. | 17:06 |
ZipCPU | You could even define words for particular responses, such as passing around an "interrupt", "bus error", "bus reset" has taken place, or a "lock request" and/or "lock clear" signal. | 17:11 |
ZipCPU | If you keep the signal lines at 6-bits or less, than a 6-LUT should process them nicely, no? ;) | 17:11 |
olofk_ | I haven't given that idea much thought, but I did create a bus about two years ago with some other ideas | 17:15 |
ZipCPU | Realistically, the purpose of such a bus might be to handle packet switched requests, which would then be turned into WB actions at their destinations. | 17:16 |
olofk_ | I called it CAMD as a working name. The idea there was to separate Command/Address and Data/Mask into separate buses | 17:16 |
olofk_ | And have a third channel for responses | 17:16 |
ZipCPU | Wouldn't you want an economy of wires? Since you'd be using a packet approach, wouldn't you wish to start with an address at the beginning of the packet? | 17:17 |
ZipCPU | The reverse channel could be identical, save for a bit in the Start of Packet word specifying that it is a return response. | 17:18 |
olofk_ | One argument against serializing (as in your proposal) is that data might well be the width of a cache line or DDR burst (say 256 bits) while you still want a 32-bit address space | 17:18 |
olofk_ | For off-chip serializing is important, and I think we should consider that use case from the beginning, but we also need to optimize for on-chip behaviour | 17:18 |
ZipCPU | If the address is smaller than the amount of data, stuff the bits--no problem. If the data width is smaller than the address width, use multiple clocks to send the address. | 17:19 |
olofk_ | I will do my best to find compelling arguments against your idea, and hopefully I'll come up with none :) | 17:23 |
olofk_ | There are a few of course, but I'm not sure they are that strong | 17:23 |
ZipCPU | So, a read request might look something like: http://imgur.com/a/Scq30 | 17:24 |
olofk_ | 1. You will lose bandwidth everytime you send a new address instruction (but you can make sure to write large bursts to make that penalty small) | 17:25 |
olofk_ | 2. It will add logic to all slaves to decode the combined data/address bus | 17:25 |
olofk_ | (not sure that is a lot however) | 17:25 |
olofk_ | 3. It will not be backwards-compatible (but I find it hard to come up with any scheme that is a significant improvement and still retains backwards compatibility) | 17:26 |
ZipCPU | As for backwards compatibility, I already have something similar to what would be needed to bridge between this and the current WB (B4, pipelined ;) | 17:27 |
ZipCPU | The big problem you will have is the stall line on the reverse, since nothing in the current WB allows responses to stall. | 17:27 |
olofk_ | Oh yes. Any new scheme would definitely need flow control on the return channel | 17:28 |
olofk_ | I would probably still want to handle valid/ready flags outside of the payload | 17:29 |
olofk_ | So that the underlying medium can take care of this | 17:29 |
olofk_ | But now I need to sleep :) | 17:30 |
ZipCPU | Rgr. | 17:30 |
ZipCPU | Maybe kc5tja and I will solve it before you wake back up. | 17:30 |
kc5tja | olofk_: Yay! | 17:32 |
ZipCPU | kc5tja: Have you followed the discussion so far? | 17:33 |
kc5tja | I just read up. | 17:36 |
ZipCPU | Any thoughts? You were looking for a packet based ... WB replacement? | 17:37 |
kc5tja | I'd like to see one eventually, but I don't need it now. | 17:39 |
kc5tja | But I think I need to clarify what I mean by packets. | 17:39 |
kc5tja | AXI4 transmits entire packets in a single clock (well, req/ack) cycle. | 17:39 |
kc5tja | The availability of five different channels means that a single interconnect can be participating in five different transactions at any given time. | 17:40 |
ZipCPU | Not really. They transmit entire packet requests in a single clock. The packets responses still take about one clock per data word. | 17:40 |
kc5tja | Sure, if you're streaming. AXI4 supports 1024- and 2048-bit wide buses, which is an entire cache-line. | 17:40 |
ZipCPU | Do you envision a single interconnect to handle five different transactions per clock, or TDM those transactions in a more traditional packet structure? | 17:40 |
kc5tja | Both, for different applications. Multitasking links for more efficient use of on-chip resources, and TDM for off-chip access. | 17:41 |
kc5tja | But, like I said, if you tweak WB to meet the former spec, you'll end up with AXI4 *anyway*. | 17:42 |
ZipCPU | Yeah, I'd like to avoid creating AXI4. | 17:43 |
kc5tja | The thing is, AXI4's spec allows concurrent transactions on multiple channels, so it's actually possible to bridge (non-pipelined) Wishbone to AXI4 pretty easily. It just involves a ton of wiring. | 17:52 |
kc5tja | You don't even need a state machine as long as word widths are the same. | 17:52 |
ZipCPU | Yeah ... except for AXI4 burst requests ... ;) | 17:56 |
ZipCPU | I just finished building an AXI4-WB bridge, and those burst requests were what made the whole thing difficult. | 17:57 |
kc5tja | How so? | 17:58 |
ZipCPU | In the end, I needed a FIFO with four pointers in it: One to reference the request that was received from AXI. This included any expanded burst request. The second pointer was to WB requests that had been issued. The third to WB ACK's received (and possibly data), and the fourth kept track of responses given. | 17:59 |
ZipCPU | I'm not looking forward to debugging the timing errors associated with reading and writing from/to those FIFOs. | 18:00 |
kc5tja | You need FIFOs with pipelined WB too. | 18:00 |
kc5tja | The master must store a transaction in a FIFO so that it knows what transaction the next ACK applies to. | 18:00 |
kc5tja | Are you targeting AXI3 or AXI4? | 18:00 |
ZipCPU | I mean, consider this, what if you receive a request for a 256-long burst. If your FIFO is 16 long, it will be a while before you can accept any more requests, so you'll need to drop the READY lines on the input channels. | 18:00 |
ZipCPU | (AXI4) | 18:00 |
ZipCPU | Hmm ... yeah, I do have other bus FIFO's wandering around, but I have tried to keep them out of the interconnect. | 18:02 |
ZipCPU | In general, only the source master is maintaining any FIFO's. | 18:02 |
ZipCPU | And ... not all of my source master's maintain FIFOs. | 18:03 |
kc5tja | When the Kestrel advances to the point of needing caches, the CPU suddenly finds itself in a different clock domain from the rest of the system. FIFOs are compulsory for that case, which is why my cached version of the CPU will use B4 not B3. :) | 18:03 |
gnawzie | hello | 22:19 |
ZipCPU|Laptop | Hello gnawzie! | 22:19 |
gnawzie | What's up? | 22:19 |
ZipCPU|Laptop | It's late at night here, and I'm going to lose it soon. | 22:20 |
gnawzie | lol | 22:20 |
ZipCPU|Laptop | How about you? You must be just getting up for the day? | 22:20 |
gnawzie | No it's about 6:20pm now | 22:20 |
gnawzie | I'm thinking of using openrisc for a drone flight controller | 22:20 |
ZipCPU|Laptop | Ahh ... PST. EST here. | 22:20 |
ZipCPU|Laptop | Wait ... 6:20pm isn't PST, ... that's not in the US at all is it? | 22:21 |
gnawzie | its alaska | 22:22 |
ZipCPU|Laptop | Ok, that makes sense. | 22:24 |
ZipCPU|Laptop | Just saw your question on ##fpga. "Is OpenRISC any good"? | 22:24 |
gnawzie | haha | 22:24 |
ZipCPU|Laptop | This is the forum to chat with folks who use it. Only problem is, most of those folks are on European time. | 22:24 |
ZipCPU|Laptop | We'll see them in the morning. | 22:24 |
gnawzie | okay | 22:24 |
ZipCPU|Laptop | What are you looking for? | 22:25 |
gnawzie | just a bit overwhelmed by the complexity | 22:25 |
ZipCPU|Laptop | Yeah ... tell me about it. I understand completely! | 22:26 |
ZipCPU|Laptop | What level are you looking to understand OpenRISC at? | 22:26 |
ZipCPU|Laptop | User/application level? Bare metal? Verilog? Compile/tools level? | 22:26 |
ZipCPU|Laptop | I ask because the complexity at each level is different. Some are easier than others. | 22:27 |
gnawzie | I really don't know the structure of those I have no direction lol | 22:28 |
ZipCPU|Laptop | Well, okay, let's try this a different way: what do you want to accomplish? | 22:29 |
ZipCPU|Laptop | In many ways, OpenRISC is a tool. Which tool you use depends upon the task at hand. | 22:30 |
gnawzie | Okay... that makes sense then. It's not a core you can simply instantiate and use | 22:32 |
ZipCPU|Laptop | Well ... I might disagree with that. Many people have just instantiated it and used it. | 22:32 |
ZipCPU|Laptop | Placing it into your own design will require building a bus with peripherals on it, connecting the core to that bus, and perhaps even a debug core to that bus. | 22:33 |
ZipCPU|Laptop | Then you would load instructions into the CPU, and off you go! | 22:33 |
ZipCPU|Laptop | Ok, back to that load instructions part ... that will require building the toolchain, compiler, etc., and then using it to build your program. All quite doable. | 22:34 |
ZipCPU|Laptop | While the core will run Linux, I think you'll find that many of the folks on this forum aren't running Linux on it. | 22:34 |
gnawzie | yeah | 22:38 |
ZipCPU|Laptop | The other thing to know is that many of the folks on this forum have made those tasks REALLY easy. | 22:39 |
ZipCPU|Laptop | However, I need to head to bed for the night. Feel free to ask further questions, but do please stick around for the answers. | 22:40 |
ZipCPU|Laptop | You might not notice them till sometime tomorrow morning. | 22:41 |
--- Log closed Fri Dec 16 00:00:29 2016 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!