--- Log opened Wed Oct 24 00:00:26 2018 | ||
-!- flyback is now known as scarface- | 01:00 | |
-!- scarface- is now known as flyback | 01:00 | |
alown_cara | ZipCPU: Thanks for following that up with olofk. Unfortunately the wb_intercon appears to lack full support for tags. (Use of LGPL3 also makes things more complex than (say) MIT/BSD/etc.) | 06:51 |
---|---|---|
alown_cara | ZipCPU: Looks like I will get to make my own (buggy) crossbar generation script. | 06:52 |
ZipCPU | alown_cara: What are you trying to do? | 06:52 |
ZipCPU | I have my own cross bar generation program. Might it help you at all? | 06:52 |
alown_cara | Plausibly, though the thing you can probably have guessed from the license questions is that I am currently wearing a "company" hat. | 06:53 |
ZipCPU | Sure | 06:53 |
ZipCPU | I have a program I use that I call AutoFPGA | 06:53 |
* alown_cara looks on github | 06:54 | |
ZipCPU | Currently, it builds Wishbone B4/pipeline crossbas | 06:54 |
ZipCPU | https://github.com/ZipCPU/autofpga | 06:54 |
ZipCPU | The code it produces may be licensed as you wish. AutoFPGA asserts no copyright restrictions on that code, similar to GCC | 06:54 |
alown_cara | Interesting. | 06:55 |
ZipCPU | I've told myself often that it could be used for non WB/B4/pipeline crossbars, but I have yet to implement that capability into it | 06:55 |
alown_cara | Can I ask which toolchains you have run this though so far? | 06:55 |
ZipCPU | I've used the AutoFPGA output on Vivado, Quartus, Yosys, and Verilator toolchains | 06:56 |
alown_cara | Hmm. For Quartus I would struggle to shift the consensus from Qsys... But, I am currently looking for a good interconnect for a Lattice based design. | 06:57 |
ZipCPU | I have used this with an iCE40 design, https://github.com/ZipCPU/icozip | 06:58 |
ZipCPU | I've also used it with https://github.com/ZipCPU/tinyzip, just ... the pre-production hardware I have doesn't seem to be supported anymore, so I've never loaded this design and proven that it works | 06:59 |
ZipCPU | I have used this approach with Qsys as well. | 06:59 |
alown_cara | My personal hat is very interested, I need to read a bit more and check it suits the company requirements though. | 07:00 |
ZipCPU | The biggest problem you might expect would be WB/B3(classic) vs WB/b4/pipeline | 07:00 |
ZipCPU | I should also ask ... what "tags" are you hoping to support? | 07:01 |
alown_cara | The underlying device hardware somewhat dictates that at least parts must be built to interface classic. | 07:01 |
ZipCPU | What class of underlying hardware are you using? SDRAM? | 07:02 |
alown_cara | Mach XO3L(F) EFB | 07:02 |
ZipCPU | Not familiar with that one | 07:03 |
* ZipCPU googles | 07:03 | |
alown_cara | A block of silicon embedded in the device for doing SPI/I2C/internal-NVRAM/timer/etc. with an internal wishbone classic slave interface | 07:04 |
ZipCPU | Looking at it now. Looks like AutoFPGA would do nicely with it | 07:05 |
alown_cara | Tag-wise, I was contemplating whether a new type of addressing-tag was going to be the easiest way to adapt existing IP built with bursting sizes declared during burst setup phase (Avalon-MM) to WB. | 07:06 |
ZipCPU | Ahh ... do you have an Avalon-MM capability with burst mode support? | 07:07 |
ZipCPU | I built an Avalon->WB bridge, just didn't add burst mode to it | 07:07 |
ZipCPU | Wasn't that hard to do. Burst mode might be more difficult though. | 07:07 |
alown_cara | I have a lot of IP that uses bursts (often a different max burst size per component) | 07:07 |
alown_cara | WB(4) bursts appear to have quite a different nature. | 07:08 |
ZipCPU | I don't use the burst mode in WB(4), and I don't think my code is any less effective as a result. The same would not be true of WB3 | 07:08 |
alown_cara | True. | 07:08 |
ZipCPU | Using WB4/p, you can just issue one bus transaction after another to issue the whole burst, without actually issuing a burst | 07:09 |
ZipCPU | You get all the performance, but with none of the complexity | 07:09 |
alown_cara | (Also, the autofpga readme appears to note that "no support is provided for WB B3 [..] yet" | 07:09 |
ZipCPU | This is true | 07:09 |
ZipCPU | The Mach component appears to be B4 though, doesn't it? | 07:09 |
* ZipCPU checks again | 07:09 | |
alown_cara | However, lots of underyling things need some set-up time before they are able to service (say) a read burst, providing the burst size info in the first cycle helps a lot with this. | 07:10 |
ZipCPU | So, here's how I've gotten around that: | 07:10 |
ZipCPU | 1) I assume any transaction may be either singular, or linear in addressing | 07:10 |
ZipCPU | 2) Any transaction may begin a burst, of an unknown length | 07:11 |
alown_cara | The Mach reference guide refers to its WB implementation as "classic". | 07:11 |
ZipCPU | 3) Within a burst, addresses are constant or incrementing | 07:11 |
ZipCPU | alown_cara: The WB/B4 spec discusses how to bridge from classic to pipeline and back again | 07:11 |
alown_cara | Also true. | 07:11 |
alown_cara | Maybe, I am missing something, but I don't think (1)-(3) helps when there is an extensive latency to performing these operations, but high bandwidth. | 07:12 |
ZipCPU | I think they do, but let me try to explain | 07:12 |
ZipCPU | Let me ask one question first: none of these peripherals appear to be high bandwidth peripherals: I2C, SPI, counter, etc | 07:13 |
ZipCPU | Why are you interested in high performance bursting? That doesn't seem to make any sense. | 07:13 |
ZipCPU | On top of that, the EFB I/O doesn't support bursting either | 07:15 |
alown_cara | It does, but I haven't provided enough context on the constraints, and the supporting systems to explain why. | 07:15 |
ZipCPU | Well, okay, let me return ton (1)-(3) | 07:15 |
alown_cara | Sure. | 07:15 |
ZipCPU | If there is an extensive latency to perform the operations, then the first operation sets up the transfer, and (at least with wb/p) the second one waits at the peripheral (not the master) | 07:16 |
ZipCPU | As a result, only the first item suffers from any latency, the rest go immediately to the peripheral when it is ready | 07:16 |
ZipCPU | I think I've written about this extensively on zipcpu.com | 07:17 |
ZipCPU | Ahh, I have a good slide for you. Interested in comparing two bus interaction charts? | 07:17 |
alown_cara | I was just pulling up the spec for the timing diagrams again. | 07:17 |
ZipCPU | Check out slide 26 (internal marking) of https://github.com/ZipCPU/zipcpu/blob/master/doc/orconf.pdf | 07:19 |
ZipCPU | Now compare that with slide 27 (the next one, also based upon internal marking) | 07:19 |
ZipCPU | That should show you the performance you can expect when using the pipelined mode | 07:20 |
ZipCPU | The problem with WB classic is that the bus master has to wait for the slave to respond before issuing a second request | 07:20 |
ZipCPU | WB pipeline changes this so that the master only has to wait until the interconnect accepts the request before sending an additional one | 07:21 |
alown_cara | This would help in theory but only if wb/p allows this overlap to be extended to N outstanding requests. | 07:21 |
ZipCPU | It may be extended arbitrarily | 07:21 |
ZipCPU | I personally limit the extensions within the code I formally verify, to help the formal verification, but the spec creates no limit on the length of the transaction when done in this fashion | 07:22 |
ZipCPU | The longest burst I've done (so far) has been 1024 transactions using this approach. (That was my DMA engine that I used for that purpose) | 07:23 |
ZipCPU | Can I interest you in an article setting up formal wishbone properties, and comparing WB to AXI and Avalon? http://zipcpu.com/zipcpu/2017/11/07/wb-formal.html | 07:25 |
* alown_cara is surprised I don't recall this article, as I definitely recall reading others there linked from HN | 07:26 | |
ZipCPU | Not all of my articles have been cross posted to HN | 07:27 |
ZipCPU | Indeed, I think only about 10 or so have | 07:27 |
* ZipCPU goes to count | 07:27 | |
ZipCPU | Ok, only 14 have been cross posted to HN | 07:28 |
alown_cara | Some more of the context: WB bus interaction that occurs over a high latency link (which is also relatively bandwidth starved) operates in a packetized manner. | 07:28 |
ZipCPU | Go on | 07:29 |
alown_cara | So, this extended approach would require the requesting packetizer to accept all the addressing cycles to count the number in, before it could issue the packet to the other side. | 07:29 |
alown_cara | (Which would need to do the same thing to the responses) | 07:29 |
ZipCPU | Not sure I followed. Can you explain? | 07:29 |
ZipCPU | Ahh, nvm | 07:30 |
ZipCPU | Got it | 07:30 |
ZipCPU | Go on | 07:30 |
alown_cara | As such, a tag indicating the number of cycles in the burst that is present during the first cycle, would allow this packet to be issued on the first cycle | 07:30 |
alown_cara | allowing a certain amount of overlap | 07:30 |
ZipCPU | If the link is bandwidth starved, then the only benefit you would get would be from reading, right? | 07:30 |
alown_cara | yep. | 07:30 |
ZipCPU | How bandwidth starved? Are you coming from a serial port perhaps? | 07:31 |
alown_cara | No, it is a popular packet-based protocol, just that most of the bandwidth is reserved for other purposes. | 07:31 |
ZipCPU | Network packet? | 07:31 |
ZipCPU | It sounds like what you need/want is just an Avalon-MM -> WB/classic bridge, right? Do you have other peripherals you need to access as well while you are at it? | 07:32 |
alown_cara | Of course. | 07:33 |
ZipCPU | Heheh | 07:33 |
alown_cara | (to the later part) | 07:33 |
ZipCPU | Do you have a strong need to reconfigure often? In other words, would it make more sense to build the interconnect by hand? | 07:33 |
alown_cara | AVMM is a huge spec though, so an actually fully compliant bridge would be quite a project of itself. | 07:33 |
ZipCPU | I have an AVMM->WB(B4/p) bridge, but I understand what you mean--it's not "fully compliant". However, it has been good enough for me. | 07:34 |
alown_cara | I was hoping to avoid needing to do the hand-crafted bit during the initial R&D, but given the device is also rapidly running out of available resources due to being a bit small... | 07:34 |
ZipCPU | Yes, there is that. iCE40 hx8k? | 07:34 |
alown_cara | (The big two vendors can't even implement AVMM<->AXI3/4 fully, so...) | 07:34 |
ZipCPU | wb_interconn isn't known for a low-resource connection IIRC | 07:35 |
alown_cara | I'm not using wb_interconn. | 07:35 |
ZipCPU | AutoFPGA will do low resource decoding, but it has some other difficulties you've just mentioned | 07:35 |
ZipCPU | (I know, but you were considering it) | 07:35 |
alown_cara | True. | 07:35 |
ZipCPU | That suggests something handcrafted might be ideal | 07:36 |
ZipCPU | I do have a blog article discussing a hand crafted interconnect | 07:36 |
* alown_cara is wondering if he could get approval to extend autofpga as necessary and upstream generic stuff. | 07:36 | |
ZipCPU | It's not really that hard, but it does get *REALLY* annoying when you start to need to make changes | 07:36 |
alown_cara | Given the staticness of the requirements on this project, I would be encountering that a lot. | 07:36 |
ZipCPU | I do have some commercial work I'll be needing to do soon as well. My goal with that work would be to support a full AXI4->WBp bridge, including all of AXI4's burst modes as well. | 07:38 |
alown_cara | The optimal resource-wise result would probably be some weird mixture of shared-bus and crossbar for different master<->slave combinations. | 07:38 |
ZipCPU | Sure | 07:39 |
alown_cara | Last task I did, I started with AVMM, migrated bits to AXI4, then re-migrated bits to AXI3 for transaction locking support. | 07:39 |
ZipCPU | Do you use AV a lot? | 07:39 |
alown_cara | As a (mostly) Intel/Altera shop, the answer would be yes (for better and worse). | 07:40 |
* ZipCPU just might have a set of formal properties for AVMM -- they just don't support burst mode (yet) | 07:40 | |
* alown_cara simply has a great time writing simulation code to test the obvious bits, then makes it softwares problem to find the bugs :p | 07:41 | |
ZipCPU | My problem is that one mistake can lock up the hardware hard. You can read about my "one mistake" here if you are interested. http://zipcpu.com/blog/2018/02/09/first-cyclonev.html | 07:42 |
alown_cara | Heheh, when you say the ARM was issuing these out-of-order, I presume you mean that whatever system was in charge of maintaining coherency loading in to cache was unaware of your target locations had strict requirements? | 07:48 |
* alown_cara is intrigued by the comments on use of formal verifications, as has been leaning strongly into SVA testing approaches at the moment | 07:48 | |
ZipCPU | Pretty much | 07:48 |
ZipCPU | The FIFO required items to be read in order, and it ignored the address | 07:49 |
ZipCPU | The ARM tried to load addresses starting on 8-word boundaries, then came back and filled in the gaps | 07:50 |
alown_cara | My ARM is a little rusty, but I thought most arm-based SoCs had to expose special ports if they are meant to be coherent, as the exact details of the various levels of cache (if they exist) are left to the SoC implementor. | 07:51 |
alown_cara | (thinking about what happens in Zynq and on Tegra chips) | 07:51 |
ZipCPU | My knowledge of internal ARM details is essentially non-existent---other than the scars from the fails I've suffered through. :D | 07:52 |
alown_cara | Anyway, I should go and do some other bugfixing for now, and will have a play with autofpga later today. | 07:55 |
alown_cara | Thanks for all the help. | 07:56 |
ZipCPU | Feel free to write as you have the need | 07:56 |
ZipCPU | My pleasure! | 07:56 |
alown_cara | Sure. I will idle around here for a bit then. (I might make my personal hat join in too). | 07:57 |
alown_cara | ZipCPU: I have had a bit of time to look over autofpga, and whilst I think it would definitely work, it doesn't quite seem right (it solves a slightly different problem). | 13:03 |
alown_cara | ZipCPU: All I was really looking to see if it already existed, was a tool to take a description of master/slave ports and build a piece of WB interconnect to join it together, autofpga focuses on the higher level problem of building and maintaining the whole system. | 13:04 |
ZipCPU | I'm not sure I'd draw the same conclusion | 13:05 |
ZipCPU | While AutoFPGA has the capability to build much more than just the interconnect, its primarily a copy/paste program. If you don't give it more information, it won't build the other parts for you. | 13:05 |
ZipCPU | Hence, you get what you put into it. | 13:06 |
alown_cara | Fair enough. I haven't particularly tried to use it to achieve anything yet, just going by what it seemed to be. | 13:06 |
ZipCPU | If you just want an interconnect, just grab the main.v output and you will be there. | 13:06 |
ZipCPU | That's what I essentially did when working with Qsys | 13:06 |
alown_cara | Hmm. Would I correct to say that of the sample component files, the rtcdate.txt is the simplest wishbone slave component that pulls in a module? (rather than providing data implicitly as the pwrcount.txt seems to?) | 13:12 |
ZipCPU | RTCDATE is pretty simple, yes | 13:15 |
ZipCPU | There are actually four different types of module incorporation: SINGLE (where the result of any read is already known on the clock of the read itself), DOUBLE (where it takes a clock to get to the result), | 13:16 |
alown_cara | Yeah, I was about to follow up with a question about these distinctions, having seen icd.txt | 13:16 |
ZipCPU | OTHER (where the read/write may take some multiple number of clocks to complete), and MEMORY (similar to other, but impacts the linker script) | 13:16 |
alown_cara | Does OTHER imply that autofpga will not attempt to do anything intelligent to it, and simply executes the @X.INSERTs? | 13:17 |
ZipCPU | In all cases, the X.INSERT's will be applied | 13:17 |
ZipCPU | The "intelligent" stuff has to do with how the wires are then created to the interconnect | 13:18 |
ZipCPU | s/created/created and connected/ | 13:18 |
alown_cara | ala businfo.cpp's create_sio/create_dio? | 13:19 |
ZipCPU | Those would be two of the pieces | 13:21 |
ZipCPU | create_sio creates the connections for the SINGLE's, and create_dio for the DOUBLE's | 13:21 |
ZipCPU | Check out the "writeout_bus_logic" function in businfo.cpp, if you want to look into where this connection takes place. | 13:22 |
alown_cara | That explains how that bit ties together. | 13:24 |
ZipCPU | My plan to support additional bus types was to create a new bus class for each type, and have that new class include the function necessary to the task--similar to writeout_bus_logic for WB/B4/p | 13:25 |
alown_cara | Sorry, got pulled in to another discussion. | 13:39 |
alown_cara | I am intrigued by what benefits from a DOUBLE, given it can't stall? | 13:40 |
alown_cara | Is this just as a timing improved version of SINGLE (add an extra register stage)? | 13:41 |
ZipCPU | The DOUBLE and SINGLE peripherals allow me to simplify the result gathering process. | 13:43 |
ZipCPU | Not only can they not stall, but they also have very specific acknowledgement cycles. | 13:43 |
ZipCPU | This allows the return logic to be simplified--I no longer need to check for an acknowledgement for example, since I know exactly when I will see it. | 13:44 |
alown_cara | Ah, so that is the distinction with OTHER, which forces you to wait for acks as relevant? | 13:44 |
ZipCPU | Yes, exactly! | 13:45 |
alown_cara | Hmm. Maybe I should try and build the system I have in mind with this and see how far I can get... | 13:45 |
ZipCPU | I'd be glad to support you from here. | 13:46 |
alown_cara | Out of interest: how is the buserr.txt component being used? | 13:46 |
alown_cara | (It looks like AXI?) | 13:46 |
ZipCPU | It's just a peripheral used to return the address of the last bus error | 13:46 |
ZipCPU | It shouldn't look like AXI ... | 13:46 |
alown_cara | I read "AWID" and thought write id. | 13:46 |
ZipCPU | I use it within the ZipCPU so that I can tell, after a bus error, what the cause of the error was. | 13:47 |
ZipCPU | Ahhh ... I think that was short for "Address WIDth" | 13:47 |
alown_cara | Does the presence of biarbiter.txt mean that each bus is inherently single-master? | 13:49 |
alown_cara | (Or did you just want extra control over that particular bus<->bus trannsfer?) | 13:49 |
ZipCPU | Correct. Each bus has a single master, but the bi-arbiter can be used to create arbitrary interconnect topologies. | 13:50 |
ZipCPU | The biarbiter is a slave to two busses, and a master of another. | 13:50 |
alown_cara | (In this case "zip" and "wbu" -> "dwb") | 13:50 |
* alown_cara wonders if it could emit a dot graph of the resulting generated bus topology | 13:51 | |
ZipCPU | Yes, and then dwb goes through a delay to become wb | 13:51 |
ZipCPU | I'd like to, but can't (yet). Even better, I'd love to be able to edit that dot graph to create the desired bus topology. | 13:52 |
ZipCPU | I'm just not there yet. | 13:52 |
ZipCPU | I need to step away for lunch. | 13:52 |
ZipCPU | I'll be back later | 13:52 |
alown_cara | ok. Thanks. I will see you tomorrow then. | 13:54 |
-!- Netsplit *.net <-> *.split quits: shorne, flyback, alown_cara, M6HZ | 14:49 | |
-!- Netsplit over, joins: M6HZ | 14:50 | |
--- Log closed Thu Oct 25 00:00:28 2018 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!