IRC logs for #openrisc Wednesday, 2018-10-24

--- Log opened Wed Oct 24 00:00:26 2018
-!- flyback is now known as scarface-		01:00
-!- scarface- is now known as flyback		01:00
alown_cara	ZipCPU: Thanks for following that up with olofk. Unfortunately the wb_intercon appears to lack full support for tags. (Use of LGPL3 also makes things more complex than (say) MIT/BSD/etc.)	06:51
alown_cara	ZipCPU: Looks like I will get to make my own (buggy) crossbar generation script.	06:52
ZipCPU	alown_cara: What are you trying to do?	06:52
ZipCPU	I have my own cross bar generation program. Might it help you at all?	06:52
alown_cara	Plausibly, though the thing you can probably have guessed from the license questions is that I am currently wearing a "company" hat.	06:53
ZipCPU	Sure	06:53
ZipCPU	I have a program I use that I call AutoFPGA	06:53
* alown_cara looks on github		06:54
ZipCPU	Currently, it builds Wishbone B4/pipeline crossbas	06:54
ZipCPU	https://github.com/ZipCPU/autofpga	06:54
ZipCPU	The code it produces may be licensed as you wish. AutoFPGA asserts no copyright restrictions on that code, similar to GCC	06:54
alown_cara	Interesting.	06:55
ZipCPU	I've told myself often that it could be used for non WB/B4/pipeline crossbars, but I have yet to implement that capability into it	06:55
alown_cara	Can I ask which toolchains you have run this though so far?	06:55
ZipCPU	I've used the AutoFPGA output on Vivado, Quartus, Yosys, and Verilator toolchains	06:56
alown_cara	Hmm. For Quartus I would struggle to shift the consensus from Qsys... But, I am currently looking for a good interconnect for a Lattice based design.	06:57
ZipCPU	I have used this with an iCE40 design, https://github.com/ZipCPU/icozip	06:58
ZipCPU	I've also used it with https://github.com/ZipCPU/tinyzip, just ... the pre-production hardware I have doesn't seem to be supported anymore, so I've never loaded this design and proven that it works	06:59
ZipCPU	I have used this approach with Qsys as well.	06:59
alown_cara	My personal hat is very interested, I need to read a bit more and check it suits the company requirements though.	07:00
ZipCPU	The biggest problem you might expect would be WB/B3(classic) vs WB/b4/pipeline	07:00
ZipCPU	I should also ask ... what "tags" are you hoping to support?	07:01
alown_cara	The underlying device hardware somewhat dictates that at least parts must be built to interface classic.	07:01
ZipCPU	What class of underlying hardware are you using? SDRAM?	07:02
alown_cara	Mach XO3L(F) EFB	07:02
ZipCPU	Not familiar with that one	07:03
* ZipCPU googles		07:03
alown_cara	A block of silicon embedded in the device for doing SPI/I2C/internal-NVRAM/timer/etc. with an internal wishbone classic slave interface	07:04
ZipCPU	Looking at it now. Looks like AutoFPGA would do nicely with it	07:05
alown_cara	Tag-wise, I was contemplating whether a new type of addressing-tag was going to be the easiest way to adapt existing IP built with bursting sizes declared during burst setup phase (Avalon-MM) to WB.	07:06
ZipCPU	Ahh ... do you have an Avalon-MM capability with burst mode support?	07:07
ZipCPU	I built an Avalon->WB bridge, just didn't add burst mode to it	07:07
ZipCPU	Wasn't that hard to do. Burst mode might be more difficult though.	07:07
alown_cara	I have a lot of IP that uses bursts (often a different max burst size per component)	07:07
alown_cara	WB(4) bursts appear to have quite a different nature.	07:08
ZipCPU	I don't use the burst mode in WB(4), and I don't think my code is any less effective as a result. The same would not be true of WB3	07:08
alown_cara	True.	07:08
ZipCPU	Using WB4/p, you can just issue one bus transaction after another to issue the whole burst, without actually issuing a burst	07:09
ZipCPU	You get all the performance, but with none of the complexity	07:09
alown_cara	(Also, the autofpga readme appears to note that "no support is provided for WB B3 [..] yet"	07:09
ZipCPU	This is true	07:09
ZipCPU	The Mach component appears to be B4 though, doesn't it?	07:09
* ZipCPU checks again		07:09
alown_cara	However, lots of underyling things need some set-up time before they are able to service (say) a read burst, providing the burst size info in the first cycle helps a lot with this.	07:10
ZipCPU	So, here's how I've gotten around that:	07:10
ZipCPU	1) I assume any transaction may be either singular, or linear in addressing	07:10
ZipCPU	2) Any transaction may begin a burst, of an unknown length	07:11
alown_cara	The Mach reference guide refers to its WB implementation as "classic".	07:11
ZipCPU	3) Within a burst, addresses are constant or incrementing	07:11
ZipCPU	alown_cara: The WB/B4 spec discusses how to bridge from classic to pipeline and back again	07:11
alown_cara	Also true.	07:11
alown_cara	Maybe, I am missing something, but I don't think (1)-(3) helps when there is an extensive latency to performing these operations, but high bandwidth.	07:12
ZipCPU	I think they do, but let me try to explain	07:12
ZipCPU	Let me ask one question first: none of these peripherals appear to be high bandwidth peripherals: I2C, SPI, counter, etc	07:13
ZipCPU	Why are you interested in high performance bursting? That doesn't seem to make any sense.	07:13
ZipCPU	On top of that, the EFB I/O doesn't support bursting either	07:15
alown_cara	It does, but I haven't provided enough context on the constraints, and the supporting systems to explain why.	07:15
ZipCPU	Well, okay, let me return ton (1)-(3)	07:15
alown_cara	Sure.	07:15
ZipCPU	If there is an extensive latency to perform the operations, then the first operation sets up the transfer, and (at least with wb/p) the second one waits at the peripheral (not the master)	07:16
ZipCPU	As a result, only the first item suffers from any latency, the rest go immediately to the peripheral when it is ready	07:16
ZipCPU	I think I've written about this extensively on zipcpu.com	07:17
ZipCPU	Ahh, I have a good slide for you. Interested in comparing two bus interaction charts?	07:17
alown_cara	I was just pulling up the spec for the timing diagrams again.	07:17
ZipCPU	Check out slide 26 (internal marking) of https://github.com/ZipCPU/zipcpu/blob/master/doc/orconf.pdf	07:19
ZipCPU	Now compare that with slide 27 (the next one, also based upon internal marking)	07:19
ZipCPU	That should show you the performance you can expect when using the pipelined mode	07:20
ZipCPU	The problem with WB classic is that the bus master has to wait for the slave to respond before issuing a second request	07:20
ZipCPU	WB pipeline changes this so that the master only has to wait until the interconnect accepts the request before sending an additional one	07:21
alown_cara	This would help in theory but only if wb/p allows this overlap to be extended to N outstanding requests.	07:21
ZipCPU	It may be extended arbitrarily	07:21
ZipCPU	I personally limit the extensions within the code I formally verify, to help the formal verification, but the spec creates no limit on the length of the transaction when done in this fashion	07:22
ZipCPU	The longest burst I've done (so far) has been 1024 transactions using this approach. (That was my DMA engine that I used for that purpose)	07:23
ZipCPU	Can I interest you in an article setting up formal wishbone properties, and comparing WB to AXI and Avalon? http://zipcpu.com/zipcpu/2017/11/07/wb-formal.html	07:25
* alown_cara is surprised I don't recall this article, as I definitely recall reading others there linked from HN		07:26
ZipCPU	Not all of my articles have been cross posted to HN	07:27
ZipCPU	Indeed, I think only about 10 or so have	07:27
* ZipCPU goes to count		07:27
ZipCPU	Ok, only 14 have been cross posted to HN	07:28
alown_cara	Some more of the context: WB bus interaction that occurs over a high latency link (which is also relatively bandwidth starved) operates in a packetized manner.	07:28
ZipCPU	Go on	07:29
alown_cara	So, this extended approach would require the requesting packetizer to accept all the addressing cycles to count the number in, before it could issue the packet to the other side.	07:29
alown_cara	(Which would need to do the same thing to the responses)	07:29
ZipCPU	Not sure I followed. Can you explain?	07:29
ZipCPU	Ahh, nvm	07:30
ZipCPU	Got it	07:30
ZipCPU	Go on	07:30
alown_cara	As such, a tag indicating the number of cycles in the burst that is present during the first cycle, would allow this packet to be issued on the first cycle	07:30
alown_cara	allowing a certain amount of overlap	07:30
ZipCPU	If the link is bandwidth starved, then the only benefit you would get would be from reading, right?	07:30
alown_cara	yep.	07:30
ZipCPU	How bandwidth starved? Are you coming from a serial port perhaps?	07:31
alown_cara	No, it is a popular packet-based protocol, just that most of the bandwidth is reserved for other purposes.	07:31
ZipCPU	Network packet?	07:31
ZipCPU	It sounds like what you need/want is just an Avalon-MM -> WB/classic bridge, right? Do you have other peripherals you need to access as well while you are at it?	07:32
alown_cara	Of course.	07:33
ZipCPU	Heheh	07:33
alown_cara	(to the later part)	07:33
ZipCPU	Do you have a strong need to reconfigure often? In other words, would it make more sense to build the interconnect by hand?	07:33
alown_cara	AVMM is a huge spec though, so an actually fully compliant bridge would be quite a project of itself.	07:33
ZipCPU	I have an AVMM->WB(B4/p) bridge, but I understand what you mean--it's not "fully compliant". However, it has been good enough for me.	07:34
alown_cara	I was hoping to avoid needing to do the hand-crafted bit during the initial R&D, but given the device is also rapidly running out of available resources due to being a bit small...	07:34
ZipCPU	Yes, there is that. iCE40 hx8k?	07:34
alown_cara	(The big two vendors can't even implement AVMM<->AXI3/4 fully, so...)	07:34
ZipCPU	wb_interconn isn't known for a low-resource connection IIRC	07:35
alown_cara	I'm not using wb_interconn.	07:35
ZipCPU	AutoFPGA will do low resource decoding, but it has some other difficulties you've just mentioned	07:35
ZipCPU	(I know, but you were considering it)	07:35
alown_cara	True.	07:35
ZipCPU	That suggests something handcrafted might be ideal	07:36
ZipCPU	I do have a blog article discussing a hand crafted interconnect	07:36
* alown_cara is wondering if he could get approval to extend autofpga as necessary and upstream generic stuff.		07:36
ZipCPU	It's not really that hard, but it does get REALLY annoying when you start to need to make changes	07:36
alown_cara	Given the staticness of the requirements on this project, I would be encountering that a lot.	07:36
ZipCPU	I do have some commercial work I'll be needing to do soon as well. My goal with that work would be to support a full AXI4->WBp bridge, including all of AXI4's burst modes as well.	07:38
alown_cara	The optimal resource-wise result would probably be some weird mixture of shared-bus and crossbar for different master<->slave combinations.	07:38
ZipCPU	Sure	07:39
alown_cara	Last task I did, I started with AVMM, migrated bits to AXI4, then re-migrated bits to AXI3 for transaction locking support.	07:39
ZipCPU	Do you use AV a lot?	07:39
alown_cara	As a (mostly) Intel/Altera shop, the answer would be yes (for better and worse).	07:40
* ZipCPU just might have a set of formal properties for AVMM -- they just don't support burst mode (yet)		07:40
* alown_cara simply has a great time writing simulation code to test the obvious bits, then makes it softwares problem to find the bugs :p		07:41
ZipCPU	My problem is that one mistake can lock up the hardware hard. You can read about my "one mistake" here if you are interested. http://zipcpu.com/blog/2018/02/09/first-cyclonev.html	07:42
alown_cara	Heheh, when you say the ARM was issuing these out-of-order, I presume you mean that whatever system was in charge of maintaining coherency loading in to cache was unaware of your target locations had strict requirements?	07:48
* alown_cara is intrigued by the comments on use of formal verifications, as has been leaning strongly into SVA testing approaches at the moment		07:48
ZipCPU	Pretty much	07:48
ZipCPU	The FIFO required items to be read in order, and it ignored the address	07:49
ZipCPU	The ARM tried to load addresses starting on 8-word boundaries, then came back and filled in the gaps	07:50
alown_cara	My ARM is a little rusty, but I thought most arm-based SoCs had to expose special ports if they are meant to be coherent, as the exact details of the various levels of cache (if they exist) are left to the SoC implementor.	07:51
alown_cara	(thinking about what happens in Zynq and on Tegra chips)	07:51
ZipCPU	My knowledge of internal ARM details is essentially non-existent---other than the scars from the fails I've suffered through. :D	07:52
alown_cara	Anyway, I should go and do some other bugfixing for now, and will have a play with autofpga later today.	07:55
alown_cara	Thanks for all the help.	07:56
ZipCPU	Feel free to write as you have the need	07:56
ZipCPU	My pleasure!	07:56
alown_cara	Sure. I will idle around here for a bit then. (I might make my personal hat join in too).	07:57
alown_cara	ZipCPU: I have had a bit of time to look over autofpga, and whilst I think it would definitely work, it doesn't quite seem right (it solves a slightly different problem).	13:03
alown_cara	ZipCPU: All I was really looking to see if it already existed, was a tool to take a description of master/slave ports and build a piece of WB interconnect to join it together, autofpga focuses on the higher level problem of building and maintaining the whole system.	13:04
ZipCPU	I'm not sure I'd draw the same conclusion	13:05
ZipCPU	While AutoFPGA has the capability to build much more than just the interconnect, its primarily a copy/paste program. If you don't give it more information, it won't build the other parts for you.	13:05
ZipCPU	Hence, you get what you put into it.	13:06
alown_cara	Fair enough. I haven't particularly tried to use it to achieve anything yet, just going by what it seemed to be.	13:06
ZipCPU	If you just want an interconnect, just grab the main.v output and you will be there.	13:06
ZipCPU	That's what I essentially did when working with Qsys	13:06
alown_cara	Hmm. Would I correct to say that of the sample component files, the rtcdate.txt is the simplest wishbone slave component that pulls in a module? (rather than providing data implicitly as the pwrcount.txt seems to?)	13:12
ZipCPU	RTCDATE is pretty simple, yes	13:15
ZipCPU	There are actually four different types of module incorporation: SINGLE (where the result of any read is already known on the clock of the read itself), DOUBLE (where it takes a clock to get to the result),	13:16
alown_cara	Yeah, I was about to follow up with a question about these distinctions, having seen icd.txt	13:16
ZipCPU	OTHER (where the read/write may take some multiple number of clocks to complete), and MEMORY (similar to other, but impacts the linker script)	13:16
alown_cara	Does OTHER imply that autofpga will not attempt to do anything intelligent to it, and simply executes the @X.INSERTs?	13:17
ZipCPU	In all cases, the X.INSERT's will be applied	13:17
ZipCPU	The "intelligent" stuff has to do with how the wires are then created to the interconnect	13:18
ZipCPU	s/created/created and connected/	13:18
alown_cara	ala businfo.cpp's create_sio/create_dio?	13:19
ZipCPU	Those would be two of the pieces	13:21
ZipCPU	create_sio creates the connections for the SINGLE's, and create_dio for the DOUBLE's	13:21
ZipCPU	Check out the "writeout_bus_logic" function in businfo.cpp, if you want to look into where this connection takes place.	13:22
alown_cara	That explains how that bit ties together.	13:24
ZipCPU	My plan to support additional bus types was to create a new bus class for each type, and have that new class include the function necessary to the task--similar to writeout_bus_logic for WB/B4/p	13:25
alown_cara	Sorry, got pulled in to another discussion.	13:39
alown_cara	I am intrigued by what benefits from a DOUBLE, given it can't stall?	13:40
alown_cara	Is this just as a timing improved version of SINGLE (add an extra register stage)?	13:41
ZipCPU	The DOUBLE and SINGLE peripherals allow me to simplify the result gathering process.	13:43
ZipCPU	Not only can they not stall, but they also have very specific acknowledgement cycles.	13:43
ZipCPU	This allows the return logic to be simplified--I no longer need to check for an acknowledgement for example, since I know exactly when I will see it.	13:44
alown_cara	Ah, so that is the distinction with OTHER, which forces you to wait for acks as relevant?	13:44
ZipCPU	Yes, exactly!	13:45
alown_cara	Hmm. Maybe I should try and build the system I have in mind with this and see how far I can get...	13:45
ZipCPU	I'd be glad to support you from here.	13:46
alown_cara	Out of interest: how is the buserr.txt component being used?	13:46
alown_cara	(It looks like AXI?)	13:46
ZipCPU	It's just a peripheral used to return the address of the last bus error	13:46
ZipCPU	It shouldn't look like AXI ...	13:46
alown_cara	I read "AWID" and thought write id.	13:46
ZipCPU	I use it within the ZipCPU so that I can tell, after a bus error, what the cause of the error was.	13:47
ZipCPU	Ahhh ... I think that was short for "Address WIDth"	13:47
alown_cara	Does the presence of biarbiter.txt mean that each bus is inherently single-master?	13:49
alown_cara	(Or did you just want extra control over that particular bus<->bus trannsfer?)	13:49
ZipCPU	Correct. Each bus has a single master, but the bi-arbiter can be used to create arbitrary interconnect topologies.	13:50
ZipCPU	The biarbiter is a slave to two busses, and a master of another.	13:50
alown_cara	(In this case "zip" and "wbu" -> "dwb")	13:50
* alown_cara wonders if it could emit a dot graph of the resulting generated bus topology		13:51
ZipCPU	Yes, and then dwb goes through a delay to become wb	13:51
ZipCPU	I'd like to, but can't (yet). Even better, I'd love to be able to edit that dot graph to create the desired bus topology.	13:52
ZipCPU	I'm just not there yet.	13:52
ZipCPU	I need to step away for lunch.	13:52
ZipCPU	I'll be back later	13:52
alown_cara	ok. Thanks. I will see you tomorrow then.	13:54
-!- Netsplit .net <-> .split quits: shorne, flyback, alown_cara, M6HZ		14:49
-!- Netsplit over, joins: M6HZ		14:50
--- Log closed Thu Oct 25 00:00:28 2018

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!