IRC logs for #openrisc Thursday, 2016-09-22

--- Log opened Thu Sep 22 00:00:22 2016
olofk	Hoolootwo: Yeah, I've been bitten by that a few times. It's really annoying. We should fix it	02:10
Hoolootwo	at least for now, could something could be thrown in the readme?	02:11
olofk	Hoolootwo: Will do. I forgot about it as I generally don't use the xterm at all, but connect via telnet instead	02:11
Hoolootwo	ah okay, I will probably end up doing that too eventually	02:12
olofk	I found it works a bit better than xterm	02:12
olofk	The issue there is that if you boot linux, it will just stop at the prompt without any indication that it's waiting for a telnet connection	02:13
olofk	I guess that or1ksim in general could be a bit more descriptive :)	02:13
olofk	We should also make 32MB RAM default, so you don't strictly need to use a config file	02:14
olofk	You know what, I'll file a few bugs so I don't forget	02:14
olofk	ZipCPU\|Laptop: I think a potential improvement could be to store the provider info outside of the core file, as a separate file. This has crossed my mind a few times, but there are of course some drawbacks to this approach too	02:24
olofk	So for now, the general workflow I use myself is to store a .core file in the repo without a provider section. This is always up-to-date with the head of the repo	02:25
olofk	For orpsoc-cores I prefer to store only proper releases and point to a specific version, tag, commit in the provider section	02:26
olofk	And regarding MicroBlaze. I noticed the same thing a few years ago. It's a interesting architecture, and I'm not convinced it's all bad	02:27
olofk	It allows you to have a high bandwidth connection to your RAM that doesn't have to wait for slow transfers on the peripheral bus	02:28
Hoolootwo	from what I have seen for applications where you really need i/o, you do DMA on the microblaze	02:28
olofk	There are however some complications. I worked with a dual-core setup that had one part of the RAM shared, which meant it couldn't be cached. That was a bit tricky to get right, since I had to feed a segment of the peripheral bus to the RAM	02:29
olofk	Hoolootwo: Yes. If you're transferring large amounts of data	02:30
olofk	But much of the I/O transfers are just small and slow accesses, like talking to a SPI COntroller or UART	02:31
olofk	And having separate buses will avoid having the CPU wait for the bus to be free when it wants to talk to the RAM	02:31
shorne	stekern: I am looking at jonas's change to sys_rt_sigreturn, I understand that he made the change to switch the return path from normal syscall return path to the exception return path. But its not that big of a difference as the syscall return patch checkes for workpending then jumps to the exception return path if there is pending work.	03:43
shorne	So other than some restored registers its not too much different. Do you have any idea what jonas means when he said he reworked that patch?	03:44
shorne	I didnt see any in my rebase. If you don't know Ill just send him a mail	03:44
stekern	afair, that is the reworked patch, and he did get feedback from sebastian macke about it. But, I might be remembering wrong	03:45
stekern	I'm pretty sure I would have picked up the latest version if there was a more recent one though	03:46
shorne	yeah, I cant find anything in any history also the comment says "comment from the original patch"	03:46
shorne	Do you know if it was discussed on the linux kernel mailing list before?" or just on the openrisc list?	03:46
shorne	since the openrisc archive seems gone now	03:47
stekern	just the openrisc list	03:47
shorne	I see, ok might just have to shoot jonas a mail. I read through everything it seems ok	03:47
stekern	I can try to forward the messages to you, I have them in my own archive	03:48
shorne	that would be great if you can	03:48
stekern	done	03:50
shorne	Hmm, so it seems first patch still did return via syscall path, second did return via execption path	03:58
shorne	but then Sebastian says strace is really broken, and jonas says he will look again	03:58
shorne	so there might be a 3rd patch	03:58
olofk	Jonas hasn't been active for about four years, so don't get your hopes up that he did another one	04:01
shorne	yeah, I am kind of thinking that	04:01
shorne	well, I guess I have to try this patch with strace and see if it breaks	04:02
stekern	shorne: I strongly remember that there was no follow-up after that last mail, sebastian might remember if he did some more testing of it	04:08
stekern	poke53281 <- sebastian	04:08
shorne	stekern: thanks, its good to know you remember no updates. Interestingly it seems that thread is just between you Jonas and Sebastian	04:10
shorne	looks like its off mailing list (just by the forwarded headers)	04:10
stekern	that might very well be the case	04:10
shorne	wayback machine only has lists.openrisc.net till 2012	04:11
shorne	it doesnt have the mails thoug, Ill test it	04:14
shorne	https://www.mail-archive.com/linux@lists.openrisc.net/index.html#00432	04:17
shorne	I found this	04:17
shorne	good	04:17
shorne	those mails were definitely not on the list, anyway, I got work to do	04:24
shorne	(not those patches)	04:24
ZipCPU\|Laptop	olofk: I can understand Xilinx's purpose in having four busses. 1. Separate instruction and data busses avoids a bottleneck, 2. Caching a bus that can be cached is an advantage, 3. Having a wide bus speeds things up with memory--especially since the default DDR3 width is 128bits.	06:20
ZipCPU\|Laptop	What gets me is that none of these busses are truly pipelined.	06:21
ZipCPU\|Laptop	They are heavy, feature laden beasts, that are (in terms of performance) inherently slow.	06:21
wallento	about that core file provider, there is still the plan to bring up an api.librecores.org that provides all cores as .core files or others	06:24
olofk	ZipCPU\|Laptop: AXI4 is definitely pipelined	06:26
ZipCPU\|Laptop	Not the way Xilinx implemented it for their MicroBlaze.	06:28
ZipCPU\|Laptop	According to the docs, the only allow one request in flight at a time--even though the bus allows more.	06:28
ZipCPU\|Laptop	For the peripheral bus, that's one 32-bit word request.	06:28
ZipCPU\|Laptop	For the memory/cachable busses, that's one 128-bit request that may be pipelined if the bus is smaller.	06:29
olofk	ah ok	06:29
olofk	The problem with pipeline accesses is that you lose exact exceptions	06:29
ZipCPU\|Laptop	Not necessarily. I was reading through the LM32 wishbone spec yesterday, and they handled that this way:	06:30
ZipCPU\|Laptop	Every STB pulse gets either an ACK, ERR, or RTY signal in return.	06:30
ZipCPU\|Laptop	So, as long as the ERR signal doesn't come before your ACK signal (which could happen if you cross devices ...) your exceptions remain exact.	06:31
olofk	But that is without pipelining	06:32
olofk	You have to wait for the ack, err or rty to come back before sending another one	06:33
olofk	This is basically what all non-pipelined wb masters does	06:33
ZipCPU\|Laptop	Why wait? The alternative is that you are prepared to roll back several operations. You need to maintain that information anyway, in order to know which register to place the result into for a read request.	06:35
olofk	Yes. Rolling back is another option, but it's also more complex	06:35
olofk	But I doubt that lm32 implements wb4 pipelined mode	06:36
olofk	uBlaze sends a burst request to fill a cache line, when needed. It's the same thing we do with mor1kx	06:39
olofk	I don't see how pipelining would help here	06:40
olofk	(Can't believe I'm defending uBlaze after all the bad things I have said about it) :)	06:41
ZipCPU\|Laptop	So here's a question there: is the flash on the cache line, or just the DDR3 memory?	07:02
ZipCPU\|Laptop	QSPI flash can get a big benefit from pipelining.	07:03
ZipCPU\|Laptop	olofk: Regarding rollback ... for loads, I don't retire instructions until the memory operation is complete. There's nothing that needs to be rolled back as a result. Writes can be (but aren't yet) done the same way.	07:04
ZipCPU\|Laptop	Nothing truly then needs to be rolled back.	07:05
olofk	Not sure how they do QSPI Flash. It's likely not pipelined, so either they read from the peripheral bus, or they DMA to the memory	07:05
olofk	Do you have a cache?	07:05
olofk	Without a cache, pipelined accesses are definitely a benefit	07:08
shorne	olofk: any word from opencores.org?	07:13
ZipCPU\|Laptop	olofk: Currently, I have an instruction cache but no data cache. I also have no MMU, so concurrency is not a big issue for me (yet). I intend to fix/change both of these, but that hasn't happened yet.	07:18
olofk	shorne: Not from what I have heard	07:20
kc5tja	WithOUT a cache, pipelining is a benefit? Unless you're streaming data, that has been the case in my models. Software reads and writes to random locations in memory more frequently than not, so you have to incur the 70ns (for DRAM) penalty with sufficient frequency that SDRAM protocol overhead actually makes it a slower choice than asynchronous 70ns DRAM.	10:47
kc5tja	s/has been the case/has not been the case/	10:47
kc5tja	QSPI is not pipelined; it is, however, a burst transfer device.	10:48
kc5tja	It's protocol is a lot like SDRAM's protocol, only with more clock cycles.	10:49
kc5tja	You send it the read command (which includes the address) in about 6 cycles or so, then you wait some more cycles (with no transfers) while the device accesses the flash contents, and then you start streaming data back.	10:50
kc5tja	If a particular device does support some kind of pipelining, it's at most six cycles (or however many it takes to receive a command), which is likely to be but a tiny fraction of your burst length.	10:51
Hoolootwo	I can't seem to get to opencores.org from any of my various locations, is it down for anyone else?	14:44
Hoolootwo	dns seems to resolve, but no http gets through	14:44
wallento	its dead	14:59
Hoolootwo	how dead?	15:00
Hoolootwo	gone forever, or should it be back relatively soon?	15:01
mafm	it's not dead, it's pining for the fjords	15:01
* mafm wearing a Cleese t-shirt right now, conveniently		15:02
Hoolootwo	mafm++	15:03
ZipCPU\|Laptop	kc5tja: I intend to discuss the benefit of pipelining without a cache at ORCONF. Indeed, part of my presentation will show Dhrystone measures with and without pipelining.	15:30
ZipCPU\|Laptop	As for QSPI, if every access requires the six clocks for address plus two dummy clocks before data shows up, then you've just made my point.	15:31
ZipCPU\|Laptop	A first access in a group requires 8 clocks to start, then can produce one 32-bit data value every 8 clocks.	15:31
ZipCPU\|Laptop	If you string your operations together, sequentially, then you can read from the flash in 8+8N clocks.	15:32
ZipCPU\|Laptop	This is one form of "pipelining".	15:32
ZipCPU\|Laptop	As for SDRAM, the DDR3 SDRAM I'm working with will have an access time of (roughly) 9+N clocks.	15:34
ZipCPU\|Laptop	Pipelining lets you exploit the N instead of requiring that N be 1 every time.	15:34
ZipCPU\|Laptop	BTW: N is the number of 128-bit words you wish to read (or write--you just can't switch mid transfer without stalling)	15:35
ZipCPU\|Laptop	One more comment on the Dhrystone measure: that is with and without pipelining on the data channel. The instruction channel is both pipelined and cached as soon as the cache in the CPU is enabled, and hence the CPU is pipelined. (The option connects the two within the ZipCPU.)	15:37
ZipCPU\|Laptop	Indeed, I get a rough 50% improvement in my Dhrystone score by implementing pipelining ... even without a data cache.	15:37
-!- Dan__ is now known as Guest63860		15:42
-!- Guest63860 is now known as ZipCPU		15:42
kc5tja	ZipCPU\|Laptop: Nice. I wish I could attend. :(	15:50
kc5tja	re: 8+8N clocks -- that's not pipelining. That's bursting. Pipelining is when you can start transaction N+1 before transaction N completes.	15:51
kc5tja	Otherwise, RS-232 communications is highly pipelined transmission of data. ;-)	15:52
ZipCPU	Perhaps I'm not defining pipelining the same way. Hmm ... here's my definition: the controller never relinquishes control of the bus and would, barring stalls from the WB slave, issue one request per clock and then wait for one ack per request.	15:56
ZipCPU	With that definition, RS-232 is not "pipelined" unless the wishbone master holds onto the bus and the RS-232 device stalls everything while waiting for its next byte.	15:56
ZipCPU	Similarly, a more traditional RS-232 device would maintain a FIFO buffer, which could easily be set up for M+N clocks, where M is any bus propagation time and N is the number of transfers necessary to either fill the buffer or finish the message.	15:57
kc5tja	Clocking doesn't really enter into any definition I've seen (only used as examples).	16:02
kc5tja	For example, a CPU with a pipeline can still take 5 cycles to execute an instruction, but if it has a 5-deep pipeline, it can "appear" to have an instruction latency of 1 cycle.	16:02
kc5tja	That's because it's busy processing 5 instructions at any given time (bubbles notwithstanding).	16:02
kc5tja	But, a pipeline doesn't always have to be clock synchronous. The 80386 actually had a limited pipeline which allowed up to three instructions to execute at once, but the minimum latency was 2 cycles.	16:03
kc5tja	Another example more relevant to Flash SQPI devices is the RapidIO interconnect.	16:05
kc5tja	In as little as four clocks (but can be more depending on the kind of packet and how wide your interconnect is), you can kick off any bus transaction you like.	16:05
kc5tja	In a couple of other clocks, you'll get back an acknowledgement that the packet was received by the network.	16:06
kc5tja	However, that doesn't mean your transaction has completed. It just means that the interconnect is now free for another if you support that.	16:06
kc5tja	The receipt of the "ok I'm done" for your transaction might come hundreds of clocks later. In the meantime, you can "queue up" a number of other transactions, some of which might even complete out of order (!!).	16:07
ZipCPU	I still think "pipeline" is appropriate. This for two reasons: 1) the WB spec calls this type of access "pipelined", and 2) the bus (not necessarily the peripheral) is acting in a pipelined fashion -- even by your definition.	16:07
kc5tja	However, what you _cannot_ do is break up an individual transaction's burst of data.	16:07
kc5tja	I cannot agree with that definition.	16:08
ZipCPU	Consider a bus with multiple stages within it. If there's one request in each stage, you then have a pipeline.	16:09
kc5tja	(1) WB treats bus transactions as single-beat things, even in a pipelined implementation. The "pipeline" depth in the controller _must_ match the interconnect's register depth (plus the pipeline depth in the peripheral), or it will fall out of synchronization.	16:09
ZipCPU	If the peripheral at the far end only accepts one access every x clocks, that doesn't negate the fact that the bus itself was pipelind.	16:10
kc5tja	Yes, because the other x-1 clocks are impossible to use for a transaction.	16:10
ZipCPU	Now wait a second here ... if a CPU has five pipeline stages, and whenever you perform a multiply the multiply stage takes 8 clocks, that doesn't mean the CPU isn't pipelined.	16:12
kc5tja	That's not what I said.	16:12
kc5tja	What makes it pipelined is the fact that you can have up to 5 instructions in flight at the same time. Key word: SAME time.	16:13
ZipCPU	Then ... what have I missed? You offered a CPU as an example of what defined a pipeline, and I'm pointing out that any pipeline can stall.	16:13
ZipCPU	Ok, but I can still have five bus transactions in flight at the SAME time, even if the peripheral at the end stalls the bus.	16:13
kc5tja	Simply stuffing a stream of data down a pipe doesn't make it pipelined. What makes it pipelined is the _capability_ for that pipe to have _multiple_ and _independent_ transactions in flight at once.	16:14
kc5tja	For example, if your CPU can request to read the next instruction from program space _before_ the currently executing instruction completes a data fetch from data space, then you have a pipelined bus.	16:15
kc5tja	This is why I say flash QPI devices are bursted, not pipelined. You cannot start read #2 until read #1 has completely finished.	16:16
ZipCPU	So ... if a CPU issues a write command to address 0, 1, 2, and 3, before being stalled by the peripheral that needs to wait 14 clocks before the first request completes, and 8 clocks for every request thereafter ... that's not a pipelin?	16:16
ZipCPU	And then once that first request completes, the CPU issues a command to write address 4 -- even before 1, 2, and 3 have completed ... that's not a pipeline?	16:17
kc5tja	Nope. That's just burst-mode with lots of wait-states.	16:17
kc5tja	Wait, you just said that write to 3 stalls the CPU.	16:17
ZipCPU	Yes. The 3rd write stalled the CPU, not the first two.	16:18
kc5tja	I need to see a timing diagram, because this is too confusing to disentangle on IRC alone.	16:18
ZipCPU	Do you have WB B4 spec available to you?	16:19
kc5tja	Yes. It's on my desktop.	16:19
ZipCPU	Okay, let's compare illustration 3-10 on page 49 with ...	16:20
ZipCPU	3-11 on page 51.	16:20
ZipCPU	3-11 is what I'm calling "pipelined"	16:20
kc5tja	OK, that is pipelined by virtue of the fact that the address bus, WE, and other control signals changes value every (non-stalled) cycle. That is to say, EVERY cycle is potentially a unique read or write transaction.	16:23
ZipCPU	Yes!	16:23
kc5tja	What raised my objection is (let me type it out)	16:23
kc5tja	Flash SQPI devices cannot support that mode of operation. At all.	16:24
kc5tja	What they DO have, is a set of clock cycles where you send an address and a WE bit,	16:24
kc5tja	followed by some access time latency,	16:24
kc5tja	followed by one or more cycles of contiguously addressed data.	16:24
ZipCPU	(Let me know when you are done ...)	16:25
kc5tja	(heh, sorry, had an interruption at the door)	16:27
kc5tja	But, all the while, it's one transaction.	16:27
kc5tja	These 23 clocks (or whatever) all correspond to a _single_ WB bus cycle.	16:27
kc5tja	I guess the spec calls them block transactions instead of burst transactions.	16:28
kc5tja	Still got Motorola terminology in my brain.	16:28
kc5tja	Does that make sense?	16:28
ZipCPU	I think so ... perhaps our confusion is in the difference between the device itself and the controller.	16:29
kc5tja	Now, can you USE pipelining for this operation? Absolutely. And I honestly would probably prefer it over block transactions because it seems to give more control over timing.	16:29
kc5tja	Might be. Different things have different pipelines, which is semantically equivalent to different clock domains.	16:30
kc5tja	Either way, no matter which terminology you use, I only ask that you be consistent with it. :)	16:30
ZipCPU	So, the multiple QSPI controllers I've written have all been both internally pipelined (especially this last one), and used the pipeline bus mode.	16:31
ZipCPU	I can understand why you might say, though, that the interface itself is not pipelined.	16:31
* kc5tja nods		16:32
shorne	wallento: FYI, I am building musl with a host gcc version of 6.11. It seems that gcc-6 cannot build gcc-5 due to this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69959	17:36
-!- Netsplit .net <-> .split quits: hammond, simoncoo1, andrzejr, eliask, ssvb, jeremybennett, wallento, Hoolootwo, SMDwrk, nurelin, (+7 more, use /NETSPLIT to show all of them)		17:37
shorne	wallento ... just got split	17:37
shorne	great	17:37
-!- Netsplit over, joins: eliask, simoncoo1, kc5tja, rokka		17:38
shorne	It seems we need to bump up to gcc 5.4.0 (which I did here) https://github.com/stffrdhrn/or1k-gcc/tree/musl-5.4.0	17:39
shorne	I just did a merge from gcc 5.4.0 release into or1k to create or1k-5.4.0, and then rebase the musl-5.3.0 on or1k-5.4.0, to create musl-5.4.0	17:41
shorne	wallento: repeat I did the bump here gcc 5.4.0 (which I did here) https://github.com/stffrdhrn/or1k-gcc/tree/musl-5.4.0	17:41
shorne	it was very smooth	17:41
shorne	no conflicts	17:41
shorne	now my musl build is running again	17:42
shorne	Hoolootwo: what do you need from opencores? openrisc related data here: http://openrisc.io/, opencores repo here http://freecores.github.io/	18:46
Hoolootwo	shorne, I was just looking at the topic, and I keep finding broken opencores links on google about openrisc stuff	23:09
--- Log closed Fri Sep 23 00:00:24 2016

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!