IRC logs for #openrisc Monday, 2016-12-05

--- Log opened Mon Dec 05 00:00:13 2016
bandvigshorne: I’m not sure what kind of test tests you are talking about. Let me describe what I did.04:23
bandvigI implemented combined 32- and 64- bits FPU in MAROCCHINO pipe (https://github.com/openrisc/mor1kx/tree/marocchino_devel) and rebuild Atlys SoC with the core.04:23
bandvig I also rebuild NewLIB tool chain with modified binutils.04:23
bandvigWith this tool chain I rebuild SoftFloat test tool and verified 32- and 64- bits FPU instruction execution.04:23
bandvigThe computational results were right.04:24
bandvigThere are differences in flags generation, especially NaNs. However, I suspect that SoftLoad library doesn’t follow IEEE standard accurately.04:24
bandvigSo, additional investigations are needed here.04:24
bandvigAt last I compiled Whets Stone benchmarks in both 32- and 64- bits variants04:24
bandvig(in terms of computed digits, but, of course different MIPS, MOPS and so on) similar to both of the variants and to results I found in Internet for  other machines.04:25
bandvigShould I make some additional tests?04:25
bandvigCorrection: ... and 64- bits variants and got results  (in terms ...04:27
shornebandvig: no I think that is fine.  I think just having this text in the commit text would be good.04:45
shorneDo we need to specify any flags to get gcc to output the floating point instructions?  I am a bit worried if this is included by default code will break if fpu is not available04:46
bandvigshorne: Yes, we need. By default GCC doesn’t emit FP-instructions, it uses software routines instead.05:35
bandvigTo force emitting single precision FP-instructions the option –mhard-float should be used.05:35
bandvigTo emit (single + double) we have to use two options simultaneously: –mhard-float –mdouble-float.05:35
bandvigThe single –mdouble-float is forbidden if I remember correctly.05:35
bandvigBy the way, to use single and double precision FP- instructions for mathematics functions (i.e. sin, cos, log, etc)05:47
bandvigwe need provide the –mhard-float –mdouble-float options to GCC on stage “NewLIB” (http://openrisc.io/newlib/building.html) while building NewLIB based tool chain.05:48
bandvigOn my PC I follow olofk’s advice and edit NewLIB’s config.host file in appropriate manner.05:48
bandvigAn alternative way is multi- lib configuration to extend currently implemented delay-slot/no-delay-slot/compatible-delay-slot.05:55
bandvigHowever, I haven’t tried to do this because multi- lib configuration consumes much more time for compilation.05:55
shornebandvig: understood, I think this is good to have, and thanks for the good work.  We should merge, but I would want to test first. If someone else wants to help I am fine with it.  But we should work on pushing these changes as well to fsf upstream.06:39
shorneafter gdb I guess06:39
shornepushed some more clean to https://github.com/stffrdhrn/binutils-gdb/tree/or1k-upstream09:42
shorneits just brutal what we have to do to the doxygen comments09:42
-!- jonmasters_ is now known as jonmasters13:54
-!- Netsplit *.net <-> *.split quits: rokka14:28
-!- Netsplit over, joins: rokka14:30
kc5tjaBeen studying AXI(4) lately.  Came to the conclusion that it's pretty much what you'd arrive at if you split Wishbone transactions by read/write direction, and by address/data phase.16:18
kc5tjaBut then I realized, STALL_I is kind of like A(rw)READY, and ACK_I is kind of like (rw)VALID.16:18
kc5tjaThere's no equivalent to write response channel that I can see.16:19
kc5tjaSo, to get concurrent read/write throughput like you can on AXI, use two Wishbone "channels", one for input, one for output, and make both support B4 pipelining.  The interconnect then properly mux the traffic on these channels as it sees fit.  And I never have to worry about a transaction ID in the process, since that's an implementation detail of the interconnect.16:22
kc5tjaI'm pretty convinced Wishbone is many years ahead of its time, and that it still has surprises in store for the future.16:23
ZipCPU|Laptop^^ +116:38
ZipCPU|LaptopOne thing AXI has that wishbone doesn't is flow control on the return channel.16:38
ZipCPU|LaptopWell ... that and more configuration options than the Cadillac salesman will offer you.16:39
kc5tjaTrue; the lack of a LAST signal is annoying, but that can at least be included in a tag bus.16:48
kc5tjaOr are you thinking about something else?16:48
ZipCPU|LaptopNo, I was thinking more about the fact that even the return busses have VALID and READY signals.16:55
kc5tjaAhh, yeah, good point.16:57
kc5tjaWhile not _official_, it seems like an easy alteration/enhancement to the basic specification to add a STALL_O signal too.16:59
ZipCPU|LaptopSure, you could, but I really don't think it's necessary.16:59
ZipCPU|LaptopRemember: I'm of the simple is better mind.  Dump the tag lines.  ;)16:59
kc5tjaI don't either.  But in cases where it's required or helpful...17:00
kc5tjaThe only case where I see AXI being a fundamentally better bus, I think, is when you have, say, 5 independently running cores (say, a C-slowed/retimed design with 5 contexts), and a transfer on context 3 completes before a transfer from context 2.17:01
kc5tjaBut, then again, AXI4 disallows out of order transfer completion.17:01
kc5tjaI don't think the tag lines are required.17:02
ZipCPU|LaptopHmm ... if you want out of order completion, you will need to tag every bus request with an ID.17:19
ZipCPU|LaptopPersonally, I think the bus should be optimized for sequential access, rather than out of order access.  It's  just a whole lot easier to optimize hardware for that purpose.17:20
kc5tjaIf you have an arbiter coupling three CPUs to a single interconnect port, from the POV of that port, you're going to have three independently operating transactions in flight, and not all of them will complete in order.17:26
kc5tjaSo from the POV of a particular master, I agree -- build for sequential access.17:27
kc5tjaBut the _interconnect_ has to be able to deal with out of order transactions.17:27
ZipCPU|LaptopYeah, I hear you, but ... from my own experience, one transaction will completely tie up a slaves capability.  The slave isn't likely to be able to handle more than one transaction at a time, but can be optimized for sequential accesses.17:33
ZipCPU|LaptopThe opportunity for multiple masters is limited to multiple masters attempting to contact separate and individual slaves.17:33
ZipCPU|LaptopIn that case, a smart interconnect that can connect master A to slave X, master B to slave Z, master C to slave Q, etc.17:34
ZipCPU|LaptopEach slave can then be optimized for sequential access, and out of order access just happens at ... whatever order/rate it happens at.  It's now independent of the bus.17:35
kc5tjaRight; just about the only way to do that is a crossbar switch, which is easy enough to build.17:55
kc5tjaBut switches take up space in a design.  Sometimes a ring would be better.17:55
ZipCPU|LaptopAgreed with everything up to the "ring".  Ahm ... how would a ring work? Do you have a good reference for that?18:02
ZipCPU|LaptopI guess I don't have any intuition for how a ring shaped bus might be made with wishbone.18:02
ZipCPU|LaptopWould masters and slaves be equal partners on the ring, or would the ring just be the implementation of the interconnect itself?18:02
kc5tjaRing would be the interconnect itself.  There'd be registers along each hop as well, allowing the ring to span the die without incurring too much latency.18:04
kc5tjaAddresses flow clockwise, data counter-clockwise.  (Although, I don't think it matters except for performance reasons.)18:05
kc5tjaRings are often used for NoCs.  Jan's interconnect is a 2D mesh composed of orthogonal rings, for example.18:05
kc5tjaZipCPU|Laptop: it's just occurred to me that you can use Wishbone in "pipeline" configuration (as distinct from mode; how confusing is that?) to implement AXI-like channels.23:36
--- Log closed Tue Dec 06 00:00:15 2016

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!