--- Log opened Mon Dec 05 00:00:13 2016 | ||
bandvig | shorne: I’m not sure what kind of test tests you are talking about. Let me describe what I did. | 04:23 |
---|---|---|
bandvig | I implemented combined 32- and 64- bits FPU in MAROCCHINO pipe (https://github.com/openrisc/mor1kx/tree/marocchino_devel) and rebuild Atlys SoC with the core. | 04:23 |
bandvig | I also rebuild NewLIB tool chain with modified binutils. | 04:23 |
bandvig | With this tool chain I rebuild SoftFloat test tool and verified 32- and 64- bits FPU instruction execution. | 04:23 |
bandvig | The computational results were right. | 04:24 |
bandvig | There are differences in flags generation, especially NaNs. However, I suspect that SoftLoad library doesn’t follow IEEE standard accurately. | 04:24 |
bandvig | So, additional investigations are needed here. | 04:24 |
bandvig | At last I compiled Whets Stone benchmarks in both 32- and 64- bits variants | 04:24 |
bandvig | (in terms of computed digits, but, of course different MIPS, MOPS and so on) similar to both of the variants and to results I found in Internet for other machines. | 04:25 |
bandvig | Should I make some additional tests? | 04:25 |
bandvig | Correction: ... and 64- bits variants and got results (in terms ... | 04:27 |
shorne | bandvig: no I think that is fine. I think just having this text in the commit text would be good. | 04:45 |
shorne | Do we need to specify any flags to get gcc to output the floating point instructions? I am a bit worried if this is included by default code will break if fpu is not available | 04:46 |
bandvig | shorne: Yes, we need. By default GCC doesn’t emit FP-instructions, it uses software routines instead. | 05:35 |
bandvig | To force emitting single precision FP-instructions the option –mhard-float should be used. | 05:35 |
bandvig | To emit (single + double) we have to use two options simultaneously: –mhard-float –mdouble-float. | 05:35 |
bandvig | The single –mdouble-float is forbidden if I remember correctly. | 05:35 |
bandvig | By the way, to use single and double precision FP- instructions for mathematics functions (i.e. sin, cos, log, etc) | 05:47 |
bandvig | we need provide the –mhard-float –mdouble-float options to GCC on stage “NewLIB” (http://openrisc.io/newlib/building.html) while building NewLIB based tool chain. | 05:48 |
bandvig | On my PC I follow olofk’s advice and edit NewLIB’s config.host file in appropriate manner. | 05:48 |
bandvig | An alternative way is multi- lib configuration to extend currently implemented delay-slot/no-delay-slot/compatible-delay-slot. | 05:55 |
bandvig | However, I haven’t tried to do this because multi- lib configuration consumes much more time for compilation. | 05:55 |
shorne | bandvig: understood, I think this is good to have, and thanks for the good work. We should merge, but I would want to test first. If someone else wants to help I am fine with it. But we should work on pushing these changes as well to fsf upstream. | 06:39 |
shorne | after gdb I guess | 06:39 |
shorne | pushed some more clean to https://github.com/stffrdhrn/binutils-gdb/tree/or1k-upstream | 09:42 |
shorne | its just brutal what we have to do to the doxygen comments | 09:42 |
-!- jonmasters_ is now known as jonmasters | 13:54 | |
-!- Netsplit *.net <-> *.split quits: rokka | 14:28 | |
-!- Netsplit over, joins: rokka | 14:30 | |
kc5tja | Been studying AXI(4) lately. Came to the conclusion that it's pretty much what you'd arrive at if you split Wishbone transactions by read/write direction, and by address/data phase. | 16:18 |
kc5tja | But then I realized, STALL_I is kind of like A(rw)READY, and ACK_I is kind of like (rw)VALID. | 16:18 |
kc5tja | There's no equivalent to write response channel that I can see. | 16:19 |
kc5tja | So, to get concurrent read/write throughput like you can on AXI, use two Wishbone "channels", one for input, one for output, and make both support B4 pipelining. The interconnect then properly mux the traffic on these channels as it sees fit. And I never have to worry about a transaction ID in the process, since that's an implementation detail of the interconnect. | 16:22 |
kc5tja | I'm pretty convinced Wishbone is many years ahead of its time, and that it still has surprises in store for the future. | 16:23 |
ZipCPU|Laptop | ^^ +1 | 16:38 |
ZipCPU|Laptop | One thing AXI has that wishbone doesn't is flow control on the return channel. | 16:38 |
ZipCPU|Laptop | Well ... that and more configuration options than the Cadillac salesman will offer you. | 16:39 |
kc5tja | True; the lack of a LAST signal is annoying, but that can at least be included in a tag bus. | 16:48 |
kc5tja | Or are you thinking about something else? | 16:48 |
ZipCPU|Laptop | No, I was thinking more about the fact that even the return busses have VALID and READY signals. | 16:55 |
kc5tja | Ahh, yeah, good point. | 16:57 |
kc5tja | While not _official_, it seems like an easy alteration/enhancement to the basic specification to add a STALL_O signal too. | 16:59 |
ZipCPU|Laptop | Sure, you could, but I really don't think it's necessary. | 16:59 |
ZipCPU|Laptop | Remember: I'm of the simple is better mind. Dump the tag lines. ;) | 16:59 |
kc5tja | I don't either. But in cases where it's required or helpful... | 17:00 |
kc5tja | The only case where I see AXI being a fundamentally better bus, I think, is when you have, say, 5 independently running cores (say, a C-slowed/retimed design with 5 contexts), and a transfer on context 3 completes before a transfer from context 2. | 17:01 |
kc5tja | But, then again, AXI4 disallows out of order transfer completion. | 17:01 |
kc5tja | I don't think the tag lines are required. | 17:02 |
ZipCPU|Laptop | Hmm ... if you want out of order completion, you will need to tag every bus request with an ID. | 17:19 |
ZipCPU|Laptop | Personally, I think the bus should be optimized for sequential access, rather than out of order access. It's just a whole lot easier to optimize hardware for that purpose. | 17:20 |
kc5tja | If you have an arbiter coupling three CPUs to a single interconnect port, from the POV of that port, you're going to have three independently operating transactions in flight, and not all of them will complete in order. | 17:26 |
kc5tja | So from the POV of a particular master, I agree -- build for sequential access. | 17:27 |
kc5tja | But the _interconnect_ has to be able to deal with out of order transactions. | 17:27 |
ZipCPU|Laptop | Yeah, I hear you, but ... from my own experience, one transaction will completely tie up a slaves capability. The slave isn't likely to be able to handle more than one transaction at a time, but can be optimized for sequential accesses. | 17:33 |
ZipCPU|Laptop | The opportunity for multiple masters is limited to multiple masters attempting to contact separate and individual slaves. | 17:33 |
ZipCPU|Laptop | In that case, a smart interconnect that can connect master A to slave X, master B to slave Z, master C to slave Q, etc. | 17:34 |
ZipCPU|Laptop | Each slave can then be optimized for sequential access, and out of order access just happens at ... whatever order/rate it happens at. It's now independent of the bus. | 17:35 |
kc5tja | Right; just about the only way to do that is a crossbar switch, which is easy enough to build. | 17:55 |
kc5tja | But switches take up space in a design. Sometimes a ring would be better. | 17:55 |
ZipCPU|Laptop | Agreed with everything up to the "ring". Ahm ... how would a ring work? Do you have a good reference for that? | 18:02 |
ZipCPU|Laptop | I guess I don't have any intuition for how a ring shaped bus might be made with wishbone. | 18:02 |
ZipCPU|Laptop | Would masters and slaves be equal partners on the ring, or would the ring just be the implementation of the interconnect itself? | 18:02 |
kc5tja | Ring would be the interconnect itself. There'd be registers along each hop as well, allowing the ring to span the die without incurring too much latency. | 18:04 |
kc5tja | Addresses flow clockwise, data counter-clockwise. (Although, I don't think it matters except for performance reasons.) | 18:05 |
kc5tja | Rings are often used for NoCs. Jan's interconnect is a 2D mesh composed of orthogonal rings, for example. | 18:05 |
kc5tja | ZipCPU|Laptop: it's just occurred to me that you can use Wishbone in "pipeline" configuration (as distinct from mode; how confusing is that?) to implement AXI-like channels. | 23:36 |
--- Log closed Tue Dec 06 00:00:15 2016 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!