--- Log opened Thu Oct 02 00:00:19 2014 | ||
stekern | olofk: in my multi-core soc, wb_intercon is a huge bottleneck fmax-wise, do you have any ideas how we can improve that? | 03:05 |
---|---|---|
stekern | I think wb_mux is the worst bottleneck | 03:12 |
olofk | stekern: I see. I was actually a bit worried about that when I made it, but thought it would be better to do a simple implementation first and improve it when needed | 06:47 |
olofk | Can we afford to make it registered and trade fmax for latency, or should we try to optimize the combinatorial paths? | 06:49 |
olofk | For single accesses, like control/status registers, I guess that latency isn't a problem | 06:49 |
olofk | Well, latency isn't really a problem for accessing larger data either, but we have to make sure not to lose bandwidth | 06:50 |
olofk | I see two ways around that | 06:53 |
olofk | 1. Use the delayed ack feature from wb b4. That would require all masters to be b4 aware | 06:53 |
olofk | 2. Register the data and let the mux fake an ack even though we don't know if it was a successful transaction. The issue here is that the master won't know which access caused an error | 06:55 |
olofk | (2) Given that we don't recover from bus errors, I guess it's not that big deal if the error comes a little later | 06:56 |
stekern | what's the delayed ack from b4? | 06:59 |
olofk | You can fire away a stream of data without waiting for an ack | 07:00 |
olofk | The slave will return acks for all transactions, and when all acks have returned, the master knows that the burst was completed | 07:00 |
stekern | ok. but I'll refuse to use any b4 features before the spec document is free from obnoxious restrictions | 07:01 |
olofk | There was something weird there, right? Remind me | 07:02 |
stekern | well, look at page 2: http://cdn.opencores.org/downloads/wbspec_b4.pdf | 07:03 |
stekern | "[NO] Can be edited completely and your name put on it. | 07:03 |
stekern | " | 07:03 |
olofk | Can be offered through auction sites | 07:04 |
stekern | makes you want to laugh (if you compare wb3 and wb4) | 07:04 |
olofk | Hmm... they don't look at all the same | 07:05 |
stekern | no, and someone put their name on it | 07:05 |
olofk | I remember that Richard Herveille was about upset about b4 | 07:06 |
olofk | B4 was created in joint cooperation with CERN. We should talk to them at orconf and see what they think | 07:06 |
stekern | yeah, I don't think he was consulted at all about it, yet he is the steward of the spec... | 07:06 |
olofk | That would be good to bring up at the OpenRISC forum slot at orconf, but it's the last talk on sunday so I guess people will be quite tired by then | 07:08 |
olofk | Or have gone home already | 07:08 |
stekern | yeah, but we can of course bring it up off schedule | 07:10 |
olofk | Yep. | 07:10 |
stekern | that aside, where can I read about the delayed acks? | 07:11 |
olofk | 83 | 07:11 |
olofk | Page 83 | 07:11 |
stekern | hmm, ok. It's not at all clear to me that the behaviour you were describing is allowed. are you allowed to assert stb for several cycles without the previous being acked? | 07:15 |
olofk | Note to self. Stop smoking crack when consulting data sheets | 07:16 |
stekern | I think "stop smoking" crack on it own might be good advice too ;) | 07:17 |
stekern | ...the crack was supposed to be within "" in that sentence... | 07:18 |
olofk | stekern: Stop "smoking" crack when you are on IRC | 07:19 |
stekern | yeah, all good advice... | 07:19 |
olofk | Anyhow. I can't find it either | 07:20 |
olofk | I'm checking ohwr.org for clues | 07:20 |
stekern | I was thinking that we'd make all the slow ctrl accesses registered and keep them seperated from the "fast" interfaces (like main mem) | 07:22 |
olofk | That's a good idea | 07:22 |
olofk | We probably want to separate them later on anyway to support wider mem accesses | 07:22 |
olofk | As the small-modules-crazy person that I am, I think that we could split that up into two intercon blocks, and have a dedicated registering component between them | 07:23 |
stekern | I think it's enough to add a "REGISTERED" parameter to the module | 07:24 |
olofk | Where do you want the registers? | 07:24 |
stekern | on everything | 07:25 |
olofk | Before mux, between mux and arbiter, and after arbiter? | 07:25 |
stekern | ah you meant like that | 07:26 |
stekern | well, I think it might be enought in the muxer | 07:27 |
stekern | and to register adr, cyc, stb & we | 07:28 |
stekern | so, between mux and arbiter | 07:28 |
olofk | I prefer having it between the components. Then we can make a dedicated component that we can put in wherever we need to | 07:29 |
stekern | beacuse, you'll have different muxes for the fast and slow buses, but not different arbiters, right? | 07:30 |
olofk | hmm | 07:30 |
stekern | well, I guess you could do the split with an arbiter too | 07:31 |
stekern | but yeah, sure, a seperate registering module is fine. as long as you don't make a seperate repo for it ;) | 07:33 |
olofk | :) | 07:33 |
olofk | Yes! No matter what I previously said. Now I'm awesome | 08:18 |
olofk | stekern: How did you hook up the stream writer? | 08:24 |
stekern | olofk: http://pastie.org/9613276 | 09:09 |
olofk | stekern: I was more thinking interconnect-wise | 09:16 |
stekern | ah | 09:18 |
stekern | http://pastie.org/9613331 | 09:19 |
stekern | so, I connect it to one of the SDRAM ports | 09:22 |
stekern | (that I should rename from eth0 to dma0 or something, since it's not eth0 specific anymore | 09:22 |
olofk | I'm not getting any data, but the problem could be pretty much anywhere, and I can't use signaltap as easily now when I have the gdb connection | 09:44 |
olofk | So once again, I consider using diila | 09:46 |
stekern | since you most likely have the most interesting signals in the top-module, it's pretty suitable | 09:48 |
stekern | the annoying part comes when you want to look at some signal deep down in a sub module hierarchy | 09:49 |
olofk | Yep, I need to look at the toplevel first. wb writes to the register don't seem to get there | 10:16 |
olofk | Why can I never remember how to make gdb set a byte? | 11:01 |
olofk | hmm... what's wrong with this? set *(char *)0x91000000 = 0xff | 11:03 |
olofk | It works when I write to RAM. Must be something weird with my peripheral accesses | 11:04 |
stekern | but it works from software? | 12:09 |
olofk | Don't have any software | 14:22 |
olofk | stekern: How does triggering work in diila | 14:28 |
olofk | ? | 14:28 |
olofk | It trigs when trig_i == value of register 0 ? | 14:29 |
olofk | Future improvements: Add register for setting don't care bits, selecting edge/level trigger | 14:33 |
stekern | yeah, I know but I haven't needed to change stuff like that at runtime yet :) | 14:45 |
olofk | Looks like diila won't help anything here | 17:39 |
olofk | I can't write to its registers | 17:39 |
stekern | not even from the cpu? | 17:41 |
olofk | haven't tried that | 17:41 |
olofk | Just via debug interface | 17:41 |
olofk | But it's the same problem I'm seeing with wb_stream_writer, so I suspect that I fucked up the instantiation somehow | 17:42 |
olofk | Hmm... now I can't write to gpio either | 17:43 |
stekern | can it be related to the changea you did? | 17:43 |
stekern | it's de0 nano you're using? | 17:45 |
olofk | hm.. I can write to the gpio direction register, and when I read back 0x91000000 I get 0xff000000 | 17:45 |
olofk | It's based on de0_nano. I pretty much only added the MyriadRF stuff | 17:45 |
olofk | ah ok... the GPIO probably works. Just that I need to set register 1 to see the contents of reg 0 | 17:46 |
stekern | yes | 17:47 |
olofk | hmm | 17:48 |
olofk | Now I'm getting something from 0x96000000, which should be diila | 17:48 |
olofk | Nahh.. I just get back the last value I wrote to GPIO when I read from the diila address space | 17:49 |
stekern | that doesn't sound good | 17:51 |
olofk | ahh.. wait a minute. I'm looking at the whole address vector in one place in wb_stream_writer_cfg | 17:52 |
olofk | That's no good | 17:52 |
olofk | Because the intercon doesn't clear the top bits | 17:52 |
olofk | Doesn't explain what's wrong with diila, but could explain my stream writer problems | 17:53 |
stekern | ah, yes.. I do like this in my instantiation: .wbs_adr_i(wb_m2s_streamer0_slave_adr[11:0]), | 17:56 |
olofk | I used 5:0, but that should work too I guess | 17:57 |
stekern | olofk: quick chance to review before I push: http://pastie.org/9614389 | 17:57 |
olofk | Can't see any problems | 17:59 |
olofk | But it sucks that it can't be easily parametrized | 17:59 |
stekern | what do you want to parameterize? | 18:01 |
olofk | mainly width, for >32 bit vectors | 18:02 |
stekern | yeah | 18:03 |
olofk | Still can't write to my streamer regs :( | 18:03 |
stekern | maybe I should change all the integer to longint | 18:04 |
olofk | Is there a longint in verilog? | 18:05 |
olofk | ah, $time is 64 bit, right? | 18:05 |
olofk | gtg | 18:05 |
stekern | ah, no. longint is only in systemverilog | 18:15 |
olofk | Ah fuck it. Not again. I forgot to add the diila instance to the slaves list | 20:06 |
olofk | Ah fuck it. Not again. I forgot to regenerate wb_intercon | 20:28 |
olofk | Ah fuck it. Design doesn't fit anymore | 20:34 |
stekern | what have you stuffed in there? | 20:37 |
olofk | Everything but the kitchen sink | 20:37 |
stekern | =P | 20:37 |
olofk | Not too much really | 20:37 |
stekern | you probably need to cut down on the blockrams when diila is present | 20:38 |
stekern | like set the mor1kx cache to something smaller | 20:38 |
olofk | Yeah, I suspect the problem is the block rams | 20:38 |
olofk | Would it work to just decrease the diila blockrams to 512 instead of 1024? Or will that break anything? | 20:39 |
stekern | nothing except the vcd generating script I think | 20:40 |
stekern | should be fairly straight forward to mend that | 20:40 |
olofk | I'll try that first | 20:40 |
olofk | I see that diila.v is 168 lines long. You should definitely split that up into at least four modules. Maybe put two of the modules in different repos | 20:41 |
olofk | hmm.. halving the diila mem only made total usage go down from 114% to 108% | 20:44 |
olofk | Which params should I touch on mor1kx to make it smaller? | 20:44 |
stekern | the D/ICACHE_BLOCK/SET ones | 20:50 |
stekern | ...and _WAYS | 20:50 |
olofk | Nice. There's a great hierarchical usage summary in the map report that I could import into a spreadsheet | 20:51 |
stekern | why do you need to import it to a spreadsheet? | 20:51 |
stekern | are you shooting for to become a manager? | 20:52 |
olofk | Because the lines were so long and emacs wrapped them so I couldn't get an overview | 20:52 |
olofk | :) | 20:52 |
stekern | you know that emacs can truncate lines ;) | 20:52 |
olofk | I made a quick VBA script to put them in my access database | 20:52 |
olofk | Ok, so changing SET_WIDTH from 8 to say 6, will that do? | 20:53 |
olofk | Is size=SET_WIDTH*BLOCK_WIDTH*WAYS? | 20:53 |
stekern | 2^ that, yes | 20:55 |
stekern | what are they now? | 20:55 |
stekern | BLOCK=4, SET=8, WAYS=1 => 4K cache | 20:55 |
stekern | but decreasing the cache will not make it much smaller other than use less block ram | 20:57 |
olofk | it was 5,8,2 | 20:57 |
olofk | setting it to 5,6,2 for I and D worked | 20:58 |
olofk | Nope. Still no action in diila | 20:59 |
olofk | I give up for today | 21:04 |
stekern | I don't give up, I go to bed | 21:11 |
--- Log closed Fri Oct 03 00:00:20 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!