IRC logs for #openrisc Thursday, 2014-10-02

--- Log opened Thu Oct 02 00:00:19 2014
stekernolofk: in my multi-core soc, wb_intercon is a huge bottleneck fmax-wise, do you have any ideas how we can improve that?03:05
stekernI think wb_mux is the worst bottleneck03:12
olofkstekern: I see. I was actually a bit worried about that when I made it, but thought it would be better to do a simple implementation first and improve it when needed06:47
olofkCan we afford to make it registered and trade fmax for latency, or should we try to optimize the combinatorial paths?06:49
olofkFor single accesses, like control/status registers, I guess that latency isn't a problem06:49
olofkWell, latency isn't really a problem for accessing larger data either, but we have to make sure not to lose bandwidth06:50
olofkI see two ways around that06:53
olofk1. Use the delayed ack feature from wb b4. That would require all masters to be b4 aware06:53
olofk2. Register the data and let the mux fake an ack even though we don't know if it was a successful transaction. The issue here is that the master won't know which access caused an error06:55
olofk(2) Given that we don't recover from bus errors, I guess it's not that big deal if the error comes a little later06:56
stekernwhat's the delayed ack from b4?06:59
olofkYou can fire away a stream of data without waiting for an ack07:00
olofkThe slave will return acks for all transactions, and when all acks have returned, the master knows that the burst was completed07:00
stekernok. but I'll refuse to use any b4 features before the spec document is free from obnoxious restrictions07:01
olofkThere was something weird there, right? Remind me07:02
stekernwell, look at page 2:
stekern"[NO] Can be edited completely and your name put on it.07:03
olofkCan be offered through auction sites07:04
stekernmakes you want to laugh (if you compare wb3 and wb4)07:04
olofkHmm... they don't look at all the same07:05
stekernno, and someone put their name on it07:05
olofkI remember that Richard Herveille was about upset about b407:06
olofkB4 was created in joint cooperation with CERN. We should talk to them at orconf and see what they think07:06
stekernyeah, I don't think he was consulted at all about it, yet he is the steward of the spec...07:06
olofkThat would be good to bring up at the OpenRISC forum slot at orconf, but it's the last talk on sunday so I guess people will be quite tired by then07:08
olofkOr have gone home already07:08
stekernyeah, but we can of course bring it up off schedule07:10
stekernthat aside, where can I read about the delayed acks?07:11
olofkPage 8307:11
stekernhmm, ok. It's not at all clear to me that the behaviour you were describing is allowed. are you allowed to assert stb for several cycles without the previous being acked?07:15
olofkNote to self. Stop smoking crack when consulting data sheets07:16
stekernI think "stop smoking" crack on it own might be good advice too ;)07:17
stekern...the crack was supposed to be within "" in that sentence...07:18
olofkstekern: Stop "smoking" crack when you are on IRC07:19
stekernyeah, all good advice...07:19
olofkAnyhow. I can't find it either07:20
olofkI'm checking for clues07:20
stekernI was thinking that we'd make all the slow ctrl accesses registered and keep them seperated from the "fast" interfaces (like main mem)07:22
olofkThat's a good idea07:22
olofkWe probably want to separate them later on anyway to support wider mem accesses07:22
olofkAs the small-modules-crazy person that I am, I think that we could split that up into two intercon blocks, and have a dedicated registering component between them07:23
stekernI think it's enough to add a "REGISTERED" parameter to the module07:24
olofkWhere do you want the registers?07:24
stekernon everything07:25
olofkBefore mux, between mux and arbiter, and after arbiter?07:25
stekernah you meant like that07:26
stekernwell, I think it might be enought in the muxer07:27
stekernand to register adr, cyc, stb & we07:28
stekernso, between mux and arbiter07:28
olofkI prefer having it between the components. Then we can make a dedicated component that we can put in wherever we need to07:29
stekernbeacuse, you'll have different muxes for the fast and slow buses, but not different arbiters, right?07:30
stekernwell, I guess you could do the split with an arbiter too07:31
stekernbut yeah, sure, a seperate registering module is fine. as long as you don't make a seperate repo for it ;)07:33
olofkYes! No matter what I previously said. Now I'm awesome08:18
olofkstekern: How did you hook up the stream writer?08:24
olofkstekern: I was more thinking interconnect-wise09:16
stekernso, I connect it to one of the SDRAM ports09:22
stekern(that I should rename from eth0 to dma0 or something, since it's not eth0 specific anymore09:22
olofkI'm not getting any data, but the problem could be pretty much anywhere, and I can't use signaltap as easily now when I have the gdb connection09:44
olofkSo once again, I consider using diila09:46
stekernsince you most likely have the most interesting signals in the top-module, it's pretty suitable09:48
stekernthe annoying part comes when you want to look at some signal deep down in a sub module hierarchy09:49
olofkYep, I need to look at the toplevel first. wb writes to the register don't seem to get there10:16
olofkWhy can I never remember how to make gdb set a byte?11:01
olofkhmm... what's wrong with this? set *(char *)0x91000000 = 0xff11:03
olofkIt works when I write to RAM. Must be something weird with my peripheral accesses11:04
stekernbut it works from software?12:09
olofkDon't have any software14:22
olofkstekern: How does triggering work in diila14:28
olofkIt trigs when trig_i == value of register 0 ?14:29
olofkFuture improvements: Add register for setting don't care bits, selecting edge/level trigger14:33
stekernyeah, I know but I haven't needed to change stuff like that at runtime yet :)14:45
olofkLooks like diila won't help anything here17:39
olofkI can't write to its registers17:39
stekernnot even from the cpu?17:41
olofkhaven't tried that17:41
olofkJust via debug interface17:41
olofkBut it's the same problem I'm seeing with wb_stream_writer, so I suspect that I fucked up the instantiation somehow17:42
olofkHmm... now I can't write to gpio either17:43
stekerncan it be related to the changea you did?17:43
stekernit's de0 nano you're using?17:45
olofkhm.. I can write to the gpio direction register, and when I read back 0x91000000 I get 0xff00000017:45
olofkIt's based on de0_nano. I pretty much only added the MyriadRF stuff17:45
olofkah ok... the GPIO probably works. Just that I need to set register 1 to see the contents of reg 017:46
olofkNow I'm getting something from 0x96000000, which should be diila17:48
olofkNahh.. I just get back the last value I wrote to GPIO when I read from the diila address space17:49
stekernthat doesn't sound good17:51
olofkahh.. wait a minute. I'm looking at the whole address vector in one place in wb_stream_writer_cfg17:52
olofkThat's no good17:52
olofkBecause the intercon doesn't clear the top bits17:52
olofkDoesn't explain what's wrong with diila, but could explain my stream writer problems17:53
stekernah, yes.. I do like this in my instantiation: .wbs_adr_i(wb_m2s_streamer0_slave_adr[11:0]),17:56
olofkI used 5:0, but that should work too I guess17:57
stekernolofk: quick chance to review before I push:
olofkCan't see any problems17:59
olofkBut it sucks that it can't be easily parametrized17:59
stekernwhat do you want to parameterize?18:01
olofkmainly width, for >32 bit vectors18:02
olofkStill can't write to my streamer regs :(18:03
stekernmaybe I should change all the integer to longint18:04
olofkIs there a longint in verilog?18:05
olofkah, $time is 64 bit, right?18:05
stekernah, no. longint is only in systemverilog18:15
olofkAh fuck it. Not again. I forgot to add the diila instance to the slaves list20:06
olofkAh fuck it. Not again. I forgot to regenerate wb_intercon20:28
olofkAh fuck it. Design doesn't fit anymore20:34
stekernwhat have you stuffed in there?20:37
olofkEverything but the kitchen sink20:37
olofkNot too much really20:37
stekernyou probably need to cut down on the blockrams when diila is present20:38
stekernlike set the mor1kx cache to something smaller20:38
olofkYeah, I suspect the problem is the block rams20:38
olofkWould it work to just decrease the diila blockrams to 512 instead of 1024? Or will that break anything?20:39
stekernnothing except the vcd generating script I think20:40
stekernshould be fairly straight forward to mend that20:40
olofkI'll try that first20:40
olofkI see that diila.v is 168 lines long. You should definitely split that up into at least four modules. Maybe put two of the modules in different repos20:41
olofkhmm.. halving the diila mem only made total usage go down from 114% to 108%20:44
olofkWhich params should I touch on mor1kx to make it smaller?20:44
stekernthe D/ICACHE_BLOCK/SET ones20:50
stekern...and _WAYS20:50
olofkNice. There's a great hierarchical usage summary in the map report that I could import into a spreadsheet20:51
stekernwhy do you need to import it to a spreadsheet?20:51
stekernare you shooting for to become a manager?20:52
olofkBecause the lines were so long and emacs wrapped them so I couldn't get an overview20:52
stekernyou know that emacs can truncate lines ;)20:52
olofkI made a quick VBA script to put them in my access database20:52
olofkOk, so changing SET_WIDTH from 8 to say 6, will that do?20:53
stekern2^ that, yes20:55
stekernwhat are they now?20:55
stekernBLOCK=4, SET=8, WAYS=1 => 4K cache20:55
stekernbut decreasing the cache will not make it much smaller other than use less block ram20:57
olofkit was 5,8,220:57
olofksetting it to 5,6,2 for I and D worked20:58
olofkNope. Still no action in diila20:59
olofkI give up for today21:04
stekernI don't give up, I go to bed21:11
--- Log closed Fri Oct 03 00:00:20 2014

Generated by 2.15.2 by Marius Gedminas - find it at!