IRC logs for #openrisc Friday, 2013-08-02

--- Log opened Fri Aug 02 00:00:53 2013
olofkI finally pushed some wishbone interconnect utilities to orpsocv3. With test benches!!! :)00:14
stekernolofk: nice!05:39
stekernimplementing PL1 isn't as easy as it first seemed07:05
stekernat least not in hw07:09
stekernsince you need to know if the pl1 bit is set before you've read the pl1 bit07:10
jonibostekern: i'm not seeing the problem there... why do you need to know if it's set?07:23
stekernbecause you need to know which bits in the address that is being accessed you should use as index to the tlbs, don't you?07:25
stekernthere are of course ways to solve that:07:28
jonibowell, really you'd to index the high 8 bits, check PL1 and then index 19 bits07:29
stekern1) assume PL1 is set and reread if not (*very* bad idea)07:29
joniboit's all parallelizable, right?07:29
joniboyou'd search 8 bits and 19 bits at the same time and use the 8 bit value if PL1 is set07:29
stekern2) speculatively read with both indexes and let the ones with PL1 set overrule07:30
joniboyes, 2) is what I tried to say above07:30
stekernbut it's messy, since the second port of the RAM is used for SPR accesses07:30
joniboright, I see07:31
stekern3) cache which tlb's have PL1 set07:31
joniboheh, 3) amounts to having ATB's07:32
stekernI think 2 is still the right option, and it's mostly messy for the itlb, since the dtlb spr accesses are exclusive to memory accesses07:32
stekern(3 = ATB) I know, I wonder if they are a product of a similar conversation as we are having now =P07:33
joniboyeah, that thought crossed my mind, too07:35
stekernit turned out to be simpler to just use the X/U/W bits straight in hw, rather than use a lookup table07:38
jonibogood to hear... it seemed to me that that should be the case07:39
jonibobut the PL1 thing is tricky07:39
joniboand it gets us 16MB pages that are marginally useful (really a bit too big)07:39
jonibohow is the TLB implemented... memory?07:41
joniboblock ram?07:42
stekerntwo of them actually, one for match regs and one for trans regs07:45
stekernwell, in mor1kx at least07:45
stekerniirc they are combined to one in or120007:46
jonibohmm... i'm not sure I understand how this works07:53
jonibodoes it _iterate_ over the match registers, or does it try to match against all 128 in parallel?07:54
stekernno, it reads the match register with the index and checks if it matches the current address07:55
jonibothat makes sense07:56
jonibothat's what the kernel does, too07:56
jonibothe TLB index for the 8-bit indexed virtual address is always 0, right?07:57
stekernif you'd have multiple way tlbs you'd have to iterate over the ways (read all of them in parallell)07:57
jonibo((VIRTADDRESS&0xff000000)>>13)mod128 = 008:00
joniboso you can only ever have PL1 set at index 008:00
stekernnot sure what you meant by that, but the max indexing from the 19 bit is virt_addr[20:13]08:00
stekernso, yes08:01
stekern(I think)08:01
joniboyeah, that's right08:01
stekernso 3) perhaps isn't such a bad idea after all08:01
stekernand doesn't need to amount to ATB, right?08:01
stekernor am I missthinking this?08:02
jonibobut, it's kind of hokey08:02
jonibothere can only ever be 1 huge page in the TLB08:02
joniboif you use _only_ huge pages you end up never using 127 of the 128 slots in the TLB08:02
joniboit's not strictly necessary to "pre-index" the TLB as we do... it's just a simplification08:03
joniboreally you'd want an LRU over the 128 slots08:03
stekernumm, isn't the LRU for determine which way should go?08:03
joniboyou'd want to say "TLB, cache this virt to phys mapping for me with these flags" and the TLB caches the entry into a slot for you08:04
jonibothe way LRU is a second LRU08:04
joniboi'm saying we should have 128 TR and MATCH regs... we should have _one_ and the CPU should figure out where to cache the entry based on an LRU over the 128 available slots08:04
jonibodoes this even make sense...???08:05
jonibowhy would you need ways in that case?08:05
joniboas it stands now, if I copy data from address 0xXXX02000 to address 0xXXX12000, I get a TLB miss for ever read/write08:07
joniboumm, this address aren't quite right, but hopefull you get my point08:08
stekernmakes sense, but sounds awfully hard to implement an LRU over 128 entries08:08
stekernyeah, I get your point08:08
joniboLRU over 128 entries... yeah, sounds "impossible"08:09
stekernso using ways with a more modest amount of entries to LRU over is a trade off08:09
jonibo64 entries w/ 2 ways is already a win and quite easy to implement08:10
jonibo4 ways is trickier, I think, as the LRU is a bit complicated08:10
stekernyeah, and 128x2 isn't any harder (I'm running mor1kx with 128 entries)08:11
jonibojust looking quickly at x86 for reference, they have effectively an ATB08:11
jonibo14 entries, fully associative for 2/4MB pages08:11
joniboand the main TLB is for 4kb pages08:12
stekernyeah, that does solve the problem with "only 1 huge page"08:12
joniboagain, it would be nice if the hardware picked the way to cache an entry into for you08:13
jonibobased on the LRU bits08:13
stekernyou mean "it will be nice when" ;)08:13
jonibothe arch spec kind of mandates that you pick it yourself08:14
stekernregarding 4-way LRU, Stefan Wallenotwits shared a neat trick for that:
jonibocool... let's doable then08:17
stekern"picking it yourself" <- not sure I understood what you meant by that08:19
joniboyou need to specify the way to fill by writing to the appropriate MR/TR registers08:19
jonibothere's a unique register  for each way08:19
stekernah, ok, I get what you mean. You are thinking in the case of software reload08:20
joniboI still think a single MR/TR pair is sufficient... the HW selects the way for you based on LRU bits and the HW selects the TLB slot to fill based on the VIRT page frame in the MR register08:21
stekernyes, that's kind of silly...08:21
joniboyou'd write the MR register, and when you write the TR register it triggers a TLB cache entry fill based on the contents of the two registers08:22
joniboif you _really_ need to control it yourself, you could have a third register with 7 bits for slot and 1/2 bits for way... but seems like overkill08:23
jonibotoday we have 2 x slots x ways TLB registers08:23
jonibowith a 4-way TLB, we'd probably find that kernel pages wouldn't be bumped out of the TLB very often which would make it less interesting to pursue things like using a dedicate TLB entry for a huge kernel page that never gets flushed08:25
joniboso what's the way forward then...?  multiple ways partially solves the problem solved by huge pages (not completely, but it makes things better).  might be better to pursue that than PL1/ATB...?08:40
jonibothe PL1 bit is starting to seem like utter nonsense that should be removed from the spec08:41
joniboas things stand now, it could only ever be set on "set 0"08:41
stekernthinking about that a bit more, I'm not sure that's true, why can't it be set on other sets?08:48
jonibo((VIRTADDRESS&0xff000000)>>13)mod128 = 008:49
stekernbut that's for the 8kb page case, no?08:49
stekernfor pl1=1, you would use virt_addr[28:21] to index the tlbs08:50
stekern(assuming 128 sets)08:50
stekernfor pl1=0 you would use virt_addr[20:13]08:51
jonibook, sure, if you do it that way then I guess you'd get 128 huge pages too08:51
joniboif you do it this way then you don't even need to store the bottom 7 bits of the page frame because you get it from the TLB index automatically, right?08:54
jonibofor pl=1 you'd use bits virt_addr[30:24] to find the TLB set and you'd only need to match the high bit in the entry's VPN field, so the VPN field could be 1 bit there08:58
jonibofor pl=0, you'd only have to match the 19-7=12 highs of the VPN field08:59
jonibothe VPN field only needs to be 12 bits, really08:59
joniboof course, this depends on how many sets the TLB has09:02
jonibofor 64 sets you'd need 13 bits09:02
juliusbolofk: nice one with the!09:04
stekernyes, virt_addr[30:24], where did I get 28:21 from?09:34
stekernand yeah, you could probably save a couple of bits by reusing the ones from vpn09:44
stekern(otoh, atm mor1kx saves reserved bits to  mem too, so you'd probably want to optimize those away first ;))09:44
joniboVPN at 13 bits + PPN at 19 bits will all fit into 1 word09:53
joniboand then you could move the "flag/status/etc" bits to a separate area... does that buy you anything?09:53
joniboof if the VPN is 12 bits (which works for 128 entry TLB), then you can get the "PL1" flag in there as well09:54
stekernif you're not going somewhere special with having words, block rams doesn't need to built up by 32-bit words09:56
jonibook, then there may be real saving to be had09:57
joniboi'll be back in 30 minutes09:57
stekernbut having them seperated into translate and match helps with SPR/READ and write, so I'm inclined to keep that10:01
stekernbrushing away unused bits is just something I haven't got around to do10:02
stekern*SPR read/write10:02
stekernwell, write anyway, read isn't much of a problem10:05
juliusbalright - it really is official - I'm going to talk at OHS2013 -
juliusb(only for 6 minutes but it still counts!)10:23
olofkjuliusb: I see that you're on the "Democratizing Knowledge" track. Considering what elitistic bastards we all are, I'm not sure they put your talk in the right place ;)14:31
juliusbolofk: haha yes, very true, I'll be sure to ask the organisers to vet the audience beforehand and eject those who are not worthy17:39
olofkjuliusb: That's the spirit :)18:21
-!- Netsplit *.net <-> *.split quits: poke53282, olofk19:20
-!- Netsplit over, joins: olofk19:23
stekernI think I've got muxing of spr rw and normal itlb accesses onto one port working now19:40
olofkIn the or1200-basic test, there is an access to memory address 0x12345678. Is that supposed to exist? Wouldn't that require an insane amount of memory to simulate?23:43
olofkOr does it assume that the memory wraps?23:46
--- Log closed Sat Aug 03 00:00:55 2013

Generated by 2.15.2 by Marius Gedminas - find it at!