--- Log opened Sun Feb 16 00:00:41 2014 | ||
stekern | pgavin: interesting thought, I never considered an option of invalidating the branch prediction tables | 01:27 |
---|---|---|
stekern | mor1kx actually have a branch predictor too, but it's just a simple static one right now, I've been planning to add a dynamic one to it though | 01:28 |
stekern | ysionneau: to be honest, the hw page tree walker didn't improve performance by huge amounts | 01:46 |
stekern | one reason is that the tlb handler and the pagetables will likely be in cache and currently the hw pagewalker doesn't go through the cache to walk the tables | 01:47 |
stekern | speaking of which, I took a look at the WIP lm32 netbsd port, I didn't quite get the tlb miss handler implementation | 01:51 |
pgavin | stekern: right now my BTB has logic to iterate through each entry and invalidate it at reset time | 05:19 |
pgavin | but that's not quite consistent with other aspects of the ISA | 05:19 |
stekern | is it common for architectures to expose the branch predictor like that? | 05:26 |
stekern | one thing that is a disadvantage with it is that somewhat define how the implementation should look like | 05:27 |
stekern | +you | 05:28 |
stekern | exposing my ignorance here - why is it better to explicitly invalidate them than to have "random" predictions? | 05:30 |
stekern | if it's just a matter of avoiding 'x'es in simulations, than I'd brush it off as a simulation issue and I think it should be handled by simulation specific logic | 05:32 |
pgavin | stekern: it's not really any better | 05:50 |
pgavin | but my pipeline depends on the BTB never returning an incorrect target | 05:50 |
pgavin | so the valid bits in the btb need to be flushed | 05:51 |
pgavin | but the if the BPB initially has random values it's not really a big deal | 05:51 |
pgavin | also, on a context switch the BTB will need to be flushed | 05:53 |
pgavin | because it uses virtual addresses | 05:53 |
pgavin | I suppose the btb flush could be tied to one of the l.*sync instructions somehow | 05:54 |
stekern | pgavin: hmm, ok, I see | 09:33 |
stekern | why would virtual addresses matter though? it's just pc relative addresses | 09:36 |
stekern | ah, or you mean the branch source | 09:36 |
ysionneau | 02:51 < stekern> speaking of which, I took a look at the WIP lm32 netbsd port, I didn't quite get the tlb miss handler implementation < it was very wrong until yesterday or 2 days ago | 09:37 |
ysionneau | I commited quite a few commits those last days | 09:37 |
stekern | ysionneau: I looked at it after your commits yesterday | 09:37 |
ysionneau | I had a hard time to get it "working" | 09:37 |
stekern | why do you need to save that much context though? | 09:37 |
ysionneau | ok, I agree it might not be obvious and might not be coded the best way | 09:37 |
stekern | I haven't looked to deeply I admit ;) | 09:38 |
ysionneau | stekern: I could save less, but since I am calling _do_real_tlb_miss_handling which is in C | 09:38 |
ysionneau | then I prefered saving everything, to avoid any future issue | 09:38 |
stekern | yeah, that was part of my question, what's the difference between fake and real? | 09:38 |
ysionneau | but indeed _do_real_tlb_miss_handling is just using sp r1 r2 r3 and not much more | 09:38 |
ysionneau | fake is used during very early boot | 09:39 |
stekern | ok, so that's boot tlb miss handlers | 09:39 |
ysionneau | fake is replaced by real at the end of pmap_bootstrap() (in lm32/lm32/lm32_pmap.c) | 09:39 |
ysionneau | fake only does PA = va - base_virt + base_phys | 09:39 |
stekern | but what is the C function doing? | 09:39 |
ysionneau | C is doing the page table lookup | 09:39 |
ysionneau | I could do it in assembly | 09:40 |
ysionneau | but then I'm pretty sure anyway that I will need someday to call do_fault() or some internal netbsd mechanism | 09:40 |
ysionneau | for instance when a user space process want to execute a non executable page | 09:40 |
stekern | yeah, that was my third question, at what point is the pagefault handler called | 09:41 |
ysionneau | I think this is handled in machine independant C code | 09:41 |
ysionneau | for now I don't call it, which is "bad" | 09:41 |
ysionneau | but as far as the boot is going right now I don't need it | 09:41 |
stekern | could be, in Linux the pagefault handler is a bit "boiler platey" | 09:41 |
stekern | ok, fair enough, that straighten out my question marks - you have a C function as the tlb miss handler for easier debugging during development | 09:42 |
pgavin | stekern: the BTB is used to avoid adding the branch PC to the immediate | 09:48 |
pgavin | it's written to after the target is calculated | 09:48 |
pgavin | it made the most sense to use virtual addresses for both | 09:48 |
pgavin | writing the physical address of the target to the BTB would require translating the target address before the write | 09:49 |
pgavin | plus there would be other difficulties | 09:49 |
stekern | ok, I understand. And I agree, if you're going to save something, it should be virtual addresses | 09:51 |
ysionneau | stekern: I could certainly improve performance by only saving 2 or 3 registers and then check if I need to call a C function, then save the remaining regs only if I need to | 09:55 |
ysionneau | but for now: I need to get the kernel running more than getting it run fast :) | 09:55 |
stekern | ysionneau: sure, I'm not condemning that, I was just confused about the heavy-weightness of it, and thought the C-function did something more involved than just walking the pagetables | 13:22 |
stekern | pgavin: so if I understand your implementation right - you need to know if the value in the BTB is valid early, and when you have calculated the target, you can't start backing out of a wrong value in it, right? | 13:24 |
stekern | are you only using the BTB for conditional jumps, or are you buffering other branches as well? | 13:26 |
stekern | I guess you could snoop the immucr for writes to invalidate the BTB, to get back to your original problem | 13:28 |
ysionneau | ok I understand :) | 13:29 |
-!- Netsplit *.net <-> *.split quits: jonmasters | 17:08 | |
pgavin | stekern: the only reason the pipeline requires the BTB always produce the correct target is because I don't want to compare the BTB target with calculated target | 19:08 |
pgavin | it's really just to avoid extra logic | 19:08 |
pgavin | the BTB buffers conditional and unconditional direct jumps/branches | 19:09 |
pgavin | I only write a target to the BTB it if the branch was taken but predicted not taken | 19:10 |
pgavin | (I think) | 19:10 |
pgavin | so another thing is that I want to support L1 caches with associativity > 2 | 19:17 |
pgavin | and the LRU state table needs to be flushed | 19:17 |
pgavin | with associativity == 2 the LRU state is only 1 bit, but higher associativities have a special encoding that needs to be initialized at reset | 19:18 |
pgavin | I suppose the LRU table entries could be flushed when the CBIRs are written | 19:23 |
pgavin | but I think there would be more serious problems if those entries aren't flushed and an entry is accessed than there would be for e.g. the mor1kx | 19:24 |
pgavin | actually now that I've thought about it a bit i think the LRU logic would work correctly even if all bits were intially random | 20:35 |
--- Log closed Mon Feb 17 00:00:43 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!