stekernjuliusb: to get zero branch overhead without delay slot, wouldn't you need to do the branch address calculation in fetch stage?03:16
stekernwithout branch prediction03:18
stekernyou would even need branch target prediction to avoid doing the branch address calculation there03:19
stekernI'm still planning on adding simple branch prediction to cappuccino, I don't think there's another way to get any decent fmax out of it without crippling all b(n)f branches to be 2 cycle03:21
stekernor more correctly, crippling all l.sfxx; l.b(n)f to be 3-cycle03:22
stekernI'm not sure what scheme would be best though, initially I thought about a "traditional" backward branches taken forward not03:24
stekernbut I'm playing with the idea of just using the old flag value and recheck in execute stage if it was a misprediction03:25
stekernbecause, for loops, I think that would work pretty well, basically the first jump is potentially a misprediction and the last is a misprediction03:27
stekernfor conditional statements it's a bit of coin tossing, but so is the backwards-taken-forward-not03:28
stekernanother benefit of using the old value is that the compiler can be taught to avoid branch mispredictions (by inserting instructions between the l.sfxx and the l.b(n)f)03:30
