@stekern | hmm, otherwise it's probably not so problematic, except when an exception happens when an bubble is in execute stage | 06:45 |
---|---|---|
@stekern | or actually, perhaps that's not a problem, the same pc should be in execute so it will return back to that insn | 07:06 |
@stekern | ah, but I need to prevent the execute pc to advance in to ctrl/mem stage when the bubble nop moves from execute->ctrl/mem | 09:11 |
@stekern | ok, think I've got it working now, at least all tests except uart and the debug except tests work as they should | 10:15 |
@stekern | and the uart and debug except tests didn't work before neither | 10:15 |
@juliusb | stekern: nice one! | 10:51 |
@juliusb | yes, the hazards are annoying to handle, but have to be done | 10:51 |
@stekern | they are not only annoying, it's hard to not create slow paths when solving them | 10:54 |
@stekern | before the nop bubble that hazard handling made it go ~30MHz, now it's back up to ~85MHz | 10:55 |
@stekern | still some other stuff that is making long paths, but it's getting closer | 10:56 |
@stekern | one is an odd comb-loop with the ticktimer exception | 10:56 |
@juliusb | hmmm | 11:23 |
ams | indee. | 11:34 |
@stekern | (I might well have accidently created that loop, so don't waste time thinking about that, not just yet at least ;)) | 12:32 |
@stekern | found out what was wrong with the uart-simple test | 13:18 |
@stekern | I had screwed up the flag tracking (the one that I so proudly said I had fixed some day ago)... | 13:18 |
@stekern | I was looking at the flag_set and flag_clear signals and comparing them to the flag bit in sr | 13:19 |
@stekern | but the sr is not written until in memstage, so theres a 1-clock gap between flag_set and sr[FLAG] going high | 13:21 |
-!- derRicha1d is now known as derRichard | 16:49 | |
@stekern | think I've killed the last slow path (with an extremely ugly hack), fmax landed on 78 MHz for cyclone IV and 82 MHz for spartan6 | 18:48 |
@stekern | it's not extremely bloated neither, map report for spartan6 claims 1248 slices | 18:49 |
@stekern | the extremely ugly hack has to do with the condition where a lsu op is in mem stage stalling a branch in execute stage | 18:52 |
@stekern | a lot of logic assumes that a branch can't be stalled (in particular the fetcher) | 18:53 |
@stekern | so I tried holding off branch_occur with execute_valid (that's pretty ugly too), but that created a slow path on cyclone iv | 18:54 |
@stekern | so I'm using the bubble logic to squeeze in a nop between lsu ops and branches | 18:54 |
@stekern | ... as a temporary solution, I might add ;) | 18:56 |
@stekern | juliusb: this is interesting, comparing to the numbers you have in your presentation for cappuccino: 3211 (1234), it looks like it's smaller now: 2773 (1198) | 19:13 |
@stekern | what did you have in the system when you did those comparisons? | 19:13 |
@stekern | and how did you get or1200 to get an fmax of 95 MHz? | 19:14 |
@stekern | I mean, it never go over 50 MHz in orpsoc | 19:17 |
-!- Netsplit *.net <-> *.split quits: ams, derRichard | 22:01 | |
@juliusb | stekern: very nice work mate! | 23:01 |
@juliusb | it's annoying to try to get the best performance out of the fetch stage when taking into account the possibility of stalls elsewhere :-/ | 23:02 |
@juliusb | stekern: I think I did just a top-level wrapper for the mor1kx and or1200 to get those numbers | 23:02 |
@juliusb | and gave it a 100MHz clock I think | 23:02 |
@juliusb | to get those numbers | 23:03 |
-!- derRicha1d is now known as derRichard | 23:03 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!