@stekern | wtf, I remove the slowest/most critical path that ISE found, then it comes up with an _even_ slower path... | 06:02 |
---|---|---|
@stekern | juliusb: I'm starting to believe that your "early branch detection" actually could increase fmax on cappuccino | 09:12 |
@stekern | since it seems that most of our timing problems are related to detecting branches and calculating the branch target | 09:20 |
@stekern | splitting them up might make that net more lightweight | 09:20 |
@stekern | and atm, if I have got this correctly, all branches are actually detected in execute stage | 09:21 |
@stekern | for cappuccino | 09:21 |
@stekern | so, what branches was it that you "detect early" in prontoespresso? | 09:26 |
@stekern | I guess I can't avoid waiting for execute stage to do register jumps | 09:27 |
@juliusb | stekern: check out my mor1kx repo, the fetch_prontoespresso.v in there | 12:24 |
@juliusb | I detect all, so I can halt the bus burst access, and all but the jumps using register addresses I calculate where we're jumping to and immediately put it out on the bus | 12:24 |
@juliusb | there's some optimisation to do in terms of removing the logic in the ctrl stage which looks at the jump target from the execute stage (so that stuff gets optimised away and we only have 1 set of decode logic, hopefully, for the jump/branch instructions, but that's for later) | 12:25 |
@stekern | ok, sounds good | 12:26 |
@juliusb | I think also maybe syscalls, traps | 12:26 |
@juliusb | i'm not sure how big the decode logic is tbh | 12:26 |
@stekern | what I have in mind for cappuccino would be to move the branch calculation that is possible to decode stage, register it so it's available at the same time as it is now. And register the rest and take the 1 cycle latency for that | 12:28 |
@stekern | (techically, the ones that get moved get 1 cycle latency too, but since we move them to an earlier stage, we should not see performace regression with that) | 12:29 |
@stekern | of course, if it's no trouble to feed the output from the decode stage directly, we'll do that | 12:30 |
@stekern | I think it's exception and addresses from registers that are causing most troubles | 12:31 |
@stekern | with some "proof-of-concept" hacks, I get the most critical path going from rf through dcache back to rf, which should be possible to handle with split up mem/wb stages | 12:32 |
@stekern | and with some even more "proof-of-concept" hacks on that I get fmax to be 85 MHz for de0-nano and orpsoc | 12:33 |
@juliusb | what do you mean by "register the rest"? do you mean register the l.j[al]r instructions in execute stage? | 12:36 |
@stekern | no, just the branch_occur and branch_target | 12:37 |
@juliusb | the reason I was calculating the branch address in the execute stage was mainly to save logic by using the ALU's adder | 12:37 |
@juliusb | ahh right OK i'm with you | 12:37 |
@juliusb | but it's hard not to have a dedicated adder for branch targets | 12:38 |
@stekern | (alu reuse) yeah, I know | 12:38 |
@juliusb | as we've found out ;) plus you always take a latency hit on your jumps/branches so it's worth throwign some logic at making those operations as fast as possible | 12:38 |
@stekern | I agree | 12:39 |
_franck_ | @VPI experts: http://pastie.org/5345888 | 16:14 |
_franck_ | how am I debugging this ? | 16:14 |
_franck_ | I tried to compile jtag_vpi.c with g flag then I got: | 16:15 |
_franck_ | http://pastie.org/5345902 | 16:15 |
_franck_ | it doesn't help that much... | 16:15 |
_franck_ | any idea ? | 16:16 |
@juliusb | _franck_: wow, I'm not really sure to be honest | 17:03 |
@juliusb | stack smashing haha | 17:03 |
_franck_ | it just crash at some point... :) | 17:03 |
@juliusb | I'm not too sure of how different things are with -g and what might be doing some bad things to the stack | 17:06 |
@juliusb | did you do a complete clean and rebuild of everything? | 17:06 |
@juliusb | perhaps changing one bit and not recompiling the whole thing is bad | 17:06 |
@juliusb | i'd dlete everything and rebuilt | 17:06 |
@juliusb | rebuild | 17:06 |
@juliusb | all VPI-related object files, all Icarus-generated things | 17:06 |
_franck_ | I'll try clean everything.... | 17:07 |
_franck_ | so there is no way you'd know I could gdb the jtap_vpi.o ? | 17:07 |
_franck_ | the C part of the VPi object | 17:08 |
_franck_ | I'll debug with prints | 17:08 |
@juliusb | Oh I'm sure you could GDB to it somehow | 17:08 |
@juliusb | just run the vvp under gdb and then run it like you normally would and just set a breakpoint where you want it | 17:08 |
@juliusb | I imagine that'd work OK? | 17:08 |
_franck_ | yeah I'll try | 17:09 |
@juliusb | but i'd clean everything | 17:09 |
@juliusb | as Icarus is nice and open source you could even recompile that with all the debugging on :) | 17:10 |
_franck_ | that will be the next step if I can't find anything | 17:11 |
@juliusb | but debugging with printf() might be the quickest and easiest | 17:12 |
_franck_ | I can connect to the sim with openOCD, read JTAG id, dump resgisters | 17:14 |
@juliusb | awesome | 17:14 |
_franck_ | now when I load I got this bug | 17:14 |
_franck_ | http://pastie.org/5346157 | 17:15 |
_franck_ | gdb talk a bit more | 17:15 |
_franck_ | I'll see what I can do | 17:15 |
@juliusb | oh, looks like a genuinue misuse of something in the VPI code | 17:17 |
@juliusb | you triggered an assertion in Icarus | 17:17 |
@juliusb | send_result_to_server (userdata=0x0) at jtag_vpi.c:243 | 17:19 |
@juliusb | whatever you're passing there it doesn't like | 17:19 |
_franck_ | what is this pointer anyway ? | 17:20 |
_franck_ | userdata ? | 17:20 |
@juliusb | something which is passed to that function of yours, send_result_to_server() isn't it? | 17:20 |
@juliusb | I would have to see the code of that function | 17:20 |
_franck_ | is implicitly (? do we say that) used by vpi_scan or other vpi functions ? | 17:21 |
@juliusb | huh? I don't get you | 17:24 |
@juliusb | you're asking if this function is implicitly used? | 17:24 |
_franck_ | the userdata pointer | 17:24 |
@juliusb | I have 0 idea, is this code you wrote or stuff already the VPI libs? | 17:24 |
_franck_ | http://pastie.org/5346206 | 17:24 |
@juliusb | oh yeah tha tlooks like code I wrote | 17:25 |
_franck_ | I took a lot from jp_vpi.c | 17:25 |
_franck_ | already in orpsoc | 17:25 |
@juliusb | yep, umm, it's called in a system cal I think | 17:25 |
@juliusb | $send_result_to_server() | 17:25 |
@juliusb | in the verilog, I think | 17:25 |
_franck_ | ok, anyway, that's not the point... | 17:26 |
@juliusb | well that's where your crash is | 17:26 |
_franck_ | yes, I'll printf around to see what is going on | 17:27 |
@juliusb | you should check if argh is null or something | 17:27 |
@juliusb | and argval | 17:27 |
@juliusb | i would double check if they're null | 17:27 |
_franck_ | yeah, I'll check everything | 17:28 |
@juliusb | it looks like it expects a length integer, and array of data, if your verilog isn't using the function like that then it'll probably crash | 17:29 |
_franck_ | this function works at some point before the crash | 17:30 |
@juliusb | how and where are you using it? | 17:31 |
@juliusb | the $send_result_to_server() or whatever | 17:31 |
_franck_ | flip_tms = 0; | 17:32 |
_franck_ | do_scan_chain; | 17:32 |
_franck_ | $send_result_to_server(length, buffer_in); | 17:32 |
_franck_ | integer length; | 17:32 |
_franck_ | reg [31:0] buffer_in [0:4095]; | 17:32 |
@juliusb | you sure that's the only place? | 17:32 |
_franck_ | no, in two difference case(cmd) | 17:33 |
_franck_ | http://pastie.org/5346260 | 17:34 |
_franck_ | see | 17:34 |
_franck_ | `CMD_SCAN_CHAIN and `CMD_SCAN_CHAIN_FLIP_TMS | 17:34 |
@juliusb | it looks like you've rmoved some checking I did, I had this same code in a couple of functions which double checked the type it got as vpiMemory (or, vpiRegArray as Modelsim liked to have it called) | 17:34 |
_franck_ | yes, it was a non need check in my case and I wanted to have less code lines ;) | 17:35 |
@juliusb | haha yep, but it may not catch the bug anyway, perhaps it's the first get-value thing | 17:36 |
@juliusb | so, I don't know man, are you sure there's no other variables named integer? | 17:36 |
@juliusb | I would gdb through it then and double check what gets passed in there | 17:36 |
@juliusb | and I would delete and rebuild everything | 17:36 |
_franck_ | yeah don't worry, I'll find, that the game we play :) | 17:36 |
@juliusb | indeed | 17:37 |
@juliusb | :) | 17:37 |
@juliusb | removing checking stuff from code like that is usually not a good idea, I probably did it for a reason ;) | 17:38 |
@juliusb | i know it saves space, but it may not save you time in the end | 17:38 |
_franck_ | probably, I'll put it again, it was planed anyway for the release version | 17:39 |
@olofk | the userdata pointer is generally not needed | 17:43 |
@olofk | Many of the examples include it, but you can remove it safely. | 17:43 |
@olofk | IIRC, you can only assign that pointer when you register the systf. Might be wrong on that one, though | 17:45 |
@olofk | One fun part of VPI is that you can register syscalls during a simulation. I have absolutely no idea when that could be useful | 17:51 |
@juliusb | haha | 17:53 |
@juliusb | omg | 17:53 |
@juliusb | have code on the simulated CPU write its own syscall\ | 17:53 |
@olofk | :) | 17:53 |
@juliusb | that's got to be a feature of skynet | 17:54 |
@juliusb | haha that would be awesome, seriously | 17:54 |
@olofk | Yeah, I was actually thinking of creating the world's first verilog virus | 17:54 |
@juliusb | It's C though, you'd kinda of need to write a lot of stuff in the VPI world to do it | 17:54 |
@olofk | To be more precise, it's shared objects, so I guess you could use whatever language you want as long as you provide the right hooks | 17:55 |
@juliusb | hmmm | 17:55 |
@juliusb | so some sort of C-Python thing | 17:55 |
@olofk | I know that there is some python-based test environment that uses VPI | 17:56 |
@juliusb | and have the software on the processor emit Python strings which could then be evaluated in the Python via VPI | 17:56 |
@juliusb | omg pretty sweet | 17:56 |
@olofk | python's exec function could work there | 17:56 |
@juliusb | yep | 17:56 |
@juliusb | but then... the systemcall in the verilog would need to be precompiled? | 17:57 |
@olofk | I'm going to implement callbacks in the orpsocv3 .core files using exec. Just hope that no one writes evil core descriptions :) | 17:57 |
@juliusb | .core files? | 17:57 |
@juliusb | a thing for ORPSoCv3? | 17:57 |
@olofk | yes | 17:57 |
@olofk | Well, a bit off-topic, but I just discovered python's exec. Still happy about that :) | 17:58 |
@juliusb | Oh | 17:58 |
@juliusb | no way? | 17:58 |
@olofk | yes way | 17:59 |
@juliusb | I love it, we embed all sorts of Python in our scripts around here and exec() away on the fly | 17:59 |
@olofk | c00l. I would love to replace large parts of our scripting environment with something better (python-based) | 18:00 |
@olofk | Never been a big tcl fan | 18:01 |
@olofk | juliusb: How do you run test cases for mor1kx? Is it the same elf to bin to vmem mechanism as in orpsocv2? | 18:29 |
@juliusb | umm I think so | 18:53 |
@juliusb | TCL is the worst thing ever | 18:53 |
@juliusb | vs Python it's like chalk and cheese | 18:53 |
@olofk | jeremybennett: I'm trying to break out cpu/parse.c from or1ksim, but it looks like a bit more work than I anticipated. Where is oraddr_t declared, for example? | 19:50 |
@olofk | aha! in arch.h | 19:50 |
@olofk | Are we using coff files anymore? | 20:05 |
_franck_ | juliusb: it was a stack smashing... | 20:32 |
_franck_ | an out of bound index in an array | 20:33 |
_franck_ | nothing to do with vpi_get_value(array_word, &argval) :) | 20:35 |
@olofk | Hey, can someone please ack my declare pcreg_boot before use patch? It's a bit problematic that or1200 is unusable at the moment | 20:36 |
@olofk | in modelsim at least | 20:36 |
@juliusb | olofk: silence = ack | 20:59 |
@juliusb | but, I'll pass an eye over it | 20:59 |
@juliusb | you're doing an assign though, I guess this is some v2k thing? | 21:00 |
@olofk | yes. I think we are ready to use a feature that has been around for 11 years now ;) | 21:23 |
@stekern | olofk: I reckon we should just post, commit, post a note it's commited on straight forward stuff like that | 21:38 |
@olofk | Yeah, I was on the verge of doing that. It was a trivial fix | 21:51 |
_franck_ | as I'm new to verilog and icarus, I wonder if someone can explain why i see this ?! | 23:54 |
_franck_ | http://picpaste.com/pics/franck-9jkS60mr.1352411631.png | 23:54 |
_franck_ | it is a basic for loog | 23:55 |
_franck_ | loop | 23:55 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!