stekern | hmm, seems like inetd takes ~100% cpu when or1k linux is running under qemu | 07:27 |
---|---|---|
stekern | could it be the ethmac in qemu that is acting up? | 08:17 |
juliusb | what happens if you disable that...? | 12:23 |
stekern | dunno | 12:28 |
stekern | killhupping inetd seems to "solve" it | 12:28 |
juliusb | and was that responsible for slow execution of qemu? | 12:29 |
stekern | functionally it looks pretty good at least, went through the gcc regression | 12:29 |
juliusb | cool | 12:29 |
stekern | yes, but I haven't really tested if it's much faster after the killhup | 12:30 |
juliusb | cool | 12:30 |
stekern | juliusb: what was the critical path you saw on cappucino? | 12:59 |
stekern | I'm seeing it in the pipeline forward logic | 13:02 |
juliusb | http://pastie.org/5088170 | 13:03 |
stekern | and iirc, that's where I saw it on milkymist too | 13:03 |
stekern | or, I think it's the pipeline forward logic | 13:03 |
stekern | it's from rf into rf | 13:03 |
juliusb | http://pastie.org/5088172 | 13:03 |
juliusb | they're the 2 runs on virtex 5 I did, in the 2 configs | 13:04 |
juliusb | one with multiply and one without | 13:04 |
juliusb | one is from the RF, through the ALU, calculating an address for the LSU which then goes into the data cache | 13:04 |
juliusb | fair enough | 13:04 |
juliusb | that'd be easy enough to register, imposing a cycle latency on the dcache lookups | 13:05 |
juliusb | mmm, maybe I had caches turned off in the "smaller" implementation | 13:06 |
juliusb | let me check | 13:06 |
juliusb | yeah, cache disabled in teh first pastie\ | 13:06 |
juliusb | hence path is different | 13:06 |
juliusb | http://pastie.org/5088177 | 13:07 |
juliusb | that's spartan6, with cache and multipliers | 13:07 |
juliusb | different path again | 13:07 |
stekern | heh | 13:09 |
stekern | http://pastie.org/5088177 <- I don't quite get that | 14:26 |
stekern | why is it going through lsu into the icache? | 14:26 |
juliusb | umm lsu valid signal (which can stall the execute stage) going through teh control unit and controlling whether the fetch stage should advance | 14:51 |
stekern | ah, right | 15:01 |
stekern | juliusb: I just boldly deleted 32 lines of code, hope you will not miss'em too much ;) | 16:21 |
stekern | https://github.com/openrisc/mor1kx/commit/b3fdc8374ddc6bfca30c159810eb3aa7f4cdf7b7 | 16:21 |
juliusb | nps :) all cappuccino-specific stuff, so I guess if that still passes regression then all good | 16:35 |
stekern | yeah, you could probably rip that out in espresso too, the signals are just dangling | 16:37 |
juliusb | there's a bit of that around the place | 17:10 |
juliusb | we also should factor out the tick timer and pic logic from the ctrl stages to neaten things up | 17:10 |
stekern | mmm, and look into what more could be made generic | 17:38 |
stekern | (I don't think there was a difference between the espresso rf and cappocino rf before I just ripped out those 32 lines) | 17:38 |
stekern | it has to be done with consideration though, if you try to make it too much generic changes to one pipeline might break others otherwise | 17:39 |
stekern | I take back what I said about rf, there are some other differences | 17:49 |
juliusb | yeah im pretty sure there's differnces, otherwise I would have kept them separate to keep with the mission statement of module reuse within the processor | 18:27 |
stekern | :) | 19:02 |
juliusb | ok, now i have a hack to mor1kx-dev-env to allow you to point a variable to the GCC source directory and it will build and run all of the c torture tests | 19:03 |
juliusb | will rely the newlib startup code etc | 19:03 |
juliusb | obviously it's not doing all of the testing of the gcc testsuite because there's a lot of stuff per test which will test different compiler optimisations | 19:04 |
juliusb | but i'm just interested in getting a bucketload of C code to run on the processor | 19:04 |
juliusb | so something like: make vlt-tests GCC_TESTS=1 GCC_SRC=/localhome/jules/git/openrisc/or1k-gcc | 19:06 |
juliusb | will run the verilator model against each file | 19:06 |
juliusb | 1200-odd tests | 19:06 |
juliusb | better than nothing :) | 19:06 |
stekern | cool | 19:09 |
stekern | juliusb: http://pastebin.com/t3XE1HHu | 19:09 |
stekern | how does that look? | 19:09 |
stekern | I'm mostly wondering about the generate logic, functionality I have already tested | 19:10 |
juliusb | yeah cool, adding a ffl1 registering stage, and thus valid signal? np | 19:10 |
juliusb | can you name the ffl1 wire ffl1_result? | 19:11 |
juliusb | hmm or like | 19:11 |
juliusb | ffl1_result_wire | 19:11 |
stekern | yeah, sure | 19:12 |
juliusb | was that a long path you were seeing? | 19:12 |
stekern | on cyclone iv it's the most critical path | 19:13 |
juliusb | wow | 19:13 |
juliusb | fair enough | 19:13 |
juliusb | with cache enabled? | 19:13 |
juliusb | i love looking at or1knd disassembly :) | 19:13 |
juliusb | so much nicer than having all those delay slots | 19:13 |
stekern | both with and without | 19:14 |
juliusb | ah intersting | 19:18 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!