--- Log opened Sat Apr 06 00:00:03 2013 | ||
-!- Netsplit *.net <-> *.split quits: jonibo, simoncook, LoneTech, zhai365, rolloTomasi, orsoc1_ | 00:31 | |
stekern | I've got branch resolving in decode stage pretty stable now, all tests passes and it boots linux in the cycle accurate model and on de0-nano | 03:49 |
---|---|---|
stekern | and it seems the effort was not all together wasted: http://pastie.org/7336326 | 03:49 |
stekern | so about a 9% increase in coremark | 03:54 |
stekern | from 84.7 to 92.3 | 03:54 |
stekern | let's see about dhrystone, 1.38 was the result before moving the branches | 03:56 |
stekern | http://pastie.org/7336391 | 03:57 |
stekern | so, about 6% increase there (1.38->1.46) | 03:58 |
stekern | a bit of regression in the fmax department though, only ~70 MHz now | 05:25 |
stekern | it's actually the same path that olofk saw on virtex6, l.sfxxx -> l.bf | 05:26 |
stekern | but now it's worse, since the branch signal isn't registered into execute stage | 05:27 |
stekern | it looks like the 'a < b' operation is the one that is causing most of the trouble, wonder if there is a simpler way to calculate that condition | 11:32 |
stekern | simpler in terms of logic elements that is.. | 11:32 |
stekern | I guess the adder result and the carry could be used for that, it has a fair amount of logic attached before it though, in the b_mux. some of that logic could perhaps be moved to decode (since it is actually decoding the instruction) | 13:04 |
@juliusb | awesome work stekern !! | 13:04 |
stekern | thanks ;) | 13:04 |
@juliusb | is that at 50MHz? | 13:05 |
stekern | we do pretty much 2*or1200 performance in coremark now | 13:05 |
stekern | yes, 50 MHz | 13:05 |
@juliusb | so getting close to 2 CoreMarks/MHz | 13:05 |
stekern | yup | 13:07 |
stekern | there's still the 'get 1-cycle stores' trick left to do | 13:09 |
@juliusb | wow | 13:09 |
@juliusb | that'll be a lean mean Linux machine :) | 13:10 |
@juliusb | I really should look at doing stuff like that for the other pipelines | 13:10 |
stekern | after that, I think we pretty much max out what the ISA gives us for a single issue, in order pipeline | 13:11 |
stekern | doesn't the wb spec allows for 'pipelined' writes without using burst? | 13:13 |
@juliusb | not sure really | 13:15 |
@juliusb | posted pipelined writes? | 13:16 |
@juliusb | stekern: regarding de0 nano build - you don't simulate it much, do you? | 13:16 |
stekern | no, just registered writes, I guess | 13:16 |
@juliusb | why do you want pipelined writes without using burst? | 13:17 |
stekern | I have simulated it some time, long ago ;) | 13:17 |
@juliusb | modelsim choked on something in mor1kx | 13:18 |
@juliusb | the declaratin of write_pending as a reg after it's read before that in mor1kx_dcache.v | 13:18 |
stekern | you need to pull openrisc/mor1kx ;) | 13:19 |
@juliusb | -"before that" | 13:19 |
@juliusb | oh that's fixed? | 13:19 |
@juliusb | ok | 13:19 |
stekern | it's already fixed there | 13:19 |
@juliusb | didn't pull yesterday I think | 13:19 |
stekern | a bit annoying that verilator doesn't catch those | 13:20 |
@juliusb | ya | 13:20 |
@juliusb | does icarus? | 13:20 |
@juliusb | I don't think it does? | 13:20 |
stekern | nope | 13:20 |
@juliusb | what aobut synthesis tools? | 13:20 |
stekern | quartus doesn't, xst does | 13:20 |
@juliusb | wow, inconsistent | 13:21 |
@juliusb | who cares, between icarus, verilator, modelsim, xst, quartus | 13:21 |
stekern | I want to do 1-cycle writes without needing to have writeback cache | 13:22 |
@juliusb | and the odd run with synopsys Design Compiler, I'm sure we'll have everything covered | 13:22 |
@juliusb | ah I see | 13:22 |
stekern | because right now it's the wb bus that is the bottle neck, you can write to sdram in 1-cycle | 13:23 |
@juliusb | yeah just have a posted writes thing on th ebus interface | 13:23 |
@juliusb | only stall if there's something else coming up after it which requires the bus is available (cache miss or another write perhaps, but even in the case of another write ideally that doesn't stall, either.) | 13:23 |
@juliusb | maybe a posted-writes FIFO? | 13:24 |
stekern | yeah, but it still feels silly, when the (main) memory actually can handle 1 write per cycle | 13:26 |
@juliusb | I think it makes it more generic to have that capability... | 13:29 |
@juliusb | buses always introduce annoying latency | 13:31 |
stekern | oh, I agree, but I'm still wondering if it's permitted by the wb spec to do registered writes | 13:32 |
@juliusb | so I notice the de0 nano's bootrom address offset is 0x0b0000 | 13:33 |
@juliusb | I guess the same SPI using the FPGA image also houses a little bit of software? | 13:33 |
@juliusb | s/using/storing/ | 13:34 |
stekern | because even with a posted-write fifo/store buffer, that will fill up when the wb bus is doing 2-cycle stores | 13:34 |
stekern | yes, you can snuck in a u-boot and linux in there ;) | 13:34 |
@juliusb | very good :) | 13:34 |
@juliusb | sure, it'll fill up - I'd say you'd want to size a buffer, though, so it could post up to 16 writes, for a big stack save to be allowed to execute single cycle | 13:35 |
stekern | I agree, again, and that's something we should do, in the future | 13:36 |
stekern | but perhaps a 1-word store buffer in the cache would be something that could help a little, at least on the occasional 1-word stores | 13:43 |
stekern | that would be easy to implement | 13:43 |
stekern | I'll stick to my plan though, clean up this branch in decode stage, add dbus burst support and then start optimising for area and speed | 13:45 |
@juliusb | cool | 13:48 |
stekern | http://pastie.org/7340535 | 13:50 |
stekern | that's how you do a flash image for de0-nano | 13:50 |
@juliusb | nice ok thanks :) | 13:51 |
stekern | don't thank me, it was Gong Tao that instructed me ;) | 13:51 |
@juliusb | who's that? | 13:51 |
@juliusb | I'm only simulating things at present | 13:51 |
@juliusb | synthesis works but want to check everything in sim first | 13:51 |
stekern | a guy that did a hit and run with a couple of linux, de0 nano and u-boot fixes | 13:52 |
@juliusb | :) | 13:52 |
@juliusb | a good kind of hit-and-run | 13:52 |
stekern | yup | 13:52 |
@juliusb | does the SDRAM controller take a while to start up on the de0 nano? | 13:57 |
@juliusb | like a powerup delay or osmething? | 13:57 |
stekern | yes | 13:59 |
stekern | you might want to decrease that when you simulate | 13:59 |
@juliusb | yup | 13:59 |
stekern | there's a POWERUP_DELAY parameter | 14:00 |
stekern | it's in us, and default is 200 | 14:01 |
@juliusb | ah I see! | 14:02 |
@juliusb | I was looking around in the guts of the versatile SDRAM controller | 14:03 |
stekern | don't ;) | 14:03 |
@juliusb | I used to have to do that for a living | 14:03 |
@juliusb | now I don't, and if I don't have to, I won't :) | 14:04 |
stekern | it's using my fancy pancy controller | 14:04 |
stekern | fancy pancy and riddled with bugs ;) | 14:04 |
@juliusb | ya | 14:04 |
stekern | but I think I've killed most of them by now | 14:04 |
@juliusb | you wrote your ownt though in the end it looks like? | 14:05 |
stekern | yup, from scratch | 14:05 |
@juliusb | and it looks a lot better! | 14:05 |
@juliusb | truly legendary | 14:05 |
stekern | I could do that, but I was too lazy to change the define from VERSATILE_SDRAM ;) | 14:05 |
@juliusb | hah :) | 14:06 |
stekern | juliusb: you are using the combinatorial decode in (pronto)espresso, right? | 14:51 |
stekern | I'm asking because I'm thinking we perhaps should break out the registered decode from the mor1kx_decode.v and make that purely combinatorial | 14:55 |
@juliusb | make mor1kx_decode purely comb? | 14:55 |
@juliusb | ya registering is disabled in both espresso cores | 14:56 |
stekern | because I'm adding a fair amount of 'cappuccino' specific stuff in there now, and I've got a feeling it'd be cleaner to make a mor1kx_decode_cappuccino.v wrapper that does all the registering and other cruft | 14:57 |
@juliusb | ah OK so instantiate the decode unit in your cappuccino wrapper? | 14:57 |
stekern | and that would then consume the generic mor1kx_decode.v | 14:57 |
@juliusb | and therefore have no need for registering? | 14:57 |
stekern | yes, exactly | 14:57 |
@juliusb | ya | 14:57 |
@juliusb | btw I spotted some not-very-nice behaviour in de0 nano | 14:58 |
stekern | what kind of? | 14:59 |
@juliusb | basically the dbus_arbiter uses the registered slave decode to control which slave's signals are driven | 14:59 |
@juliusb | but there's some other signals which are directly decoded and put out combinatorially | 14:59 |
@juliusb | so cyc and stb out of the arbiter_debus uses wbm_cyc_o & wb_slave_sel_r[N] where N is the slave number | 15:00 |
@juliusb | wbm_cyc_o is combinatorially muxed through from one of the two bus masters | 15:01 |
@juliusb | but, for a cycle the slave select guy is still pointing at the previous master | 15:01 |
@juliusb | so in the case where it was last pointing at the SDRAM but then points at the SPI0 block, it gives SDRAM a single cycle of indicating it wants to write | 15:02 |
@juliusb | so, a slution is to register all of the master's signals | 15:02 |
@juliusb | (I think that's a good idea) | 15:02 |
@juliusb | or combinatorially decode which slave should be selected | 15:03 |
stekern | umm, it's probably me that has done that (half-assly) | 15:09 |
stekern | but comb decode is no good | 15:09 |
stekern | no, that's not me, but I have maybe noticed the same thing you are noticing now, because that sel_r sounds awfully familiar, I have fiddled with it at some point | 15:12 |
@juliusb | ya we can probably just register the incoming master selects | 15:14 |
stekern | yeah, I have removed the _r in some old mor1kx dev repo of mine | 15:15 |
@juliusb | ok I've just registered the cyc/stb from the master | 15:22 |
@juliusb | appears to work | 15:22 |
@juliusb | so Im going to look at putting thi son the board now - is there a guide anywhere to setting up the debug stuff over the adv_dbg_bridge? | 15:23 |
@juliusb | _franck_: your openOCD port supports using adv_dbg_sys reight? | 15:28 |
@juliusb | right, even | 15:28 |
stekern | it does, I'm using that, but I can't for the life find the config options | 15:31 |
stekern | it's probably somewhere in the irc logs | 15:32 |
stekern | ./configure --enable-usb_blaster_libftdi --enable-adv_debug_sys --enable-altera_vjtag--enable-maintainer-mode | 15:33 |
stekern | http://www.elec4fun.fr/2011-03-30-10-16-30/2012-08-22-20-50-31/or1k-on-de1-openocd | 15:40 |
stekern | http://juliusbaxter.net/openrisc-irc/%23openrisc.2012-11-21.log.html#t15:56 | 15:42 |
@juliusb | cool | 18:14 |
stekern | I've been using or1k-elf-gdb lately btw, haven't noticed any problems with that | 18:20 |
stekern | otoh, I'm mostly using it to load stuff into memory and reset the pc | 18:20 |
@juliusb | haha | 18:25 |
@juliusb | not single stepping? | 18:25 |
@juliusb | _franck_ did a bunch of work, right? | 18:25 |
stekern | yeah, he brought everything up to date with the latest gdb AFAIK | 18:26 |
@juliusb | nice | 18:26 |
@juliusb | another man who deserves a beer or 8 | 18:26 |
@juliusb | :) | 18:26 |
stekern | well, after reloading linux, I usually do: spr npc 0x100; si; si | 18:27 |
@juliusb | cool | 18:27 |
@juliusb | is all of his work on the openrisc/or1k-src repo? | 18:27 |
stekern | just to get it out of itlb missing | 18:27 |
stekern | yes, just do the build instructions, but with enable-gdb instead | 18:28 |
stekern | I just tested espresso on de0-nano, for the fun of it | 18:29 |
stekern | I don't think I've ran that on real hw before | 18:29 |
stekern | of course I had to do dhry and coremark | 18:30 |
stekern | http://pastie.org/7343163 | 18:30 |
@juliusb | it'll be bad because of no caches | 18:32 |
stekern | it's of course unfair, it has no caches and connected to sdram, would be interested to see that with blockram | 18:32 |
@juliusb | ya | 18:32 |
stekern | one odd thing is, I get worse coremark and dhry scores when I run in the cycle accurate model than on de0-nano for cappuccino | 18:33 |
@juliusb | hmmmm..... | 18:43 |
@juliusb | with a big internal SRAM? | 18:43 |
@juliusb | maybe the timing isn't being calculated correctly? | 18:43 |
@juliusb | would be interesting to just run a raw cycle counter to see exactly how long it really takes in both situations | 18:45 |
@juliusb | when enabling just gdb I get a build error: No rule to make target `../libgui/src/libgui.a | 18:46 |
@juliusb | got to run, bbl | 18:47 |
stekern | hmm, odd | 18:47 |
stekern | worked fine here (tm) | 18:47 |
win2mac | hello guys i want to start playing around with fpga on linux. Which fpga board should i get? | 18:55 |
stekern | what's your budget? and what kind of peripherals would you like? | 19:01 |
win2mac | i just want to experiment with cpu design .. i was thinking about de0 nano | 19:02 |
stekern | that's a good choice | 19:03 |
win2mac | i am also considering buying logic analyzer http://dangerousprototypes.com/open-logic-sniffer/ | 19:04 |
stekern | that I have no experience with | 19:05 |
win2mac | that one is pretty cheap but has limited frequency range | 19:07 |
win2mac | should i get standalone unit? | 19:11 |
stekern | what do you intend to use it for? | 19:11 |
win2mac | question is if i really need it at beginning | 19:15 |
win2mac | i will get de0 nano and will see | 19:16 |
stekern | you can get pretty far by using internal logic analysers | 19:19 |
win2mac | you mean fpga internal la? | 19:20 |
stekern | yup | 19:21 |
win2mac | ok | 19:21 |
win2mac | i am kind of hobbyist because i will learn fpga at collage :) | 19:22 |
stekern | cool, what college? | 19:26 |
win2mac | where are you from? | 19:27 |
stekern | sweden, but currently living in finland | 19:28 |
win2mac | czech republic | 19:28 |
win2mac | i want to go abroad to study | 19:31 |
mor1kx | [mor1kx] skristiansson pushed 12 new commits to master: https://github.com/openrisc/mor1kx/compare/f8a7fdc1a8d1...001f4fd7d9c5 | 21:09 |
mor1kx | mor1kx/master 2450048 Stefan Kristiansson: cappuccino/fetch: generate NOPs on reset and pipeline flush | 21:09 |
mor1kx | mor1kx/master 70eb105 Stefan Kristiansson: decode: generate jal result | 21:09 |
mor1kx | mor1kx/master 5437177 Stefan Kristiansson: cappuccino: use jal result from decode... | 21:09 |
--- Log closed Sun Apr 07 00:00:04 2013 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!