IRC logs for #openrisc Saturday, 2013-04-06

--- Log opened Sat Apr 06 00:00:03 2013
-!- Netsplit *.net <-> *.split quits: jonibo, simoncook, LoneTech, zhai365, rolloTomasi, orsoc1_00:31
stekernI've got branch resolving in decode stage pretty stable now, all tests passes and it boots linux in the cycle accurate model and on de0-nano03:49
stekernand it seems the effort was not all together wasted:
stekernso about a 9% increase in coremark03:54
stekernfrom 84.7 to 92.303:54
stekernlet's see about dhrystone, 1.38 was the result before moving the branches03:56
stekernso, about 6% increase there (1.38->1.46)03:58
stekerna bit of regression in the fmax department though, only ~70 MHz now05:25
stekernit's actually the same path that olofk saw on virtex6, l.sfxxx -> l.bf05:26
stekernbut now it's worse, since the branch signal isn't registered into execute stage05:27
stekernit looks like the 'a < b' operation is the one that is causing most of the trouble, wonder if there is a simpler way to calculate that condition11:32
stekernsimpler in terms of logic elements that is..11:32
stekernI guess the adder result and the carry could be used for that, it has a fair amount of logic attached before it though, in the b_mux. some of that logic could perhaps be moved to decode (since it is actually decoding the instruction)13:04
@juliusbawesome work stekern !!13:04
stekernthanks ;)13:04
@juliusbis that at 50MHz?13:05
stekernwe do pretty much 2*or1200 performance in coremark now13:05
stekernyes, 50 MHz13:05
@juliusbso getting close to 2 CoreMarks/MHz13:05
stekernthere's still the 'get 1-cycle stores' trick left to do13:09
@juliusbthat'll be a lean mean Linux machine :)13:10
@juliusbI really should look at doing stuff like that for the other pipelines13:10
stekernafter that, I think we pretty much max out what the ISA gives us for a single issue, in order  pipeline13:11
stekerndoesn't the wb spec allows for 'pipelined' writes without using burst?13:13
@juliusbnot sure really13:15
@juliusbposted pipelined writes?13:16
@juliusbstekern: regarding de0 nano build - you don't simulate it much, do you?13:16
stekernno, just registered writes, I guess13:16
@juliusbwhy do you want pipelined writes without using burst?13:17
stekernI have simulated it some time, long ago ;)13:17
@juliusbmodelsim choked on something in mor1kx13:18
@juliusbthe declaratin of write_pending as a reg after it's read before that in mor1kx_dcache.v13:18
stekernyou need to pull openrisc/mor1kx ;)13:19
@juliusb-"before that"13:19
@juliusboh that's fixed?13:19
stekernit's already fixed there13:19
@juliusbdidn't pull yesterday I think13:19
stekerna bit annoying that verilator doesn't catch those13:20
@juliusbdoes icarus?13:20
@juliusbI don't think it does?13:20
@juliusbwhat aobut synthesis tools?13:20
stekernquartus doesn't, xst does13:20
@juliusbwow, inconsistent13:21
@juliusbwho cares, between icarus, verilator, modelsim, xst, quartus13:21
stekernI want to do 1-cycle writes without needing to have writeback cache13:22
@juliusband the odd run with synopsys Design Compiler, I'm sure we'll have everything covered13:22
@juliusbah I see13:22
stekernbecause right now it's the wb bus that is the bottle neck, you can write to sdram in 1-cycle13:23
@juliusbyeah just have a posted writes thing on th ebus interface13:23
@juliusbonly stall if there's something else coming up after it which requires the bus is available (cache miss or another write perhaps, but even in the case of another write ideally that doesn't stall, either.)13:23
@juliusbmaybe a posted-writes FIFO?13:24
stekernyeah, but it still feels silly, when the (main) memory actually can handle 1 write per cycle13:26
@juliusbI think it makes it more generic to have that capability...13:29
@juliusbbuses always introduce annoying latency13:31
stekernoh, I agree, but I'm still wondering if it's permitted by the wb spec to do registered writes13:32
@juliusbso I notice the de0 nano's bootrom address offset is 0x0b000013:33
@juliusbI guess the same SPI using the FPGA image also houses a little bit of software?13:33
stekernbecause even with a posted-write fifo/store buffer, that will fill up when the wb bus is doing 2-cycle stores13:34
stekernyes, you can snuck in a u-boot and linux in there ;)13:34
@juliusbvery good :)13:34
@juliusbsure, it'll fill up - I'd say you'd want to size a buffer, though, so it could post up to 16 writes, for a big stack save to be allowed to execute single cycle13:35
stekernI agree, again, and that's something we should do, in the future13:36
stekernbut perhaps a 1-word store buffer in the cache would be something that could help a little, at least on the occasional 1-word stores13:43
stekernthat would be easy to implement13:43
stekernI'll stick to my plan though, clean up this branch in decode stage, add dbus burst support and then start optimising for area and speed13:45
stekernthat's how you do a flash image for de0-nano13:50
@juliusbnice ok thanks :)13:51
stekerndon't thank me, it was Gong Tao that instructed me ;)13:51
@juliusbwho's that?13:51
@juliusbI'm only simulating things at present13:51
@juliusbsynthesis works but want to check everything in sim first13:51
stekerna guy that did a hit and run with a couple of linux, de0 nano and u-boot fixes13:52
@juliusba good kind of hit-and-run 13:52
@juliusbdoes the SDRAM controller take a while to start up on the de0 nano?13:57
@juliusblike a powerup delay or osmething?13:57
stekernyou might want to decrease that when you simulate13:59
stekernthere's a POWERUP_DELAY parameter14:00
stekernit's in us, and default is 20014:01
@juliusbah I see!14:02
@juliusbI was looking around in the guts of the versatile SDRAM controller14:03
stekerndon't ;)14:03
@juliusbI used to have to do that for a living14:03
@juliusbnow I don't, and if I don't have to, I won't :)14:04
stekernit's using my fancy pancy controller14:04
stekernfancy pancy and riddled with bugs ;)14:04
stekernbut I think I've killed most of them by now14:04
@juliusbyou wrote your ownt though in the end it looks like?14:05
stekernyup, from scratch14:05
@juliusband it looks a lot better!14:05
@juliusbtruly legendary14:05
stekernI could do that, but I was too lazy to change the define from VERSATILE_SDRAM ;)14:05
@juliusbhah :)14:06
stekernjuliusb: you are using the combinatorial decode in (pronto)espresso, right?14:51
stekernI'm asking because I'm thinking we perhaps should break out the registered decode from the mor1kx_decode.v and make that purely combinatorial14:55
@juliusbmake mor1kx_decode purely comb?14:55
@juliusbya registering is disabled in both espresso cores14:56
stekernbecause I'm adding a fair amount of 'cappuccino' specific stuff in there now, and I've got a feeling it'd be cleaner to make a mor1kx_decode_cappuccino.v wrapper that does all the registering and other cruft14:57
@juliusbah OK so instantiate the decode unit in your cappuccino wrapper?14:57
stekernand that would then consume the generic mor1kx_decode.v14:57
@juliusband therefore have no need for registering?14:57
stekernyes, exactly14:57
@juliusbbtw I spotted some not-very-nice behaviour in de0 nano14:58
stekernwhat kind of?14:59
@juliusbbasically the dbus_arbiter uses the registered slave decode to control which slave's signals are driven14:59
@juliusbbut there's some other signals which are directly decoded and put out combinatorially14:59
@juliusbso cyc and stb out of the arbiter_debus uses wbm_cyc_o & wb_slave_sel_r[N] where N is the slave number15:00
@juliusbwbm_cyc_o is combinatorially muxed through from one of the two bus masters15:01
@juliusbbut, for a cycle the slave select guy is still pointing at the previous master15:01
@juliusbso in the case where it was last pointing at the SDRAM but then points at the SPI0 block, it gives SDRAM a single cycle of indicating it wants to write15:02
@juliusbso, a slution is to register all of the master's signals15:02
@juliusb(I think that's a good idea)15:02
@juliusbor combinatorially decode which slave should be selected15:03
stekernumm, it's probably me that has done that (half-assly)15:09
stekernbut comb decode is no good15:09
stekernno, that's not me, but I have maybe noticed the same thing you are noticing now, because that sel_r sounds awfully familiar, I have fiddled with it at some point15:12
@juliusbya we can probably just register the incoming master selects15:14
stekernyeah, I have removed the _r in some old mor1kx dev repo of mine15:15
@juliusbok I've just registered the cyc/stb from the master15:22
@juliusbappears to work15:22
@juliusbso Im going to look at putting thi son the board now - is there a guide anywhere to setting up the debug stuff over the adv_dbg_bridge?15:23
@juliusb_franck_: your openOCD port supports using adv_dbg_sys reight?15:28
@juliusbright, even15:28
stekernit does, I'm using that, but I can't for the life find the config options15:31
stekernit's probably somewhere in the irc logs15:32
stekern./configure --enable-usb_blaster_libftdi --enable-adv_debug_sys --enable-altera_vjtag--enable-maintainer-mode15:33
stekernI've been using or1k-elf-gdb lately btw, haven't noticed any problems with that18:20
stekernotoh, I'm mostly using it to load stuff into memory and reset the pc18:20
@juliusbnot single stepping?18:25
@juliusb_franck_ did a bunch of work, right?18:25
stekernyeah, he brought everything up to date with the latest gdb AFAIK18:26
@juliusbanother man who deserves a beer or 818:26
stekernwell, after reloading linux, I usually do: spr npc 0x100; si; si18:27
@juliusbis all of his work on the openrisc/or1k-src repo?18:27
stekernjust to get it out of itlb missing18:27
stekernyes, just do the build instructions, but with enable-gdb instead18:28
stekernI just tested espresso on de0-nano, for the fun of it18:29
stekernI don't think I've ran that on real hw before18:29
stekernof course I had to do dhry and coremark18:30
@juliusbit'll be bad because of no caches18:32
stekernit's of course unfair, it has no caches and connected to sdram, would be interested to see that with blockram18:32
stekernone odd thing is, I get worse coremark and dhry scores when I run in the cycle accurate model than on de0-nano for cappuccino18:33
@juliusbwith a big internal SRAM?18:43
@juliusbmaybe the timing isn't being calculated correctly?18:43
@juliusbwould be interesting to just run a raw cycle counter to see exactly how long it really takes in both situations18:45
@juliusbwhen enabling just gdb I get a build error: No rule to make target `../libgui/src/libgui.a18:46
@juliusbgot to run, bbl18:47
stekernhmm, odd18:47
stekernworked fine here (tm)18:47
win2machello guys i want to start playing around with fpga on linux. Which fpga board should i get?18:55
stekernwhat's your budget? and what kind of peripherals would you like?19:01
win2maci just want to experiment with cpu design .. i was thinking about de0 nano19:02
stekernthat's a good choice19:03
win2maci am also considering buying logic analyzer
stekernthat I have no experience with19:05
win2macthat one is pretty cheap but has limited frequency range19:07
win2macshould i get standalone unit?19:11
stekernwhat do you intend to use it for?19:11
win2macquestion is if i really need it at beginning19:15
win2maci will get de0 nano and will see19:16
stekernyou can get pretty far by using internal logic analysers19:19
win2macyou mean fpga internal la?19:20
win2maci am kind of hobbyist because i will learn fpga at collage :)19:22
stekerncool, what college?19:26
win2macwhere are you from?19:27
stekernsweden, but currently living in finland19:28
win2macczech republic19:28
win2maci want to go abroad to study19:31
mor1kx[mor1kx] skristiansson pushed 12 new commits to master:
mor1kxmor1kx/master 2450048 Stefan Kristiansson: cappuccino/fetch: generate NOPs on reset and pipeline flush21:09
mor1kxmor1kx/master 70eb105 Stefan Kristiansson: decode: generate jal result21:09
mor1kxmor1kx/master 5437177 Stefan Kristiansson: cappuccino: use jal result from decode...21:09
--- Log closed Sun Apr 07 00:00:04 2013

Generated by 2.15.2 by Marius Gedminas - find it at!