IRC logs for #openrisc Wednesday, 2013-03-27

--- Log opened Wed Mar 27 00:00:48 2013
stekernjuliusb: ah, on prontoespresso? that's interestings03:28
stekern-s03:28
stekernyeah, I'm falling in love with verilator, mor1kx runs at around 0.5MHz here in it03:29
stekernthat's blasting fast03:30
andresjkstekern, does verilator support vhdl?03:37
andresjkIm using Isim for now :)03:38
andresjkI have a question about wishbone block transactions. I have my "dma" core wich is a master-slave. I can set up the starting address and the number of registers and the fsm does single reads until all the mem regs are retrieved. I believe that block transaction can increase the performance and I want to upgrade my core.According to the wishbone documentation the block transactions allow a maximun number of reads, I think its about 8. My question is if I03:49
andresjk have to read more than 8 registers my core should finish the block transaction but how does it continues. I thought about interrupts but the I thought about a delay counter so the signasl STB and CYC are not asserted for so long.03:49
andresjkAccording to the wishbone documentation the peripheral can assert STB and CYC as long as it wants but obviously is not recommended because of stalling.03:49
andresjkSo whats the correct approach?03:49
mor1kx[mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/fb355dce9f6f78bde6c5a904d6b7802f8efcf88607:21
mor1kxmor1kx/master fb355dc Stefan Kristiansson: doc: cappuccino implementation updates + few other fixes...07:21
stekernandresjk: as the name suggests, it supports verilog07:22
stekernto answer your other question, use the b3 burst mode07:23
stekernjuliusb: I changed the layout of the hierarchy section a bit, I had to in order to get >4 level bullet lists working with latex07:25
andresjkyes, thats what I have in mind. but im question is about the lenght of the burst07:27
andresjkwhat should the core do when the length of the data is larger07:28
andresjkthe slave can insert a wait state07:29
andresjkbut the arbiter will never which if cyc and stb are asserted07:30
stekernumm, if it has more data than the burst length, a new burst cycle is started07:32
stekernI'm not sure I understand the problem07:33
andresjkthe problem is the next burst cycle would continue to have ownership of the bus because the cyc signal is asserted so until whole data is transfer the bus cannot be used for another peripheral. It will stall so a while.07:36
andresjkor not?07:36
stekernafter an end-of-burst there'll be a gap where an other periperal can peek in07:41
stekernfor linear bursts there are no limit on the burst length, but it's probably wise to limit them to not experience the kind of problem you are describing07:43
andresjkyeah, it make sense. The graph was misguiding.07:46
andresjkthanks stekern07:46
stekernjust to make clear, you deassert cyc_o on end-of-burst as well07:47
andresjkyes, I check the documentation again and It says so in the last clock but in the diagram cyc_o was asserted that why my confusion.07:50
andresjkabout the pipeline, should I implemented pipeline instead of sync transactions07:51
stekernwhat do you mean?07:52
andresjkI think that in pipelined mode the master doesn't wait for an ack to continue. It saves 1 clk08:01
andresjkwell I guess I will implement the burst mode first08:01
andresjkthanks!08:01
stekernah, you're speaking of the pipeline mode in b408:12
stekernI don't think there are any slaves supporting that, and you can basically obtain that with bursting anyways08:12
stekernthe only thing you can't do with bursting that you could do with pipelining is "random access pipelining"08:13
stekernif you are writing to a linear address space, there is no advantage of the pipeline mode08:13
stekernand the B4 document has some very odd licensing and restriction notes, so I have dismissed it mostly because of that08:14
stekernjuliusb: hmm, doesn't the bus if do redudantant accesses when it is in b3_read_bursting mode?09:24
stekernI'm thinking we should communicate more information to the bus if, so it properly can end the burst09:34
juliusbstekern: nice work on the documentation update10:36
juliusbabout redundant accesses - not sure there.10:38
juliusbIt probably does. I thought it did correctly signal the end-of-burst10:38
stekernit does correctly signal the end-of-burst10:41
stekernbut I think it might do that access even if the internal interface haven't requested it10:43
stekernI'm not certain neither, have to check it10:43
juliusboh, probably because it's adhering to the burst-length10:46
juliusbyou can set the default burst length10:46
stekernhmm, what if I'm doing 7 accesses and jump, what happens then?10:50
stekernassuming fetcher without cache here10:52
stekernor just fetches 1 instruction and then stall for something10:53
stekernhttp://oompa.chokladfabriken.org/tmp/bus_if_end_of_burst.png <- like here10:55
juliusbyeah not sure, but it'll handle it I think10:56
juliusbit'll know the internal request line has gone down and put out the bus finished thing I think10:57
stekernnaah, the waveform I just pasted tells me otherwise ;)10:57
juliusb?!? CTI has gone 3'b11110:57
juliusbthat's right isn't it? The internal guy, for some reason, just wants 0x100, gets it then doens't want the next one yet so deasserts req10:59
juliusbthe bus interface then says it's the end of the burst, so puts CTI=3'b111, and then discards that next ack10:59
stekernyeah, but the if is still fetching 10410:59
juliusbfine10:59
juliusbit has to according to the standard10:59
stekernthat was my point10:59
juliusbI see you want to change CTI while the request is in flight... not sure it can do that10:59
juliusbnot sure you're allowed to I mean11:00
juliusbso you have a cool-down cycle11:00
juliusbyeah I reckon you can't11:00
stekernno, I want to be able to tell the bus if that "hey, the next address I'm going to give you is not going to be something you can burst"11:00
juliusbok11:01
stekernbecause otherwise we will never be able to do data bus bursts11:01
juliusbsure, just set CTI=3'b111 then I think, or BTE or 0 or whatever it is to do a "classic" cycle11:01
stekernimagine if you have a fifo at 0x104...11:01
juliusbyou can do repeating address accesses on wishbone I think11:02
stekernwell, I want to burst in some cases, and not on some11:02
juliusbok, so for data though, you're talking11:02
juliusbnot instruction I guess11:02
juliusbthat wishbone bursty guy was only written for instruction fetch11:03
juliusb(in mor1kx)11:03
juliusbno support for data burst writes11:03
juliusbbut for cache lines it should be good11:03
juliusbactually, ignore that last line11:03
stekerninstruction is not so much of a problem, since you can harmlessly read that extra access without "side-effects"11:03
juliusbI mean, the usual behaviour when dealing with data cache lines is fine for cache line fills/stores11:04
stekernyeah, it works ok in that case too, as long as you have the burstlength set-up to match the cache-line length11:04
juliusbyep11:04
juliusbwhich you should11:05
juliusbthose _should_ definitely match11:05
juliusbsilly not to11:05
stekernyeah, I have no problem with that11:05
juliusbbut I still don't understand your first issue, though11:05
juliusbon your data port, when you are doing things which are not either a) single reads/writes when you don't have dat acache or b) full line loads/stores?11:06
juliusbwhy are you concerned about the case where it's neither of those?11:06
stekernI'm concerned over the fact that I can't tell the bus if which of those two I want to do11:06
stekernbecause I want to access peripherals without side-effects and do bursty cacheline refills11:07
juliusboh, OK, is there not a way to tell the bus interface in the mor1kx not to burst?11:07
juliusbfair enough, that may be the case11:07
stekernno, that's exactly what I'm trying to say ;)11:08
juliusbmaybe need to add thatthen :)11:08
stekernthat I think we need a system to tell it that11:08
stekernthe simplest is just a 1-bit "burst_o" signal that is asserted together with "req_o"11:14
stekernor then you could have the next address connected to the bus_if and let it decide if it can try to burst it11:15
stekernI have no strong feelings in one or the other direction, at least not at the moment :)11:17
stekernactually, maybe that burstlength parameter could be a signal instead11:33
stekernotoh, you always want either some constant burst_length or 1, so maybe thats just a waste11:34
olofkOk... so BST != GMT ?13:39
olofkhttp://doodle.com/68xs348c2hdptrkf13:40
olofkNeed to add more info on how to participate, and I'm not sure about the times, but is it good enough to be sent to the mailing list?13:42
olofkHas anyone used $fseek in verilog, btw? I'm having a problem with large seek values (probably over 0x80000000). Dividing the seek into smaller relative steps work, but it would be nice to always have it treated as unsigned. Any ideas?13:47
juliusbthe UK is on BST as of Sunday14:25
juliusbCurrently on GMT (or UTC? but UTC=GMT I believe)14:25
juliusb$fseek: no never really used it I think14:26
stekernhttp://pastie.org/713943216:25
juliusb84?!?!?16:29
stekerntoday I learned that accessing a 'sig1[5:2]' with a 'sig2[1:0]' (i.e. sig1[sig2]) give different results in the cycle-accurate simulator than quartus synthesis16:32
stekernverilator obviously treats it the same as if sig1 would be [3:0]16:33
stekern(it should have been, but I had screwed up my calculations)16:33
stekernyeah, 84, so 4 better than before my pipeline rework (we had 80 then)16:34
stekerngit HEAD gives 7416:34
stekern+ we still have a fmax of ~8016:34
stekernthings move in the right direction at least ;)16:35
juliusbawesome work16:39
* juliusb applauds16:42
juliusbso this is better fetch integration with the cache unit?16:44
juliusbthis is due to, I mean16:44
stekernnah, this is making loads from dcache 1-cycle16:49
stekern+due to16:49
stekernso now the cache memorys read address comes straight from the alu and the result is ready in ctrl/mem stage16:50
stekernunchached lsu accesses are all handled in ctrl/mem stage16:51
stekerni.e. the address from execute_alu is only connected to the read port of the cache16:51
stekernjust like it should, but the cache logic needed some massaging to understand this16:52
stekernthere's still a nasty path on the wb-bus though, since write acks come directly from the soc bus16:54
stekernactually, read acks too, when the cache is disabled16:56
juliusbah of course, single cycle data access16:59
juliusbHow big can you make the caches? You should pump them right up (128,256KB?) so it can fit in the whole coremark app16:59
juliusbrun it once16:59
juliusbthen run it again :)16:59
juliusb(without re-initing the caches, of course)16:59
juliusbfor maxi coremark results16:59
stekernthat thought have struck me too ;)17:00
juliusbwould be good to have a nice headline unmber17:00
stekernbut I think 16KB is enough, all fits in that17:00
stekernso, now both icache and dcache are virtually indexed, physically tagged17:01
juliusbfrom what I've been doing at work lately (doing a bit of research into whether to use a cortex m3 or an inhouse thing) headline numbers are important. People read it and believe it - even though they take them with a grain of salt (they appear to _massively_ depend on your compiler - there's results for the same RTL which range from < 1 Coremark/MHz to over 3!!!)17:01
juliusbno, it'd be more than 16KB, surely?!17:02
stekernwhich means mor1kx should be 'massively' faster than or1200 when running linux17:02
stekernthat imposes a limit on cache sizes unfortunately though17:03
juliusbalthough, I usually look at the size of the whole text section which usually contains all of the statically linked library code, too17:03
juliusbso maybe it's only 16KB's worth17:03
juliusbmight be worth getting it to do 2 runs through17:03
stekernIIRC, increasing caches increased coremark scores, but only up until 16KB17:03
juliusbsee if the numbers improve17:03
juliusbhmm OK17:03
stekernI probably should add some kind of error if immu is enabled and the cache size is too large...17:06
stekerncan you do compile time erroring (or even better warning) from the verilog source?17:07
stekernblock + set width can't be larger than 1317:10
stekernwell it can, but then you have to take counter measures in software17:10
juliusbbased on generates I think you can, yes17:24
juliusbgenerate an $error() or something?17:24
juliusbAm I making that up? :P17:24
stekernI don't know, but wouldn't that be runtime?17:27
stekernhave to investigate17:27
juliusbruntime or synthesis17:33
juliusbmaybe?!17:33
juliusbin the generate part of the statement you could put something surrounded by `ifdef which is a blantat error message17:34
juliusbso you never hit it in sim17:34
stekernmmm, I guess some ifdef magic might do it17:47
stekernI don't mind hitting it in sim though, but I also want it to show during synthesis17:48
andresjkHi, Im facing a rare problem. I have a peripheral with 5 data registers, when scaling to 50 regs  the serial output of the orpsoc doesnt show any data. After rescaling to 30 regs it shows data but suddenly  the kernel crashes18:20
andresjkthe fpga utilization is about 50% and in the summary the is no issue with the registers in my core18:21
glowplugIs it possible to test the peripheral in simulation before flashing to the fpga?18:38
andresjkI did it for the first version of 5 registers and it worked. Its a good point, I will try it18:41
andresjkthanks18:41
glowplugNo problem.  If you dont mind me asking what peripheral is it and is your board a Xilinx or Altera?18:43
andresjkIts basically a master-slave which given a starting address and the number of memory registers will retrieve the data from the memory into the registers. Its is going to be part of a more complex peripheral which is meant to do some kind of processing18:45
andresjkmy board is a Xilinx18:46
andresjkI dont know why but I think the issue is more related to the PAR than my logic since it works fine18:47
glowplugThe peripheral passed with 50 registers in simulation?18:47
andresjkI havent try it yet. It did for 5 registers just fine. I thought it was going to be straightforward to just scale it up to 50 but I was wrong xD18:48
glowplugFair enough.  =D18:57
mor1kx[mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/12c2717b8cf7fe8481f0b5dbc6e5549f6d1798f919:41
mor1kxmor1kx/master 12c2717 Stefan Kristiansson: cappccino/lsu: dcache: make cache loads 1-cycle...19:41
stekernjuliusb: I'm confused (again), which ram are we using when simulating? ram_wb_b3 or wb_ram_b3?20:00
stekernram_wb_b3 obviously20:02
stekernhmm, I just tried booting up my "reference Linux image" on or1200 and it freezes right after: init started: BusyBox v1.19.0.git (2011-02-16 08:10:12 CET)20:29
stekernconclusion, mor1kx is now more stable than or1200 =P20:29
stekernI should get a newer version and test against20:30
stekernnewer version of the kernel that is20:31
glowplugThat is fantastic!  Do you guys ever stop working?  Haha20:33
stekernglowplug: the more stable than or1200 was not serious, I can get that image to crash on mor1kx too ;)20:36
stekernas for stop working, so much fun to do, so little time...20:42
glowplugI guess it's not work if it's fun.  =D22:45
glowplugDoes mor1kx utilize microcode for any complex instructions?22:45
glowplugOr any instructions at all for that matter.  Haha22:54
--- Log closed Thu Mar 28 00:00:50 2013

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!