IRC logs for #openrisc Wednesday, 2013-03-27

--- Log opened Wed Mar 27 00:00:48 2013
stekern	juliusb: ah, on prontoespresso? that's interestings	03:28
stekern	-s	03:28
stekern	yeah, I'm falling in love with verilator, mor1kx runs at around 0.5MHz here in it	03:29
stekern	that's blasting fast	03:30
andresjk	stekern, does verilator support vhdl?	03:37
andresjk	Im using Isim for now :)	03:38
andresjk	I have a question about wishbone block transactions. I have my "dma" core wich is a master-slave. I can set up the starting address and the number of registers and the fsm does single reads until all the mem regs are retrieved. I believe that block transaction can increase the performance and I want to upgrade my core.According to the wishbone documentation the block transactions allow a maximun number of reads, I think its about 8. My question is if I	03:49
andresjk	have to read more than 8 registers my core should finish the block transaction but how does it continues. I thought about interrupts but the I thought about a delay counter so the signasl STB and CYC are not asserted for so long.	03:49
andresjk	According to the wishbone documentation the peripheral can assert STB and CYC as long as it wants but obviously is not recommended because of stalling.	03:49
andresjk	So whats the correct approach?	03:49
mor1kx	[mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/fb355dce9f6f78bde6c5a904d6b7802f8efcf886	07:21
mor1kx	mor1kx/master fb355dc Stefan Kristiansson: doc: cappuccino implementation updates + few other fixes...	07:21
stekern	andresjk: as the name suggests, it supports verilog	07:22
stekern	to answer your other question, use the b3 burst mode	07:23
stekern	juliusb: I changed the layout of the hierarchy section a bit, I had to in order to get >4 level bullet lists working with latex	07:25
andresjk	yes, thats what I have in mind. but im question is about the lenght of the burst	07:27
andresjk	what should the core do when the length of the data is larger	07:28
andresjk	the slave can insert a wait state	07:29
andresjk	but the arbiter will never which if cyc and stb are asserted	07:30
stekern	umm, if it has more data than the burst length, a new burst cycle is started	07:32
stekern	I'm not sure I understand the problem	07:33
andresjk	the problem is the next burst cycle would continue to have ownership of the bus because the cyc signal is asserted so until whole data is transfer the bus cannot be used for another peripheral. It will stall so a while.	07:36
andresjk	or not?	07:36
stekern	after an end-of-burst there'll be a gap where an other periperal can peek in	07:41
stekern	for linear bursts there are no limit on the burst length, but it's probably wise to limit them to not experience the kind of problem you are describing	07:43
andresjk	yeah, it make sense. The graph was misguiding.	07:46
andresjk	thanks stekern	07:46
stekern	just to make clear, you deassert cyc_o on end-of-burst as well	07:47
andresjk	yes, I check the documentation again and It says so in the last clock but in the diagram cyc_o was asserted that why my confusion.	07:50
andresjk	about the pipeline, should I implemented pipeline instead of sync transactions	07:51
stekern	what do you mean?	07:52
andresjk	I think that in pipelined mode the master doesn't wait for an ack to continue. It saves 1 clk	08:01
andresjk	well I guess I will implement the burst mode first	08:01
andresjk	thanks!	08:01
stekern	ah, you're speaking of the pipeline mode in b4	08:12
stekern	I don't think there are any slaves supporting that, and you can basically obtain that with bursting anyways	08:12
stekern	the only thing you can't do with bursting that you could do with pipelining is "random access pipelining"	08:13
stekern	if you are writing to a linear address space, there is no advantage of the pipeline mode	08:13
stekern	and the B4 document has some very odd licensing and restriction notes, so I have dismissed it mostly because of that	08:14
stekern	juliusb: hmm, doesn't the bus if do redudantant accesses when it is in b3_read_bursting mode?	09:24
stekern	I'm thinking we should communicate more information to the bus if, so it properly can end the burst	09:34
juliusb	stekern: nice work on the documentation update	10:36
juliusb	about redundant accesses - not sure there.	10:38
juliusb	It probably does. I thought it did correctly signal the end-of-burst	10:38
stekern	it does correctly signal the end-of-burst	10:41
stekern	but I think it might do that access even if the internal interface haven't requested it	10:43
stekern	I'm not certain neither, have to check it	10:43
juliusb	oh, probably because it's adhering to the burst-length	10:46
juliusb	you can set the default burst length	10:46
stekern	hmm, what if I'm doing 7 accesses and jump, what happens then?	10:50
stekern	assuming fetcher without cache here	10:52
stekern	or just fetches 1 instruction and then stall for something	10:53
stekern	http://oompa.chokladfabriken.org/tmp/bus_if_end_of_burst.png <- like here	10:55
juliusb	yeah not sure, but it'll handle it I think	10:56
juliusb	it'll know the internal request line has gone down and put out the bus finished thing I think	10:57
stekern	naah, the waveform I just pasted tells me otherwise ;)	10:57
juliusb	?!? CTI has gone 3'b111	10:57
juliusb	that's right isn't it? The internal guy, for some reason, just wants 0x100, gets it then doens't want the next one yet so deasserts req	10:59
juliusb	the bus interface then says it's the end of the burst, so puts CTI=3'b111, and then discards that next ack	10:59
stekern	yeah, but the if is still fetching 104	10:59
juliusb	fine	10:59
juliusb	it has to according to the standard	10:59
stekern	that was my point	10:59
juliusb	I see you want to change CTI while the request is in flight... not sure it can do that	10:59
juliusb	not sure you're allowed to I mean	11:00
juliusb	so you have a cool-down cycle	11:00
juliusb	yeah I reckon you can't	11:00
stekern	no, I want to be able to tell the bus if that "hey, the next address I'm going to give you is not going to be something you can burst"	11:00
juliusb	ok	11:01
stekern	because otherwise we will never be able to do data bus bursts	11:01
juliusb	sure, just set CTI=3'b111 then I think, or BTE or 0 or whatever it is to do a "classic" cycle	11:01
stekern	imagine if you have a fifo at 0x104...	11:01
juliusb	you can do repeating address accesses on wishbone I think	11:02
stekern	well, I want to burst in some cases, and not on some	11:02
juliusb	ok, so for data though, you're talking	11:02
juliusb	not instruction I guess	11:02
juliusb	that wishbone bursty guy was only written for instruction fetch	11:03
juliusb	(in mor1kx)	11:03
juliusb	no support for data burst writes	11:03
juliusb	but for cache lines it should be good	11:03
juliusb	actually, ignore that last line	11:03
stekern	instruction is not so much of a problem, since you can harmlessly read that extra access without "side-effects"	11:03
juliusb	I mean, the usual behaviour when dealing with data cache lines is fine for cache line fills/stores	11:04
stekern	yeah, it works ok in that case too, as long as you have the burstlength set-up to match the cache-line length	11:04
juliusb	yep	11:04
juliusb	which you should	11:05
juliusb	those _should_ definitely match	11:05
juliusb	silly not to	11:05
stekern	yeah, I have no problem with that	11:05
juliusb	but I still don't understand your first issue, though	11:05
juliusb	on your data port, when you are doing things which are not either a) single reads/writes when you don't have dat acache or b) full line loads/stores?	11:06
juliusb	why are you concerned about the case where it's neither of those?	11:06
stekern	I'm concerned over the fact that I can't tell the bus if which of those two I want to do	11:06
stekern	because I want to access peripherals without side-effects and do bursty cacheline refills	11:07
juliusb	oh, OK, is there not a way to tell the bus interface in the mor1kx not to burst?	11:07
juliusb	fair enough, that may be the case	11:07
stekern	no, that's exactly what I'm trying to say ;)	11:08
juliusb	maybe need to add thatthen :)	11:08
stekern	that I think we need a system to tell it that	11:08
stekern	the simplest is just a 1-bit "burst_o" signal that is asserted together with "req_o"	11:14
stekern	or then you could have the next address connected to the bus_if and let it decide if it can try to burst it	11:15
stekern	I have no strong feelings in one or the other direction, at least not at the moment :)	11:17
stekern	actually, maybe that burstlength parameter could be a signal instead	11:33
stekern	otoh, you always want either some constant burst_length or 1, so maybe thats just a waste	11:34
olofk	Ok... so BST != GMT ?	13:39
olofk	http://doodle.com/68xs348c2hdptrkf	13:40
olofk	Need to add more info on how to participate, and I'm not sure about the times, but is it good enough to be sent to the mailing list?	13:42
olofk	Has anyone used $fseek in verilog, btw? I'm having a problem with large seek values (probably over 0x80000000). Dividing the seek into smaller relative steps work, but it would be nice to always have it treated as unsigned. Any ideas?	13:47
juliusb	the UK is on BST as of Sunday	14:25
juliusb	Currently on GMT (or UTC? but UTC=GMT I believe)	14:25
juliusb	$fseek: no never really used it I think	14:26
stekern	http://pastie.org/7139432	16:25
juliusb	84?!?!?	16:29
stekern	today I learned that accessing a 'sig1[5:2]' with a 'sig2[1:0]' (i.e. sig1[sig2]) give different results in the cycle-accurate simulator than quartus synthesis	16:32
stekern	verilator obviously treats it the same as if sig1 would be [3:0]	16:33
stekern	(it should have been, but I had screwed up my calculations)	16:33
stekern	yeah, 84, so 4 better than before my pipeline rework (we had 80 then)	16:34
stekern	git HEAD gives 74	16:34
stekern	+ we still have a fmax of ~80	16:34
stekern	things move in the right direction at least ;)	16:35
juliusb	awesome work	16:39
* juliusb applauds		16:42
juliusb	so this is better fetch integration with the cache unit?	16:44
juliusb	this is due to, I mean	16:44
stekern	nah, this is making loads from dcache 1-cycle	16:49
stekern	+due to	16:49
stekern	so now the cache memorys read address comes straight from the alu and the result is ready in ctrl/mem stage	16:50
stekern	unchached lsu accesses are all handled in ctrl/mem stage	16:51
stekern	i.e. the address from execute_alu is only connected to the read port of the cache	16:51
stekern	just like it should, but the cache logic needed some massaging to understand this	16:52
stekern	there's still a nasty path on the wb-bus though, since write acks come directly from the soc bus	16:54
stekern	actually, read acks too, when the cache is disabled	16:56
juliusb	ah of course, single cycle data access	16:59
juliusb	How big can you make the caches? You should pump them right up (128,256KB?) so it can fit in the whole coremark app	16:59
juliusb	run it once	16:59
juliusb	then run it again :)	16:59
juliusb	(without re-initing the caches, of course)	16:59
juliusb	for maxi coremark results	16:59
stekern	that thought have struck me too ;)	17:00
juliusb	would be good to have a nice headline unmber	17:00
stekern	but I think 16KB is enough, all fits in that	17:00
stekern	so, now both icache and dcache are virtually indexed, physically tagged	17:01
juliusb	from what I've been doing at work lately (doing a bit of research into whether to use a cortex m3 or an inhouse thing) headline numbers are important. People read it and believe it - even though they take them with a grain of salt (they appear to _massively_ depend on your compiler - there's results for the same RTL which range from < 1 Coremark/MHz to over 3!!!)	17:01
juliusb	no, it'd be more than 16KB, surely?!	17:02
stekern	which means mor1kx should be 'massively' faster than or1200 when running linux	17:02
stekern	that imposes a limit on cache sizes unfortunately though	17:03
juliusb	although, I usually look at the size of the whole text section which usually contains all of the statically linked library code, too	17:03
juliusb	so maybe it's only 16KB's worth	17:03
juliusb	might be worth getting it to do 2 runs through	17:03
stekern	IIRC, increasing caches increased coremark scores, but only up until 16KB	17:03
juliusb	see if the numbers improve	17:03
juliusb	hmm OK	17:03
stekern	I probably should add some kind of error if immu is enabled and the cache size is too large...	17:06
stekern	can you do compile time erroring (or even better warning) from the verilog source?	17:07
stekern	block + set width can't be larger than 13	17:10
stekern	well it can, but then you have to take counter measures in software	17:10
juliusb	based on generates I think you can, yes	17:24
juliusb	generate an $error() or something?	17:24
juliusb	Am I making that up? :P	17:24
stekern	I don't know, but wouldn't that be runtime?	17:27
stekern	have to investigate	17:27
juliusb	runtime or synthesis	17:33
juliusb	maybe?!	17:33
juliusb	in the generate part of the statement you could put something surrounded by `ifdef which is a blantat error message	17:34
juliusb	so you never hit it in sim	17:34
stekern	mmm, I guess some ifdef magic might do it	17:47
stekern	I don't mind hitting it in sim though, but I also want it to show during synthesis	17:48
andresjk	Hi, Im facing a rare problem. I have a peripheral with 5 data registers, when scaling to 50 regs the serial output of the orpsoc doesnt show any data. After rescaling to 30 regs it shows data but suddenly the kernel crashes	18:20
andresjk	the fpga utilization is about 50% and in the summary the is no issue with the registers in my core	18:21
glowplug	Is it possible to test the peripheral in simulation before flashing to the fpga?	18:38
andresjk	I did it for the first version of 5 registers and it worked. Its a good point, I will try it	18:41
andresjk	thanks	18:41
glowplug	No problem. If you dont mind me asking what peripheral is it and is your board a Xilinx or Altera?	18:43
andresjk	Its basically a master-slave which given a starting address and the number of memory registers will retrieve the data from the memory into the registers. Its is going to be part of a more complex peripheral which is meant to do some kind of processing	18:45
andresjk	my board is a Xilinx	18:46
andresjk	I dont know why but I think the issue is more related to the PAR than my logic since it works fine	18:47
glowplug	The peripheral passed with 50 registers in simulation?	18:47
andresjk	I havent try it yet. It did for 5 registers just fine. I thought it was going to be straightforward to just scale it up to 50 but I was wrong xD	18:48
glowplug	Fair enough. =D	18:57
mor1kx	[mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/12c2717b8cf7fe8481f0b5dbc6e5549f6d1798f9	19:41
mor1kx	mor1kx/master 12c2717 Stefan Kristiansson: cappccino/lsu: dcache: make cache loads 1-cycle...	19:41
stekern	juliusb: I'm confused (again), which ram are we using when simulating? ram_wb_b3 or wb_ram_b3?	20:00
stekern	ram_wb_b3 obviously	20:02
stekern	hmm, I just tried booting up my "reference Linux image" on or1200 and it freezes right after: init started: BusyBox v1.19.0.git (2011-02-16 08:10:12 CET)	20:29
stekern	conclusion, mor1kx is now more stable than or1200 =P	20:29
stekern	I should get a newer version and test against	20:30
stekern	newer version of the kernel that is	20:31
glowplug	That is fantastic! Do you guys ever stop working? Haha	20:33
stekern	glowplug: the more stable than or1200 was not serious, I can get that image to crash on mor1kx too ;)	20:36
stekern	as for stop working, so much fun to do, so little time...	20:42
glowplug	I guess it's not work if it's fun. =D	22:45
glowplug	Does mor1kx utilize microcode for any complex instructions?	22:45
glowplug	Or any instructions at all for that matter. Haha	22:54
--- Log closed Thu Mar 28 00:00:50 2013

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!