--- Log opened Wed Mar 27 00:00:48 2013 | ||
stekern | juliusb: ah, on prontoespresso? that's interestings | 03:28 |
---|---|---|
stekern | -s | 03:28 |
stekern | yeah, I'm falling in love with verilator, mor1kx runs at around 0.5MHz here in it | 03:29 |
stekern | that's blasting fast | 03:30 |
andresjk | stekern, does verilator support vhdl? | 03:37 |
andresjk | Im using Isim for now :) | 03:38 |
andresjk | I have a question about wishbone block transactions. I have my "dma" core wich is a master-slave. I can set up the starting address and the number of registers and the fsm does single reads until all the mem regs are retrieved. I believe that block transaction can increase the performance and I want to upgrade my core.According to the wishbone documentation the block transactions allow a maximun number of reads, I think its about 8. My question is if I | 03:49 |
andresjk | have to read more than 8 registers my core should finish the block transaction but how does it continues. I thought about interrupts but the I thought about a delay counter so the signasl STB and CYC are not asserted for so long. | 03:49 |
andresjk | According to the wishbone documentation the peripheral can assert STB and CYC as long as it wants but obviously is not recommended because of stalling. | 03:49 |
andresjk | So whats the correct approach? | 03:49 |
mor1kx | [mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/fb355dce9f6f78bde6c5a904d6b7802f8efcf886 | 07:21 |
mor1kx | mor1kx/master fb355dc Stefan Kristiansson: doc: cappuccino implementation updates + few other fixes... | 07:21 |
stekern | andresjk: as the name suggests, it supports verilog | 07:22 |
stekern | to answer your other question, use the b3 burst mode | 07:23 |
stekern | juliusb: I changed the layout of the hierarchy section a bit, I had to in order to get >4 level bullet lists working with latex | 07:25 |
andresjk | yes, thats what I have in mind. but im question is about the lenght of the burst | 07:27 |
andresjk | what should the core do when the length of the data is larger | 07:28 |
andresjk | the slave can insert a wait state | 07:29 |
andresjk | but the arbiter will never which if cyc and stb are asserted | 07:30 |
stekern | umm, if it has more data than the burst length, a new burst cycle is started | 07:32 |
stekern | I'm not sure I understand the problem | 07:33 |
andresjk | the problem is the next burst cycle would continue to have ownership of the bus because the cyc signal is asserted so until whole data is transfer the bus cannot be used for another peripheral. It will stall so a while. | 07:36 |
andresjk | or not? | 07:36 |
stekern | after an end-of-burst there'll be a gap where an other periperal can peek in | 07:41 |
stekern | for linear bursts there are no limit on the burst length, but it's probably wise to limit them to not experience the kind of problem you are describing | 07:43 |
andresjk | yeah, it make sense. The graph was misguiding. | 07:46 |
andresjk | thanks stekern | 07:46 |
stekern | just to make clear, you deassert cyc_o on end-of-burst as well | 07:47 |
andresjk | yes, I check the documentation again and It says so in the last clock but in the diagram cyc_o was asserted that why my confusion. | 07:50 |
andresjk | about the pipeline, should I implemented pipeline instead of sync transactions | 07:51 |
stekern | what do you mean? | 07:52 |
andresjk | I think that in pipelined mode the master doesn't wait for an ack to continue. It saves 1 clk | 08:01 |
andresjk | well I guess I will implement the burst mode first | 08:01 |
andresjk | thanks! | 08:01 |
stekern | ah, you're speaking of the pipeline mode in b4 | 08:12 |
stekern | I don't think there are any slaves supporting that, and you can basically obtain that with bursting anyways | 08:12 |
stekern | the only thing you can't do with bursting that you could do with pipelining is "random access pipelining" | 08:13 |
stekern | if you are writing to a linear address space, there is no advantage of the pipeline mode | 08:13 |
stekern | and the B4 document has some very odd licensing and restriction notes, so I have dismissed it mostly because of that | 08:14 |
stekern | juliusb: hmm, doesn't the bus if do redudantant accesses when it is in b3_read_bursting mode? | 09:24 |
stekern | I'm thinking we should communicate more information to the bus if, so it properly can end the burst | 09:34 |
juliusb | stekern: nice work on the documentation update | 10:36 |
juliusb | about redundant accesses - not sure there. | 10:38 |
juliusb | It probably does. I thought it did correctly signal the end-of-burst | 10:38 |
stekern | it does correctly signal the end-of-burst | 10:41 |
stekern | but I think it might do that access even if the internal interface haven't requested it | 10:43 |
stekern | I'm not certain neither, have to check it | 10:43 |
juliusb | oh, probably because it's adhering to the burst-length | 10:46 |
juliusb | you can set the default burst length | 10:46 |
stekern | hmm, what if I'm doing 7 accesses and jump, what happens then? | 10:50 |
stekern | assuming fetcher without cache here | 10:52 |
stekern | or just fetches 1 instruction and then stall for something | 10:53 |
stekern | http://oompa.chokladfabriken.org/tmp/bus_if_end_of_burst.png <- like here | 10:55 |
juliusb | yeah not sure, but it'll handle it I think | 10:56 |
juliusb | it'll know the internal request line has gone down and put out the bus finished thing I think | 10:57 |
stekern | naah, the waveform I just pasted tells me otherwise ;) | 10:57 |
juliusb | ?!? CTI has gone 3'b111 | 10:57 |
juliusb | that's right isn't it? The internal guy, for some reason, just wants 0x100, gets it then doens't want the next one yet so deasserts req | 10:59 |
juliusb | the bus interface then says it's the end of the burst, so puts CTI=3'b111, and then discards that next ack | 10:59 |
stekern | yeah, but the if is still fetching 104 | 10:59 |
juliusb | fine | 10:59 |
juliusb | it has to according to the standard | 10:59 |
stekern | that was my point | 10:59 |
juliusb | I see you want to change CTI while the request is in flight... not sure it can do that | 10:59 |
juliusb | not sure you're allowed to I mean | 11:00 |
juliusb | so you have a cool-down cycle | 11:00 |
juliusb | yeah I reckon you can't | 11:00 |
stekern | no, I want to be able to tell the bus if that "hey, the next address I'm going to give you is not going to be something you can burst" | 11:00 |
juliusb | ok | 11:01 |
stekern | because otherwise we will never be able to do data bus bursts | 11:01 |
juliusb | sure, just set CTI=3'b111 then I think, or BTE or 0 or whatever it is to do a "classic" cycle | 11:01 |
stekern | imagine if you have a fifo at 0x104... | 11:01 |
juliusb | you can do repeating address accesses on wishbone I think | 11:02 |
stekern | well, I want to burst in some cases, and not on some | 11:02 |
juliusb | ok, so for data though, you're talking | 11:02 |
juliusb | not instruction I guess | 11:02 |
juliusb | that wishbone bursty guy was only written for instruction fetch | 11:03 |
juliusb | (in mor1kx) | 11:03 |
juliusb | no support for data burst writes | 11:03 |
juliusb | but for cache lines it should be good | 11:03 |
juliusb | actually, ignore that last line | 11:03 |
stekern | instruction is not so much of a problem, since you can harmlessly read that extra access without "side-effects" | 11:03 |
juliusb | I mean, the usual behaviour when dealing with data cache lines is fine for cache line fills/stores | 11:04 |
stekern | yeah, it works ok in that case too, as long as you have the burstlength set-up to match the cache-line length | 11:04 |
juliusb | yep | 11:04 |
juliusb | which you should | 11:05 |
juliusb | those _should_ definitely match | 11:05 |
juliusb | silly not to | 11:05 |
stekern | yeah, I have no problem with that | 11:05 |
juliusb | but I still don't understand your first issue, though | 11:05 |
juliusb | on your data port, when you are doing things which are not either a) single reads/writes when you don't have dat acache or b) full line loads/stores? | 11:06 |
juliusb | why are you concerned about the case where it's neither of those? | 11:06 |
stekern | I'm concerned over the fact that I can't tell the bus if which of those two I want to do | 11:06 |
stekern | because I want to access peripherals without side-effects and do bursty cacheline refills | 11:07 |
juliusb | oh, OK, is there not a way to tell the bus interface in the mor1kx not to burst? | 11:07 |
juliusb | fair enough, that may be the case | 11:07 |
stekern | no, that's exactly what I'm trying to say ;) | 11:08 |
juliusb | maybe need to add thatthen :) | 11:08 |
stekern | that I think we need a system to tell it that | 11:08 |
stekern | the simplest is just a 1-bit "burst_o" signal that is asserted together with "req_o" | 11:14 |
stekern | or then you could have the next address connected to the bus_if and let it decide if it can try to burst it | 11:15 |
stekern | I have no strong feelings in one or the other direction, at least not at the moment :) | 11:17 |
stekern | actually, maybe that burstlength parameter could be a signal instead | 11:33 |
stekern | otoh, you always want either some constant burst_length or 1, so maybe thats just a waste | 11:34 |
olofk | Ok... so BST != GMT ? | 13:39 |
olofk | http://doodle.com/68xs348c2hdptrkf | 13:40 |
olofk | Need to add more info on how to participate, and I'm not sure about the times, but is it good enough to be sent to the mailing list? | 13:42 |
olofk | Has anyone used $fseek in verilog, btw? I'm having a problem with large seek values (probably over 0x80000000). Dividing the seek into smaller relative steps work, but it would be nice to always have it treated as unsigned. Any ideas? | 13:47 |
juliusb | the UK is on BST as of Sunday | 14:25 |
juliusb | Currently on GMT (or UTC? but UTC=GMT I believe) | 14:25 |
juliusb | $fseek: no never really used it I think | 14:26 |
stekern | http://pastie.org/7139432 | 16:25 |
juliusb | 84?!?!? | 16:29 |
stekern | today I learned that accessing a 'sig1[5:2]' with a 'sig2[1:0]' (i.e. sig1[sig2]) give different results in the cycle-accurate simulator than quartus synthesis | 16:32 |
stekern | verilator obviously treats it the same as if sig1 would be [3:0] | 16:33 |
stekern | (it should have been, but I had screwed up my calculations) | 16:33 |
stekern | yeah, 84, so 4 better than before my pipeline rework (we had 80 then) | 16:34 |
stekern | git HEAD gives 74 | 16:34 |
stekern | + we still have a fmax of ~80 | 16:34 |
stekern | things move in the right direction at least ;) | 16:35 |
juliusb | awesome work | 16:39 |
* juliusb applauds | 16:42 | |
juliusb | so this is better fetch integration with the cache unit? | 16:44 |
juliusb | this is due to, I mean | 16:44 |
stekern | nah, this is making loads from dcache 1-cycle | 16:49 |
stekern | +due to | 16:49 |
stekern | so now the cache memorys read address comes straight from the alu and the result is ready in ctrl/mem stage | 16:50 |
stekern | unchached lsu accesses are all handled in ctrl/mem stage | 16:51 |
stekern | i.e. the address from execute_alu is only connected to the read port of the cache | 16:51 |
stekern | just like it should, but the cache logic needed some massaging to understand this | 16:52 |
stekern | there's still a nasty path on the wb-bus though, since write acks come directly from the soc bus | 16:54 |
stekern | actually, read acks too, when the cache is disabled | 16:56 |
juliusb | ah of course, single cycle data access | 16:59 |
juliusb | How big can you make the caches? You should pump them right up (128,256KB?) so it can fit in the whole coremark app | 16:59 |
juliusb | run it once | 16:59 |
juliusb | then run it again :) | 16:59 |
juliusb | (without re-initing the caches, of course) | 16:59 |
juliusb | for maxi coremark results | 16:59 |
stekern | that thought have struck me too ;) | 17:00 |
juliusb | would be good to have a nice headline unmber | 17:00 |
stekern | but I think 16KB is enough, all fits in that | 17:00 |
stekern | so, now both icache and dcache are virtually indexed, physically tagged | 17:01 |
juliusb | from what I've been doing at work lately (doing a bit of research into whether to use a cortex m3 or an inhouse thing) headline numbers are important. People read it and believe it - even though they take them with a grain of salt (they appear to _massively_ depend on your compiler - there's results for the same RTL which range from < 1 Coremark/MHz to over 3!!!) | 17:01 |
juliusb | no, it'd be more than 16KB, surely?! | 17:02 |
stekern | which means mor1kx should be 'massively' faster than or1200 when running linux | 17:02 |
stekern | that imposes a limit on cache sizes unfortunately though | 17:03 |
juliusb | although, I usually look at the size of the whole text section which usually contains all of the statically linked library code, too | 17:03 |
juliusb | so maybe it's only 16KB's worth | 17:03 |
juliusb | might be worth getting it to do 2 runs through | 17:03 |
stekern | IIRC, increasing caches increased coremark scores, but only up until 16KB | 17:03 |
juliusb | see if the numbers improve | 17:03 |
juliusb | hmm OK | 17:03 |
stekern | I probably should add some kind of error if immu is enabled and the cache size is too large... | 17:06 |
stekern | can you do compile time erroring (or even better warning) from the verilog source? | 17:07 |
stekern | block + set width can't be larger than 13 | 17:10 |
stekern | well it can, but then you have to take counter measures in software | 17:10 |
juliusb | based on generates I think you can, yes | 17:24 |
juliusb | generate an $error() or something? | 17:24 |
juliusb | Am I making that up? :P | 17:24 |
stekern | I don't know, but wouldn't that be runtime? | 17:27 |
stekern | have to investigate | 17:27 |
juliusb | runtime or synthesis | 17:33 |
juliusb | maybe?! | 17:33 |
juliusb | in the generate part of the statement you could put something surrounded by `ifdef which is a blantat error message | 17:34 |
juliusb | so you never hit it in sim | 17:34 |
stekern | mmm, I guess some ifdef magic might do it | 17:47 |
stekern | I don't mind hitting it in sim though, but I also want it to show during synthesis | 17:48 |
andresjk | Hi, Im facing a rare problem. I have a peripheral with 5 data registers, when scaling to 50 regs the serial output of the orpsoc doesnt show any data. After rescaling to 30 regs it shows data but suddenly the kernel crashes | 18:20 |
andresjk | the fpga utilization is about 50% and in the summary the is no issue with the registers in my core | 18:21 |
glowplug | Is it possible to test the peripheral in simulation before flashing to the fpga? | 18:38 |
andresjk | I did it for the first version of 5 registers and it worked. Its a good point, I will try it | 18:41 |
andresjk | thanks | 18:41 |
glowplug | No problem. If you dont mind me asking what peripheral is it and is your board a Xilinx or Altera? | 18:43 |
andresjk | Its basically a master-slave which given a starting address and the number of memory registers will retrieve the data from the memory into the registers. Its is going to be part of a more complex peripheral which is meant to do some kind of processing | 18:45 |
andresjk | my board is a Xilinx | 18:46 |
andresjk | I dont know why but I think the issue is more related to the PAR than my logic since it works fine | 18:47 |
glowplug | The peripheral passed with 50 registers in simulation? | 18:47 |
andresjk | I havent try it yet. It did for 5 registers just fine. I thought it was going to be straightforward to just scale it up to 50 but I was wrong xD | 18:48 |
glowplug | Fair enough. =D | 18:57 |
mor1kx | [mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/12c2717b8cf7fe8481f0b5dbc6e5549f6d1798f9 | 19:41 |
mor1kx | mor1kx/master 12c2717 Stefan Kristiansson: cappccino/lsu: dcache: make cache loads 1-cycle... | 19:41 |
stekern | juliusb: I'm confused (again), which ram are we using when simulating? ram_wb_b3 or wb_ram_b3? | 20:00 |
stekern | ram_wb_b3 obviously | 20:02 |
stekern | hmm, I just tried booting up my "reference Linux image" on or1200 and it freezes right after: init started: BusyBox v1.19.0.git (2011-02-16 08:10:12 CET) | 20:29 |
stekern | conclusion, mor1kx is now more stable than or1200 =P | 20:29 |
stekern | I should get a newer version and test against | 20:30 |
stekern | newer version of the kernel that is | 20:31 |
glowplug | That is fantastic! Do you guys ever stop working? Haha | 20:33 |
stekern | glowplug: the more stable than or1200 was not serious, I can get that image to crash on mor1kx too ;) | 20:36 |
stekern | as for stop working, so much fun to do, so little time... | 20:42 |
glowplug | I guess it's not work if it's fun. =D | 22:45 |
glowplug | Does mor1kx utilize microcode for any complex instructions? | 22:45 |
glowplug | Or any instructions at all for that matter. Haha | 22:54 |
--- Log closed Thu Mar 28 00:00:50 2013 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!