--- Log opened Sun Mar 01 00:00:02 2015 | ||
stekern | rschmidlin: I'm having problems with what I'm working with and dcache enabled now | 04:22 |
---|---|---|
stekern | so maybe there's a bug in mor1kx | 04:22 |
stekern | ...or this is completely unrelated | 04:22 |
stekern | annoying that or1ksim doesn't implement support for the PL1 bit | 05:12 |
stekern | nope, my dcache problem was a pure sw bug | 07:49 |
stekern | I was invalidating dcache according to NCS instead of 1 << NCS | 07:49 |
rschmidlin | good morning all | 09:04 |
rschmidlin | stekern, the changes are actually unrelated to the cache problems. I've only done these changes to allow me to synthesize mor1kx with ISE14.07. Despite of the changes, I'm still having the multiple signal drivers issues. | 09:06 |
stekern | rschmidlin: yeah, I know, my comment was only related to your cache | 09:12 |
stekern | problems | 09:13 |
stekern | but your only having problems with ISE14.07 for spartan3 right? | 09:13 |
rschmidlin | well, I didn't try any other. But yes. I was catching up with the conversation with bandvig | 09:28 |
rschmidlin | I actually implemented his fpu32_v1.0 tag. I certainly want an FPU. But I got stuck with the caches. | 09:29 |
bandvig | hello all! | 09:32 |
rschmidlin | hello! | 09:33 |
rschmidlin | stekern, you are right. Synthesis ran through for Spartan6. | 09:33 |
rschmidlin | I didn't expect that | 09:34 |
bandvig | I've just started synthesis atlys (Spartan6 based board) project with ISE14.6. In adout 20 minutes I'll report about presence of issues. But I don't remember that I have any problem with either cache or "multiple signal drivers issues". | 09:37 |
rschmidlin | bandvig, my bad. I cross-reported something here. | 09:40 |
rschmidlin | Problem (1): caches are not working with wb_ram_b3.v on an Artix 6. | 09:41 |
rschmidlin | Problem (2): I get multiple drivers for signals when implementing mor1kx for Spartan3A DSP. | 09:41 |
rschmidlin | (1) implementation from Vivado. (2) implementation from ISE14.7. | 09:41 |
stekern | rschmidlin: can you paste the program that fails? | 10:11 |
rschmidlin | http://pastie.org/9991119 | 10:16 |
rschmidlin | sorry, I was actually reproducing the one I have at work. There is still an error in this. | 10:17 |
rschmidlin | I believe this should give you the idea and contain everything I have in it: http://pastie.org/9991129 | 10:22 |
stekern | and your caches are 8kb and the cache line size is 32? | 10:25 |
rschmidlin | http://pastie.org/9991136 | 10:29 |
rschmidlin | stekern, yes | 10:41 |
bandvig | rschmidlin: About problem (1). It looks like I couldn't help you. My atlys project is very old (in fact the mor1kx is the only fresh part). So, it just doesn't include wb_ram_b3.v. | 10:45 |
rschmidlin | stekern, the data cache (wb_ram_b3.v) gives me a buserr at line 95. wb_ram.v (wb_ram_b3.v with integrated arbiter) works with data cache. The instruction cache gives me a trap instruction at line 63. | 10:45 |
rschmidlin | bandvig, no worries. I should get a ddr memory controller if I really go down the mor1kx way. Though I only wanted to have a simple system to gather some performance measurements. I thought about instantiating a MIG and wrapping and I thought I could use atlys code. But Vivado has a new MIG interface, without possibility for many interfaces and arbitration. I'd have to adapt the complete wrapper. | 10:48 |
gr8 | what happened to the openrisc server? | 10:52 |
bandvig | gr8: if you are about openrisc.net, it is down. | 11:05 |
bandvig | gr8: we discussed yesterdey: http://juliusbaxter.net/openrisc-irc/%23openrisc.2015-02-28.log.html | 11:07 |
stekern | I synced openrisc/linux to v3.19 now | 11:09 |
stekern | rschmidlin: there wasn't any problems running your test asm on my sockit at least | 11:21 |
stekern | http://pastie.org/9991180 | 11:21 |
stekern | that's the exact asm I ran | 11:21 |
bandvig | stekern: cool. By the way. I saw in the logs. If I push a commit into openrisc/mor1kx, the information about the commit appears here. | 11:24 |
bandvig | stekern: I could be useful to organize the same for any project from operisc. | 11:24 |
stekern | bandvig: yes, I know, I turned on the irc notifications on mor1kx. | 11:26 |
stekern | but I want to be careful adding too much of those, it easily becomes noisy if every little commit to every project is notified in the logs | 11:27 |
bandvig | stekern: ok. And I would like to restore our yesterday discussion about pipelines. | 11:44 |
bandvig | First of all I understood your point about separate REGs for FPU... | 11:44 |
bandvig | Second, I've thought your proposal with freshed morning brain :)... | 11:45 |
bandvig | Let me several lines to describe what I've understood... | 11:45 |
gr8 | bandvig: ok thanks | 11:47 |
bandvig | stekern: Lets use single issue scheme (ISU denotes "issue unit")... | 12:02 |
bandvig | There is also a kind or queue on WB side. The queue contains ordered identifiers of units from which WB waits ready signals. | 12:03 |
bandvig | (1 of 4) each time issue logic places an instruction into a one of paralleled pipe, it also sends unit identifier into next slot of WB's queue | 12:03 |
bandvig | (2 of 4) if a conflict occurs when ready signals have raised from units different to WB queue's head. | 12:03 |
bandvig | In the case all such units have become stalled (and ISU isn't able to put into them a new instruction even for data independence) till the unit with identifier equal to WB queue head's one rises ready flag. | 12:03 |
bandvig | (3 of 4) After conflict resolving (the unit of interest have provided result) the WB queue pushes and grant access to GPRS for next ordered unit. | 12:03 |
bandvig | (4 of 4) Of course, if WB' queue is full, the whole pipe stalls till resolving the conflict. | 12:03 |
bandvig | uff... done... :) | 12:04 |
rschmidlin | stekern, thanks a lot. That was from a DDR3 controller, wasn't it? Good to know that it works with the DDR controllers. The Xilinx block rams, and the way they are interpreted in the Wishbone wrapper should be the problem. | 12:32 |
stekern | bandvig: yes, that sounds good. pretty much like how the current storebuffer works for the lsu | 12:36 |
stekern | rschmidlin: yes, it was from DDR3 ram, but I think I have a ram_wb_b3.v instantiation somewhere on this soc too | 12:39 |
stekern | what's the difference between wb_ram_b3.v and ram_wb_b3? | 12:40 |
stekern | looks like I can't even write into that SRAM... | 13:10 |
stekern | and the test would have been moot anyway, since I have it mapped at 0x80000000 and everything above that is uncached anyway | 13:11 |
Me1234 | mailman-owner@lists.openrisc.net 8:00 AM (10 hours ago) | 14:08 |
Me1234 | I got an email from lists.openrisc.net . It means these are DNS problems. | 14:09 |
bandvig | stekern: yes, I also thought about storebuffer as a model for WB queue | 14:16 |
bandvig | In fact the approach is also a good starting point for subsequent performance improvement. | 14:20 |
bandvig | Let me to propose a plan for further development of mor1kx... | 14:22 |
bandvig | (1 of 4) The 1-st step (let me to repeat). Single issue, paralleled units, conflict control on WB stage. | 14:22 |
bandvig | Proposed code name is Latte (?). (Wikipedia: "A cappuccino differs from a caffè latte in that it is prepared with much less steamed or textured milk than the caffè latte..." | 14:23 |
bandvig | So Latte is more steamed milk (paralleled units) and more textured milk (more smart conflict control)) :)) | 14:23 |
bandvig | (2 of 4) The 2-nd step. Expand the 1-st step by implementation of out-of-order completion, i.e. full featured reorder buffer in WB. | 14:24 |
bandvig | Proposed code name is Marocchino (espresso, steamed milk, cocoa powder). | 14:24 |
bandvig | (3 of 4) The step 3-rd. Expant the step 2 with implementation Tomasulo algorithm. By the way the reservation stations could play a role of separate REGs for each unit. | 14:24 |
bandvig | Proposed code name is Miel (espresso, steamed milk, cinnamon and hone) | 14:24 |
bandvig | (4 of 4) Multi issue extension of step 3. Proposed code name is Lungo (from ORCONF-2014 materials) | 14:24 |
bandvig | To all. Comments? Proposals (about code names for example)? | 14:25 |
bandvig | Personally, I'm planning to start implementation the discussed approach (paralleled units with stalling from WB) in several days. | 14:40 |
stekern | bandvig: sounds like a plan | 14:46 |
rschmidlin | Hmm, I have taken the implementation from mor1kx-generic. It also has ram_wb_b3.v and not wb_ram_b3.v. I don't know anything about the wb_ram_b3.v implementation then. | 14:55 |
rschmidlin | I had a working data cache with ram_wb.v which essentially includes an arbiter before the memory. | 14:57 |
bandvig | stekern: Yes, it is a plan :). Let say, it is my style. Before each FPU iteration I usually generated a plan with set of small steps to achieve next goal. | 15:02 |
bandvig | stekern: Of course I don't force anybody to follow the proposed 4 steps. | 15:02 |
olofk | bandvig: Like the coffee names and implementation ideas, so Go go go! :) | 16:44 |
olofk | rschmidlin: ram_wb_b3 (from orpsocv2, with built-in arbiter) was never meant for synthesising and shouldn't be used anymore | 16:45 |
olofk | wb_ram (from yours truly) is the way to go. If you find problems with that, I'm all ears | 16:46 |
olofk | oh... I see that three systems still use ram_wb | 16:47 |
olofk | mor1kx-generic and or1200-generic aren't intended for synthesis, so that's no big deal | 16:47 |
olofk | sockit however... | 16:48 |
olofk | ah no.. mor1kx-generic doesn't use ram_wb after all. It uses my Wishbone memory BFM which is definitely not synthesisable | 16:56 |
olofk | Is Nathan Yawn still active btw? | 16:58 |
olofk | I would like to do a new release of adv_debug_sys | 16:58 |
olofk | With the patches we have gathered in FuseSoC | 16:58 |
rschmidlin | I don't think so. But you can certainly contact him about the patches. | 17:02 |
olofk | Nope. Wrong again. mor1kx-generic does use ram_wb | 17:05 |
rschmidlin | olofk, what should matter is if we can get an internal memory based on block/distributed rams for FPGAs working with mor1kx. | 17:05 |
olofk | rschmidlin: Yes. The whole idea with wb_ram was to have a synthesisable RAM with multiple backends for different FPGAs | 17:05 |
olofk | But I only ever wrote a generic backend | 17:06 |
rschmidlin | olofk, Vivado and ISE are able to infer block rams from that code Olof. | 17:06 |
rschmidlin | so it must be good enough for Xilinx already. | 17:07 |
olofk | rschmidlin: I had to do some tricks get that working with ISE, and it turns out there are still some problems with Quartus | 17:07 |
rschmidlin | olofk, if you are looking for something usable with Quartus, strip out the Wishbone things and go with this: http://opencores.org/websvn,filedetails?repname=minsoc&path=%2Fminsoc%2Ftrunk%2Frtl%2Fverilog%2Fminsoc_onchip_ram_top.v | 17:10 |
rschmidlin | It simply generates banks of 4 8-byte memory blocks and mux them together right. | 17:10 |
olofk | rschmidlin: Well, there's another thing that made things a bit more complicated | 17:11 |
rschmidlin | olofk, but I'd let ram_wb_b3.v as it is in case it is already working with Quart.s | 17:11 |
olofk | I wanted to have the option to preload the RAM from a verilog memory file | 17:12 |
olofk | That works fine in simulation (and ISE I think) | 17:13 |
olofk | But Quartus decides to split up my 32-bit memory to 4 8-bit memories (even if they have a primitive with byte-wise write enables) | 17:13 |
olofk | And when it splits up the memory it no longer loads the data | 17:13 |
olofk | I wanted the preloading so that I could have a bootloader that could also hold some volatile data | 17:14 |
olofk | But I've decided to drop that idea since the FPGA tools just won't play nice | 17:14 |
olofk | So then 4 8-bit memories work fine, even if it's a bit of waste of memories | 17:15 |
olofk | Another way to work around that would be to directly instantiate Altera primitives to get one 32-bit memory with byte-wise write enable | 17:16 |
olofk | But then it can only load their stupid .mif format | 17:16 |
rschmidlin | ahh, you want that the synthesizer initializes the memory for you? | 17:16 |
olofk | Yes. That works mostly fine except for the tool bugs | 17:17 |
olofk | But I'm giving up on that now and create a wb_rom component instead that can be preloaded in a portable way | 17:18 |
olofk | Then I'll probably drop the initialization code from wb_ram, or at least explain that it's broken on some devices (Cyclone IV at least) | 17:18 |
rschmidlin | I get it. However, my current problem seems to be the iteration between the caches' Wishbone interfaces and the memory Wishbone wrapper. | 17:18 |
olofk | Did you run simulations on it? | 17:19 |
rschmidlin | olofk, I believe there is already a rom module simply written out by a script, isnt there? | 17:19 |
olofk | rschmidlin: Yes there is, but that one is a bit awkward | 17:19 |
olofk | I'm just dropping the write enables from wb_ram | 17:20 |
olofk | That makes it easier to switch contents at compile-time | 17:20 |
olofk | (see or1k_bootloaders) | 17:20 |
rschmidlin | olofk, the simulations work. I was discussing that with stekern the whole weekend. He told me that he already heard issues as such that Xilinx memories are not playing along the lines with the description. And that the caches are somewhat picky on the burst transactions. But I'm left somewhat clueless on what to do. My step will be to deny bursts first and see if the system is working and then try to put it back in. | 17:20 |
rschmidlin | I thought about going over that libelf situation with fusesoc on mac this weekend. But I had other things to do. | 17:22 |
olofk | rschmidlin: I remember that the synthesis and simulation behaved differently when I first tested wb_ram on a spartan6. Had to create workarounds in the code | 17:27 |
olofk | About libelf, what we want to do is to make fusesoc pick up include files that aren't in the standard gcc include path I think | 17:28 |
olofk | we should use fusesoc.conf for that | 17:28 |
olofk | fuessoc.conf could be used for a lot of other things to, like setting paths to the EDA tools and enable the monochrome mode | 17:29 |
olofk | Right now it only has one option I think :) | 17:29 |
olofk | Can I pass a linker script to or1k-elf-as? | 17:33 |
stekern | olofk: you can pass it to gcc (which will pass it to ld) | 18:02 |
stekern | or directly to ld, it's not called a *linker* script for nothing ;) | 18:03 |
ams | olofk: -Xl switch | 19:14 |
ams | olofk: or -Wl | 19:14 |
ams | olofk: oh, script .. not reaidn well.. but same thing .. -T for script name | 19:14 |
olofk | hahaha I found a limitation in the icecube2 software. The line containing which verilog files to use seem to have a maximum length of 2047 characters | 20:46 |
olofk | Ah ok. It's possible to put each file on a separate line | 20:47 |
--- Log closed Mon Mar 02 00:00:04 2015 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!