IRC logs for #openrisc Saturday, 2015-02-28

--- Log opened Sat Feb 28 00:00:00 2015
stekernrschmidlin: thanks, I'll take a look at that.04:13
stekernyou mentioned you had to hack up some always @* too, do you have a diff with that?04:13
stekernat least the first multi source complaint is assigned in such always @* statement04:16
stekernbah, what good is notes when you can't find where you put them?04:47
GeneralStupidi have one question which drives me crazy :)09:44
GeneralStupidnow i have built the OpenRISC, (ok need testing etc... but dont talk about that), then i want to add an IP Slave... How do i do this?09:44
GeneralStupidDo i have to add the verilog files to Quartus?09:45
GeneralStupidAnd, is it possible to mix verilog and vhdl?09:45
olofkGeneralStupid: Yes, it's possible to mix verilog and vhdl when you build for an FPGA target. It's more tricky for simulation. The vendors charge an insane amount of money for an extra mixed-language license09:51
olofkWhat kind of core do you want to add?09:51
GeneralStupidolofk: i would like to add a custom ipslave for calculating something i already have in hardware and software. A gammatone filter for example.09:53
olofkIs it an existing core, or are you planning to write it?09:55
GeneralStupidpartially existing, but i have to write it for my job09:56
stekernfun, my uart driver works with or1ksim and verilator sim, but not in hw09:56
stekernbet its the baud div reg that doesn't get written with the righr value09:56
olofkstekern: uart16550?09:57
GeneralStupidolofk: i hope i dont have to simulate everything together...09:58
GeneralStupidi also like the Qsys, Nios II stuff in theory. But 400$ is a lot of money :-D10:00
stekernolofk: yes10:59
olofkI'm doing a tagged new release of the core on github now11:02
stekernbut this is a sw bug11:02
olofkRight now I really regret having patches automatically applied11:02
olofkAh right.11:02
olofkDid you look at the newlib code?11:02
stekernturns out that the there's uart output before the init code is run11:03
stekernso, since the baudrate div reg is 0, there's no uart transmit11:03
stekern...but verilator and or1ksim doesn't care about the baudrate div reg, so it happens to work there11:04
stekernno, I didn't look at the newlib code, I used a driver from the omap3 platform11:04
olofkAh right. That's the benefit of using a standard interface (in theory :))11:05
stekernhah, it was even worse... I doesn't even call the init function from anywhere...11:11
stekernstill not working, but I think that's due to the first problem...11:16
olofkDoing some fixes for the patch handling in FuseSoC. Just sucks that I can't really test it :)11:17
stekernbah, now the uarts working, but there's other problems11:33
bandvigHello, all! What is going on with OpenRISC.NET? It look like it is down.15:42
juliusbbandvig: yeah it's been down for a couple of weeks now I think. olofk had a plan I believe16:30
bandvigjuliusb: thanks16:53
bandvigwell... let me announce that I've finished single precision FPU for cappuccino pipeline17:00
bandvigI'm planning to write more detailed description of achievements in mailing list as soon as OpenRISC.NET up17:00
bandvigThe new FPU implementation is (let say) pipeline ready...17:02
bandvigThat means:17:05
bandvig(1) it is potentially expandable to ability to execute most of FP instructions (without data dependency) one by one...17:06
bandvig(2) it is potentially expandable to ability to execute most of FP instructions in parallel with integer and load/store ones (again in case of  data independence)17:08
bandvigHowever, it looks like cappuccino pipe isn't very suitable to implement these features.17:10
bandvigSo, currently each FP instruction stalls pipe till it finishes (analogue to other multi-clock instructions like integer division)17:11
bandvigAs declared in ORCONF-2014 materials, there is a plan to implement more complex pipeline (name proposal: lungo) with multi-issue and out of order execution...17:17
bandvigIt is good ambition. However, personally I would like to start with a simpler step, namely to implement a pipe similar to Beyond Semiconductor's BA25...17:21
olofkbandvig: That's awesome!17:27
bandvignitially even so without out of order completion, but possibility to execute FP/Integer/Load-Store instructions in parallel in case of data independence17:27
olofkWe still have the mailing list at lists.opencores.org17:27
olofkI guess this is one of those rare situations where we are actually helped by the dual lists :)17:28
olofkand I'll take care of in a few days17:28
bandvigolofk: I've even two accounts OpenCores. However, It was strange, but I had a quite long way to achieve correct registration (with some exchange of letters with OpenCores admins). At all, after several iterations, I wasn't successful to get subscription on :( So, I'm looking forward openrisc.net17:42
GeneralStupidwhats the problem with opencores?17:43
GeneralStupidi thought openrisc is the initial project ?!17:43
olofkbandvig: Aha. I thought you could subscribe to the mailing list without an account17:43
olofkGeneralStupid: Long story with different people wanting different things some years ago17:44
GeneralStupidolofk: ohh :( í have to say opencores is a bit confusing for me. but i like the idea of it17:45
olofkGeneralStupid: Yes. I think that's one thing we can all agre on :)'17:46
GeneralStupidi think there should be more guidance17:51
GeneralStupidthat whole stuff is really complex... and IMHO the most people want just use that stuff for their own developments and not getting experts in it first17:52
olofkGeneralStupid: Yes, it is a very steep learning curve, and we have known about this problem for years. It is getting a bit more straight-forward every year but we still got a long way to go17:54
stekernbandvig: the problem is that the fpu use the normal regs17:55
stekern(that has some nice properties too though)17:55
stekernso, unless you can finish the FPU operation before wb, I'm not sure there are much benefit of not stalling already in execute17:57
GeneralStupidolofk: i think fusesoc is the right way.17:57
olofkGeneralStupid: I hope so too, but it is also intended to be quite low-level and I would love if someone did some GUI or CLI on top of it to make certain operations easier17:58
GeneralStupidolofk: whats now needed is some tool like Qsys ... or something which can manage "easily" IPSlaves or create custom IP Slaves (with pre defined interface)17:59
GeneralStupidolofk: yes. :)17:59
olofkI have been experimenting with IP-XACT and Kactus2 for top-level generation. Hope to push some of the results soonish17:59
olofkThat would cover some of the areas that qsys do17:59
GeneralStupidolofk: iam working on a big institute... and these work which i want to do is very interesting for the professor - if i do it good propably i can make my doctor thesis with him...18:00
olofkBut using standards and open source tools instead of proprietary vendor-lock-in crap18:00
bandvigolofk: I even tried to subscript opencore's lists without opencores account. I sent subscription request through web-form, but haven't got any e-mail responce. I tried again, but without success as subscription engine reported that such account already exists. I tried to send a post and it was rejected. I stopped my experiments with lists on opencores. Now I'm ever not remember exactly my account for list (but I' able to login in Op18:00
GeneralStupidolofk: so iam very interested and i have to take time for that...18:00
GeneralStupidolofk: probably i could try write something with Qt ...18:00
olofkGeneralStupid: That's great. We have had quite a few doctor grade students coming in here to help out. It's been very good18:01
GeneralStupidolofk: we are using MUCH properitary stuff... The work i want to do is standard at our institute (IMS) but they wanted to see if it is working with open source too...18:01
olofkbandvig: Ahh. right. That's probably because they auto-created an mailing list account from your opencores account. Yeah. That sucks, and it's not easy to get help :/18:02
olofkGeneralStupid: Well, we are happy to help if they are interested in learning more about the open source options18:03
GeneralStupidolofk:  i really think they are, but if i understand them right, they already tried the leon2 and it tooked to long to set it correctly up.18:05
GeneralStupidolofk: compared to nios II18:05
GeneralStupidolofk: and we are more focussed on the zedboards18:06
olofkIn terms of raw processing power, it makes no sense to compete with the hard ARM cores18:06
olofkOpenRISC has its advantage in that it's portable between FPGA vendors and can be used for ASIC without paying license fees18:07
olofkThe open source aspect also allows people to make changes to its internals18:08
olofkLike adding a kickass FPU :)18:08
bandvigGeneralStupid: please, see my adventure with subscription on opencore's lists I described to olofk. I had quite similar process with creating accounts on OpenCores site. As a result I've two OpenCores accounts :)18:11
GeneralStupidok so is coming?18:11
olofkTime to break FuseSoC!18:12
bandvigwell, let's come back to pipelines...18:13
GeneralStupidthe idea is, getting a lot of pre configured openrisc's in one program?18:18
stekernbandvig: btw, when/if you feel that your fpu work is ready to be merged in to master, give us a ping18:33
stekernI still haven't got around to take a closer look at it, just some overview, but if it's detached enough I don't see a reason why you can't continue with it in master branch18:34
bandvigstekern: it is already available in  I think I'll create pull request soon18:36
stekernyes, I know. ;) and that sounds great18:38
olofkall right then. Most cores in orpsoc-cores should work now18:43
olofkMake sure to do a 'fusesoc update'18:44
olofkOr perhaps just a git pull in orpsoc-cores. Don't really remember what fusesoc update does18:44
olofkGeneralStupid: It would be great if you could remove your changes to orpsoc-cores as well as the files in the tar ball I linked to yesterday, update orpsoc-cores and see if it works now18:54
stekernolofk: time to give your updated build instructions for or1k-elf-gcc another round...18:55
olofkstekern: What changed?18:55
olofkI've been planning to do that anyway to point to the newlib tarball instead of git18:56
stekernnothing, I'm sitting at my workstation now and realised that my or1k-elf- toolchain is 'ancient'18:57
stekernand contains blueCmd's atomic implementation with syscalls...18:57
olofkah :)18:58
bandvigstekern: lets say it mostly detached and concentrated in pfpu32 sub-folrder, however I also made some additions in cappuccino's "decode", decode_execute", "execute", "execute_ctrl", "ctrl" and created additional connections through whole cappuccino pipe.19:03
stekernbandvig: that's fine, as long as the changes are no-ops when the fpu is disabled19:05
stekernor at least doesn't cause any additional logic nor critical paths19:05
bandvigstekern: yes, all of them are placed under FEATURE_FPU, I've just mentioned about it to clarify the reason I wasn't hasty to push FPU into master branch. Well, lets back to pipeline discussion.19:08
bandvigI don't see a problem with sharing GPR among Integer/FP units. I think there is not enough parallelism in cappuccino pipe. Let's compare: ...19:11
bandvig(1 of 3) cappuccino: pc->fetch->decode->execute(with stall)->ctrl/mem->wb19:12
bandvig(2 of 3) ba25: pc->fetch->decode->parallel(integer/fp/load_store/ctrl WITH out of order complition)->wb19:12
bandvig(3 of 3) my proposal: pc->fetch->decode->parallel(integer/fp/load_store/ctrl WITHOUT out of order complition)->wb19:12
stekernwell, if you are doing it in paralell, it'll be out of order, no?19:16
stekernI mean, the writes to the rf will not be in order if you're going to allow instructions coming after a FPU instruction to finish before the FPU instruction19:19
stekernolofk: for some reason when I try to wget the newlib tarball, it fails19:21
olofkstekern: Maybe I should have tested that first19:21
stekernit claims the file is not available, but if I go to the ftp and download it manually the file is there19:21
olofkI'll revert the change19:21
stekernand you are missing the the 'tar zxvf' instruction19:22
stekernhow do I extract an tar.gz?19:22
stekernah! you have an extra 2 in the filename19:22
olofkI reverted it19:24
olofkI haven't tried building from the tar ball after all19:24
stekernwell I'm doing it now...19:25
olofkWell, you know what to do if it works :)19:25
bandvigI'm planing to organize say "limited" parallelism... For example, "Integer pipelined MUL" potentially could be paralleled with L/S and FPU-op, but instruction issue logic have take into account the length of each unit, so all multi-cycle issued instructions must achieve WB in issue order19:25
stekernbandvig: oh, ok. I/we have a version of the integer mul that does that19:27
stekerni.e. the mul is pipelined along the pipeline and it's result is ready in wb stage19:28
stekernso it's done in 'paralell' with whatever happens to be in the pipeline19:29
stekerndo I understand you correctly that you want to expand that, and stall the pipeline from wb stage until the FPU instruction is ready there?19:30
stekernthat way you could have several slow instructions working in paralell in execute and ctrl stage19:31
stekernfor instance another FPU instruction or a cache refill19:32
bandvigto implement pipelined mul in cappuccino pipe it was necessary to create ... how to say ... glue logic looks like crutches. I want to re-desine pipe to ideology which able to include more sophisticated operation in systematic (and I believe more simpler and straightforward) way. Let me clarify...19:40
bandvigLets FPU-MUL takes 6 clocks and IntMUL takes 3 clocks. No dependency. FPU-MUL is issued first. Issue logic wait 4 clocks and issues IntMUL. FPU-MUL and IntMUL achieve WB in issue order, however they operate in parallel during 2 clocks.19:45
stekernok, that's basically what I described, but with the added restriction that you need to know the number of clocks the following instruction(s) will take19:49
bandvigI think the BA25-like architecture is more siutable for such approach than cappuccino19:50
rschmidlinstekern, I have dropped the git diff log in my google drive again and shared it with you.20:00
rschmidlinstekern, in case you find the time of course.20:01
bandvigA question to all. :)  As I'm not an coffee expert :) winch code name is suitable for such pipe? ... after some reading of wikipedia... Cortado? Latte? Something else?20:02
olofkIrish coffee would make sense if it's an out-of-order pipeline :)20:03
stekernbandvig: how decoupled are the FPU instructions, can you issue several FPU instructions at the same time?20:04
stekernrschmidlin: I'll take a look20:06
stekernbandvig: because, I think what you propose will have most benefit for FPU instructions, given that basically in the current state, only the integer div is such an instruction that stalls the pipeline and the number of instructions are known20:07
bandvig(1) It should be single issue pipe. (2) FpMUL, FpADD, FpSUB, FpCompare, FpToInt, IntToFp are pipelined completety. They are able paralleled in described sence. FpDiv could be paralleled with almost all of them excluding FpMUL (they share multiplier).20:12
stekernI still think letting the instructions propagate into wb stage and stall from there (but unfinished instructions in execute and mem/ctrl can continue until they are finished) might be better than to depend on the number of cycles of the following isntructions20:13
stekernstalling from wb stage is hairy though, especially in cappuccino since it's not designed for that, so you're right that it'll be hard to do it there20:14
stekernbut all that is of course under the assumption that there are no dependencies, which there will be20:15
olofkAwesome! I can build bitstreams for the Lattice iCE40 devices now with FuseSoC. Too bad I don't have any actual devices to try this with :)20:17
olofkAnyone got an iCE40 around?20:18
stekernrschmidlin: ok, that was less instrusive than you made it sound20:18
stekernolofk: newlib from tar compiled and installed at least20:20
bandvigAnd yes, as I wrote into listm I'm not interested in yet another super small micro-controller. I'm interested in CPU with performance level close to ARM Cortex-A8/A9 including quite powerful FPU32/64 arithmetic, but not in fancy DSP-like instruction (I prefer to build convolution engines in hardware). I treat FPGA mostly as prototyping platform.20:21
olofkstekern: It would be great if you could compile and run a program that prints more than 13 characters too20:21
stekernolofk: who would need that many characters?20:28
olofkYeah I agree. Hello world!\n is 13. I've never felt the need to write anything more than that20:32
bandvigstekern: as far as I understand, your approach is closer to out of order completion with some how imitation of reorder buffer. I think, in the approach the issuing, for example, two consecutive independent FpMUL will require more complex control inside FPU-MUL pipe, while with my approach the complexity is concentrated in issue logic. As raiment it requires a priori knowledge about units architecture.20:33
bandvig" As payment ... " of course20:36
stekernI think the main win I'm after would be to be able to paralallize cache refills with alu instructions20:38
stekernor non-cached loads20:39
stekernbut you could already achieve that by not stalling the start of a multicycle instructions in execute stage if it doesn't depend on the result of the load as things stand today in cappuccino20:40
stekernunless the load comes after the multicycle instruction, that's where there would be a win20:41
bandvigstekern: as far as I understand, even so there isn't dependency of load result, you propose any how require a kind of reorder buffer because registry file update must be performed in order. In fact, as I won't to implement reorder buffer (at least at the step), my scheme has got some drawback comparable to out-of-order-completion + reorder buffer. In case of cache miss (un-cached access) I plan to stall issue login and other pipes t21:11
bandvigbus "ACK".21:11
stekernno, what I proposed was to wait in writeback stage, leaving mem stage to be able to perform timeconsuming work while waiting21:16
stekernif mem stage finishes before the multicycle instruction in wb, it stalls until the wb instruction has finished21:17
stekernpretty much like the pipelined mul works, but with the addition that it can stall too21:17
stekern(the pipelined mul will always be finished when it reaches wb)21:18
stekernI guess another approach (or way to explain what I mean) would be to add a seperate 'multicycle execute stage', located after the mem stage21:20
bandvigstekern: Thanks for discussion. As I'm really want to sleep :) I'm going to think on your proposal tomorrow. Lets see soon.21:25
stekernbandvig: dito. I'm not trying to steer you away from your ideas too, just trying to ponder different angles of it ;)21:26
stekernbut basically what I said about seperate FPU and mul regs applies, that's how other architectures solved the problem. That way the multicycle instructions can 'fork off' and doesn't have to bother with dependencies (other than of their own type)21:28
stekernthat's not an option for us though, and using the GPRs have some nice properties as well21:29
bandvigbe sure, I perceive your ideas exactly in the way. And I'm aways open to alternative ideas.21:29
stekernolofk: I can print more than 13 characters21:58
olofkstekern: I say that's roughly 100% code coverage23:06
--- Log closed Sun Mar 01 00:00:02 2015

Generated by 2.15.2 by Marius Gedminas - find it at!