--- Log opened Sat Feb 28 00:00:00 2015 | ||
stekern | rschmidlin: thanks, I'll take a look at that. | 04:13 |
---|---|---|
stekern | you mentioned you had to hack up some always @* too, do you have a diff with that? | 04:13 |
stekern | at least the first multi source complaint is assigned in such always @* statement | 04:16 |
stekern | bah, what good is notes when you can't find where you put them? | 04:47 |
GeneralStupid | i have one question which drives me crazy :) | 09:44 |
GeneralStupid | now i have built the OpenRISC, (ok need testing etc... but dont talk about that), then i want to add an IP Slave... How do i do this? | 09:44 |
GeneralStupid | Do i have to add the verilog files to Quartus? | 09:45 |
GeneralStupid | And, is it possible to mix verilog and vhdl? | 09:45 |
olofk | GeneralStupid: Yes, it's possible to mix verilog and vhdl when you build for an FPGA target. It's more tricky for simulation. The vendors charge an insane amount of money for an extra mixed-language license | 09:51 |
olofk | What kind of core do you want to add? | 09:51 |
GeneralStupid | olofk: i would like to add a custom ipslave for calculating something i already have in hardware and software. A gammatone filter for example. | 09:53 |
olofk | Is it an existing core, or are you planning to write it? | 09:55 |
GeneralStupid | partially existing, but i have to write it for my job | 09:56 |
stekern | fun, my uart driver works with or1ksim and verilator sim, but not in hw | 09:56 |
stekern | bet its the baud div reg that doesn't get written with the righr value | 09:56 |
olofk | stekern: uart16550? | 09:57 |
GeneralStupid | olofk: i hope i dont have to simulate everything together... | 09:58 |
GeneralStupid | i also like the Qsys, Nios II stuff in theory. But 400$ is a lot of money :-D | 10:00 |
stekern | olofk: yes | 10:59 |
olofk | I'm doing a tagged new release of the core on github now | 11:02 |
stekern | cool | 11:02 |
stekern | but this is a sw bug | 11:02 |
olofk | Right now I really regret having patches automatically applied | 11:02 |
olofk | Ah right. | 11:02 |
olofk | Did you look at the newlib code? | 11:02 |
stekern | turns out that the there's uart output before the init code is run | 11:03 |
olofk | oops | 11:03 |
stekern | so, since the baudrate div reg is 0, there's no uart transmit | 11:03 |
stekern | ...but verilator and or1ksim doesn't care about the baudrate div reg, so it happens to work there | 11:04 |
stekern | no, I didn't look at the newlib code, I used a driver from the omap3 platform | 11:04 |
olofk | Ah right. That's the benefit of using a standard interface (in theory :)) | 11:05 |
stekern | hah, it was even worse... I doesn't even call the init function from anywhere... | 11:11 |
olofk | oh. | 11:11 |
stekern | still not working, but I think that's due to the first problem... | 11:16 |
olofk | Doing some fixes for the patch handling in FuseSoC. Just sucks that I can't really test it :) | 11:17 |
stekern | bah, now the uarts working, but there's other problems | 11:33 |
bandvig | Hello, all! What is going on with OpenRISC.NET? It look like it is down. | 15:42 |
juliusb | bandvig: yeah it's been down for a couple of weeks now I think. olofk had a plan I believe | 16:30 |
bandvig | juliusb: thanks | 16:53 |
bandvig | well... let me announce that I've finished single precision FPU for cappuccino pipeline | 17:00 |
bandvig | I'm planning to write more detailed description of achievements in openrisrc.net mailing list as soon as OpenRISC.NET up | 17:00 |
bandvig | The new FPU implementation is (let say) pipeline ready... | 17:02 |
bandvig | That means: | 17:05 |
bandvig | (1) it is potentially expandable to ability to execute most of FP instructions (without data dependency) one by one... | 17:06 |
bandvig | (2) it is potentially expandable to ability to execute most of FP instructions in parallel with integer and load/store ones (again in case of data independence) | 17:08 |
bandvig | However, it looks like cappuccino pipe isn't very suitable to implement these features. | 17:10 |
bandvig | So, currently each FP instruction stalls pipe till it finishes (analogue to other multi-clock instructions like integer division) | 17:11 |
bandvig | As declared in ORCONF-2014 materials, there is a plan to implement more complex pipeline (name proposal: lungo) with multi-issue and out of order execution... | 17:17 |
bandvig | It is good ambition. However, personally I would like to start with a simpler step, namely to implement a pipe similar to Beyond Semiconductor's BA25... | 17:21 |
olofk | bandvig: That's awesome! | 17:27 |
bandvig | nitially even so without out of order completion, but possibility to execute FP/Integer/Load-Store instructions in parallel in case of data independence | 17:27 |
olofk | We still have the mailing list at lists.opencores.org | 17:27 |
olofk | I guess this is one of those rare situations where we are actually helped by the dual lists :) | 17:28 |
olofk | and I'll take care of openrisc.net in a few days | 17:28 |
bandvig | olofk: I've even two accounts OpenCores. However, It was strange, but I had a quite long way to achieve correct registration (with some exchange of letters with OpenCores admins). At all, after several iterations, I wasn't successful to get subscription on lists.opencores.org. :( So, I'm looking forward openrisc.net | 17:42 |
GeneralStupid | whats the problem with opencores? | 17:43 |
GeneralStupid | i thought openrisc is the initial project ?! | 17:43 |
olofk | bandvig: Aha. I thought you could subscribe to the mailing list without an account | 17:43 |
olofk | GeneralStupid: Long story with different people wanting different things some years ago | 17:44 |
GeneralStupid | olofk: ohh :( í have to say opencores is a bit confusing for me. but i like the idea of it | 17:45 |
olofk | GeneralStupid: Yes. I think that's one thing we can all agre on :)' | 17:46 |
GeneralStupid | i think there should be more guidance | 17:51 |
GeneralStupid | that whole stuff is really complex... and IMHO the most people want just use that stuff for their own developments and not getting experts in it first | 17:52 |
olofk | GeneralStupid: Yes, it is a very steep learning curve, and we have known about this problem for years. It is getting a bit more straight-forward every year but we still got a long way to go | 17:54 |
stekern | bandvig: the problem is that the fpu use the normal regs | 17:55 |
stekern | (that has some nice properties too though) | 17:55 |
stekern | so, unless you can finish the FPU operation before wb, I'm not sure there are much benefit of not stalling already in execute | 17:57 |
GeneralStupid | olofk: i think fusesoc is the right way. | 17:57 |
olofk | GeneralStupid: I hope so too, but it is also intended to be quite low-level and I would love if someone did some GUI or CLI on top of it to make certain operations easier | 17:58 |
GeneralStupid | olofk: whats now needed is some tool like Qsys ... or something which can manage "easily" IPSlaves or create custom IP Slaves (with pre defined interface) | 17:59 |
GeneralStupid | olofk: yes. :) | 17:59 |
olofk | I have been experimenting with IP-XACT and Kactus2 for top-level generation. Hope to push some of the results soonish | 17:59 |
olofk | That would cover some of the areas that qsys do | 17:59 |
GeneralStupid | olofk: iam working on a big institute... and these work which i want to do is very interesting for the professor - if i do it good propably i can make my doctor thesis with him... | 18:00 |
olofk | But using standards and open source tools instead of proprietary vendor-lock-in crap | 18:00 |
bandvig | olofk: I even tried to subscript opencore's lists without opencores account. I sent subscription request through web-form, but haven't got any e-mail responce. I tried again, but without success as subscription engine reported that such account already exists. I tried to send a post and it was rejected. I stopped my experiments with lists on opencores. Now I'm ever not remember exactly my account for list (but I' able to login in Op | 18:00 |
GeneralStupid | olofk: so iam very interested and i have to take time for that... | 18:00 |
GeneralStupid | olofk: probably i could try write something with Qt ... | 18:00 |
olofk | GeneralStupid: That's great. We have had quite a few doctor grade students coming in here to help out. It's been very good | 18:01 |
GeneralStupid | olofk: we are using MUCH properitary stuff... The work i want to do is standard at our institute (IMS) but they wanted to see if it is working with open source too... | 18:01 |
olofk | bandvig: Ahh. right. That's probably because they auto-created an mailing list account from your opencores account. Yeah. That sucks, and it's not easy to get help :/ | 18:02 |
olofk | GeneralStupid: Well, we are happy to help if they are interested in learning more about the open source options | 18:03 |
GeneralStupid | olofk: i really think they are, but if i understand them right, they already tried the leon2 and it tooked to long to set it correctly up. | 18:05 |
GeneralStupid | olofk: compared to nios II | 18:05 |
GeneralStupid | olofk: and we are more focussed on the zedboards | 18:06 |
olofk | In terms of raw processing power, it makes no sense to compete with the hard ARM cores | 18:06 |
olofk | OpenRISC has its advantage in that it's portable between FPGA vendors and can be used for ASIC without paying license fees | 18:07 |
olofk | The open source aspect also allows people to make changes to its internals | 18:08 |
olofk | Like adding a kickass FPU :) | 18:08 |
bandvig | GeneralStupid: please, see my adventure with subscription on opencore's lists I described to olofk. I had quite similar process with creating accounts on OpenCores site. As a result I've two OpenCores accounts :) | 18:11 |
GeneralStupid | ok so openrisc.org is coming? | 18:11 |
olofk | Time to break FuseSoC! | 18:12 |
bandvig | well, let's come back to pipelines... | 18:13 |
GeneralStupid | the idea is, getting a lot of pre configured openrisc's in one program? | 18:18 |
stekern | bandvig: btw, when/if you feel that your fpu work is ready to be merged in to master, give us a ping | 18:33 |
stekern | I still haven't got around to take a closer look at it, just some overview, but if it's detached enough I don't see a reason why you can't continue with it in master branch | 18:34 |
bandvig | stekern: it is already available in https://github.com/openrisc/mor1kx/tree/withfpu I think I'll create pull request soon | 18:36 |
stekern | yes, I know. ;) and that sounds great | 18:38 |
olofk | all right then. Most cores in orpsoc-cores should work now | 18:43 |
olofk | Make sure to do a 'fusesoc update' | 18:44 |
olofk | Or perhaps just a git pull in orpsoc-cores. Don't really remember what fusesoc update does | 18:44 |
olofk | GeneralStupid: It would be great if you could remove your changes to orpsoc-cores as well as the files in the tar ball I linked to yesterday, update orpsoc-cores and see if it works now | 18:54 |
stekern | olofk: time to give your updated build instructions for or1k-elf-gcc another round... | 18:55 |
olofk | stekern: What changed? | 18:55 |
olofk | I've been planning to do that anyway to point to the newlib tarball instead of git | 18:56 |
stekern | nothing, I'm sitting at my workstation now and realised that my or1k-elf- toolchain is 'ancient' | 18:57 |
stekern | and contains blueCmd's atomic implementation with syscalls... | 18:57 |
olofk | ah :) | 18:58 |
bandvig | stekern: lets say it mostly detached and concentrated in pfpu32 sub-folrder, however I also made some additions in cappuccino's "decode", decode_execute", "execute", "execute_ctrl", "ctrl" and created additional connections through whole cappuccino pipe. | 19:03 |
stekern | bandvig: that's fine, as long as the changes are no-ops when the fpu is disabled | 19:05 |
stekern | or at least doesn't cause any additional logic nor critical paths | 19:05 |
bandvig | stekern: yes, all of them are placed under FEATURE_FPU, I've just mentioned about it to clarify the reason I wasn't hasty to push FPU into master branch. Well, lets back to pipeline discussion. | 19:08 |
bandvig | I don't see a problem with sharing GPR among Integer/FP units. I think there is not enough parallelism in cappuccino pipe. Let's compare: ... | 19:11 |
bandvig | (1 of 3) cappuccino: pc->fetch->decode->execute(with stall)->ctrl/mem->wb | 19:12 |
bandvig | (2 of 3) ba25: pc->fetch->decode->parallel(integer/fp/load_store/ctrl WITH out of order complition)->wb | 19:12 |
bandvig | (3 of 3) my proposal: pc->fetch->decode->parallel(integer/fp/load_store/ctrl WITHOUT out of order complition)->wb | 19:12 |
stekern | well, if you are doing it in paralell, it'll be out of order, no? | 19:16 |
stekern | I mean, the writes to the rf will not be in order if you're going to allow instructions coming after a FPU instruction to finish before the FPU instruction | 19:19 |
stekern | olofk: for some reason when I try to wget the newlib tarball, it fails | 19:21 |
olofk | stekern: Maybe I should have tested that first | 19:21 |
stekern | it claims the file is not available, but if I go to the ftp and download it manually the file is there | 19:21 |
olofk | I'll revert the change | 19:21 |
stekern | and you are missing the the 'tar zxvf' instruction | 19:22 |
olofk | oh... | 19:22 |
stekern | how do I extract an tar.gz? | 19:22 |
stekern | ah! you have an extra 2 in the filename | 19:22 |
olofk | I reverted it | 19:24 |
olofk | I haven't tried building from the tar ball after all | 19:24 |
stekern | well I'm doing it now... | 19:25 |
olofk | cool | 19:25 |
olofk | Well, you know what to do if it works :) | 19:25 |
bandvig | I'm planing to organize say "limited" parallelism... For example, "Integer pipelined MUL" potentially could be paralleled with L/S and FPU-op, but instruction issue logic have take into account the length of each unit, so all multi-cycle issued instructions must achieve WB in issue order | 19:25 |
stekern | bandvig: oh, ok. I/we have a version of the integer mul that does that | 19:27 |
stekern | i.e. the mul is pipelined along the pipeline and it's result is ready in wb stage | 19:28 |
stekern | so it's done in 'paralell' with whatever happens to be in the pipeline | 19:29 |
stekern | do I understand you correctly that you want to expand that, and stall the pipeline from wb stage until the FPU instruction is ready there? | 19:30 |
stekern | that way you could have several slow instructions working in paralell in execute and ctrl stage | 19:31 |
stekern | for instance another FPU instruction or a cache refill | 19:32 |
bandvig | to implement pipelined mul in cappuccino pipe it was necessary to create ... how to say ... glue logic looks like crutches. I want to re-desine pipe to ideology which able to include more sophisticated operation in systematic (and I believe more simpler and straightforward) way. Let me clarify... | 19:40 |
bandvig | Lets FPU-MUL takes 6 clocks and IntMUL takes 3 clocks. No dependency. FPU-MUL is issued first. Issue logic wait 4 clocks and issues IntMUL. FPU-MUL and IntMUL achieve WB in issue order, however they operate in parallel during 2 clocks. | 19:45 |
stekern | ok, that's basically what I described, but with the added restriction that you need to know the number of clocks the following instruction(s) will take | 19:49 |
bandvig | I think the BA25-like architecture is more siutable for such approach than cappuccino | 19:50 |
rschmidlin | stekern, I have dropped the git diff log in my google drive again and shared it with you. | 20:00 |
rschmidlin | stekern, in case you find the time of course. | 20:01 |
bandvig | A question to all. :) As I'm not an coffee expert :) winch code name is suitable for such pipe? ... after some reading of wikipedia... Cortado? Latte? Something else? | 20:02 |
olofk | Irish coffee would make sense if it's an out-of-order pipeline :) | 20:03 |
stekern | bandvig: how decoupled are the FPU instructions, can you issue several FPU instructions at the same time? | 20:04 |
stekern | rschmidlin: I'll take a look | 20:06 |
stekern | bandvig: because, I think what you propose will have most benefit for FPU instructions, given that basically in the current state, only the integer div is such an instruction that stalls the pipeline and the number of instructions are known | 20:07 |
bandvig | (1) It should be single issue pipe. (2) FpMUL, FpADD, FpSUB, FpCompare, FpToInt, IntToFp are pipelined completety. They are able paralleled in described sence. FpDiv could be paralleled with almost all of them excluding FpMUL (they share multiplier). | 20:12 |
stekern | I still think letting the instructions propagate into wb stage and stall from there (but unfinished instructions in execute and mem/ctrl can continue until they are finished) might be better than to depend on the number of cycles of the following isntructions | 20:13 |
stekern | stalling from wb stage is hairy though, especially in cappuccino since it's not designed for that, so you're right that it'll be hard to do it there | 20:14 |
stekern | but all that is of course under the assumption that there are no dependencies, which there will be | 20:15 |
olofk | Awesome! I can build bitstreams for the Lattice iCE40 devices now with FuseSoC. Too bad I don't have any actual devices to try this with :) | 20:17 |
olofk | Anyone got an iCE40 around? | 20:18 |
stekern | rschmidlin: ok, that was less instrusive than you made it sound | 20:18 |
stekern | olofk: newlib from tar compiled and installed at least | 20:20 |
bandvig | And yes, as I wrote into openrisk.net listm I'm not interested in yet another super small micro-controller. I'm interested in CPU with performance level close to ARM Cortex-A8/A9 including quite powerful FPU32/64 arithmetic, but not in fancy DSP-like instruction (I prefer to build convolution engines in hardware). I treat FPGA mostly as prototyping platform. | 20:21 |
olofk | stekern: It would be great if you could compile and run a program that prints more than 13 characters too | 20:21 |
stekern | olofk: who would need that many characters? | 20:28 |
olofk | Yeah I agree. Hello world!\n is 13. I've never felt the need to write anything more than that | 20:32 |
bandvig | stekern: as far as I understand, your approach is closer to out of order completion with some how imitation of reorder buffer. I think, in the approach the issuing, for example, two consecutive independent FpMUL will require more complex control inside FPU-MUL pipe, while with my approach the complexity is concentrated in issue logic. As raiment it requires a priori knowledge about units architecture. | 20:33 |
bandvig | " As payment ... " of course | 20:36 |
stekern | I think the main win I'm after would be to be able to paralallize cache refills with alu instructions | 20:38 |
stekern | or non-cached loads | 20:39 |
stekern | but you could already achieve that by not stalling the start of a multicycle instructions in execute stage if it doesn't depend on the result of the load as things stand today in cappuccino | 20:40 |
stekern | unless the load comes after the multicycle instruction, that's where there would be a win | 20:41 |
bandvig | stekern: as far as I understand, even so there isn't dependency of load result, you propose any how require a kind of reorder buffer because registry file update must be performed in order. In fact, as I won't to implement reorder buffer (at least at the step), my scheme has got some drawback comparable to out-of-order-completion + reorder buffer. In case of cache miss (un-cached access) I plan to stall issue login and other pipes t | 21:11 |
bandvig | bus "ACK". | 21:11 |
stekern | no, what I proposed was to wait in writeback stage, leaving mem stage to be able to perform timeconsuming work while waiting | 21:16 |
stekern | if mem stage finishes before the multicycle instruction in wb, it stalls until the wb instruction has finished | 21:17 |
stekern | pretty much like the pipelined mul works, but with the addition that it can stall too | 21:17 |
stekern | (the pipelined mul will always be finished when it reaches wb) | 21:18 |
stekern | I guess another approach (or way to explain what I mean) would be to add a seperate 'multicycle execute stage', located after the mem stage | 21:20 |
bandvig | stekern: Thanks for discussion. As I'm really want to sleep :) I'm going to think on your proposal tomorrow. Lets see soon. | 21:25 |
stekern | bandvig: dito. I'm not trying to steer you away from your ideas too, just trying to ponder different angles of it ;) | 21:26 |
stekern | but basically what I said about seperate FPU and mul regs applies, that's how other architectures solved the problem. That way the multicycle instructions can 'fork off' and doesn't have to bother with dependencies (other than of their own type) | 21:28 |
stekern | that's not an option for us though, and using the GPRs have some nice properties as well | 21:29 |
bandvig | be sure, I perceive your ideas exactly in the way. And I'm aways open to alternative ideas. | 21:29 |
stekern | olofk: I can print more than 13 characters | 21:58 |
olofk | stekern: I say that's roughly 100% code coverage | 23:06 |
--- Log closed Sun Mar 01 00:00:02 2015 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!