--- Log opened Thu Mar 28 00:00:50 2013 | ||
stekern | glowplug: no, RISC arcitectures seldom do | 03:16 |
---|---|---|
glowplug | Interesting. I've been doing nonstop research into various processor architectures. Very fascinating stuff. | 03:19 |
glowplug | I wonder if microcode is an easy-ish method for obtaining more single-threaded performance without deviating from a small simple instruction set. | 03:19 |
glowplug | If the goal is to have every instruction complete in one cycle then microcode would allow complex operations to execute in one cycle but not be hardwired instructions. | 03:26 |
stekern | hmm, if they would complete in one cycle, then there would not be much microcode in there to sequentially execute, no? | 03:28 |
stekern | or what did you mean? | 03:28 |
glowplug | For example there is no way you guys are going to hardwire the Sin operation into the CPU. That is not RISC like at all. | 03:29 |
glowplug | But microprogramming would allow a Sin operation to execute in a single cycle. | 03:30 |
stekern | first sentence: I agree. | 03:30 |
glowplug | Instruction level parallelism. I think is what I'm describing. | 03:31 |
stekern | second sentence: doesn't make sense | 03:31 |
stekern | if a microprogram only takes one cycle to execute, then it's just one instruction = hardwired | 03:32 |
glowplug | One cycle to reprogram the core with every subsequent operation taking one cycle to complete. | 03:33 |
glowplug | An initial delay followed by a series of parallel instruction operations that completes a more complex operation in a single cycle. | 03:33 |
stekern | but there are already instructions that take more than one cycle to execute, mul, div etc | 03:34 |
glowplug | Have you guys thought about trying to get those down to 1-cycle? | 03:34 |
stekern | 1-cycle div is going to be _huge_ | 03:34 |
glowplug | But consider this. If you have a general parallel instruction pipeline that can execute via microcode you could load div microcode in one cycle, followed by every subsequent div operation completing in just 1 cycle. | 03:36 |
glowplug | That same space could be shared by various other complex operations. Like multiply, sin ect. | 03:36 |
stekern | one thing not so nice with or1k is that div and mul doesn't have seperate registers (mips for example does) | 03:36 |
glowplug | Serialized single core applications would execute very quickly (Linux, general apps ect.). | 03:37 |
stekern | glowplug: yes, of course, that's what pipelining is all about, but you have to consider hazards | 03:37 |
glowplug | Aren't hazards more or less avoidable with instruction level parallelism VS other higher levels of parrallelism? | 03:38 |
stekern | I think I'm just getting confused of your use of the word 'microcode', you are basically just speaking about serial operations. Just like we already do for div | 03:38 |
stekern | and about not stalling the pipeline when executing them | 03:39 |
glowplug | From what I understand microcode is a set of machine instructions stored in CPU memory that is loaded on demand to program gate logic. | 03:39 |
glowplug | For example the PS2 graphics engine has a co-processor that loads microcode many times per frame to process different instructions in less cycles until different code is loaded for other operations. | 03:40 |
glowplug | With 3 CPU's it outperformed desktop GPU's with 48+. | 03:40 |
glowplug | Even more interestingly is that those cores could process normal instructions, they were not dedicated graphics pipelines. | 03:41 |
stekern | if you need the result from the serialized operation in the instruction after it, then you can't execute that instruction | 03:41 |
stekern | that's a hazard | 03:41 |
glowplug | So the entire parallel instruction unit would be frozen with the main CPU forced to process the current operation then the parallel unit could proceed with the data it needs. | 03:44 |
glowplug | Here is an example of what I'm describing (sort of). | 03:46 |
glowplug | http://www.icsa.inf.ed.ac.uk/cgi-bin/hase/emma-m.pl?arch2-t.html,arch2-f.html,menu2.html | 03:46 |
stekern | yes | 03:46 |
stekern | but it still have nothing to do with microcode vs not microcode | 03:46 |
glowplug | They claim here that mul,div are still multicycle but I'm pretty sure those operations are accelerated and that theoretically single cycle is possible. | 03:46 |
stekern | it's perfectly normal to do what you are decribing (for for example mul and div) in a RISC processor, the very first MIPS machines did it | 03:48 |
stekern | but they still didn't use any microcode | 03:48 |
stekern | (and still do) | 03:48 |
stekern | MIPS have seperate registers for mul operations, thus making the hazard logic more lightweight | 03:49 |
glowplug | Absolutely I realize that RISC CPU's traditionally have hardwired mul and div logic. What I'm describing is more like the link above where instead of hardwiring those operations they are programmable. | 03:50 |
glowplug | Which means anything could be placed in that "hardwired" space and benefit from the speedup without greatly increasing gate count. | 03:51 |
stekern | i.e. the mul unit works in parallell with all other instructions, it only stalls the pipeline if the program is trying to read the mul instruction before the mul has completed | 03:51 |
stekern | glowplug: yeah, that could make sense | 03:52 |
glowplug | I see what you are saying also. If you have one programmable parallel instruction pipeline and its set to div and you receive another complex operation you have to wait a cycle for programming. | 03:53 |
stekern | so you basically have a co-processor that is tightly coupled with the cpu | 03:53 |
glowplug | Maybe that is why the PS2 core has 3 CPU's. | 03:53 |
glowplug | Absolutely. A programmable parallel instruction pipeline coprocessor. | 03:54 |
glowplug | It gets requests for complex instructions, wires itself to execute those in as few cycles as possible, completes them, and outputs to a shared memory. | 03:54 |
glowplug | Everything "simple" would be done traditionally by the RISC CPU. | 03:55 |
glowplug | And as far as I can tell such a co-prosessor would still qualify as being RISC because it doesn't operate on complex instructions but runs many hardwired reconfigurable parallel simple instructions for a given logic. | 03:56 |
glowplug | The EMMA-2 is an extreme example though since it impliments all operations even add, sub in microcode. | 03:59 |
glowplug | I don't think thats really necessary. | 04:00 |
glowplug | At any rate I will do a ton more reading. I'm extremely interested in the usage of programmable machine instructions. =) | 04:11 |
glowplug | On a somewhat related note. Have you seen this project? | 04:12 |
glowplug | http://panda.dei.polimi.it/?page_id=31 | 04:12 |
mor1kx | [mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/d10778603ac39e37f694c232e0d25e11a6545e56 | 04:15 |
mor1kx | mor1kx/master d107786 Stefan Kristiansson: ctrl: avoid using blocking assignment in a synchronous block... | 04:15 |
stekern | no, I'll take a look at ti | 04:16 |
glowplug | It is apparently a very old project but the first release was 2012. It looks like it translates regular C into RTL. Including high level constructs. | 04:17 |
glowplug | Not really sure if I dig their logo though. O.o | 04:20 |
glowplug | So the 4-stage pipeline that your using is technically Superscalar correct? That's what you were reffering to earlier? | 04:44 |
stekern | glowplug: which 4-stage pipeline? | 04:53 |
stekern | all our implementations are single issue | 04:54 |
glowplug | It looks like I was way way off on the wrong course with microcode. The only advantage it gives is that if you have a physical CPU bug you can fix it (i.e. only applies to ASIC chips). | 04:59 |
glowplug | So you are currently avoiding a superscalar implimentation because of the pitfalls you mentioned earlier (hazards ect.). | 05:00 |
stekern | that's a problem you would have to solve if you'd implemented a mulit-issue pipeline, yes | 05:01 |
glowplug | Without knowing it I was trying to describe a microprogrammed superscalar cpu... which would apparently be extremely bad. Haha | 05:04 |
stekern | what you described was a pipelined co-processor tightly coupled with the main cpu | 05:05 |
stekern | that doesn't make it superscalar | 05:06 |
glowplug | I think thats what I was shooting for though. Was instruction level parallelism ontop of being machine code programmable. | 05:06 |
glowplug | *A multi issue pipeline | 05:06 |
stekern | for it to be multi-issue you would need to feed it two (or more) instructions at the same time | 05:08 |
glowplug | Exactly! | 05:08 |
stekern | that's not what you described | 05:08 |
stekern | but maybe what you meant ;) | 05:08 |
glowplug | Its deffinately what I meant. =) | 05:08 |
glowplug | To push many simple instructions into a multi issue pipeline. | 05:09 |
stekern | but then the main cpu kind of would need to be multi-issue as well | 05:09 |
glowplug | With the control logic of those instructions being programmed by microcode. | 05:09 |
glowplug | It would because there would be really no way of preventing hazards in such a system. | 05:10 |
stekern | just a note, you have hazards in a single-issue pipeline as well | 05:11 |
glowplug | It would just be exacerbated by being multi-issue. To an extreme degree of complexity. | 05:11 |
glowplug | You would think that increasing serial performance would be relatively easy. Not so! Haha | 05:12 |
glowplug | I'm off to bed. 8) | 05:16 |
stekern | I'm off to work =) | 05:17 |
stekern | juliusb: I can probably figure this out myself, but if you know from the top of your head what has made doing 'make or1200-mmu.dis' in the sim/or1200 directory broken, give me a headsup | 11:10 |
stekern | ../../../lib/include/cpu-utils.h:5:27: fatal error: mor1kx-utils.h: No such file or directory # include "mor1kx-utils.h" | 11:13 |
stekern | is the error | 11:13 |
-!- stekern_ is now known as stekern | 12:46 | |
juliusb | stekern: hmmm, not sure. If yuo're building in mor1kx-dev-env then I'm not 100% what is goin on there. I do recall changing a lot of stuff to try and make it processor agnostic | 13:57 |
juliusb | so you'll see cpu-utils.h pulling in mor1kx or or1200-utils.h depending on what is defined | 13:58 |
juliusb | stekern: I think It's a mistake on my part | 14:54 |
juliusb | in sw/lib/include/cpu-utils.h we have the #if OPENRISC_CPU_TYPE==mor1kx | 14:54 |
juliusb | but OPENRISC_CPU_TYPE isn't defined anywhere | 14:54 |
juliusb | but we add -DOPENRISC_CPU_DRIVER=$(OPENRISC_CPU_DRIVER) to CFLAGS in the sw/Makefile.inc file | 14:56 |
juliusb | so cpu-utils.h should probably test OPENRISC_CPU_DRIVER not OPENRISC_CPU_TYPE | 14:56 |
juliusb | I'm not sure how that ever worked! | 14:56 |
juliusb | clearly it's not relied upon much :P | 14:56 |
juliusb | this is cool: http://pastie.org/7152651 | 17:04 |
juliusb | a gcc test | 17:04 |
juliusb | but it does a trampoline | 17:05 |
juliusb | it puts some code in stack space | 17:05 |
juliusb | and executes it | 17:05 |
juliusb | so, somwehre in our GCC port it knows how to mangle together a little set of instructions with their immediates to branch somewhere | 17:05 |
juliusb | I think, at least | 17:05 |
juliusb | oh that was compiled with delay slots | 17:08 |
juliusb | (obviously) | 17:08 |
juliusb | ok i'm off for the weekend | 17:08 |
juliusb | will be lurking via the log webpage :) | 17:08 |
-!- asm_ is now known as asm | 18:59 | |
--- Log closed Fri Mar 29 00:00:51 2013 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!