IRC logs for #openrisc Thursday, 2013-03-28

--- Log opened Thu Mar 28 00:00:50 2013
stekernglowplug: no, RISC arcitectures seldom do03:16
glowplugInteresting.  I've been doing nonstop research into various processor architectures.  Very fascinating stuff.03:19
glowplugI wonder if microcode is an easy-ish method for obtaining more single-threaded performance without deviating from a small simple instruction set.03:19
glowplugIf the goal is to have every instruction complete in one cycle then microcode would allow complex operations to execute in one cycle but not be hardwired instructions.03:26
stekernhmm, if they would complete in one cycle, then there would not be much microcode in there to sequentially execute, no?03:28
stekernor what did you mean?03:28
glowplugFor example there is no way you guys are going to hardwire the Sin operation into the CPU.  That is not RISC like at all.03:29
glowplugBut microprogramming would allow a Sin operation to execute in a single cycle.03:30
stekernfirst sentence: I agree.03:30
glowplugInstruction level parallelism.  I think is what I'm describing.03:31
stekernsecond sentence: doesn't make sense03:31
stekernif a microprogram only takes one cycle to execute, then it's just one instruction = hardwired03:32
glowplugOne cycle to reprogram the core with every subsequent operation taking one cycle to complete.03:33
glowplugAn initial delay followed by a series of parallel instruction operations that completes a more complex operation in a single cycle.03:33
stekernbut there are already instructions that take more than one cycle to execute, mul, div etc03:34
glowplugHave you guys thought about trying to get those down to 1-cycle?03:34
stekern1-cycle div is going to be _huge_03:34
glowplugBut consider this.  If you have a general parallel instruction pipeline that can execute via microcode you could load div microcode in one cycle, followed by every subsequent div operation completing in just 1 cycle.03:36
glowplugThat same space could be shared by various other complex operations.  Like multiply, sin ect.03:36
stekernone thing not so nice with or1k is that div and mul doesn't have seperate registers (mips for example does)03:36
glowplugSerialized single core applications would execute very quickly (Linux, general apps ect.).03:37
stekernglowplug: yes, of course, that's what pipelining is all about, but you have to consider hazards03:37
glowplugAren't hazards more or less avoidable with instruction level parallelism VS other higher levels of parrallelism?03:38
stekernI think I'm just getting confused of your use of the word 'microcode', you are basically just speaking about serial operations. Just like we already do for div03:38
stekernand about not stalling the pipeline when executing them03:39
glowplugFrom what I understand microcode is a set of machine instructions stored in CPU memory that is loaded on demand to program gate logic.03:39
glowplugFor example the PS2 graphics engine has a co-processor that loads microcode many times per frame to process different instructions in less cycles until different code is loaded for other operations.03:40
glowplugWith 3 CPU's it outperformed desktop GPU's with 48+.03:40
glowplugEven more interestingly is that those cores could process normal instructions, they were not dedicated graphics pipelines.03:41
stekernif you need the result from the serialized operation in the instruction after it, then you can't execute that instruction03:41
stekernthat's a hazard03:41
glowplugSo the entire parallel instruction unit would be frozen with the main CPU forced to process the current operation then the parallel unit could proceed with the data it needs.03:44
glowplugHere is an example of what I'm describing (sort of).03:46
glowplughttp://www.icsa.inf.ed.ac.uk/cgi-bin/hase/emma-m.pl?arch2-t.html,arch2-f.html,menu2.html03:46
stekernyes03:46
stekernbut it still have nothing to do with microcode vs not microcode03:46
glowplugThey claim here that mul,div are still multicycle but I'm pretty sure those operations are accelerated and that theoretically single cycle is possible.03:46
stekernit's perfectly normal to do what you are decribing (for for example mul and div) in a RISC processor, the very first MIPS machines did it03:48
stekernbut they still didn't use any microcode03:48
stekern(and still do)03:48
stekernMIPS have seperate registers for mul operations, thus making the hazard logic more lightweight03:49
glowplugAbsolutely I realize that RISC CPU's traditionally have hardwired mul and div logic.  What I'm describing is more like the link above where instead of hardwiring those operations they are programmable.03:50
glowplugWhich means anything could be placed in that "hardwired" space and benefit from the speedup without greatly increasing gate count.03:51
stekerni.e. the mul unit works in parallell with all other instructions, it only stalls the pipeline if the program is trying to read the mul instruction before the mul has completed03:51
stekernglowplug: yeah, that could make sense03:52
glowplugI see what you are saying also.  If you have one programmable parallel instruction pipeline and its set to div and you receive another complex operation you have to wait a cycle for programming.03:53
stekernso you basically have a co-processor that is tightly coupled with the cpu03:53
glowplugMaybe that is why the PS2 core has 3 CPU's.03:53
glowplugAbsolutely.  A programmable parallel instruction pipeline coprocessor.03:54
glowplugIt gets requests for complex instructions, wires itself to execute those in as few cycles as possible, completes them, and outputs to a shared memory.03:54
glowplugEverything "simple" would be done traditionally by the RISC CPU.03:55
glowplugAnd as far as I can tell such a co-prosessor would still qualify as being RISC because it doesn't operate on complex instructions but runs many hardwired reconfigurable parallel simple instructions for a given logic.03:56
glowplugThe EMMA-2 is an extreme example though since it impliments all operations even add, sub in microcode.03:59
glowplugI don't think thats really necessary.04:00
glowplugAt any rate I will do a ton more reading.  I'm extremely interested in the usage of programmable machine instructions.  =)04:11
glowplugOn a somewhat related note. Have you seen this project?04:12
glowplughttp://panda.dei.polimi.it/?page_id=3104:12
mor1kx[mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/d10778603ac39e37f694c232e0d25e11a6545e5604:15
mor1kxmor1kx/master d107786 Stefan Kristiansson: ctrl: avoid using blocking assignment in a synchronous block...04:15
stekernno, I'll take a look at ti04:16
glowplugIt is apparently a very old project but the first release was 2012.  It looks like it translates regular C into RTL.  Including high level constructs.04:17
glowplugNot really sure if I dig their logo though.  O.o04:20
glowplugSo the 4-stage pipeline that your using is technically Superscalar correct?  That's what you were reffering to earlier?04:44
stekernglowplug: which 4-stage pipeline?04:53
stekernall our implementations are single issue04:54
glowplugIt looks like I was way way off on the wrong course with microcode.  The only advantage it gives is that if you have a physical CPU bug you can fix it (i.e. only applies to ASIC chips).04:59
glowplugSo you are currently avoiding a superscalar implimentation because of the pitfalls you mentioned earlier (hazards ect.).05:00
stekernthat's a problem you would have to solve if you'd implemented a mulit-issue pipeline, yes05:01
glowplugWithout knowing it I was trying to describe a microprogrammed superscalar cpu... which would apparently be extremely bad.  Haha05:04
stekernwhat you described was a pipelined co-processor tightly coupled with the main cpu05:05
stekernthat doesn't make it superscalar05:06
glowplugI think thats what I was shooting for though.  Was instruction level parallelism ontop of being machine code programmable.05:06
glowplug*A multi issue pipeline05:06
stekernfor it to be multi-issue you would need to feed it two (or more) instructions at the same time05:08
glowplugExactly!05:08
stekernthat's not what you described05:08
stekernbut maybe what you meant ;)05:08
glowplugIts deffinately what I meant.  =)05:08
glowplugTo push many simple instructions into a multi issue pipeline.05:09
stekernbut then the main cpu kind of would need to be multi-issue as well05:09
glowplugWith the control logic of those instructions being programmed by microcode.05:09
glowplugIt would because there would be really no way of preventing hazards in such a system.05:10
stekernjust a note, you have hazards in a single-issue pipeline as well05:11
glowplugIt would just be exacerbated by being multi-issue. To an extreme degree of complexity.05:11
glowplugYou would think that increasing serial performance would be relatively easy.  Not so!  Haha05:12
glowplugI'm off to bed.  8)05:16
stekernI'm off to work =)05:17
stekernjuliusb: I can probably figure this out myself, but if you know from the top of your head what has made doing 'make or1200-mmu.dis' in the sim/or1200 directory broken, give me a headsup11:10
stekern../../../lib/include/cpu-utils.h:5:27: fatal error: mor1kx-utils.h: No such file or directory # include "mor1kx-utils.h"11:13
stekernis the error11:13
-!- stekern_ is now known as stekern12:46
juliusbstekern: hmmm, not sure. If yuo're building in mor1kx-dev-env then I'm not 100% what is goin on there. I do recall changing a lot of stuff to try and make it processor agnostic13:57
juliusbso you'll see cpu-utils.h pulling in mor1kx or or1200-utils.h depending on what is defined13:58
juliusbstekern: I think It's a mistake on my part14:54
juliusbin sw/lib/include/cpu-utils.h we have the #if OPENRISC_CPU_TYPE==mor1kx 14:54
juliusbbut OPENRISC_CPU_TYPE isn't defined anywhere14:54
juliusbbut we add -DOPENRISC_CPU_DRIVER=$(OPENRISC_CPU_DRIVER) to CFLAGS in the sw/Makefile.inc file14:56
juliusbso cpu-utils.h should probably test OPENRISC_CPU_DRIVER not OPENRISC_CPU_TYPE14:56
juliusbI'm not sure how that ever worked!14:56
juliusbclearly it's not relied upon much :P14:56
juliusbthis is cool: http://pastie.org/715265117:04
juliusba gcc test17:04
juliusbbut it does a trampoline17:05
juliusbit puts some code in stack space17:05
juliusband executes it17:05
juliusbso, somwehre in our GCC port it knows how to mangle together a little set of instructions with their immediates to branch somewhere17:05
juliusbI think, at least17:05
juliusboh that was compiled with delay slots17:08
juliusb(obviously)17:08
juliusbok i'm off for the weekend17:08
juliusbwill be lurking via the log webpage :)17:08
-!- asm_ is now known as asm18:59
--- Log closed Fri Mar 29 00:00:51 2013

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!