IRC logs for #openrisc Thursday, 2013-03-28

--- Log opened Thu Mar 28 00:00:50 2013
stekern	glowplug: no, RISC arcitectures seldom do	03:16
glowplug	Interesting. I've been doing nonstop research into various processor architectures. Very fascinating stuff.	03:19
glowplug	I wonder if microcode is an easy-ish method for obtaining more single-threaded performance without deviating from a small simple instruction set.	03:19
glowplug	If the goal is to have every instruction complete in one cycle then microcode would allow complex operations to execute in one cycle but not be hardwired instructions.	03:26
stekern	hmm, if they would complete in one cycle, then there would not be much microcode in there to sequentially execute, no?	03:28
stekern	or what did you mean?	03:28
glowplug	For example there is no way you guys are going to hardwire the Sin operation into the CPU. That is not RISC like at all.	03:29
glowplug	But microprogramming would allow a Sin operation to execute in a single cycle.	03:30
stekern	first sentence: I agree.	03:30
glowplug	Instruction level parallelism. I think is what I'm describing.	03:31
stekern	second sentence: doesn't make sense	03:31
stekern	if a microprogram only takes one cycle to execute, then it's just one instruction = hardwired	03:32
glowplug	One cycle to reprogram the core with every subsequent operation taking one cycle to complete.	03:33
glowplug	An initial delay followed by a series of parallel instruction operations that completes a more complex operation in a single cycle.	03:33
stekern	but there are already instructions that take more than one cycle to execute, mul, div etc	03:34
glowplug	Have you guys thought about trying to get those down to 1-cycle?	03:34
stekern	1-cycle div is going to be _huge_	03:34
glowplug	But consider this. If you have a general parallel instruction pipeline that can execute via microcode you could load div microcode in one cycle, followed by every subsequent div operation completing in just 1 cycle.	03:36
glowplug	That same space could be shared by various other complex operations. Like multiply, sin ect.	03:36
stekern	one thing not so nice with or1k is that div and mul doesn't have seperate registers (mips for example does)	03:36
glowplug	Serialized single core applications would execute very quickly (Linux, general apps ect.).	03:37
stekern	glowplug: yes, of course, that's what pipelining is all about, but you have to consider hazards	03:37
glowplug	Aren't hazards more or less avoidable with instruction level parallelism VS other higher levels of parrallelism?	03:38
stekern	I think I'm just getting confused of your use of the word 'microcode', you are basically just speaking about serial operations. Just like we already do for div	03:38
stekern	and about not stalling the pipeline when executing them	03:39
glowplug	From what I understand microcode is a set of machine instructions stored in CPU memory that is loaded on demand to program gate logic.	03:39
glowplug	For example the PS2 graphics engine has a co-processor that loads microcode many times per frame to process different instructions in less cycles until different code is loaded for other operations.	03:40
glowplug	With 3 CPU's it outperformed desktop GPU's with 48+.	03:40
glowplug	Even more interestingly is that those cores could process normal instructions, they were not dedicated graphics pipelines.	03:41
stekern	if you need the result from the serialized operation in the instruction after it, then you can't execute that instruction	03:41
stekern	that's a hazard	03:41
glowplug	So the entire parallel instruction unit would be frozen with the main CPU forced to process the current operation then the parallel unit could proceed with the data it needs.	03:44
glowplug	Here is an example of what I'm describing (sort of).	03:46
glowplug	http://www.icsa.inf.ed.ac.uk/cgi-bin/hase/emma-m.pl?arch2-t.html,arch2-f.html,menu2.html	03:46
stekern	yes	03:46
stekern	but it still have nothing to do with microcode vs not microcode	03:46
glowplug	They claim here that mul,div are still multicycle but I'm pretty sure those operations are accelerated and that theoretically single cycle is possible.	03:46
stekern	it's perfectly normal to do what you are decribing (for for example mul and div) in a RISC processor, the very first MIPS machines did it	03:48
stekern	but they still didn't use any microcode	03:48
stekern	(and still do)	03:48
stekern	MIPS have seperate registers for mul operations, thus making the hazard logic more lightweight	03:49
glowplug	Absolutely I realize that RISC CPU's traditionally have hardwired mul and div logic. What I'm describing is more like the link above where instead of hardwiring those operations they are programmable.	03:50
glowplug	Which means anything could be placed in that "hardwired" space and benefit from the speedup without greatly increasing gate count.	03:51
stekern	i.e. the mul unit works in parallell with all other instructions, it only stalls the pipeline if the program is trying to read the mul instruction before the mul has completed	03:51
stekern	glowplug: yeah, that could make sense	03:52
glowplug	I see what you are saying also. If you have one programmable parallel instruction pipeline and its set to div and you receive another complex operation you have to wait a cycle for programming.	03:53
stekern	so you basically have a co-processor that is tightly coupled with the cpu	03:53
glowplug	Maybe that is why the PS2 core has 3 CPU's.	03:53
glowplug	Absolutely. A programmable parallel instruction pipeline coprocessor.	03:54
glowplug	It gets requests for complex instructions, wires itself to execute those in as few cycles as possible, completes them, and outputs to a shared memory.	03:54
glowplug	Everything "simple" would be done traditionally by the RISC CPU.	03:55
glowplug	And as far as I can tell such a co-prosessor would still qualify as being RISC because it doesn't operate on complex instructions but runs many hardwired reconfigurable parallel simple instructions for a given logic.	03:56
glowplug	The EMMA-2 is an extreme example though since it impliments all operations even add, sub in microcode.	03:59
glowplug	I don't think thats really necessary.	04:00
glowplug	At any rate I will do a ton more reading. I'm extremely interested in the usage of programmable machine instructions. =)	04:11
glowplug	On a somewhat related note. Have you seen this project?	04:12
glowplug	http://panda.dei.polimi.it/?page_id=31	04:12
mor1kx	[mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/d10778603ac39e37f694c232e0d25e11a6545e56	04:15
mor1kx	mor1kx/master d107786 Stefan Kristiansson: ctrl: avoid using blocking assignment in a synchronous block...	04:15
stekern	no, I'll take a look at ti	04:16
glowplug	It is apparently a very old project but the first release was 2012. It looks like it translates regular C into RTL. Including high level constructs.	04:17
glowplug	Not really sure if I dig their logo though. O.o	04:20
glowplug	So the 4-stage pipeline that your using is technically Superscalar correct? That's what you were reffering to earlier?	04:44
stekern	glowplug: which 4-stage pipeline?	04:53
stekern	all our implementations are single issue	04:54
glowplug	It looks like I was way way off on the wrong course with microcode. The only advantage it gives is that if you have a physical CPU bug you can fix it (i.e. only applies to ASIC chips).	04:59
glowplug	So you are currently avoiding a superscalar implimentation because of the pitfalls you mentioned earlier (hazards ect.).	05:00
stekern	that's a problem you would have to solve if you'd implemented a mulit-issue pipeline, yes	05:01
glowplug	Without knowing it I was trying to describe a microprogrammed superscalar cpu... which would apparently be extremely bad. Haha	05:04
stekern	what you described was a pipelined co-processor tightly coupled with the main cpu	05:05
stekern	that doesn't make it superscalar	05:06
glowplug	I think thats what I was shooting for though. Was instruction level parallelism ontop of being machine code programmable.	05:06
glowplug	*A multi issue pipeline	05:06
stekern	for it to be multi-issue you would need to feed it two (or more) instructions at the same time	05:08
glowplug	Exactly!	05:08
stekern	that's not what you described	05:08
stekern	but maybe what you meant ;)	05:08
glowplug	Its deffinately what I meant. =)	05:08
glowplug	To push many simple instructions into a multi issue pipeline.	05:09
stekern	but then the main cpu kind of would need to be multi-issue as well	05:09
glowplug	With the control logic of those instructions being programmed by microcode.	05:09
glowplug	It would because there would be really no way of preventing hazards in such a system.	05:10
stekern	just a note, you have hazards in a single-issue pipeline as well	05:11
glowplug	It would just be exacerbated by being multi-issue. To an extreme degree of complexity.	05:11
glowplug	You would think that increasing serial performance would be relatively easy. Not so! Haha	05:12
glowplug	I'm off to bed. 8)	05:16
stekern	I'm off to work =)	05:17
stekern	juliusb: I can probably figure this out myself, but if you know from the top of your head what has made doing 'make or1200-mmu.dis' in the sim/or1200 directory broken, give me a headsup	11:10
stekern	../../../lib/include/cpu-utils.h:5:27: fatal error: mor1kx-utils.h: No such file or directory # include "mor1kx-utils.h"	11:13
stekern	is the error	11:13
-!- stekern_ is now known as stekern		12:46
juliusb	stekern: hmmm, not sure. If yuo're building in mor1kx-dev-env then I'm not 100% what is goin on there. I do recall changing a lot of stuff to try and make it processor agnostic	13:57
juliusb	so you'll see cpu-utils.h pulling in mor1kx or or1200-utils.h depending on what is defined	13:58
juliusb	stekern: I think It's a mistake on my part	14:54
juliusb	in sw/lib/include/cpu-utils.h we have the #if OPENRISC_CPU_TYPE==mor1kx	14:54
juliusb	but OPENRISC_CPU_TYPE isn't defined anywhere	14:54
juliusb	but we add -DOPENRISC_CPU_DRIVER=$(OPENRISC_CPU_DRIVER) to CFLAGS in the sw/Makefile.inc file	14:56
juliusb	so cpu-utils.h should probably test OPENRISC_CPU_DRIVER not OPENRISC_CPU_TYPE	14:56
juliusb	I'm not sure how that ever worked!	14:56
juliusb	clearly it's not relied upon much :P	14:56
juliusb	this is cool: http://pastie.org/7152651	17:04
juliusb	a gcc test	17:04
juliusb	but it does a trampoline	17:05
juliusb	it puts some code in stack space	17:05
juliusb	and executes it	17:05
juliusb	so, somwehre in our GCC port it knows how to mangle together a little set of instructions with their immediates to branch somewhere	17:05
juliusb	I think, at least	17:05
juliusb	oh that was compiled with delay slots	17:08
juliusb	(obviously)	17:08
juliusb	ok i'm off for the weekend	17:08
juliusb	will be lurking via the log webpage :)	17:08
-!- asm_ is now known as asm		18:59
--- Log closed Fri Mar 29 00:00:51 2013

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!