IRC logs for #openrisc Saturday, 2016-07-16

--- Log opened Sat Jul 16 00:00:41 2016
kc5tja	ZipCPU: I think I fixed the ISE problem. Looks like the distribution was missing a libQt_Network.so dependency.	00:29
kc5tja	Crap! It still doesn't launch from the GUI though. :(	00:31
kc5tja	This is really quite frustrating.	00:32
kc5tja	AH HA!! I had to source a shell script before launching the ISE editor. Would have been nice if Xilinx had told me this before!!	00:36
ZipCPU\|Laptop	kc5jta: Okay, that one I could've told you. I've got a script I use to start up ISE and that's most (all) of the script: run the shell script, then ISE.	08:13
ZipCPU	Okay, so this is crazy--I've debugged computers for more than 30 years, and I've never seen this bug pattern before:	08:58
ZipCPU	1) Sometimes the function prolog allocates space on the stack, sometimes it doesn't.	08:58
ZipCPU	2) A small irrelevant change to the program can keep this from happening. (A heisenbug!)	08:59
ZipCPU	3) If you load a program filled with nothing but NOOP's, the problem is guaranteed. (as long as you don't make the irrelevant change ...)	08:59
ZipCPU	At this point, I think I have a cache bug--where the NOOP (the last program) is getting run rather than the new one.	08:59
ZipCPU	I've just never seen this pattern before.	08:59
Laksen	Anyone here know of any nice pretty generic instruction fetchers written in verilog?	16:12
ZipCPU\|Laptop	Laksen: I just finished a lot of work on a fairly generic instruction fetcher, written in Verilog.	16:37
ZipCPU\|Laptop	I'm not sure if it is "nice" and "pretty" enough for you, but it is (currently) fully functional.	16:38
ZipCPU\|Laptop	Barring the last few modifications, 1) it can run with a 200MHz clock on an Artix-7, 2) it combines the cache with the instruction fetch, and so 3) it can support early branching with only a single stall cycle when jumping to somewhere in the cache.	16:40
Laksen	Functional is a lot better than not functional :)	16:40
Laksen	Can I have a look?	16:40
ZipCPU\|Laptop	You can find it in the xula25soc project on open cores, I just checked my work in. The particular fetch you are looking for can be found in trunk/rtl/pfcache.v	16:40
ZipCPU\|Laptop	Oops ... better make that trunk/rtl/cpu/pfcache.v.	16:41
ZipCPU\|Laptop	There's a less traditional pre-fetch cache in there as well, representing my first attempt at building such. That one is called 'pipefetch.v'.	16:41
ZipCPU\|Laptop	Pipefetch works by trying to maintain a window in memory around the program counter. Jumps outside the window reset the window starting from the new location.	16:42
ZipCPU\|Laptop	Needless to say, I abandoned pipefetch for the better performance of pfcache, still ... it's a unique approach.	16:43
Laksen	I'll have a look. I was thinking of a similar approach, but I just want to get the stuff running for now :)	16:44
ZipCPU\|Laptop	Are you using a wishbone bus?	16:45
Laksen	No, AXI	16:45
ZipCPU\|Laptop	Well that will be one difference.	16:45
ZipCPU\|Laptop	Another may have to do with instruction width. This prefetch cache was designed for 32-bit instructions.	16:45
Laksen	A small adapter should be fine. I dont' care too much about the latency	16:46
Laksen	It's for a bog standard risc-v so 32bit is perfect	16:46
ZipCPU\|Laptop	You ... <GASP> ... don't care about <gulp> latency? ;) This whole approach was built to cut my latency down. <Grin>	16:46
Laksen	At some point it becomes a concern, but this is just a fun vacation project to investigate extreme pipelining :P	16:47
ZipCPU\|Laptop	Really? Sounds cool! ... how extreme are we talking about?	16:48
Laksen	Got my ALU ready which can almost run at >500 MHz on a Artix 7	16:48
Laksen	64 bit	16:48
ZipCPU\|Laptop	Gosh, it took me a bit to get my 32-bit ALU able to run at 200MHz on an Artix-7--and you are headed for 500MHz??	16:49
Laksen	It synthesizes at 480 MHz where all the IO are tied directly to IOB's (giving an extra 0.8 ns delay)	16:50
Laksen	8 pipeline cycles though... so bad code will not be fast at all	16:50
Laksen	200 MHz, is that a single cycle pipeline?	16:51
ZipCPU\|Laptop	Laksen: Sorry to run off so quickly and unannounced--the dogs blessed the floor, and the basement staircase started flooding, and ...	17:17
ZipCPU\|Laptop	Life is now good again.	17:17
ZipCPU\|Laptop	200MHz is not a single cycle pipeline. 200MHz was going to be a 9-stage pipeline. How you get up from that speed to 400+MHz I don't know.	17:18
ZipCPU\|Laptop	This is my first attempt at a "high speed" FPGA design, so ... I'm learning a lot in the process about what high speed requires.	17:18
Laksen	Ah okay	17:19
Laksen	The ALU alone in my design is 8 stages. So in the end it'll probably be 8+fetch+decode+opfetch+mem(n)	17:20
ZipCPU\|Laptop	Okay, so I'm two stages for the ALU, unless the instruction requires a multiply--that will take longer.	17:21
Laksen	Each ALU stage does an 9 bit add, and single shift. Besides that I've spread out all the different logic operations over the different alu stages	17:21
ZipCPU\|Laptop	How are you handling pipeline conflict detection?	17:21
ZipCPU\|Laptop	Sorry, "pipeline hazard" detection--just remembered the proper term.	17:21
Laksen	I keep a tally by orring onehots of all output registers in flight. Any that conflict will stall the pipeline. So nothing fancy	17:22
Laksen	Simple forwarind for the end of the alu stage	17:22
ZipCPU\|Laptop	What if two instructions both use the same register as an output, but no inputs use that register?	17:23
Laksen	No problem in that case	17:23
Laksen	Oh wait. That's actually a problem I don't handle	17:23
Laksen	Thanks for asking :P	17:23
ZipCPU\|Laptop	Sure! That's one of the approaches I have been considering, and the problem I mentioned is one I'm ... struggling with.	17:25
Laksen	I've been dreaming many years of solving this problem programmatically	17:26
ZipCPU\|Laptop	You mean in software?? As in, in the compiler?	17:26
Laksen	Doing dynamic compilation of a binary into RTL, specifically for processing pipelines	17:27
ZipCPU\|Laptop	By "dynamic compilation", are you referring to instruction reordering inside the CPU?	17:28
Laksen	Basically write a program in a highlevel language that describes all paths through a CPU, and then execute that program symbolically	17:28
Laksen	Where you create a bunch of mappings between registers and IO, memory and register ports	17:28
ZipCPU\|Laptop	I'm not sure I follow ...	17:29
ZipCPU\|Laptop	Is there a paper describing your approach?	17:29
Laksen	Let me find an example	17:29
Laksen	No	17:29
Laksen	It's a novel methodology but I worked with this a lot on my master thesis, just in the wrong direction :)	17:30
ZipCPU\|Laptop	Are you working from within Academia?	17:31
Laksen	Not any longer	17:31
Laksen	This is just sparetime work :)	17:31
Laksen	Here's an example: http://pastebin.com/4mg5jBRt	17:31
Laksen	It might help the understanding that this is a basic RISC-V emulator	17:32
Laksen	The language it's written in doesn't matter. In fact this is written for a pascal compiler that compiles to Risc-V	17:33
Laksen	But that doesn't matter	17:33
Laksen	All that matters is that it's symbolically executed	17:33
Laksen	The code in the bottom is the initialization. It starts up a clocked task that's assumed to run once per clock	17:33
Laksen	And finish at some point	17:33
Laksen	Memories(2D) and registers(1D) are created before that	17:34
Laksen	Memories and registers can be accessed by reads or writes	17:34
Laksen	At a low level in the symbolic execution those are performed by system calls, so they are easy to figure out	17:35
Laksen	Conditional branches are used to propagate information about when those are performed	17:36
ZipCPU\|Laptop	Okay, so ... if this is a basic emulator, ... why would you need a Verilog prefetch?	17:36
ZipCPU\|Laptop	(Just curious ...)	17:36
Laksen	So for example register storages have an attached condition based on the path through the program that store took.	17:36
Laksen	Ohh. This is an entirely different project :P	17:36
Laksen	Sorry, just spilling my brain here :P	17:37
ZipCPU\|Laptop	Oh ... Ok. You had me confused.	17:37
ZipCPU\|Laptop	Something about a "RISC-V emulator" and "> 400 MHz" just ... didn't quite add up. ;)	17:38
Laksen	Well I get too enthusiatic about dynamic recompilation and automatic pipeline construction somtimes :\|	17:38
Laksen	But the pipeline is real though, very simple :) http://pastebin.com/6K8761tu	17:39
Laksen	Don't know yet what the registerfile accesses will be, but I think it can run far above 500 MHz if those don't slow it down	17:40
ZipCPU\|Laptop	On an FPGA, or in dedicated (ASIC) hardware?	17:41
Laksen	Aiming for Artix 7	17:41
ZipCPU\|Laptop	Will you publish your results anywhere?	17:41
kc5tja	Meanwhile, I'm having an impossible condition: a boolean expression where all inputs are well defined, yet Verilog insists the result is 'x'. >:(	17:42
Laksen	Sure	17:42
ZipCPU\|Laptop	I'd love to read about it.	17:42
ZipCPU\|Laptop	Hello, kc5tja, welcome back.	17:42
ZipCPU\|Laptop	kc5tja: Have you tried running your code through Verilator?	17:42
Laksen	Or XST. IVerilog and Yosys both accepted my old code, but the xilinx synthesizer threw a synthesis time error	17:43
kc5tja	No, largely because Verilator confuses me to no end.	17:43
ZipCPU\|Laptop	To Verilate, just do "verilator -cc toplevelverilog.v".	17:44
ZipCPU\|Laptop	I'm not going to recommended necessarily going farther than that, but Verilator does include some tremendous code checking capabilities, that have found bugs ISE and Vivado have let slip.	17:45
Laksen	ZipCPU\|Laptop, by the way, which WB interface is your pfcache using?	17:45
Laksen	B3/B4 pipeline/no pipeline?	17:46
ZipCPU\|Laptop	B4, pipelined.	17:46
ZipCPU\|Laptop	You gotta do pipelined--that way you get one access per clock. Otherwise, you've crippled your bus.	17:46
ZipCPU\|Laptop	Just ... let the user beware ... you can't cross devices.	17:46
Laksen	I agree, but I got to say I like the crispiness of AXI a lot more	17:47
Laksen	There are too many loose ends in Wishbone :/	17:48
ZipCPU\|Laptop	I haven't used AXI that much. How is it better (worse)?	17:48
Laksen	In AXI it's always pipelined	17:48
Laksen	The transactions are so easy to understand, because it's all built on handshaking on 5 channels	17:48
Laksen	Bursts are optional, but are handled precisely the same. Transactions are layered on top	17:49
kc5tja	Verilator won't even compile my code; I'm apparently much too modern for it at Verilog 1995.	17:49
ZipCPU\|Laptop	kc5tja: Not likely. You might wish to take a closer look at what it complains about.	17:49
Laksen	Can you pastebin the problematic code?	17:49
ZipCPU\|Laptop	I'd love to take a look myself.	17:50
kc5tja	It tells me quite explicitly that Verilog 1995 keyword is not supported. :)	17:54
kc5tja	In this case, wait().	17:54
kc5tja	https://gist.github.com/sam-falvo/71139ddfc4e9b80c47e3fcce18e1f500	17:56
Laksen	Why not just do a @(posedge clk_o); @(negedge clo_o);	17:58
Laksen	Never heard about the wait keyword before	17:58
ZipCPU\|Laptop	Is it synthesizable Verilog?	17:59
Laksen	No	17:59
Laksen	Or maybe the problem is that x is non-zero	17:59
kc5tja	Now Verilator tells me unexpected @.	17:59
Laksen	So the condition will always be true after startup	17:59
kc5tja	x means 'undefined' or 'unknown.'	18:00
Laksen	@(posedge clk_o); should be a perfectly valid statement	18:00
kc5tja	Which is hogwash, since all of the term's inputs are well defined.	18:00
kc5tja	Nope. Verilator doesn't like it.	18:00
kc5tja	No change in behavior in iverilog.	18:01
ZipCPU\|Laptop	"always @(posedge clk_o) story_o <= story;" is what you want.	18:01
Laksen	Not really	18:01
ZipCPU\|Laptop	No?	18:01
Laksen	It should work just fine as is	18:01
Laksen	I use that stuff all the time	18:01
kc5tja	I was hoping to avoid this, but I think I need to throw this into Xilinx ISE to see what it thinks, and let me run a simulation there.	18:02
Laksen	Ah	18:03
Laksen	You have a bunch of errors on line 55-60	18:03
Laksen	Iverilog complains about those	18:04
ZipCPU\|Laptop	Some parentheses would fix those easily.	18:04
kc5tja	My version of iverilog does not.	18:04
Laksen	No	18:04
Laksen	state_o doesn't exist in the file	18:04
Laksen	Implicit declaration	18:05
kc5tja	What options do you provide to make iverilog detect these errors? Mine literally is silent about them.	18:05
Laksen	I use a compiled version from the source repository	18:05
kc5tja	I'm at 0.9.7	18:06
Laksen	I'm at 11.0 (devel)	18:06
Laksen	I can't remember why I needed the upgrade, but it's way better	18:07
Laksen	Supports Verilog 2012 even	18:07
kc5tja	Thank you!	18:07
Laksen	Oh right. It was because it had support for the $fatal function	18:07
kc5tja	I passed (on a whim) -Wall and it found the defect.	18:07
Laksen	Very nice for makefile testbenches :)	18:07
kc5tja	OK, I got basic instruction fetching implemented.	20:39
kc5tja	Next step, illegal instruction trap.	20:39
kc5tja	Took longer than I expected; but, it at least is working and my basic design is known to not be fantasy.	20:40
kc5tja	That was easier than I'd ever expected.	21:12
kc5tja	Well, that's quite frustrating.	21:55
kc5tja	iverilog needs qualification for a module's ports (e.g., input foo; wire foo;), while Xilinx will treat this as an error.	21:55
ZipCPU	olofk: If you are interested in a TCP version of a simulated UART, my code is posted in OpenCores, xula25soc, trunk/bench/cpp. You'll want the two files, uartsim.cpp and uartsim.h.	22:45
ZipCPU	They'll take as inputs the UART transmit from the FPGA, and send the results to a TCP port (if anyone's connected to it). Characters sent on that port to the simulator will be turned into UART wires on the receive, and so it works.	22:46
ZipCPU	The only minor difficulty might be the form of the setup word--telling it the baud rate, number of bits per symbol, parity information, etc.	22:46
--- Log closed Sun Jul 17 00:00:42 2016

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!