--- Log opened Wed Jun 18 00:00:39 2014 | ||
stekern | is there really no way to 'go to next/prev named marker' in gtkwave? | 04:48 |
---|---|---|
-!- Netsplit *.net <-> *.split quits: stekern, heroux | 06:19 | |
-!- Netsplit over, joins: heroux, stekern | 06:24 | |
_franck__ | after a ctrl+c in or1ksim, can we continue the simulation where we stopped ? | 07:47 |
wallento | stekern: nope, EDGE was default, created a PR to change this | 08:22 |
mor1kx | [mor1kx] wallento opened pull request #13: PIC: Make LEVEL triggered default (master...master) https://github.com/openrisc/mor1kx/pull/13 | 08:22 |
wallento | bwah, whats that? :-D | 08:22 |
olofk | wallento: You didn't know that stekern works for NSA and monitors all our activity? | 08:27 |
olofk | stekern: I find it extremely awkward to navigate in gtkwave coming from modelsim | 08:38 |
stekern | wallento: ok, it should be LEVEL I think - iow, will pull that in asap ;) | 09:04 |
stekern | now when I start to think about it, what did we need the or1200-compliant version in mor1kx for in the first place? | 09:06 |
mor1kx | [mor1kx] skristiansson closed pull request #13: PIC: Make LEVEL triggered default (master...master) https://github.com/openrisc/mor1kx/pull/13 | 09:09 |
stekern | I think it's time to release a mor1kx v2 soon | 09:10 |
stekern | there's a lot of nice new feature in that, and I'd like to reserve multicore stuff for v3 | 09:12 |
stekern | hmm, I think I've fixed the bug in mor1kx for the profiling, but openocd is still unhappy. it claims the range is 0-0 | 09:19 |
stekern | _franck__: look what i found: http://sourceforge.net/p/openocd/code/ci/master/tree/src/target/target.c#l3641 | 09:29 |
_franck__ | cool, we can have our own profile function: http://sourceforge.net/p/openocd/code/ci/master/tree/src/target/target.c#l1227 | 09:32 |
stekern | yup, that'd make much more sense, since afaik, it's spec-kosher to access SPRs over the DU port even when the cpu is running | 09:34 |
stekern | (do you know if that works on or1200?) | 09:34 |
stekern | does it work on mor1kx? ;) | 09:34 |
_franck__ | I don't know (for both) | 09:35 |
stekern | I don't see why it shouldn't, but I have never tested | 09:35 |
stekern | can you manually read npc when the target isn't halted in openocd? | 09:35 |
_franck__ | I don't think so | 09:35 |
_franck__ | Its says "target not halted" | 09:35 |
stekern | right, that's what i remembered | 09:36 |
_franck__ | at least AFIR | 09:36 |
stekern | anyway, I want to make mor1kx work with the halt/resume first anyway, since it's obviously a good way to expose stall related bugs | 09:37 |
stekern | looks like this change isn't working: http://openocd.zylin.com/#/c/2168/2/src/target/openrisc/or1k.c,cm | 09:48 |
stekern | because, I get all zero's when reading from 'pc' here: http://sourceforge.net/p/openocd/code/ci/master/tree/src/target/target.c#l1812 | 09:49 |
stekern | but if I change that to npc, I get sensible values | 09:49 |
_franck__ | I suggested this hack if order to not modify the non specific code (I didn't see we can have our own profiling) | 09:56 |
_franck__ | you can try this: https://github.com/fjullien/openOCD/commit/86ec95584aa239b0bb7fe8881a6582006487d31b | 09:57 |
_franck__ | (compile tested) | 09:57 |
_franck__ | s/if/in | 09:57 |
stekern | ok, will do that when I've confirmed that my current changes work | 09:58 |
stekern | *changes to mor1kx | 09:58 |
stekern | seems like they do | 09:58 |
stekern | ...almost... | 09:59 |
juliusb | olofk: I agree, coming from Cadence Simvision, too, the GTKWave UI seems poorly arranged, but it works | 10:01 |
stekern | it works for a (longer) while, but then it ends up either in alignment or bus error exception | 10:02 |
stekern | I'm more used with gtkwave, so navigating in modelsim is painful for me | 10:03 |
juliusb | mor1kx v2 hey? does that mean we shuld drink a beer? | 10:03 |
stekern | two, since it's v2! ;) | 10:03 |
juliusb | oh, i'm looking forward to v3 already :) | 10:03 |
stekern | yeah, I figured it might be a good motivator to do releases more often | 10:04 |
juliusb | well, considering wallento and I will go and talk about it in a few weeks in Hamburg, it might be a good time | 10:04 |
juliusb | ( to do both the release and drink 2 beers!) | 10:04 |
juliusb | pity you can't join stekern :-/ | 10:04 |
stekern | yup :( | 10:05 |
juliusb | so I came across something interesting recently (despite it being around for about a year) | 10:05 |
juliusb | you guys seen the RISC-V work? | 10:05 |
juliusb | it appears somebody is doing the or2k work for us | 10:05 |
stekern | yes, but I'm not sure it's *so* interesting | 10:06 |
juliusb | just under a different name | 10:06 |
juliusb | there's a lot of detail yet to be released I think, the 16-bit ISA for instance, but they go through and first of all point out why they didn't want to use OR1K to do the project, despite it being ideologically aligned | 10:07 |
juliusb | it was only technical reasons | 10:07 |
juliusb | so, it appears they've fixed those technical things (no branch delay slot, conditional things done only by comparing register values not flags) | 10:07 |
juliusb | which, I'll grant you, isn't revolutionary, but they've basically taken OR1K, fixed it, and appear to be doing stuff | 10:08 |
juliusb | I dunno, I just thought it was interesting in the light of recent OR2K discussion (OK, it consisted of just 3 posts) | 10:09 |
stekern | yeah, but it still all boils down to yet-another-open-source-risc | 10:11 |
stekern | by that logic, or1k isn't so interesting neither. and to be fair, as an ISA it isn't. | 10:16 |
stekern | what imo opinion make or1k stand-out among the other yet-another-open-source-riscs is that the community is completely transparent, not driven by any entity, but still has quite strong software/toolchain support. | 10:18 |
* juliusb agrees | 10:18 | |
juliusb | so it turns out that they got in touch with me to ask me exactly that - why or1k works | 10:21 |
juliusb | it was a bit coincidental, I got in touch with the guy who's in charge of the doing the 16-bit ISA stuff and asked when it might come out and whether there's any Verilog in the open | 10:21 |
juliusb | and then later that day I went out for a beer with some of the bods involved here in Cambridge who seemed to be wanting to understand the experience of the OpenRISC project because it's kind-of what they want to do | 10:23 |
stekern | I think one of the keys are to scratch this kind of work-flow: "Initial versions of all of these have been developed or are under active development. This material is to be made available under open-source licenses." | 10:25 |
stekern | it's the typical cathedral type of development that most open source software projects moved away from | 10:25 |
stekern | (and that we certainly don't employ) | 10:26 |
juliusb | I agree. They are a bit stuck though, and I think it's a hard place to be - they have a team of academics who are all very talanted, and they have money to do 28nm ASICs and their plan long term, I guess, is to release everything they can in the open to allow people to download the RTL of the chips they hope to eventually provide on low-cost dev boards | 10:28 |
juliusb | but, as you say, it's a cathederal type development until that time where they do release everything | 10:28 |
juliusb | it possibly could be open right now | 10:28 |
juliusb | they weren't sure though whether they should go and be transparent from the beginning or whether after things have been achived, then they open it up | 10:29 |
juliusb | they do have chips, actually, it's all pretty far along AFAICT | 10:29 |
juliusb | but yes, it's just another mostly in-house thing which has no appeal to a collaborative community | 10:30 |
juliusb | but what is in the open is a good spec | 10:30 |
juliusb | and tool chains and sims | 10:30 |
juliusb | and is, as far as I can tell, the spiritual successor to or1k | 10:30 |
stekern | yeah, all open source hardware is of course a step forward | 10:31 |
juliusb | so, you don't need their implementations - Beyond Semi are also doing OR1K-compliant implementations but who cares right? what they did was a base of good work on which some even better work was done | 10:31 |
stekern | you're right, doing a semi-good implementation shouldn't be too hard, I did the eco32 in ~2 months | 10:32 |
juliusb | what I'm thinking is that if they put out a good spec with the 16-bit ISA and then the MMU, cache, interrupt controller specs, and it fits, then that'd be pretty interesting | 10:32 |
juliusb | because it then ticks all of the boxes of or2k | 10:33 |
juliusb | a spec unencumbered by some bloody proprietary company which is competitive | 10:33 |
juliusb | actually, one of the Cambridge guys involved did stuff with the Raspberry Pi, and I think maybe it's a response to everyone who said "ok, it's nice the board and software are open source, how about the chip now?" | 10:34 |
juliusb | anyway, another architecture raises the discussino we had at orconf about diversion of a limited resource | 10:35 |
stekern | right, but I guess if they wouldn't release it, their resources would have been tied to an in-house version | 10:37 |
juliusb | ?? | 10:39 |
juliusb | you mean if it's a proprietary spec then that's more boring than one where it's open and others can make better use of any product or chip those processors appear in? | 10:40 |
stekern | I mean, chances are that they wouldn't come running contributing to the openrisc project if they wasn't doing their own open source cpu architecture | 10:40 |
juliusb | ah sure | 10:41 |
juliusb | well, they talk about why they didn't pick OR1K for this in the spec, but they wanted something along the same lines in terms of openness and applicability to industry and academia | 10:42 |
juliusb | implementations are one thing, but tool chains, linux ports, etc. you'd be crazy to do your own one of those and not release that to the great wide world | 10:43 |
stekern | yes, but it's still funny, they've already released that, why not the rtl at the same time? | 10:52 |
stekern | I mean it's like it's some mentality that software code and rtl code should be treated differently for some reason | 10:58 |
juliusb | actually, they have released RTL | 11:04 |
juliusb | there's a synthesisable core written in some language called Chisel | 11:04 |
juliusb | it wasn't obvious to me, but apparently another guy involved with it wrote a tool which converts it into Verilog or C++ | 11:05 |
juliusb | (Verilator getting left out of the action it seems hehe) | 11:05 |
juliusb | but I'm not sure if that's the one they refer to as being synthesisable at 1.5 GHz and on par with an ARM Cortex A5 | 11:05 |
stekern | ah, ok | 11:06 |
juliusb | referred to here: http://www.hotchips.org/wp-content/uploads/hc_archives/hc25/HC25-posters/HC25.26.p70-RISC-V-Warterman-UCB.pdf | 11:06 |
juliusb | oh, I didn't find this the other day, it's the proposed 16-bit ISA extensions presented in the guy's masters: http://www.eecs.berkeley.edu/~krste/papers/waterman-ms.pdf | 11:08 |
juliusb | some lunchtime reading :) | 11:09 |
juliusb | i mean, this looks a lot like what we were doing for or2k | 11:09 |
stekern | ~krste, that looks like something that could have been my user name ;) | 11:14 |
stekern | would be interesting to hear sb0's input on Chisel | 11:20 |
stekern | since it's a contender to what he is doing | 11:21 |
juliusb | ya, I can imagine so | 12:22 |
juliusb | so one other thing I mentioned to the RISC-V guys at that meeting was FuseSoC and that it's the panacea for all your open source hardware collation needs (the paid version, at least) | 12:24 |
juliusb | I urged them not to re-invent the wheel there | 12:25 |
stekern | ah, that'd be nice if they'd use that | 12:27 |
olofk | Yes, a new ISA is one thing, but it would be nice if people got better at reusing the other stuff | 13:54 |
olofk | Will they use wishbone, AXI or something completely different? | 13:54 |
juliusb | Not sure. AXI maybe? They seemed to think the spec was freely available for use | 14:04 |
olofk | Would be interesting to know. AXI is probably a wiser choice for a modern arch | 14:05 |
olofk | About chisel and migen, there are plenty of new languages that use verilog as an intermediate format. It makes sense as it's a well-supported language, but I don't think verilog is very good if you view it as an intermediate language for autogenerated code as it's still a full-blown language | 14:07 |
olofk | I think it would make sense to choose a small subset of verilog that could be intended as a target for autogenerated code | 14:09 |
olofk | So if you are making a high-level language, you will only generate code for that verilog subset | 14:11 |
olofk | And if you're making a EDA tool (simulator/synthesis/whatever...) you would only need to implement support for that small subset | 14:11 |
olofk | It would probably make it easier for all the tools to handle code in the same way | 14:12 |
olofk | juliusb: I think axi is ok to use as long as you don't claim to be AXI-compatible | 14:15 |
juliusb | Ha | 14:16 |
ysionneau | off topic question: what would be the reason for doing a software assisted TLB? (like MIPS afaik) instead of a hardware page-tree walker | 14:16 |
ysionneau | beside design simplicity? | 14:16 |
juliusb | olofk: Isn't that the case in the synthesisable subset of the language (where you target just that subset for synthesisable RTL in your higher-level-lang-to-vlog tool)? | 14:17 |
stekern | ysionneau: I don't know if there are any other reasons? | 14:17 |
juliusb | ysionneau: besides that, probably none? | 14:17 |
ysionneau | ok :) | 14:17 |
juliusb | you might like more places where bugs can creep in? | 14:17 |
ysionneau | so hardware assisted TLB is always better for performance, no tradeoff | 14:18 |
ysionneau | thanks :) | 14:18 |
stekern | pretty much, but the win isn't necessarily always so huge | 14:19 |
stekern | so the hardware bloat might not be worth it | 14:19 |
stekern | and if you get some nice critical paths in the extra hw, then it *might* actually be worse for performance | 14:19 |
ysionneau | you still save the exception, the register saving and all the assembly code that does the page table lookup, and register restore | 14:20 |
ysionneau | yep so it could become the critical path | 14:20 |
stekern | well, you can design your ABI so you don't need to do any register saving | 14:21 |
stekern | and the assembly code is not necessarily many lines of code | 14:21 |
ysionneau | yes, but depending on the OS you can need quite a bunch of assembly | 14:21 |
ysionneau | for a minimal tlb update code I guess that in fact keeping 1 or 2 registers for kernel use is enough | 14:23 |
ysionneau | depending on your ISA | 14:23 |
stekern | it's two memory accesses and a bunch of shifts and ands | 14:24 |
ysionneau | yes | 14:25 |
olofk | juliusb: Yes, the synthesisable subset is a good point... but is there actually a strictly defined subset? And I still would say it's a bit too large for this purpose | 14:25 |
ysionneau | stekern: do you know how Linux handles the tlb miss when the kernel tries to access a user space pointer ? (like in syscalls) | 14:27 |
olofk | ysionneau: I think it kills a kitten every time you get a tlb miss | 14:27 |
olofk | Linux is evil | 14:27 |
ysionneau | woa :' | 14:27 |
ysionneau | not this one I hope: http://catoverflow.com/cats/4fFPNUg.gif | 14:28 |
olofk | But they might have fixed this in some of the last releases :) | 14:28 |
ysionneau | *pfew* | 14:28 |
ysionneau | thanks *whatevergod* | 14:28 |
stekern | ysionneau: the same way as any tlb miss? | 14:29 |
ysionneau | stekern: ah maybe this question is nonsense for Linux since I think syscalls are done with the process memory mapping enabled | 14:29 |
ysionneau | I'm not sure how it is for NetBSD | 14:29 |
ysionneau | I think that as soon as you enter kernel mode you use the kernel's page table | 14:30 |
ysionneau | then you don't have the user space page table anymore, or not as easy to access | 14:30 |
juliusb | I think if you're doing a pure area comparison, though, it depends how expensive your RAM which is storing the CPU code is, versus the hardware area of say, 100 flops (to store state) and 300 combinatorial elements? in ASIC that's like maybe 1.5k gates, depending on the combinatorial stuff. If the MMU is at best 100 32-bit instructions that's 3200 synchronous elements which may be as small as a couple of gates per bit to 6 gates per bit for SRAM, so it's pretty easy for HW to win | 14:31 |
* ysionneau would need to dig on this one | 14:31 | |
juliusb | olofk: for synthesis, yes | 14:31 |
juliusb | olofk: but admittedly even that's pretty big | 14:31 |
ysionneau | juliusb: I think your sentence got cut | 14:32 |
ysionneau | " of gates per bit t" | 14:32 |
stekern | hmm, afaik, in linux the kernel page tables are in all the user space page tables | 14:32 |
ysionneau | yes | 14:32 |
ysionneau | the kernel is mapped in the user space vm space | 14:32 |
juliusb | ... 3200 synchronous elements which may be as small as a couple of gates per bit to 6 gates per bit for SRAM, so it's pretty easy for HW to win | 14:32 |
ysionneau | so I guess the same vm space is kept during the syscall | 14:32 |
stekern | juliusb: right, if you're using onchip SRAM/ROM for the insn code, then a hw filler (at least in terms of area) is probably always worth considering | 14:33 |
stekern | I never thought about it in that way ;) | 14:34 |
ysionneau | juliusb: I was comparing pure hw (no instructions, just a state machine) to pure software, with tlb refill code being in cache or sdram | 14:34 |
ysionneau | or maybe I should have mentioned it :p | 14:34 |
ysionneau | I never thought about embedding the tlb refill code in some SRAM | 14:34 |
juliusb | ysionneau: my estimate of 100 flops and ~1000 gates (transistors) was for the HW state machine, the pure HW | 14:35 |
ysionneau | ah ok | 14:35 |
juliusb | that's 32 32-bit values and 4 bits for state | 14:35 |
juliusb | err | 14:35 |
juliusb | s/that's 32/that's 3/ | 14:35 |
ysionneau | not sure I understand what those 32-bit values would be | 14:36 |
juliusb | couple of addresses and some data | 14:37 |
juliusb | I'm just guessing here, I don't know how it works but you basically need a state machine to walk a table which exists in memory, remember a couple of things, and read and write some stuff, so you'll probably want a couple of pointers and a data-holder guy | 14:38 |
ysionneau | yes | 14:38 |
ysionneau | and some logic to access the bus toward sdram | 14:38 |
ysionneau | ok got it | 14:39 |
stekern | well, this is how it works: https://github.com/openrisc/mor1kx/blob/master/rtl/verilog/mor1kx_dmmu.v#L216 | 14:39 |
stekern | ;) | 14:39 |
ysionneau | :D | 14:41 |
ysionneau | I should have a deeper look at this code indeed | 14:42 |
ysionneau | I only had a very quick one so far | 14:42 |
juliusb | only 2 32-bit registers then :) | 14:43 |
juliusb | oh, no, 4 | 14:43 |
juliusb | err 3? | 14:43 |
juliusb | I dunno :) | 14:43 |
juliusb | not many | 14:43 |
wkoszek | How offended you guys will be if I posted 1 message on a mailing list about 2 openings Xilinx has in my group? | 19:17 |
stekern | wkoszek: given that you are "known to the project" and you ask in advance, I wouldn't at least be very offended ;) | 19:25 |
wkoszek | stekern: Sounds good. | 19:27 |
stekern | I think I finally got the profiling bug sorted out | 20:41 |
stekern | ...not quite yet :( | 21:20 |
lvcargnini | hello, please I would like some clarifications regarding the Or1K architecture ? | 22:21 |
poke53281 | Hi, lvcargnini. What is the question? | 22:45 |
lvcargnini | Hi, I'm with a doubt regarding the 64b OR1K | 22:46 |
lvcargnini | like ok, instructions are 32b | 22:46 |
lvcargnini | but, when I read the word, from L1, a 64b word | 22:46 |
poke53281 | Yes, instructions have a size of 4 byte. | 22:46 |
lvcargnini | how should I map that ? {32'b0,instruction} or {instruction,32'b0} | 22:47 |
lvcargnini | or the compiler is packing as two instruction in one word ? {inst1,inst2} | 22:47 |
poke53281 | Hehe, good question. Easy answer, you can't do it. | 22:47 |
lvcargnini | is to help me understand how to map the signals during decode stage | 22:48 |
lvcargnini | what do you mean can't ? | 22:48 |
lvcargnini | Oo | 22:48 |
poke53281 | Did you check the instruction set. There is no "lwz 0x12345678" for loading a word | 22:49 |
poke53281 | the address of a memory location must be stored in a register. | 22:50 |
poke53281 | and then you can add an offset to that value in the lwz instruction. | 22:50 |
lvcargnini | ok, well lwa loads a word (so 64b) | 22:51 |
poke53281 | so, in order to load from an random access you have to combine a few commands. Wait ... | 22:51 |
lvcargnini | ld loads double word | 22:52 |
lvcargnini | lbz maybe ? | 22:52 |
lvcargnini | sorry lhz | 22:53 |
lvcargnini | maybe | 22:53 |
lvcargnini | but still is is the interpretation of the "lhz 0x121212" in the asm, the issue for me, in the RTL | 22:54 |
poke53281 | l.movhi r3,0x1234 | 22:57 |
poke53281 | l.ori r3,r3,0x5678 | 22:57 |
poke53281 | l.sw 0(r3),r2 | 22:57 |
poke53281 | in 32 Bit, to write to a random 32 bit word you have to use three instructions. | 22:57 |
poke53281 | in this case I write the contents of r2 to the address 0x12345678 | 22:58 |
poke53281 | There must be a similar way in the 64-Bit cpu. | 22:58 |
lvcargnini | ok, the problem , for me, is how the l.movhi is translated to binary ? for a memory of 64b ? also, after that since I'm using 64b registers how to interpret them ? my initial question for you | 23:00 |
lvcargnini | 4-bytes instructions in a 8-byte wide register | 23:00 |
lvcargnini | poke53281, thanks for the help so far | 23:01 |
lvcargnini | I'll keep asking it, until someone can help me figure this out, on SPARC-v9 for example is the same as in SPARC-v8, but is in format {instruction,32'b0} | 23:02 |
lvcargnini | to fit the 64b register | 23:02 |
poke53281 | umhh, ok. Good question. I don't know. | 23:04 |
poke53281 | probably you have to use l.movhi and then a shift left. | 23:04 |
poke53281 | But, maybe I don't understand the real question. | 23:05 |
poke53281 | according to the specification, l.movhi is also doing a shift left of 16. "rD[63:0]←extz(Immediate) << 16" | 23:06 |
poke53281 | it's the same opcode and behaves exactly the same for 32-Bit and 64-Bit. | 23:07 |
poke53281 | So you need probably 5 4-byte instructions to fill a register with a random value. | 23:07 |
poke53281 | Or better, you load the register value form a memory address, maybe from the stack. | 23:08 |
poke53281 | If I still didn't answer your question, you should wait a little bit, until Stekern and others are available. | 23:08 |
poke53281 | Ahh, I read your email. | 23:16 |
poke53281 | I still don't really understand your question. But I guess, that the compiler will produce in principle the same binary code like for 32 Bit. | 23:22 |
poke53281 | I don't know if anything else would make sense. | 23:22 |
poke53281 | 18 60 12 34 l.movhi r3,0x1234 | 23:25 |
poke53281 | a8 63 56 78 l.ori r3,r3,0x5678 | 23:25 |
poke53281 | d7 e1 0f f8 l.sw -8(r1),r1 | 23:25 |
poke53281 | this is the way you should pack it into memory. | 23:26 |
poke53281 | d4 03 10 00 l.sw 0(r3),r2 | 23:26 |
--- Log closed Thu Jun 19 00:00:40 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!