--- Log opened Thu Jun 19 00:00:40 2014 | ||
-!- lvcargnini is now known as Guest30747 | 03:45 | |
Guest30747 | poke53281, thanks sorry my connection dropped, reason why took me so long to answer | 05:21 |
---|---|---|
Guest30747 | the question is the instructions work in 32 b (understood), but i'm implementing a 64b machine, so how the register will be filled containing 32b for the instruction ? | 05:22 |
stekern | I don't understand the question | 05:39 |
stekern | how does the number of bits in the instruction have anything (or at least much) to do with the register size? | 05:40 |
stekern | the same way as you would fill 32-bit registers, but in some cases you'll need some extra operations | 05:40 |
Guest30747 | stekern, explaining you have a 64b size registers in your architecture, bu your instruction is 32b long, so does I interpret 64b words fetching from the cache, and place into a 64 | 06:05 |
Guest30747 | 64b register, than how to intepret that to decode the oppcode of 64b ? | 06:05 |
Guest30747 | as {32'b0,instr} or {instr,32'b0} | 06:06 |
Guest30747 | to decode it. | 06:06 |
stekern | hmm, I think you have to read up a bit more on the internals of RISC machines | 06:06 |
stekern | if I'm not completely misunderstanding what you ask... | 06:06 |
Guest30747 | stekern, please elaborate your conclusion, because like assume mips 64 | 06:06 |
stekern | first, explain why you think that the instruction length is related to the register size | 06:07 |
Guest30747 | in mips64 you also have 64b regs, but instructios are 32b | 06:07 |
Guest30747 | the solution in this case is two instructions per word to fit one register | 06:07 |
Guest30747 | I don't think that way | 06:08 |
stekern | the thing I don't understand in your question is, why do you speak about 'fitting instructions into registers'? | 06:08 |
Guest30747 | the point is my architecture will have 64bits size word, for such I'll handle it using 64b regs, when fetching a instruction aligned in 8-bytes how to interpret the 64b to deocde it | 06:09 |
stekern | the instruction size is still 32-bit | 06:09 |
Guest30747 | I know | 06:09 |
stekern | heh... so what is the question then? ;) | 06:10 |
Guest30747 | but whe you fetch from iL1 you are fetching 64b | 06:10 |
stekern | why would you do that? | 06:10 |
Guest30747 | Why wouldn't I ? | 06:10 |
Guest30747 | you are currently fetching 32b ina 32b arch | 06:11 |
Guest30747 | why not fetch 64b in a 64b, to put in layaman terms | 06:11 |
Guest30747 | sorry layman | 06:11 |
stekern | I'm fetching 32-bit, because the instructions are 32-bit | 06:11 |
Guest30747 | ... | 06:11 |
stekern | if they still are 32-bit, why fetch 64-bit? | 06:11 |
stekern | (you *could* fetch 64-bit from the icache, but the reasons to do that could be applied to the 32-bit versions as well, so let's not go there) | 06:12 |
Guest30747 | I understood you are assuming a 32b and that the instruction is 32b, assuming you are compiling a code for 64b, so all the word representation, otherwise made, are 64b | 06:13 |
Guest30747 | so your memory will be aligned in 8-bytes, on ce you fetch your cache lines, like in 4 or 8 words of 8-bytes each | 06:14 |
stekern | I'm assuming an or1k 64-bit implementation, with 32-bit instructions, as it's are described in the arch manual | 06:14 |
Guest30747 | stekern, that is clear for me too | 06:14 |
Guest30747 | I understood it, so please how would you implement your HDL | 06:15 |
stekern | ... | 06:15 |
Guest30747 | for the decoding phase, assuming you compiled it for 64b, having the 32b instructions similar to MIPS architecture as example | 06:15 |
stekern | for the fetch/icache unit? | 06:15 |
stekern | exactly the same as in a 32-bit implementation | 06:16 |
Guest30747 | datapath decode after fetching lines of 4*8-bytes | 06:16 |
Guest30747 | again I agree with yuo instructions are 4-bytes | 06:16 |
Guest30747 | sorry you | 06:16 |
Guest30747 | that is the point my word now has 64b, not 32b anymore | 06:17 |
Guest30747 | how to correctly parse the 64 | 06:17 |
Guest30747 | b | 06:17 |
stekern | ...again, no, they are *not* 64-bit, they are 32-bit | 06:17 |
stekern | remember that instruction and data paths are seperated | 06:18 |
stekern | your data path will of course be 64-bit | 06:19 |
stekern | your instruction path not | 06:19 |
Guest30747 | separated memory space, so basically you area pointing that I have to have two address space (dimension) for data and inst in the same architecture | 06:19 |
stekern | yes | 06:20 |
Guest30747 | but that is messy | 06:20 |
stekern | in what way? | 06:20 |
Guest30747 | for example in the compiler implementation first thing, second I have to have different implementations for icache and dcache, which will create different latencies for L1 | 06:22 |
Guest30747 | in IC design | 06:22 |
Guest30747 | a line of 16 bytes doesn't have the same latency as a 32 bytes line, after routing and placement | 06:23 |
stekern | what you just said makes no sense at all | 06:23 |
olofk | Guest30747: Different latencies in the caches will probably not be an issue. You could even make both datapaths wider to the memory. Make them 256 bits for example | 06:23 |
olofk | You don't need to have your datapath to the memroies the same width as your instructions or data | 06:24 |
Guest30747 | I know | 06:24 |
Guest30747 | I want | 06:25 |
Guest30747 | would like | 06:25 |
olofk | :) | 06:25 |
Guest30747 | to make the floorplaning smooth | 06:25 |
Guest30747 | stekern, I thinking about a silicon not FPGA implementation | 06:25 |
Guest30747 | and yes make difference in the DFM process once you are routing it and performing STA | 06:26 |
stekern | how would that be related? | 06:27 |
Guest30747 | let me compile two memory banks I'll pass you the data | 06:27 |
stekern | ...they are completely different paths inside the cpu | 06:28 |
stekern | what you're saying still doesn't make sense, doesn't matter if it's FPGA or ASIC | 06:29 |
olofk | Guest30747: You could probably make your icache and dcache identical if you want that, even if instructions and data are different sizes | 06:31 |
olofk | If that's the big problem. But I'm not sure I have understood completely | 06:32 |
Guest30747 | olofk, that was my original intention, assuming {inst,32'b0} or {32'b0,inst}, I taught the compiler would generate the object code this way, SPARC style or MIPS64 style {inst,inst} | 06:33 |
olofk | If you're making the RTL you would have to decide for yourself how you want it | 06:34 |
stekern | the instructions will of course be 4-byte aligned... | 06:35 |
Guest30747 | http://pastebin.com/gcwpb8hH | 06:38 |
Guest30747 | for 32b | 06:38 |
Guest30747 | now assuming 64b | 06:38 |
Guest30747 | http://pastebin.com/EvEHNCSr | 06:40 |
Guest30747 | look the surfaces are different | 06:40 |
olofk | So make them both 64 bit then | 06:40 |
Guest30747 | yes but my doubt was regarding how to decode in 64b, since I had no idea what the compiler will generate | 06:40 |
olofk | And use LSB of the address to select which half of the word you want to use | 06:41 |
stekern | well... it will absolutely certainly not insert empty 4-byte data at each instruction | 06:41 |
olofk | No, I thought that sounds crazy as well. Do really other arches do that? | 06:42 |
Guest30747 | do you guys have a sample of a binary after compiling it for 64b ? | 06:42 |
stekern | we don't have toolchain support for 64-bit | 06:43 |
stekern | ...well, we have *some* support for it in parts of our toolchains, but it's not complete nor usable as of today | 06:43 |
Guest30747 | olofk, no, SPARC just assumes everything into 64b, MIPS loads two instructions instead of one for each load | 06:44 |
stekern | what does "assumes everything into 64b" mean? | 06:44 |
Guest30747 | stekern, oooooo ok, this gave something extra to think | 06:44 |
olofk | Guest30747: But loading two instructions are just dual issue, isn't it? | 06:45 |
stekern | and how do you know that MIPS does that, I would think that's a highly implementation specific detail | 06:45 |
Guest30747 | 64b word size, loads of data o r instructions are 64b aligned | 06:45 |
stekern | I would be surprised if MIPS made such data publically available | 06:45 |
Guest30747 | because I have the MIPS64 specs, yes it is very specific | 06:45 |
Guest30747 | but you can find that into old implementations like R4000, initial 64b | 06:46 |
olofk | So back to the original question, we haven't really discussed that for or1k AFAIR, but packing the instructions tight is what makes most sense to me | 06:48 |
olofk | It might even be defined that way in the spec. Not sure | 06:48 |
stekern | well... if it's important to you to have the caches emit 64-bit (I still don't get the reasoning, but), do that then | 06:49 |
Guest30747 | ok, well I was concerned mostly because of the compiler production, since it doesn't have support for it yet, I'll have to drop it for now, because it is still unclear, the problem is if I make something and later is decide in another way | 06:51 |
stekern | you will just discard half of the data in a single issue implementation... | 06:51 |
Guest30747 | yep | 06:51 |
stekern | sounds silly | 06:51 |
Guest30747 | silly drop or align in 64b single issue ? | 06:52 |
stekern | well, I can promise you that the data will be 'packed' | 06:52 |
Guest30747 | stekern, Yep that is clear | 06:52 |
olofk | But you only need a 32->64-bit conversion outside of the icache so that the CPU sees a 32 bit port | 06:52 |
Guest30747 | my only concern was the the instruction for decoding | 06:52 |
stekern | start hacking away and you will realise that you're concerns are misguided... | 06:53 |
Guest30747 | stekern, I cannot afford the time to be misguided now, reason why I'm trying to be assured before committing to its design | 06:54 |
stekern | think about this way, you can run a 32-bit program on the 64-bit implementation | 06:54 |
stekern | the instructions will be exactly layed out in the same way | 06:55 |
Guest30747 | yes, but the encoded ABI is in 32, the LD, makes the loading and properly maps the sign extension for the 64ba | 06:56 |
stekern | mmm, and that's related to icache/fetch/decode how? | 06:56 |
Guest30747 | stekern, you are right the problem is not clear for me yet how the compiler would generate the objects neither the loader would link a 32b for a 64b on-the-fly | 06:57 |
Guest30747 | to fit | 06:57 |
Guest30747 | so I can keep my arch aligned in 64/32b for data and inst, for data it prefixes 0x00000000 | 06:58 |
Guest30747 | either way, I'll have to scrub some bits to see the outcome and make my decision | 06:59 |
olofk | gtg | 07:00 |
stekern | don't think so much about the compilers, think more of it in terms of a binary just | 07:00 |
Guest30747 | thanks olofk | 07:00 |
Guest30747 | thanks stekern | 07:00 |
Guest30747 | I'll try | 07:00 |
Guest30747 | 8-) | 07:01 |
olofk | Don't ask what the compiler can do for you, but what you can do for the compiler ;) | 07:01 |
Guest30747 | haha ok olofk | 07:01 |
stekern | you don't have to worry to much about how the compiler will align things, you already have in the spec how the different data sizes need to be aligned | 07:02 |
stekern | i.e. l.lb/l.sb 1-byte, l.lh/sh 2-byte, l.lw/l.sw 4-byte and l.ld/l.sd 8-byte | 07:03 |
stekern | just implement them like that and you'll be fine | 07:03 |
Guest30747 | yes, but my concern was how the l.lb is aligned in its memory for a 64b wordsize Or1k | 07:03 |
stekern | the same as for a 32-bit | 07:04 |
Guest30747 | yes, but I was thinking in 64b regs aligned with 64b, worsize for the icache too | 07:04 |
Guest30747 | my root of problem | 07:05 |
stekern | yes, but I think those are all just some misunderstandings from your part... | 07:07 |
Guest30747 | thanks guys for your time and help | 07:07 |
Guest30747 | stekern, probably | 07:07 |
stekern | I might be wrong of course ;) | 07:07 |
Guest30747 | the main reason is I probably misunderstood something plus there is a lack of documentation, and some clarity regarding the 64b arch | 07:08 |
Guest30747 | 8-) | 07:09 |
Guest30747 | thanks stekern | 07:09 |
stekern | no problems, it was an interesting discussion | 07:12 |
stekern | ...that might lead to some better documentation at some point ;) | 07:12 |
Guest30747 | I hope ;-) | 07:12 |
stekern | I'm interested in doing a 64-bit version of mor1kx as well at some point | 07:12 |
Guest30747 | b~d | 07:12 |
Guest30747 | I was looking into it to modify it | 07:13 |
Guest30747 | but is a extensive code, so it seemed easier do all from scratch to be sure I did it right | 07:13 |
stekern | I think it wouldn't be too much work to do it actually | 07:14 |
stekern | modify it I mean | 07:14 |
stekern | a lot can be used as is | 07:14 |
Guest30747 | well, it can | 07:15 |
Guest30747 | could | 07:16 |
stekern | _franck__: I tested you or1k-profiling patch, with some modifications, it works | 08:11 |
_franck__ | great. What did you change ? I think we need to add a keep_alive() call | 08:13 |
stekern | http://pastie.org/9304431 | 08:13 |
stekern | I think the timekeeping is off too, but I didn't look closer at that | 08:14 |
stekern | I did 'profile 100 gmon.out', but it didn't feel like 100 sec | 08:15 |
_franck__ | may be it's because of that: if (sample_count >= max_num_samples || ..... | 08:16 |
_franck__ | let's see what it does ;) | 08:17 |
stekern | it gave a better result than the stall/resume method | 08:17 |
stekern | instead of the empty loop function, the uart_put function was the one where it spent more time | 08:18 |
stekern | (since of course the uart keep on sending stuff even when the cpu is stalled) | 08:18 |
olofk | wkoszek: I didn't realize you were working for Xilinx. Could you pleeeeeease make sure that your coworkers fix the forced line breaks in the ngdbuild/map/par/trce/bitgen logs? :) | 09:33 |
poke53281 | http://www.theregister.co.uk/2014/06/18/intel_fpga_custom_chip/ | 17:16 |
poke53281 | This is a nice development. If this is successful we will see it probably also in consumer versions in a few years. | 17:17 |
ysionneau | reminds me of those Intel atom soc packaged with an altera fpga | 17:27 |
-!- Netsplit *.net <-> *.split quits: poke53281, veprbl, fotis2, slp``` | 18:08 | |
-!- Netsplit over, joins: veprbl, poke53281, slp```, fotis2 | 18:10 | |
-!- Netsplit *.net <-> *.split quits: poke53281, veprbl, slp```, fotis2 | 18:15 | |
-!- Netsplit over, joins: veprbl, poke53281, slp```, fotis2 | 18:16 | |
poke53281 | Yes, but they were not successful as far as I know. Don't know why. | 18:33 |
poke53281 | I hope that this development goes on. | 18:33 |
wkoszek | olofk: I don't work for the tools team. | 21:05 |
wkoszek | olofk: Fill a ticket and wait :) | 21:05 |
--- Log closed Fri Jun 20 00:00:42 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!