--- Log opened Fri May 09 00:00:40 2014 | ||
wallento | stekern: I think the first thing is to get re-entrancy to head.S etc. (as in libgloss) and to remove the fix memory addresses. I think the SMP itself is only a few functions one needs to implement per architecture | 06:01 |
---|---|---|
stekern | wallento: yes, I think so too. I've got some ideas for head.S, but I want to explore them before making any statements about it. | 06:27 |
stekern | wallento: but.. the *very* first step is to make your multicore-demo boot Linux with *one* core ;) | 06:53 |
blueCmd | olofk: hah, we have a saying at work "nobody wants to be in the critical path" :) | 06:53 |
blueCmd | stekern: pff, you're not thinking large enough! | 06:53 |
blueCmd | if you get it to run on multiple, surely it's trivial to run on one? ;) | 06:53 |
stekern | true, only problem is that it doesn't run on multiple, thus it might not be trivial to run on one ;) | 06:54 |
stekern | (but it's probably something minor that is preventing it, so in practice, yes it should be trivial to make it run on one of the cores) | 06:55 |
stekern | I already found a mor1kx bug in the process though, ibus cyc&stb might stay asserted even though rst is asserted | 06:56 |
stekern | and that brings up another interesting question, what kind of mechanism should there be to enable/disable cores? | 06:58 |
mor1kx | [mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/e0e2f058e3ebba40a9a0231c5f54aa1d6b04bb74 | 07:01 |
mor1kx | mor1kx/master e0e2f05 Stefan Kristiansson: cappuccino/fetch: deassert ibus_req on rst | 07:01 |
pgavin | stekern: SPR space? | 07:08 |
pgavin | maybe an SPR to enable a core, and another to disable? | 07:09 |
stekern | pgavin: to the enable/disable question? yes | 07:09 |
pgavin | or one spr that's just a bitmask | 07:09 |
pgavin | do you already have an SPR defined? | 07:10 |
stekern | no, I pressed enter prematurly... =) | 07:10 |
pgavin | :) | 07:11 |
pgavin | also inter-core interrupts might be useful | 07:11 |
pgavin | not necessary, I suppose, can just use polling | 07:11 |
pgavin | the intercore interrupt can be reused to enable the cores tho | 07:12 |
stekern | yes, that's an option, but then there's a lot of smaller implementation details, like should that spr be 'global', is it only one 'master' cpu that can read/write it. etc | 07:12 |
pgavin | yeah | 07:12 |
pgavin | if you use an interrupt it should be easy to make global | 07:13 |
pgavin | there doesn't need to be any higher level state I don't think | 07:13 |
stekern | I think I need to read up on inter-core interrupts | 07:15 |
stekern | (and a fun fact related to that, google gives results about 'coitus interruptus' when searching for it) ;) | 07:16 |
pgavin | lol | 07:16 |
stekern | spellchecking gone terribly wrong - intercore, not intercourse | 07:21 |
LoneTech | stekern: another irrelevant fact - that's what Onan did, per the bible text. so the word onani is misdefined. | 07:37 |
stekern | LoneTech: ah, interesting - I didn't know that | 07:49 |
stekern | wallento: I'm not sure I understand how the bus address matching here works: https://github.com/wallento/orpsoc-cores/blob/multicore-demo/systems/mor1kx-dualcore/rtl/verilog/wb_bus_b3.v#L366 | 08:05 |
stekern | for the memory, S1_RANGE_WIDTH = 29 and S1_RANGE_MATCH = 0x1f800000 | 08:07 |
stekern | right? | 08:07 |
stekern | that'd make: assign s_select[1] = (m_bus[31:3] == 29'h1f800000); | 08:09 |
stekern | how does that work? | 08:09 |
stekern | it doesn't, because S0 is the mem, not S1... | 08:22 |
wallento | memory at S0 is 0x0000000 to 0x7FFFFF | 08:25 |
wallento | uart at S1 is 0xff80000 to 0xff80007 | 08:26 |
stekern | yeah, I think I got it now | 08:27 |
wallento | RANGE_WIDTH is number of bits from MSB that must match | 08:27 |
wallento | and RANGE_MATCH the corresponding value | 08:27 |
stekern | I mixed the slaves up, that was what threw me off | 08:27 |
wallento | if only verilog had variable width parameter arrays.. | 08:28 |
wallento | :) | 08:28 |
stekern | yes, but I think olofk's wb_intercon_gen is the next best thing ;) | 08:28 |
wallento | i like self-contained versions more :) | 08:29 |
wallento | btw: bus hold and bus hold ack are for multilevel cache coherency | 08:29 |
wallento | so not to get confused | 08:29 |
wallento | L2 needs to stop bus arbitration to inject invalidations | 08:29 |
stekern | what do you mean by self-contained? | 08:30 |
wallento | no generation scripts | 08:32 |
wallento | plain verilog | 08:32 |
stekern | ah, ok. in principle, me too. but some tasks are just to mind-numbing to do manually | 08:33 |
stekern | bus interconnect is one of those things | 08:33 |
stekern | wallento: why did you put the uart at 0xff80000 btw? | 08:56 |
wallento | very good question | 08:56 |
stekern | not that you're not allowed to ;) | 08:57 |
wallento | because I was allowed to maybe :-D | 08:57 |
stekern | but that was the root to my confusion with the slave mixup, I expected the uart to be at 0x90000000 | 08:57 |
wallento | it may be some copy&paste error | 08:58 |
wallento | I did not work on UART at all until now | 08:58 |
wallento | i will push the updated one | 08:58 |
wallento | how large is the memory space of UART? | 08:58 |
wallento | 0x10? | 08:58 |
stekern | size=32 is what we have in the de0_nano port | 09:00 |
stekern | I can't remember how many regs there actually is | 09:00 |
wallento | i gave it 2^14, that should work for the moment ;) | 09:01 |
stekern | ;) | 09:01 |
LoneTech | the 8250 had 7 registers. I would not be surprised to see 8 8-bit registers spread over 32 bytes just because the bus is 32 bit | 09:01 |
stekern | the only downside with doing it like that is that out-of bounds accesses is not caught | 09:01 |
stekern | oh, reminds me, the uart is 8-bit, not 32-bit | 09:02 |
wallento | https://github.com/wallento/orpsoc-cores/blob/multicore-demo/systems/mor1kx-dualcore/rtl/verilog/orpsoc_top.v | 09:02 |
stekern | does your bus-thingy handle that? | 09:04 |
stekern | "handle that" as in: https://github.com/wallento/orpsoc-cores/blob/multicore-demo/systems/de0_nano/rtl/verilog/orpsoc_top.v#L788 | 09:08 |
stekern | wb_intercon_gen inserts that wb_data_resize automatically nowdays, that's why it can't be seen in the mor1kx-generic top module. | 09:08 |
olofk | I just picked up my parallella | 09:43 |
olofk | I'll put it on the SoCKIT in the garage for now | 09:43 |
pgavin | stekern: I think you may have told me already, but has or1k-tests been checked on or1knd yet? | 10:08 |
stekern | yes, at least parts of it been used to test the mor1kx pronto-espresso implementation | 10:09 |
stekern | +have | 10:09 |
pgavin | hm | 10:09 |
pgavin | it seems sfbf.S has an infinite loop | 10:10 |
pgavin | in nodelay mode that is | 10:10 |
stekern | there might be tests that are delay-slotty though | 10:10 |
pgavin | ok | 10:10 |
pgavin | there's an add in the delay slot | 10:10 |
pgavin | so the loop counter isn't being incremented :) | 10:10 |
pgavin | I'll add the macros | 10:11 |
stekern | sfbf sounds like it should be delay-slot agnostic, but obviously isn't | 10:11 |
stekern | wait, I'll dig out the list of tests that the pronto was running | 10:11 |
pgavin | line 71 | 10:11 |
pgavin | is where the error is | 10:12 |
stekern | https://github.com/juliusbaxter/mor1kx-dev-env/blob/master/boards/generic/mor1kx-prontoespresso/sim/bin/Makefile#L58 | 10:12 |
stekern | sfbf isn't in that list though | 10:12 |
pgavin | ok | 10:12 |
stekern | only sf | 10:12 |
stekern | but if you fix it, feel free to push the fix directly to or1k-tests | 10:12 |
pgavin | ok | 10:12 |
pgavin | will do | 10:12 |
stekern | I'll add you to the repo | 10:12 |
pgavin | k | 10:12 |
stekern | ok, I see there are some tests in sfbf that are strictly testing delay slot behaviour | 10:16 |
stekern | perhaps we want to just ifdef that whole part away | 10:17 |
pgavin | ok | 10:18 |
pgavin | seems the .nodelay directive isn't set by default for or1knd-elf-asm | 10:20 |
pgavin | I thought I had made it do that already :/ | 10:20 |
pgavin | but if or1knd-elf-asm has .nodelay by default then it can't be turned off | 10:22 |
pgavin | so it can't be used for both architectures | 10:22 |
pgavin | which means all the .S files need to include or1k-asm.S, so set .nodelay manually | 10:23 |
pgavin | the problem is that ld checks all the objects to make sure the .nodelay setting is consistent | 10:24 |
stekern | is that the only purpose of that? | 10:39 |
stekern | or does the .nodelay thing do something more than that? | 10:40 |
pgavin | that's it | 10:40 |
pgavin | gcc emits it automatically | 10:40 |
pgavin | but then if you hand code you have to use it | 10:41 |
pgavin | maybe it was a bad idea | 10:41 |
pgavin | I had planned on going further with it | 10:41 |
stekern | maybe, but I think we actually *should* use it in bfd too... | 10:41 |
stekern | so, I think the intent is good ;) | 10:41 |
pgavin | well, I think the code that checks it is in bfd | 10:41 |
stekern | I mean, in .plt and such | 10:42 |
pgavin | ah | 10:42 |
stekern | I can't remember, I might have done that delay slot agnostic though | 10:42 |
pgavin | what I never got around to implementing (which actually shouldn't be hard) is to let delay-compat objects be linked with either delay or nodelay objects, and the result would be delay or nodelay as appropriate | 10:43 |
stekern | but, my point was more - there might be more usages for it | 10:43 |
pgavin | right | 10:43 |
stekern | yeah, PIC is broken for the no-delay case: https://github.com/openrisc/or1k-src/blob/or1k/bfd/elf32-or1k.c#L38 | 10:45 |
stekern | so, we do *need* the .nodelay to implement support for that | 10:45 |
pgavin | ah | 10:46 |
pgavin | yep | 10:46 |
pgavin | line 50 also | 10:47 |
pgavin | other ones already have nops | 10:47 |
stekern | yup, but they are there for padding | 10:48 |
stekern | I remember now, I considered making it delay-slot agnostic, but that would have meant one word larger plt entries | 10:48 |
pgavin | right | 10:49 |
pgavin | and slower in the delay-slot case | 10:49 |
stekern | and it's not possible to get the got pointer acquiring to work for the compat case neither, so you got to choose one of the archs for PIC | 10:50 |
wallento | stekern: No, the data resizer is also missing in my top | 10:50 |
stekern | wallento: I figured, I've already added it to my local copy | 10:51 |
stekern | it's booting Linux now ;) | 10:51 |
wallento | cool | 10:52 |
wallento | on a single core and the other one halting? | 10:52 |
stekern | yes, on a single core | 10:52 |
stekern | figured it'd be healthy to have that case working before I start to poke around the Linux src | 10:53 |
stekern | =) | 10:53 |
pgavin | stekern: ok, so delay-compat should be deprecated? | 10:53 |
LoneTech | so, are you aiming at coprthr-style support processor, smp booting, or cpu hotplug? :) | 10:53 |
olofk | stekern, pgavin : Is there really any benefit in doing compile-time detection of delay slots for the asm test cases? | 10:53 |
stekern | pgavin: should or could? I think it has usecases for baremetal | 10:54 |
pgavin | olofk: delay slot testing? | 10:54 |
pgavin | stekern: ok | 10:54 |
olofk | pgavin: Well, that's a good use case :) | 10:54 |
stekern | or, what I'm trying to say, I think there's usecases for it. Unfortunately it will not work together with PIC | 10:55 |
pgavin | stekern: right | 10:55 |
olofk | But how many of the tests are really delay slot tests? Could we split those out and make two versions of them? | 10:55 |
pgavin | probably | 10:56 |
stekern | LoneTech: ehm, you tell us? ;) | 10:59 |
LoneTech | I really don't know which would be hardest. coprthr allows starting processors without having them run the OS | 11:00 |
LoneTech | cpu hotplug allows requesting them at runtime, but is otherwise probably not all that different from smp startup | 11:00 |
LoneTech | and I haven't mucked about with any of them | 11:01 |
stekern | yeah, I'm in the usual "know-nothing-eager-to-learn" position | 11:01 |
pgavin | shouldn't a misaligned PC in l.jr generate an instruction alignment exception and not a bus error? | 11:54 |
pgavin | or1ksim uses an alignment exception | 11:55 |
pgavin | stekern: I guess the mor1kx uses a bus error? | 11:56 |
pgavin | stekern: nevermind, this comment is incorrect :) | 12:06 |
pgavin | or1k-insnfetcherror.S says it generates an alignment error | 12:06 |
pgavin | I suppose that 0xee00000000 generates a bus error | 12:06 |
LoneTech | there's something I don't get in the PLT routines | 12:42 |
LoneTech | looking at http://opencores.org/or1k/OpenRISC_PIC 4-word version .pltN, reloc_offset is a value we expect the resolver to have a use for | 12:43 |
LoneTech | but the PLT entry is already a lookup that would've gone in a register, isn't it? | 12:44 |
LoneTech | if that register is fixed, that and the GOT pointer should be all the information the resolver needs | 12:45 |
LoneTech | (to look up which function to look up, that is) | 12:45 |
LoneTech | it seems the function of jumping through the PLT makes more sense for register starved setups | 12:49 |
stekern | LoneTech: yes, if I understand your question right, I too asked the same question when I implemented the PIC support. And AFAICT, the benefit for us is that it allows for lazy relocations | 12:52 |
LoneTech | but you still could. the lazy relocation relies on being able to identify the routine to look up; it could do so as long as the GOT entry address is in a known register | 12:57 |
LoneTech | you could just move the translation from pltN to reloc_offsetN into plt0 afaict | 12:59 |
stekern | LoneTech: ok, I clearly misunderstood your question then ;) Let me think about that a bit more then | 13:09 |
stekern | pgavin: I couldn't figure out if you answered your own question, but in either case, mor1kx generates an ibus_align exception on unaligned jumps | 13:13 |
pgavin | stekern: yes, I answered my own :) | 13:21 |
pgavin | the comment in the file was incorrect, I've fixed the comment :) | 13:21 |
pgavin | I also added test lists and configs for or1ksim | 13:22 |
pgavin | you can see them here: https://github.com/pgavin/or1k-tests/commits/master | 13:22 |
stekern | LoneTech: I still don't quite understand how you mean, how would your plt0 look like then? | 13:23 |
stekern | pgavin: lgtm, feel free to push those to openrisc/or1k-tests | 13:25 |
pgavin | kk | 13:26 |
LoneTech | sorry, I haven't thought this through entirely. Looks like the value (nameN) I thought to leave in a register is stored in an instruction offset right now, doesn't get put in a register.. but I think just putting that in a register means you don't need .pltN for varying N | 13:30 |
LoneTech | I have to go now, though. will consider more later | 13:32 |
_franck__ | stekern: if you have some spare time, could you give uboot a try ? I have ethoc working under Linux but not under u-boot. I did not investigate. I just cloned the upstream repo and change the clock frequency for my board. | 15:35 |
stekern | let's see if _franck__ reads the backlog on the web... you either need this http://git.openrisc.net/cgit.cgi/stefan/u-boot/commit/?id=c7845df64f7df75dc3d46e2f6385c0d901f9d416 or blueCmd's struct padding patch for gcc | 16:01 |
stekern | https://github.com/bluecmd/or1k-gcc/commit/5043af9d3876eed42dfca706bc023131a519746b | 16:03 |
blueCmd | I'm gonna start working on the atomic builtins now stekern | 16:22 |
blueCmd | and rebase my patches onto openrisc/or1k-gcc | 16:31 |
stekern | \o/ | 16:37 |
stekern | | | 16:37 |
stekern | \ | 16:38 |
stekern | heh.. fail | 16:38 |
blueCmd | / \ | 16:48 |
blueCmd | I'm going to begin by rebasing the latest gcc as well | 16:48 |
blueCmd | *sigh* nah, I won't do that _just_ now | 16:54 |
blueCmd | I'll do that when I want to resolve merge conflicts | 16:54 |
stekern | haha | 16:55 |
blueCmd | it was only 2 months ago, how many arches could possibly have been introduced that are between the letters n and p? | 16:56 |
blueCmd | apparently >0 since there were conflicts | 16:57 |
olofk | I read an article today about Linux ports that started with: "It's not every day Linux is ported to a new architecture", and all I could think was, yes, it's almost every fucking day we read about a linux port to a new arch | 17:13 |
olofk | And there must be even more arches in GCC | 17:13 |
blueCmd | olofk: yes | 17:15 |
stekern | not to mention all the out of tree ones... | 17:15 |
blueCmd | stekern: like or1k you mean? | 17:18 |
blueCmd | ;) | 17:18 |
stekern | yup, or eco32 for both linux and gcc | 17:21 |
stekern | or lm32 for linux | 17:21 |
stekern | blueCmd: why are you rebasing btw? | 17:25 |
stekern | (I'm not speaking about your patches | 17:27 |
stekern | that you applied upon or1k-gcc, that's clear) | 17:28 |
blueCmd | stekern: it's nice to not fall behind | 17:28 |
stekern | but upstream | 17:28 |
stekern | yes, but why do you _rebase_ | 17:28 |
stekern | why not just pull? | 17:28 |
blueCmd | because I hate the merge commits | 17:28 |
blueCmd | I think they are messy and hard to keep track of | 17:28 |
stekern | oh... but it | 17:29 |
* _franck_ has read the backlog | 17:29 | |
stekern | *damn you enter key* | 17:29 |
stekern | 's going to make it hard for you, you break the link between how upstrean and our tree looks | 17:30 |
stekern | + you get weird looking commits like this: https://github.com/openrisc/or1k-gcc/commit/2135e320e5779bd5fa9fdae5fc8ce97072a3e721 | 17:30 |
stekern | where it looks like you are the committer of the upstream patches | 17:30 |
blueCmd | hm | 17:31 |
stekern | in other words, learn to live with the merge commits ;) | 17:31 |
blueCmd | right, I actually thought it was the other way around | 17:31 |
blueCmd | but sure, I'll try to use merge instead | 17:32 |
blueCmd | for those things | 17:32 |
blueCmd | stekern: so. for atomics - limitations. | 17:40 |
blueCmd | operations on 1, 2, 4 and 8 | 17:40 |
blueCmd | are all those possible? | 17:41 |
blueCmd | 1-4 i have no problems seeing as the lock is on the word or bigger, but I don't know if you settled on word or cache line in the end | 17:41 |
stekern | word | 17:46 |
blueCmd | stekern is becoming so gangsta | 17:53 |
olofk | :) | 17:53 |
blueCmd | stekern: so, implementing for 8 bytes wouldn't be feasible I guess | 17:55 |
stekern | haha, took a second or two to get that 'gangsta' reference | 18:41 |
olofk | stekern: a second? :) 19:53 < blueCmd> stekern is becoming so gangsta 20:41 < stekern> haha, took a second or two to get that 'gangsta' reference | 18:43 |
stekern | olofk: contrary to your belief, I'm not reading every single line here in real time ;) | 18:45 |
olofk | WHAT?!?!??! | 18:49 |
blueCmd | stekern: making good progress, I think I can make everything work very easily | 19:05 |
blueCmd | stekern: http://2e7b3d66b5d1dfcc.paste.se/ - that is my WIP | 19:06 |
blueCmd | and it works as in that it replaces __atomic_ builtins and so on with the expand, quite neat | 19:07 |
blueCmd | currently the lower part throws: "wrong number of alternatives in the output template" in my face when I compile it. gonna go out and have a bite and think about it | 19:07 |
_franck_ | stekern: (ethoc) must be something else, I don't receive ARP reply (request are ok, my PC send a reply but uboot doesn't see it) | 20:20 |
_franck_ | ...and it works with barebox | 20:48 |
pgavin | stekern: I think I got gcc to generate some better code now | 22:06 |
pgavin | e.g. there's a load-use delay when possible | 22:06 |
pgavin | and a delay between l.sf* and l.bf | 22:06 |
pgavin | there's a new problem now though... it's separating l.sf* and l.bf at the expense of other things | 22:33 |
pgavin | like it will push the l.sf all the way back to the PC following a load that produces one of its inputs | 22:34 |
--- Log closed Sat May 10 00:00:42 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!