--- Log opened Sat Nov 01 00:00:03 2014 | ||
poke53281 | stekern: I am puzzled | 01:04 |
---|---|---|
poke53281 | How does the ompic work? | 01:05 |
poke53281 | Finally I compiled a smp kernel. | 01:06 |
poke53281 | It boots, no problem. Just one core yet. | 01:06 |
poke53281 | But the ompic is never used | 01:06 |
poke53281 | Do I have to change the interrupt-parent in the dts file? | 01:07 |
poke53281 | good, smp is implemented. But I get a black screen :( | 06:26 |
poke53281 | no black screen anymore. But it seems, that the second is not started. | 06:39 |
stekern | poke53281: there's an example .dts here: http://git.openrisc.net/cgit.cgi/stefan/linux/tree/arch/openrisc/boot/dts/sockit_smp.dts?h=smp | 07:31 |
stekern | (if you hadn't figured the dts problem out already) | 07:31 |
stekern | time to fix this atomic instruction emulation I think | 07:54 |
stekern | hmm, looking at the MIPS emulation of ll/sc, I can't seem to see them actually emulating the link | 09:58 |
stekern | ah, no, now I see it | 09:59 |
mor1kx | [mor1kx] bandvig pushed 1 new commit to withfpu: https://github.com/openrisc/mor1kx/commit/1abd30b22264b03d1d77df835a5c19b907712228 | 11:45 |
mor1kx | mor1kx/withfpu 1abd30b Andrey Bacherov: re-factoring floating point adder | 11:45 |
poke53281 | stekern: Yes, I read the dts files. But I don't know how ompic and the normal pic is related to each other. | 14:35 |
poke53281 | Anhow: I get the kernel message "Brought up 2 CPUs" now. :) | 14:35 |
poke53281 | But then it hangs. | 14:36 |
poke53281 | And finally, he uses the ompic | 14:55 |
stekern | currently, both are active and not related in any other way than that you have to connect the irq output of the ipi part of the ompic to a irq line of the pic, since the "pic parts" are not yet implemented in ompic | 15:17 |
stekern | and you have to connect it to a line that's non-maskable, since otherwise you will not have it unmasked on all cpus | 15:18 |
poke53281 | obviously I don't have a clue how an interrupt controller works on smp machines. | 16:55 |
poke53281 | ompic_raise_softirq checks for 10ms whether an irq is pending on the destination CPU. | 17:00 |
poke53281 | then he sets the control register of the source cpu to 0x40010000 | 17:02 |
poke53281 | irq=0 dst_cpu=1 and irq_gen | 17:03 |
poke53281 | then he sets the ctrl of cpu 1 to irq=0 dst_cpu=0 and irq_gen. | 17:03 |
poke53281 | so if I raise an interrupt (Exception int) how do I decide which cpu? | 17:11 |
poke53281 | are hardware (device) interrupts connected to the pic or to the ompic? | 17:14 |
poke53281 | Interrupt 1 is non-maskable by default. The same is true for interrupt 0. | 18:28 |
stekern | peripheral interrupts are directly connected to the pic. so far, ompic only handles IPI | 19:58 |
poke53281 | and which core should be triggered if an interrupt is raised? | 19:59 |
poke53281 | all? | 19:59 |
poke53281 | dependent on the IEE Flag? | 19:59 |
stekern | no, only the core that is assigned to the interrupt | 20:00 |
stekern | there's no support for migrating irq | 20:00 |
stekern | s between cores (yet) | 20:00 |
poke53281 | but interrupt 1 is non-masked. That means, it is triggered for all cores? | 20:01 |
stekern | yes, but you have num_cores irq lines coming out of the ipi part of ompic | 20:01 |
stekern | each core is connected to a seperate line | 20:02 |
stekern | and the irq initiator cpu selects which other cpu should be interrupted | 20:02 |
poke53281 | sorry, for slow physicists please. Let's say my uart triggers irq no. 2. Is the ompic involved in this irq? | 20:04 |
poke53281 | each core has a separate line. This I understand. And how do I figure out, which core is responsible for which irq? Does the PICMR has anything to do with it? | 20:06 |
poke53281 | Each core has a separate PIC MR and SR register? | 20:07 |
poke53281 | At the moment I have the following: A pic interrupt sets the SR registers of all cores. Clear clears all SR registers. | 20:15 |
poke53281 | Of course only the core with the correct mask in the MR register will handle the interrupt in the end. | 20:15 |
poke53281 | But that would mean, that the ompic is the only one which can trigger single cores? But then it shouldn't trigger irq 1 of the pic. | 20:19 |
stekern | poke53281: ok, let me try to start over ;) | 20:53 |
stekern | 1) ompic currently have one function, to implement ipi (inter processor interrupts). Long term, I want to add more functionality to it, and not use the builtin PIC. | 20:55 |
stekern | 2) due to 1), for a uart interrupt, ompic is not involved. It's just a 'normal' peripheral interrupt. | 20:56 |
stekern | as you say, the ipi part of ompic is the only thing that can trigger irqs on a single cpu (if we disregard the masking) | 20:58 |
stekern | and the ipi interrupts are different from the normal peripheral interrupts, each ipi interrupt has an own signal for each cpu, while the peripherals connect the same signal to all the cpus | 20:59 |
stekern | https://github.com/skristiansson/orpsoc-cores/blob/multicore/systems/sockit-multicore/rtl/verilog/orpsoc_top.v#L1811 | 21:00 |
stekern | might make it more clear | 21:01 |
poke53281 | so if the ompic non-maskable interrupt is triggered (int 1) it should only trigger the core (set PIC SR) that is defined by the ompic. All other interrupts should set the PIC SR register of all cores? | 21:02 |
stekern | right | 21:02 |
poke53281 | and clear should behave the same. | 21:04 |
poke53281 | Ok, so the next question is then, how the ompic works. | 21:04 |
stekern | the code is here: https://github.com/skristiansson/orpsoc-cores/blob/multicore/cores/ompic/rtl/verilog/ipi.v | 21:05 |
stekern | ;) | 21:05 |
poke53281 | Read STAT of CPU 0 | 21:05 |
poke53281 | Write CTRL of CPU 1 : dstcpu=0 irqno=0 flags=1 (GEN) | 21:05 |
stekern | it's basically just an array of registers | 21:06 |
poke53281 | Hmm, I can try to undestand it. | 21:06 |
poke53281 | So If I write the the control of cpu 1 does that mean that I trigger the line of cpu 1 or do I trigger the line of dstcpu? | 21:07 |
stekern | you trigger the line of dstcpu | 21:08 |
stekern | if you write the control of cpu1, that means you should be cpu1 | 21:08 |
stekern | there's really nothing enforcing it, but that's how it's supposed to be used | 21:09 |
poke53281 | Ahh, Ok. In principle I can use any control register. At least for my emulation. | 21:09 |
poke53281 | Because the data is no longer used in the end. | 21:10 |
stekern | hmm, how do you mean? | 21:10 |
poke53281 | It just tells me, which cpu triggers the soft IRQ. But this information is never used. | 21:10 |
stekern | ah, you might be right. I just figured it might be interesting to know the source cpu when I designed it- | 21:11 |
poke53281 | Ok, finally I understand :) | 21:13 |
poke53281 | In principle one control register might be sufficient. | 21:13 |
stekern | but you still need to have a control register per cpu | 21:13 |
poke53281 | ? | 21:14 |
poke53281 | Well, two cpus could write at the same time. | 21:14 |
poke53281 | But we have a lock there. | 21:14 |
stekern | since several cpus might want to send at the same time | 21:14 |
stekern | but the lock is only there to protect against re-entrance from the same core | 21:14 |
poke53281 | Ahh, Ok. | 21:15 |
poke53281 | But if we would use a global lock, then one CTRL register would be sufficient. | 21:15 |
stekern | then you would need to lock until the other cpu have handled the irq | 21:16 |
stekern | ...which I don't think will work | 21:17 |
poke53281 | well, this delay with 10000 retries is a little bit weird. | 21:17 |
stekern | it only does it if we're still waiting for a retreiver to handle an earlier ipi request | 21:18 |
poke53281 | I can work this way. | 21:18 |
poke53281 | And what happens if there are three requests. | 21:19 |
stekern | ? | 21:19 |
stekern | there can't be three requests to the same destination | 21:19 |
poke53281 | One is acknoledged immediately, so the second core can send the IRQ_GEN right in time. 10ms should be more than enough. | 21:20 |
poke53281 | Two is the limit? | 21:20 |
stekern | number two will be waiting in the retry loop for the first to be handled, the third will be spinning in the spinlock waiting for the lock to open | 21:21 |
stekern | how do you know that the first is acknowledged immediately? | 21:22 |
stekern | (I assume you meant cleared from the receiver side by that) | 21:23 |
poke53281 | well, under the assumption, that the core is nothing else to do. | 21:23 |
stekern | it might have interrupts turned off ;) | 21:23 |
poke53281 | Ok, I think you are right. Those are tiny details which would fail horribly if done wrong. | 21:24 |
stekern | IIRC I think I saw some time outs when I had it set at 1000, so I raised it to 10 000 | 21:25 |
poke53281 | There must be a better way for the timeouts. | 21:27 |
poke53281 | ompic_ipi_lock? Is this a global variable. Or several global variables for each core? | 21:28 |
poke53281 | Is ompic_ipi_lock one global variable that is the same for each core? | 21:30 |
poke53281 | and _irqsave means that the interrupts are enabled during the waiting time in the spinlock? | 21:31 |
stekern | well, the loop could be an infinite loop, the timeout is just there to make a failure evident | 21:32 |
stekern | irqsave means interrupts are *disabled* | 21:35 |
stekern | they need to be disabled, because ompic_raise_softirq() can be called from an interrupt context | 21:37 |
stekern | which would mean that you'd have a deadlock on one core trying to obtain a lock that it's holding | 21:37 |
stekern | locks are (of course) not per-cpu, it's a global 'variable' | 21:38 |
stekern | this is the actual implementation: http://git.openrisc.net/cgit.cgi/stefan/linux/tree/arch/openrisc/include/asm/spinlock.h?h=smp#n32 | 21:39 |
stekern | so the actual lock works by splitting up a 32-bit variable in two 16-bit fields, one next field and one owner field | 21:40 |
stekern | ticket spinlocks works like a ticket system when you're queuing at a bank | 21:42 |
stekern | so at the beginning, there's a number 0 at the ticket machine and a number 0 at the display | 21:44 |
stekern | the first one to arrive takes ticket 0 and get served. and at the ticket machine there'll be ticket 1 waiting for the next to arrive | 21:44 |
poke53281 | I see | 21:45 |
stekern | when the first cpu is done, it unlocks the lock by setting the 'display' to 1 | 21:45 |
stekern | the benefit is of course when you have several cores waiting for the lock, they then get served in the order they arrived | 21:48 |
poke53281 | That makes sense. | 21:49 |
poke53281 | Ok, I try to implement the ompic now. | 21:49 |
poke53281 | *YES* | 21:57 |
poke53281 | 18 more kernel messages before it stops. | 21:58 |
poke53281 | Ok, next problen: Timer :) | 22:03 |
poke53281 | http://pastie.org/9690400 | 22:06 |
stekern | nice ;) | 22:14 |
poke53281 | At the moment the kernel just stops at random positions. But my TTCRs are most of the time out of sync. | 22:16 |
poke53281 | TTCRs must always be perfect synchronously. | 22:17 |
poke53281 | So, is it also true with TTMRs? | 22:18 |
poke53281 | So, the 100Hz tick is synchronous to all cores? | 22:18 |
stekern | no | 22:19 |
stekern | the capture interrupts are set independently on each core, but the timer reference has to be in sync | 22:20 |
stekern | the tick timer is used as two different 'devices' - a clock_envent_device and a clocksource | 22:23 |
poke53281 | But I should get at least a shell with out of sync timers? | 22:23 |
poke53281 | Because, this is the only thing, that is wrong currently. | 22:24 |
stekern | nah, if the timers are out of sync, it might think that the time reference is something (that it got from another core), then setup the next event based on this reference | 22:24 |
stekern | so, the correct way to fix this is to have a globally accessible timer with per-cpu capture interrupts | 22:27 |
poke53281 | yes, and this timer is currently always zero in my implementation. :) | 22:27 |
poke53281 | this global timer I mean. | 22:28 |
stekern | but currently, the kernel code kind of expects that the built-in timer is there, so it needs some work | 22:28 |
stekern | the openrisc kernel code | 22:28 |
stekern | I'm tired and can't think straight how C sign extension and integer promotion works, will this sign extend properly? | 22:30 |
stekern | long imm; imm = (short)insn; | 22:31 |
stekern | where the imm is in [15:0] of insn | 22:31 |
poke53281 | I think so | 22:33 |
stekern | yeah, I think that's right too | 22:38 |
poke53281 | timer in sync now | 22:53 |
poke53281 | one kernel message before the shell I get a OMPIC timed out message. | 22:56 |
stekern | ouch | 22:56 |
poke53281 | or before he starts init I mean. | 22:56 |
poke53281 | http://jor1k.com/jor1k/ | 22:57 |
poke53281 | So close | 22:58 |
stekern | it panics here | 22:59 |
stekern | can I scroll up in the console somehow? | 22:59 |
poke53281 | Yes, after a while | 22:59 |
poke53281 | No | 22:59 |
stekern | great ;) | 22:59 |
poke53281 | I know this problem :) | 22:59 |
poke53281 | try it again. My solution to the problem | 23:03 |
poke53281 | self-detected stall on CPU 0 | 23:03 |
poke53281 | I know your next question. | 23:06 |
stekern | well, that is because the ompic failed | 23:06 |
stekern | do a quick test and increase that timeout (or remove it completely) | 23:07 |
poke53281 | your next question is, what the pc of cpu 0 is and in what function is fails :) | 23:07 |
stekern | I think I have rudimentary l.lwa/l.swa emulation now | 23:09 |
stekern | let's see if it works | 23:10 |
poke53281 | ? for smp ? | 23:10 |
stekern | I'm not really breaking the link on context switch yet, but it should at least do something | 23:10 |
stekern | no, for up | 23:10 |
stekern | just so you can run userspace code that contains l.lwa/l.swa without crashing on implementations that lack them | 23:11 |
poke53281 | Ok, I understand. You mean an unknown instruction exception is triggered and lwa and swa is emulated in the kernel. | 23:13 |
stekern | now I just have the problem that I don't think I have an or1ksim without them | 23:13 |
stekern | yes, exactly | 23:13 |
stekern | oh, right... calculating the return pc is going to be mighty fun... | 23:23 |
stekern | let's pretend that you can't put l.lwa and l.swa in delay slots to begin with, then it's easy | 23:30 |
poke53281 | :) | 23:31 |
poke53281 | Let's pretend, that removing the dealay slots in all (important) OpenRISC related code would take less than one day. | 23:32 |
poke53281 | Increased the number of retries by a factor of 100. No luck. | 23:34 |
stekern | well, the biggest issue, the toolchain is mostly covered | 23:36 |
stekern | kernel and then libc shouldn't be insane amount of work | 23:36 |
stekern | will take more than a day for sure ;) | 23:37 |
stekern | when I have a delay-slot-less cappuccino I'll probably do that | 23:38 |
poke53281 | Oh damn. Found the error. Stupid poke | 23:40 |
stekern | looks like the emulation works now | 23:46 |
stekern | emulation of l.lwa and l.swa in the kernel doesn't seem to play well though | 23:54 |
poke53281 | Well, that's important :) | 23:55 |
poke53281 | http://jor1k.com/jor1k/ | 23:55 |
stekern | but maybe that's because I don't actually clear the flag ever | 23:55 |
poke53281 | Only a tiny little bit is missing | 23:55 |
stekern | what's that? | 23:56 |
poke53281 | Sometimes the init routine runs, but bash is not starting. | 23:56 |
poke53281 | So, he gets even an IP from the relay. | 23:57 |
stekern | ah, it started here ;) | 23:57 |
stekern | don't you implement the 'new' version registers? | 23:57 |
poke53281 | Indeed | 23:57 |
poke53281 | no | 23:58 |
stekern | also, are you actually emulating caches? | 23:58 |
poke53281 | No | 23:58 |
stekern | you could save a couple of milliseconds in the boot by claiming not to have them | 23:59 |
poke53281 | I didn't touch the version or upr register since the first version. | 23:59 |
poke53281 | of jor1k | 23:59 |
stekern | but, way cool progress so far! | 23:59 |
--- Log closed Sun Nov 02 00:00:05 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!