--- Log opened Mon Jul 07 00:00:08 2014 | ||
dalias | stekern, any test results? :) | 03:08 |
---|---|---|
dalias | stekern, i take it r10 is the thread-pointer register for the or1k abi? | 03:14 |
stekern | dalias: not yet, I was busy with other stuff in my TODO pipeline yesterday | 03:29 |
dalias | no problem | 03:30 |
stekern | yes, r10 is the thread-pointer register | 03:30 |
dalias | ah r11 is return value. that's what was confusing me :) | 03:30 |
dalias | your thread asm commits look right then, from a non-or1k-expert perspective anyway | 03:31 |
stekern | yup, abi is: r1=sp, r2=fp, r3-r8=arg, r9=lr, r10=tp, r11 (and r12)=return values, r16=GOT pointer | 03:32 |
dalias | *nod* i found the docs now | 03:41 |
stekern | great, thanks for taking a look at it | 03:42 |
dalias | btw is the stuff about variadic arguments never being in registers still true? | 03:42 |
stekern | yes | 03:43 |
dalias | fun :) | 03:43 |
stekern | it sucks | 03:43 |
dalias | it's more efficient, but hell when idiots call variadic functions without prototypes | 03:43 |
dalias | (e.g. most hello world programs) | 03:43 |
stekern | we've had a couple of discussions about maybe changing it, but breaking the ABI isn't all that fun ;) | 03:44 |
dalias | are stack slots left for the args that were passed in registers, like mips abi? | 03:44 |
dalias | that's really ugly and inefficient (and error-prone when writing asm by hand), but it would allow you to solve the variadic problem cleanly | 03:45 |
stekern | we had some problems with gcc calling variadic functions with different prototypes | 03:45 |
stekern | no, the stack slots start at the beginning of the stack | 03:46 |
stekern | that came out weird | 03:47 |
dalias | so calling a function with <7 args has no requirement for the caller to do anything with the stack? | 03:47 |
stekern | I mean, we don't reserve stack slots for the reg args | 03:47 |
stekern | right | 03:47 |
dalias | ok | 03:47 |
dalias | if you did it the mips way, gcc could just store the arguments both on the stack AND in registers when the prototype was missing | 03:48 |
stekern | yeah, I know. but if we'd change it, I'd rather just change it to treat variadic functions like normal functions. | 03:49 |
stekern | I don't like that reserve stack slots for regs ABI, I've been dabbling with another open source cpu that does that (eco32) and I found it annoying. | 03:51 |
dalias | oh of course | 03:51 |
dalias | the mips way is idiotic and defeats the purpose of pass-by-register | 03:51 |
dalias | because you still have to do nasty stack manipulations to make a call | 03:51 |
dalias | i just meant if it were already that way, you could work around the problem without changing the ABI | 03:52 |
stekern | ah, yeah | 03:53 |
dalias | btw i wouldn't bother changing it. x86_64 already has subtle breakage if you call variadic functions as non-variadic | 03:53 |
dalias | they require %al to have the # of sse arg registers used | 03:53 |
dalias | and if %al contains junk, the callee will jump to random addresses due to the way the prologue code works..... | 03:54 |
dalias | gcc always zeros %al when you call a function with no prototype at all | 03:54 |
dalias | but we've found broken code (iirc in glib) that called variadic functions (open) via a non-variadic function pointer | 03:54 |
dalias | and of course this broke horribly | 03:55 |
dalias | so imo such code just needs to be caught and fixed | 03:55 |
stekern | mmm, I think we've pretty much decided to not change it, and instead go on a crusade to find offenders | 03:55 |
stekern | dalias: gcc used to do that to... | 03:55 |
dalias | do what? | 03:55 |
dalias | calling variadic functions via non-variadic function pointers? | 03:56 |
stekern | ah, didn't read what you wrote properly. not exactly that but almost. | 03:56 |
stekern | gcc defined the function as void func(int, ...){} in one file and then declared it as void func(int, int, int); in another | 03:58 |
stekern | but, that's fixed now | 03:58 |
dalias | ah | 03:59 |
dalias | yeah that's basically the same | 03:59 |
stekern | there's other projects out there with bugs like that (directfb iirc) | 03:59 |
dalias | well like i said they're very dangerous on x86_64 | 03:59 |
dalias | so they need to be fixed | 03:59 |
stekern | yup | 04:00 |
dalias | on x86_64, gcc really generates variadic prologue that jumps to a negative offset based on scaling %al | 04:00 |
dalias | so if the caller does not know it's variadic, %al contains junk and you jump to random code that happens to be before the start of the function | 04:01 |
dalias | this should probably be considered a security bug in gcc | 04:01 |
dalias | because they could easily avoid it by applying a single AND instruction to %al to bound its range | 04:01 |
dalias | does the musl port have all of the port files now, and just need checking/testing? | 04:16 |
dalias | or are there still missing things? | 04:16 |
stekern | dalias: I think so, unless I missed something | 05:01 |
stekern | there's still the inefficient setjmp implementation | 05:02 |
dalias | *nod* | 05:10 |
dalias | if some basic testing shows it working, i think we can commit upstream and do further work in the main tree | 05:11 |
stekern | sounds good to m | 05:13 |
stekern | e | 05:13 |
dalias | btw i have in mind some 'gratuitous' renamings of some files that i kinda want to do, which affect ports. not sure whether to do these before or after committing or1k. | 05:18 |
dalias | things like clone.[cs] -> __clone.[cs], syscall.[cs] -> __syscall.[cs], etc. | 05:18 |
dalias | (part of the goal being consistency; the other part being avoiding having multiple files with the same name in libc.a which makes debugging confusing | 05:18 |
stekern | dalias: makes sense, syncing that out of tree wouldn't be very painful though, so do as you feel fit | 05:59 |
maxpaln | Hi - hope you are all well!! | 08:43 |
maxpaln | Some good news - I have the ORSOC working on our latest silicon - ECP5 :-) | 08:43 |
maxpaln | and or factory seems to be taking significant interest in promoting the solution too :-) | 08:43 |
maxpaln | However, in more serious news - I have a problem trying to run or32-elf-sim on a Linux kernel I have built | 08:44 |
maxpaln | I am getting a 'KERNEL: Bus error (SIGBUS) 0xbc058004' error | 08:44 |
maxpaln | I suspect this is something to do with the settings in my dts file (I have gone back and reused a known good .config file to build the kernel but I am still getting the error) | 08:45 |
maxpaln | Does anyone have a bit of knowledge about the Linux boot process to point me in the right direction for my debug... | 08:46 |
maxpaln | I'll share a link to the linux boot log ... just a second. | 08:46 |
maxpaln | Here it is | 08:48 |
maxpaln | http://pastie.org/9363247 | 08:48 |
maxpaln | And in case it is helpful, here is the dts file | 08:49 |
maxpaln | http://pastie.org/9363254 | 08:50 |
stekern | wb maxpaln | 08:55 |
stekern | long time no see, and good news | 08:56 |
stekern | it's trying to access some device that you don't have in your SoC (well, virtual SoC, since it's or1ksim) | 08:58 |
maxpaln | it is a long time - I've been debugging our new silicon with the customer I am developing the ORSOC solution for | 08:58 |
maxpaln | it is has been a long 3 months!! | 08:58 |
stekern | orPsoc, orsoc is still a company ;) | 08:59 |
maxpaln | thanks for your tip - so, without knowing too much about the linux boot process | 08:59 |
maxpaln | Ah, well pointed out! | 08:59 |
maxpaln | :-) | 08:59 |
stekern | my guess is the spi stuff, IIRC there's no spi emulation in or1ksim | 08:59 |
maxpaln | Actually, that is a good point - I had noticed the references to both (orsoc and ORPSOC) and had never really put two and two together | 09:00 |
maxpaln | SO I should look at the SPI Stuff in the DTS file? | 09:00 |
stekern | try to just comment it out and see if the kernel boots then | 09:00 |
maxpaln | ok, good tip | 09:00 |
maxpaln | nope, that's not it | 09:17 |
maxpaln | out of interest, I am assuming or1ksim is using the DTS file at arch/openrisc/boot/dts/or1ksim.dts - is that correct? | 09:17 |
maxpaln | if not I've been editing the wrong file | 09:18 |
maxpaln | hmmm, this is confusing | 09:19 |
maxpaln | I went to back to what I thought was a working .config and .dts file - but the crash still occurs | 09:19 |
stekern | it depends on your config, and since you are (most likely) using a builtin dts, you need to recompile the kernel after you change things in the .dts | 09:20 |
maxpaln | aha - there you go! | 09:20 |
maxpaln | back to school again | 09:20 |
maxpaln | so, change the dts, recompile Linux, then sim | 09:20 |
maxpaln | go it | 09:20 |
ysionneau | 10:43 < maxpaln> Some good news - I have the ORSOC working on our latest silicon - ECP5 :-) < you made an ASIC out of openrisc core? | 09:25 |
stekern | ysionneau: it's a Lattice FPGA | 09:25 |
ysionneau | ah no I got it, you work for lattice ? | 09:25 |
maxpaln | Yes, that's correct | 09:26 |
ysionneau | and you just tested orpsoc on the latest fpga silicon | 09:26 |
ysionneau | ok :) | 09:26 |
ysionneau | cool | 09:26 |
maxpaln | yes, although there was a fair bit of work involved in getting the DDR3 memory controller to work | 09:26 |
ysionneau | maxpaln: is there still development on LatticeMico32 CPU at lattice? | 09:26 |
ysionneau | just out of curiosity | 09:26 |
maxpaln | yes, although mainly maintenance rather than active development | 09:31 |
maxpaln | it is still part of many of our solutions - | 09:31 |
maxpaln | when we need a processor the Mico32 is the one we use - but the roadmap, which did include things like multi-cores, broader feature set etc. has been scaled back | 09:32 |
ysionneau | ah too bad | 09:33 |
maxpaln | well, the processor is very stable and was developed to a very useable level | 09:33 |
maxpaln | But the direction for this sort of area is very much up in the air at the moment | 09:33 |
maxpaln | The SOC devices from Altera and Xilinx are interesting - but customer feedback is mixed. | 09:34 |
blueCmd | I'm going to build a wb-slave <-> wb-master bridge over 8 pins to carry wishbone over to another FPGA | 09:34 |
blueCmd | will I have a bad time? | 09:34 |
maxpaln | How will you be getting from one FPGA to another - ribbon cable or on the PCB? The phase relationship on the clocks between the two FPGAs will need to be managed, it will also vary over PVT | 09:39 |
maxpaln | meanwhile, on my linux problem - I am still getting the crash, | 09:40 |
maxpaln | As a sanity check I went back to what I believe to be a known good pair of DTS file and .config - but the crash still occurs | 09:40 |
maxpaln | so I am wondering what other variables exist that I have somehow broken | 09:41 |
maxpaln | the ork1sim command line includes a reference to a .cfg fileL | 09:42 |
maxpaln | or32-elf-sim -f arch/openrisc/or1ksim.cfg vmlinux | 09:42 |
blueCmd | maxpaln: ribbon cable if I can get that to work | 09:42 |
maxpaln | I don't remember creating (or editing) this file - and it is dated from December of last year | 09:42 |
maxpaln | could there be something out of date or somehow incompatible here? | 09:43 |
blueCmd | or32-elf-sim is a weird name | 09:43 |
maxpaln | really? I found it in the docs somewhere | 09:43 |
blueCmd | maybe that is how it's supposed to be named, but it's always just 'sim' for me | 09:43 |
maxpaln | ah, maybe you have an alias | 09:43 |
blueCmd | no, it's in /usr/loca/bin/sim | 09:44 |
blueCmd | ./configure on or1ksim repository will net you a 'sim' binary | 09:44 |
stekern | blueCmd: you can configure or1ksim with a --target option | 09:44 |
blueCmd | stekern: ah | 09:44 |
maxpaln | yeah, this definitely worked before so I don't think I have anything misconfigured | 09:45 |
maxpaln | but I take your point | 09:45 |
blueCmd | what is the crash? | 09:46 |
stekern | maxpaln: can you show your 'known-to-be-good' .dts | 09:46 |
maxpaln | blueCmd: the latest one is here: | 09:47 |
maxpaln | http://pastie.org/9363489 | 09:47 |
maxpaln | the latest dts I am using is here: | 09:47 |
blueCmd | maxpaln: actually I'm thinking of using a lattice FPGA to my wb issues. my scenario is that I have an FPGA board with only 8 GPIOs. I need to extend it to include sdcard, ethernet and RTC, and possibly more. So I'm thinking of having a ribbon cable to a lattice FPGA and let it be a 'wb expander' | 09:47 |
maxpaln | dts file http://pastie.org/9363494 | 09:48 |
blueCmd | maxpaln: check the memory size in or1ksim.cfg I would say | 09:48 |
stekern | but you still have the spi and flash stuff in there | 09:48 |
stekern | remove everything below the enet0 node | 09:49 |
maxpaln | blueCmd: The ribbon cable won't be a problem per-se but between the FPGA IOs, the PCB trace, the connectors and the cable itself you'll want to keep the clock rate pretty low - in the order of single figure MHz is a good plan to start with. | 09:49 |
maxpaln | Ah, yes - I removed the SPI stuff, tested it and saw the fail, then reverted to the 'known good one' | 09:50 |
maxpaln | let me remove the SPI stuff again and rerun | 09:50 |
maxpaln | blueCmd: that is assuming you will be sending the clock as one of the 8 wires | 09:51 |
maxpaln | stekern: - yes, that is it! :-) | 09:52 |
maxpaln | so, let me now try the 'new' dts without the SPI | 09:52 |
ysionneau | \o/ | 09:52 |
blueCmd | maxpaln: right, I'm thinking of sending a single ended clock yes. | 09:54 |
maxpaln | blueCmd: OK, so getting the bus will be bi-directional but only the master will send the clock. If that is right the biggest delay will be when the Slave sends data backl to the master, the clock used by the Slave will be phase shifted from the Master, then the data arriving at the Master will be phase shifted from the Slave. This combined phase shift or delay will be your biggest factor in | 09:57 |
maxpaln | determining throughput | 09:57 |
maxpaln | (and probably your biggest headache in terms of getting a reliable interface) | 09:59 |
blueCmd | maxpaln: what if I made a small board with an FPGA that plugged straight on the pin header and then used a real XCVR to transfer the data over a sata cable or something? would that be significantly better? | 10:01 |
blueCmd | probably going to go with the ribbon first, but I'm trying to figure out my alternatives | 10:01 |
maxpaln | well, the signal quality is only part of the issue - and slowing the clock would probably negate those effects. A pure PCB based solution would help here - it would allow you to get to higher data rates. | 10:02 |
maxpaln | but I would say you need to give consideration to what is essentially the PHY layer of the interface | 10:03 |
maxpaln | There are not many source synchronous bi-directional busses - probably for the above reason. | 10:03 |
stekern | sb0: how do I generate the misoc timing report? | 10:03 |
maxpaln | It would be more common for the Transmit device to send a clock with the data - in your case you only have one clock master, and the clock relationship between the slave and master will be unknown. | 10:04 |
blueCmd | maxpaln: I can make it MISO/MOSI style if that helps me increase the clock rate, but then I'm down to 3 pins in each direction | 10:05 |
stekern | or rather, can I get it to generate it, or do I have to run that Xilinx tool (whatever its name was now again) manually? | 10:05 |
stekern | trce I think | 10:05 |
maxpaln | yeah, I was thinking about that - 3-pins either direction is dull, but easier. | 10:05 |
maxpaln | what about making the whole bus bi-directional | 10:06 |
maxpaln | that way both slave and master can send the clock | 10:06 |
blueCmd | stekern: in vivado you do 'check_timing' and/or 'report_timing_summary' | 10:06 |
blueCmd | maybe grep for that in the code | 10:06 |
blueCmd | maxpaln: hm, interesting | 10:07 |
maxpaln | blueCmd: it would be usual in this sort of scenario to either send the data centre aligned to the clock, or send it edge aligned as normal and in the receiving device clock the data in on the opporite (probably falling) edge of the clock. This would give you half a clock cycle of margin to allow for the ribbon cable or PCB delays. | 10:12 |
blueCmd | maxpaln: so why cant I just clock twice as fast for the same effect? signal quality? | 10:17 |
blueCmd | basically what you're proposing is DDR right? | 10:18 |
maxpaln | Not really - you are clocking at a single data rate | 10:18 |
maxpaln | think of it the other way around, you clock the data out on the riding edge of the clock - so the data gets sent aligned to the rising edge of the clock | 10:19 |
maxpaln | In the receiving device you also clock the data on the rising edge of the clock | 10:19 |
maxpaln | In this scenario, there is virtually no margin for the receiving device | 10:20 |
maxpaln | the data transition will occur virtually simultaneously to the clock edge | 10:20 |
maxpaln | setup and hold times will almost certainly be violated | 10:21 |
maxpaln | this can readily be solved by clocking the data into the device on the falling edge | 10:21 |
maxpaln | now you have half a clock cycle of margin to meet setup and hold times | 10:21 |
maxpaln | you can then clock the data on the rising edge of the second register inside the receiving device - you would need to add a half clock cycle constraint to this path to ensure timing is met | 10:22 |
maxpaln | conversely, you can send the data sligned to the centre of the clock - then the receiving device can carry on as normal and use the rising edge of the clock to latch the incoming data. | 10:23 |
maxpaln | this sort of thing is at least part art and not 100% science | 10:24 |
maxpaln | you can calculate the clock to out of the transmit device (and FPGA tool will tell you this in the timing report) and the input setup of the receiving device. | 10:25 |
maxpaln | this will give you part of the delay - but the time to get across everything in between depends on a lot of factors. | 10:25 |
maxpaln | One to send your way now: the or1ksim of the new dts without SPI works fine | 10:34 |
maxpaln | :-) | 10:34 |
maxpaln | I have loaded this into HW to try and get some UART output during the Linux boot - | 10:35 |
maxpaln | but the boot fails - I get epcr asserted during the instruction read process | 10:35 |
maxpaln | the wierd thing is that I don't get a correspending wishbone err signal asserted at the same tim4e | 10:35 |
maxpaln | so I am not sure what it causing the exception | 10:36 |
maxpaln | any thoughts on how to debug? | 10:36 |
maxpaln | as a bit of background, I have a simple 'Hello World' program that sends data out over the UART to my PC | 10:36 |
maxpaln | this works fine - so I know the processor is working, or at least ib this simple test it works | 10:37 |
maxpaln | and the UART successfully sends data | 10:37 |
maxpaln | when I boot from Linux I am using the embedded logic analyser to monitor the EPCR signal and the wisbone error signals at the top level - | 10:37 |
maxpaln | oh, hang-on might have a clue | 10:38 |
maxpaln | after the exception it jumps to address 0xA00 | 10:38 |
maxpaln | aha - this is a I-TLB Miss | 10:39 |
maxpaln | [My run of progress might have stalled] | 10:40 |
stekern | itlb misses are fairly normal ;) | 10:41 |
ysionneau | itlb miss raise an exception? I thought there was a hardware page-tree walker :o | 10:41 |
maxpaln | ah, ok :-) | 10:41 |
* ysionneau should one day read the openrisc manual | 10:41 | |
maxpaln | :-) I am reading it now | 10:41 |
stekern | ysionneau: it's optional | 10:42 |
ysionneau | ah ok | 10:42 |
maxpaln | so, in my case: the assertion of the EPCR for the ITLB miss | 10:42 |
stekern | and or1200 (which maxpaln is using, right?) doesn't have a hw tlb refiller even | 10:42 |
maxpaln | is this a terminal thing? | 10:43 |
stekern | and the default mor1kx setup has it disabled too | 10:43 |
ysionneau | ah or1200 right | 10:43 |
ysionneau | ok | 10:43 |
maxpaln | Yes, OR1200 for my - I guess what I'd like to know is whether this would prevent Linux from booting | 10:43 |
maxpaln | or is the cause elsewhere | 10:43 |
stekern | and there is no support for hw tlb refill in vanilla openrisc Linux | 10:44 |
stekern | maxpaln: it wouldn't, you're problem is most likely somewhere else | 10:44 |
stekern | s/you're/your | 10:44 |
maxpaln | ok, thanks - although I feel I amback looking for the needle again :-0 | 10:45 |
maxpaln | one question, how does the or1ksim process know about the HW configuration of the processor - i.e. what capabilities exist in the actual Verilog of the processor? | 10:45 |
stekern | it doesn't | 10:46 |
maxpaln | ah, ok | 10:46 |
stekern | you have to configure it manually (with the or1ksim.cfg file) | 10:46 |
maxpaln | aha - ok | 10:46 |
maxpaln | so maybe I'll check my or1ksim.cfg to ensure it matches HW - maybe there is a clue there | 10:47 |
blueCmd | maxpaln: thanks for the explaination, makes sense. | 10:53 |
maxpaln | blueCmd: No probs! | 10:55 |
maxpaln | Ok, I have my confused hat on now - | 11:09 |
maxpaln | I am trigger on a memory instruction bus read from address 0x100 - my assumption is this should happen once when the ROM bootloader jumps there after loading from SPI | 11:10 |
maxpaln | But I am seeing this event happen every 5 seconds or so | 11:11 |
maxpaln | I am guessing that somewhere during the Linux boot sequence the processor somehow manages to get reset and start loading from address 0x100 | 11:12 |
maxpaln | I don't get any UART output so the kernel didn't get that far... | 11:13 |
maxpaln | I don't get any wishbone error signals asserted (I am also looking for this event in the ILA) | 11:14 |
mor1kx | [mor1kx] skristiansson pushed 1 new commit to master: https://github.com/openrisc/mor1kx/commit/fe350a8b3d5acfd2a3e55320ae55d970b000619c | 11:14 |
mor1kx | mor1kx/master fe350a8 Stefan Kristiansson: alu: add alternative non-stall "PIPELINED" mul impementation... | 11:14 |
stekern | ^ gives about a 4% boost in coremark benchmarks (1.86 vs 1.79 coremark/MHz) | 11:15 |
ysionneau | nice :) | 11:16 |
stekern | + it takes less resources | 11:16 |
stekern | con: timing goes down the drain a bit :( | 11:16 |
stekern | have to take a closer look at that later, maybe it's solvable | 11:17 |
maxpaln | ok, need to think about this one - it appears that when the ROM causes the processor jump to address 0x100 (after loading from SPI Flash into RAM) the processor just starts loading the data from SPI into RAM again. It is as if the instructions at RAM address 0x100 now contain the ROM bootloader code. Very odd!! | 11:30 |
ysionneau | maybe a cache issue? | 11:31 |
ysionneau | try invalidating the I cache before jumping to RAM? | 11:31 |
maxpaln | what's the best way to do that? | 11:32 |
maxpaln | sounds plausible though - the last time I had Linux running was when I was using my old DDR3 controller which (due to some very dodgy wishbone coding) didn't support wrap-bursts. In turn this required me to disable the cache's. | 11:34 |
maxpaln | So I have never had Linux running with Cache's enabled. | 11:34 |
maxpaln | I wouldn't know how to invalidate the cache though - bit of a newbie... | 11:35 |
ysionneau | I don't know much about openrisc sorry that was just a "try this maybe" advice :) | 11:35 |
maxpaln | :-) fair enough | 11:35 |
ysionneau | but from the description you are giving it sounded like an I-cache issue | 11:35 |
maxpaln | fair enough | 11:35 |
ysionneau | maybe have a look at 9.2.7 Instruction Cache Block Invalidate | 11:36 |
maxpaln | :-) - I was looking through the docs and had got to 8.2.x ... good spot... | 11:36 |
stekern | easier is probably to just disable cache before jumping to RAM | 11:37 |
stekern | on reset it should be disabled too | 11:38 |
ysionneau | there is no instruction to flush all the cache? | 11:38 |
stekern | and Linux invalidates the caches before turning them on | 11:38 |
ysionneau | you need to loop and invalidate line by line? | 11:38 |
stekern | yes | 11:38 |
ysionneau | ok | 11:38 |
maxpaln | how do you disable the cache dynamically like that? | 11:39 |
stekern | I rather have that then (only) a flush all | 11:39 |
stekern | and flush all is more expensive in terms of hw | 11:39 |
maxpaln | aha, its a bit in the supervision register | 11:39 |
stekern | s/then/than | 11:40 |
ysionneau | stekern: sure you need a FSM for that :/ | 11:40 |
stekern | yup, and if you lack more precise invalidating/flush, it's a performance killer | 11:41 |
ysionneau | for DMA and stuff like that yes | 11:44 |
ysionneau | :/ | 11:44 |
chan1 | hello, I hope somebody could briefly consider what's wrong with my compling uClibc and give me an advice. | 13:00 |
chan1 | I am following http://www.openrisc.net/toolchain-build.html (Building by Hand! part) and have error during 'Compile uClibc' | 13:00 |
chan1 | I configured for or32 and defconfig and ran make with PREFIX=${SYSROOT} but it gives me an error below | 13:00 |
chan1 | ------------------------------ | 13:00 |
chan1 | CC libpthread/linuxthreads.old/attr.o | 13:00 |
chan1 | In file included from libpthread/linuxthreads.old/internals.h:30:0, | 13:00 |
chan1 | from libpthread/linuxthreads.old/attr.c:26: | 13:00 |
chan1 | ./libpthread/linuxthreads.old/sysdeps/or32/pt-machine.h: In function 'testandset': | 13:00 |
chan1 | ./libpthread/linuxthreads.old/sysdeps/or32/pt-machine.h:41:8: error: '__NR_or1k_atomic' undeclared (first use in this function) | 13:00 |
chan1 | ./libpthread/linuxthreads.old/sysdeps/or32/pt-machine.h:41:8: note: each undeclared identifier is reported only once for each function it appears in | 13:00 |
chan1 | In file included from libpthread/linuxthreads.old/../linuxthreads.old_db/proc_service.h:20:0, | 13:00 |
chan1 | from libpthread/linuxthreads.old/../linuxthreads.old_db/thread_dbP.h:9, | 13:00 |
chan1 | from libpthread/linuxthreads.old/internals.h:32, | 13:00 |
chan1 | from libpthread/linuxthreads.old/attr.c:26: | 13:00 |
maxpaln | ok, more clues | 13:08 |
maxpaln | it seems that during the boot from RAM the processor jumps back to the ROM code | 13:09 |
maxpaln | The sequence transitions as expected: Reset to ROM address, load from SPI Flash to RAM, jump to address 0x100 and boot processor, | 13:10 |
maxpaln | however, after about 10th of the Linux code the process jumps back to the ROM | 13:11 |
maxpaln | address | 13:11 |
maxpaln | it coincides with an exception that EPCR reports at address 0xC0004B14 | 13:12 |
maxpaln | something strange is happening here.... | 13:18 |
stekern | chan1: I don't know what's causing your issue, but you might want to consider using the more up-to-date or1k- toolchain | 13:22 |
maxpaln | stekern: agreed - in fact once I am able to deliver a working Linux build to my customer I will be making it a priority to get up to date with the latest stable build of everything. | 13:22 |
maxpaln | Having come so far (and frankly so close) on the current versions, I am hoping it won't be too difficult to get something running. | 13:23 |
maxpaln | but as always, the final step is the hardest... | 13:23 |
stekern | chan1: http://opencores.org/or1k/OpenRISC_GNU_tool_chain#Linux_.28uClibc.29_toolchain_.28or1k-linux-uclibc.29 | 13:26 |
stekern | maxpaln: I wasn't speaking to you ;) | 13:26 |
maxpaln | stekern: ah, I missed that - although your advice is still valid :-) | 13:27 |
stekern | yup, but out of place for your current issue =P | 13:27 |
maxpaln | :-) | 13:27 |
maxpaln | ok, in general terms, is there any reason (other than a reset) for the processor to jump back to the boot address in the ROM code? | 13:29 |
stekern | maxpaln: what is at 0xc0004b14 | 13:29 |
maxpaln | I am looking but not seeing one | 13:29 |
maxpaln | nothing! I don't have any peripherals at 0xC..... | 13:29 |
maxpaln | I thought that might be a reserved address for cache or something | 13:29 |
sb0 | stekern, manually with trce | 13:35 |
maxpaln | Ok, I have just noticed that after the exception the Instruction address jumps to 0xF0000B00 | 13:36 |
maxpaln | Looking at the ROM code it uses a narrow address bus so it just sees 0x00 | 13:38 |
maxpaln | but the real question is why the exception causes a jump to an address at 0xF.... | 13:38 |
maxpaln | if this was meant to be 0x00000B00 then this would be a Range exception - | 13:39 |
maxpaln | but the questoin about 0xF... still stands | 13:39 |
stekern | is the EPH bit set? | 13:40 |
stekern | sb0: ok, I figured that when I couldn't find any trace of it in the source | 13:41 |
maxpaln | Hmmm, I aren't capturing the Supervision Register right now - I'll need to add it to the ILA | 13:42 |
maxpaln | ah, I see your thinking though | 13:43 |
maxpaln | (he says catching up with RTFM'ing :-) ) | 13:43 |
maxpaln | I will start a new FPGA build to capture the supervision register - (any clues where I might find it) | 13:44 |
stekern | in or1200? no idea | 13:44 |
maxpaln | yeah, just searching there now - I'll find it soon nough | 13:45 |
maxpaln | In case I need to find it again int he future: or1200_sprs.v | 13:48 |
maxpaln | well, that has confirmed that EPH is set- in fact it gets set just before EPCR changes to 0xC0004B14. | 14:26 |
maxpaln | Here is a screen grab of the EPCR and SR bits from the logic analyser | 14:29 |
maxpaln | https://www.dropbox.com/s/dfilok36dg6hf3n/epcr_and_sr_reveal.jpg | 14:29 |
maxpaln | So I have somehow triggered a range exception - that has in turn set the EPH bit! | 14:30 |
maxpaln | I should go to Las Vegas - I've managed to get a full set! | 14:37 |
maxpaln | Every bit of SR is set at the point of the EPCR changing - although DSX adn OV are only asserted for what appears to be 1 or two clock cycles. | 14:39 |
maxpaln | So I have some sort of range exception but while processing the exception the EPH flag causes the processor to execute ROM code instead of the actual exception code. | 14:40 |
maxpaln | [really confused now] | 14:40 |
blueCmd | maxpaln: having a look at the board it looks like I can group at least 6 of the 8 pins in 3 pcs of diff pairs | 15:23 |
blueCmd | would that help? | 15:24 |
maxpaln | Differential pairs would help avoid additive noise to the signals as they transition through the ribbon cable or PCB. But I think this isn't going to be your main concern - | 15:25 |
maxpaln | once you can cleanly latch the data into the receiving device you'll be in safe territory | 15:25 |
maxpaln | For example, in the Lattice SW (IPexpress) you can create an IO interface that will provide you with an edge aligned or centre aligned clock/data relationship. I suspect the other FPGA tools provide the same. This makes your life easier since you don't need to design the interface yourself - but it has the disadvantage of being device specific... | 15:27 |
blueCmd | maxpaln: what is it called for lattice? | 15:33 |
maxpaln | the tool? IPexpress - you can find it from the Tools menu within Diamond or from the Accessories subfolder on the Start Menu | 15:33 |
blueCmd | maxpaln: nah, the construct | 15:35 |
maxpaln | the construct? | 15:39 |
blueCmd | for clock centered alignment | 15:39 |
blueCmd | i.e how would you do it for lattice? | 15:39 |
maxpaln | ah, you need to create it within the IPexpress tool - you specify the parameters of the interface (bus width, frequency, IO standard etc.) | 15:40 |
maxpaln | it generates you a VHDL or Verilog module that you instantiate in your design. | 15:40 |
maxpaln | When you select SDR from the Architecture_Modules/IO folder you can choose 'clock inversion' to use the opposite edge of the system clock. | 15:41 |
stekern | maxpaln: I think you have a wild write to SR at 0xc0004b14 (or the instruction before that), my guess is that the reason you get the range exception is that the OV bit is getting set. | 15:43 |
stekern | by that wild write I mean | 15:44 |
maxpaln | Hmmm - I was just reading through the exception model to familiarise myself with it | 15:44 |
maxpaln | So, I am assuming address 0xC... is the cache or some other internal CPU feature, since I don't have an explicit peripheral at that address. | 15:45 |
maxpaln | weirdly, immediately prior to the exception the CPU is accessing the memory data bus not the exception bus | 15:46 |
maxpaln | s/exception/instruction/ | 15:46 |
maxpaln | but assuming your hunch is right (and I think you have a 100% hit rate so far - so it's a good assumption) | 15:47 |
maxpaln | I am still not sure how to fix this - if fix is the right word | 15:47 |
maxpaln | Ok, so the overflow flag is a fairly regular occurence - or at least shouldn't be fatal. | 15:52 |
maxpaln | but why EPH gets set is a mystery... | 15:52 |
blueCmd | maxpaln: Bidirection diff. clk, 2x bidirectional DDR, master/slave select pin controlled by master, high impedance on not-active side. | 15:53 |
blueCmd | would that work? | 15:54 |
stekern | maxpaln: 0xc... is a virtual address | 15:54 |
stekern | Linux uses that for the kernel | 15:54 |
blueCmd | (sorry for picking your brains on this, I'm quite facinated about this :P) | 15:54 |
maxpaln | stekern: ah, ok | 15:54 |
maxpaln | blueCmd: no worries at all | 15:54 |
stekern | maxpaln: EPH is getting set by that wild write | 15:54 |
maxpaln | by *wild* do you mean wrong | 15:55 |
stekern | you have to backtrack why garbage is getting written to it | 15:55 |
stekern | yes, wrong ;) | 15:55 |
maxpaln | ah, ok | 15:55 |
maxpaln | :-) | 15:55 |
maxpaln | back in the room | 15:55 |
maxpaln | blueCmd: I was thinking that a select pin would be a useful - it would allow the master to terminate a transaction if the Slave managed to get locked up too. | 15:56 |
maxpaln | blueCmd: I am not sure you need a differential clock, it could be useful but it depends on what data rate you plan on trying to achieve. DDR on the data IO will help a lot if you only have 2 data bits (can't you increaase this - I only count 5 pins in your description). But remember it reduces your timing margin significantly | 15:59 |
stekern | maxpaln: to look up what 0xc000b14 does, you can disas vmlinux and look search for that address | 15:59 |
maxpaln | securely latching the data from one device to the other is your primary challenge - technically, it's not difficult. But if you get ambitious with the data rates the setup and hold margins get really difficult to meet | 16:00 |
maxpaln | stekern: good suggestion | 16:03 |
blueCmd | maxpaln: system clock is 80 MHz, I want to transfer control command (read/write & byte select 3 bits?), address (worst case 32 bits), and data (worst case 32 bits). I mean, using 1 MHz that will take some time. | 16:03 |
blueCmd | so if I can get up to 2 digit MHz that would be awesome | 16:03 |
maxpaln | blueCmd: 2 digit MHz won't be an issue - 100 MHz DDR is probably your theoretical limit for something handcrafted like this | 16:04 |
maxpaln | I would expect you to be able to get to double figures MHz through a ribbon cable. | 16:04 |
blueCmd | cool! even single ended? | 16:04 |
maxpaln | But running slowly to prove comms and cranking up the data rate is the way forward | 16:05 |
maxpaln | yeah, single ended is fine for signals that slow | 16:05 |
blueCmd | yes, ofc. But I don't want to settle on a path "everybody" knows will only do 1 MHz | 16:05 |
blueCmd | but yes, I can try do this without diff pairs and see where it get's me and take it from there. | 16:06 |
maxpaln | you can help by setting the IO standard to something like LVCMOS1V2 - a smaller swing will help in various ways. But the eval boards you are using will probably dictate the IO voltage unless they (kindly) provide jumpers to select IO voltaage. | 16:06 |
blueCmd | yes, the lines have 3.3V pullup | 16:06 |
maxpaln | OK, but even 3V3 is fine at several 10s of MHz. | 16:07 |
maxpaln | We do have customers running 32-bit busses at 3V3 150 MHz but it starts to have lots of noise issues | 16:07 |
maxpaln | we have an SSO tool to help debug - but at your data rates you'll be fine. | 16:07 |
maxpaln | I would start at something like 5 MHz - establish a working baseline then up the speed. | 16:08 |
maxpaln | I would expect you to get something like 20 or even 40 Mhz fine. | 16:08 |
blueCmd | ha, I'm super-excited - long time ago I felt like this! thanks :) | 16:08 |
blueCmd | long time since I felt like this* | 16:08 |
blueCmd | Now I just need my ribbon cable to arrive from china | 16:09 |
maxpaln | keep the ribbon cable short too - that will help | 16:09 |
blueCmd | I can set up nice models and simulate this | 16:09 |
blueCmd | I bought this one: http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=151146723432&ssPageName=ADME:L:OC:US:3160 | 16:10 |
maxpaln | OK, if you're having problems one of those connectors should be moveable to reduce the length | 16:13 |
blueCmd | I have some physical placement constraints as well | 16:14 |
blueCmd | so I cannot make it super-short | 16:15 |
maxpaln | Several inches is fine - | 16:15 |
maxpaln | You'll probaby be ok with the cable as it is - when things stop working as you increase the clock rate you can try reducing the cable length a bit. But your hard disk works on ribbon cable like this (although the IDE controller standard probably went through several iterations to get to that level of robustness :-) ) | 16:17 |
blueCmd | hah yes | 16:17 |
maxpaln | stekern: objdump on the vmlin binary at address c0004b14: | 16:18 |
maxpaln | c0004b14:c0 00 18 11 l.mtspr r0,r3,0x11 | 16:18 |
maxpaln | so Hex 11 is OR'ed with GPR3 and written to GPR0 | 16:19 |
maxpaln | it is part of a section called <arch_local_irq_restore> | 16:19 |
maxpaln | the dump is here in case you feel like extending your charity a little further (I seriously don't know if I could debug this without your help - many thanks indeed!) | 16:23 |
maxpaln | https://www.dropbox.com/s/i0nsyxjs1v8j76i/vmlinux.dmp | 16:23 |
maxpaln | ok, well my time on this has come to an end today - I am out at a customer tomorrow so I will be back on it on wednesday! | 16:33 |
maxpaln | Since this is an IRQ related thing, I am womdering if there is something odd about (a) the interrupt setup in the HW or (b) the interrupt setup in the Linux Kernel | 16:33 |
maxpaln | or maybe some conflict between the two | 16:34 |
maxpaln | or maybe none of the above!!! | 16:34 |
dalias | stekern, how hard is it to get a working or1k toolchain and emulator environment set up? i'm wondering in case anyone on our side is willing to do some testing (either now, or ongoing regression testing later) | 20:21 |
stekern | dalias: I'm probably biased =) but not very hard | 20:37 |
stekern | I've had some problems building a toolchain against musl though, I haven't investigated further but I had to disable libquadmath in the second gcc stage to get it to build | 20:40 |
stekern | dalias: when you anyway are around, I'm getting this error in libc-test: fcntl.c:29: error: 'O_TTY_INIT' undeclared (first use in this function) | 20:42 |
stekern | anyway, to get a working toolchain, upstream binutils can be used and our gcc port is here: https://github.com/openrisc/or1k-gcc | 21:04 |
stekern | on the emulator side, there's openrisc support in qemu and also a custom functional simulator | 21:06 |
stekern | I'm a bit vague, since I'm not sure what kind of simulator environment you are looking after | 21:07 |
dalias | stekern, yes some of the libc-test FAILs are to be expected | 21:19 |
dalias | they're open issues in musl and/or the compiler | 21:20 |
dalias | re: building toolchain i mainly meant to ask: are patches to the official gcc and binutils needed or do they have upstream or1k support? | 21:20 |
stekern | right, so vanilla binutils from binutils-gdb git should be fine, but our gcc is not upstream | 21:24 |
stekern | dalias: this is what I got when I ran libc-test in a chrooted environment with qemu-user: http://pastie.org/9365444 | 22:04 |
dalias | stekern, qemu-user is going to fail most of the thread tests | 22:23 |
olofk | Ah crap. I missed maxpaln. If he comes back , could someone invite him to orconf. I've been meaning to do that, but haven't got any contact information | 22:26 |
dalias | stekern, that looks like just the build part | 22:26 |
dalias | not the runtime part | 22:26 |
olofk | blueCmd: You might have some luck looking at ohwr.org. I know that they have done some work on sending wishbone busses across different FPGAs | 22:27 |
olofk | There's even a linux driver for external wishbone busses. I think Alessandro Rubini did that for the CERN guys, but I don't know exactly what it does | 22:28 |
olofk | Might be some kind of wishbone over USB actually | 22:29 |
olofk | Anyway. Got to sleep now | 22:29 |
olofk | stekern: Very happy to see your fusesoc pull request. I'll take a look at it tomorrow | 22:30 |
blueCmd | olofk: ah, http://www.ohwr.org/projects/wb-serializer-core something like that is what I would need I guess, but my needs are much simpler and I don't have the luxury of high speed serdes | 23:12 |
--- Log closed Tue Jul 08 00:00:09 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!