IRC logs for #openrisc Wednesday, 2014-07-16

--- Log opened Wed Jul 16 00:00:21 2014
dalias	stekern, malloc-brk-fail is known to fail for static	02:31
stekern	dalias: oh, actlually, it's only the static case that fails	02:38
stekern	what's the reason for that?	02:38
dalias	it's a limitation in __simple_malloc that gets used because there's no realloc or free	02:57
stekern	ah	03:02
stekern	did you see the note about microblaze bits/stat.h btw?	03:03
dalias	when we made malloc work without brk, i forgot to do __simple_malloc too. but it's unlikely that static programs would have a brk issue anyway	03:03
dalias	i might fix it later anyway tho	03:03
dalias	yes i saw that	03:03
dalias	but i think microblaze is working	03:03
dalias	are you sure it uses asm-generic?	03:03
dalias	i'll check it...	03:04
stekern	if this doesn't mean it does, then I'm confused: http://lxr.free-electrons.com/source/arch/microblaze/include/uapi/asm/stat.h	03:05
dalias	hm	03:06
dalias	the test for microblaze passes...	03:22
dalias	ahh i suspect the test does not catch the breakage	03:23
stekern	which test? it was sem_open that broke for me	03:24
dalias	src/functional/stat.c	03:24
stekern	all offsets except st_ino are correct	03:24
dalias	hmm it looks like sem_open probably succeeds or fails at random depending on junk with the wrong st_ino offset	03:26
dalias	i can't get it to fail on microblaze with qemu user here	03:26
stekern	interesting, I guess I was lucky that it failed for me then ;)	03:30
dalias	:)	03:31
stekern	what host are you on?	03:31
dalias	?	03:35
stekern	nah, that can't be it... I was thinking the stat conversion in qemu-user would make the data always correct	03:35
stekern	but I looked up how it's done, and it clears the target stat struct first	03:36
stekern	...so there shouldn't even be junk there...	03:36
stekern	ah, actually... I was looking at the wrong place, microblaze qemu-user copies st_ino to both offsets...	03:42
dalias	oh? haha	03:43
dalias	why?	03:43
stekern	http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall.c;h=a50229d0d72fc68966515fcf2bc308b833a3c032;hb=HEAD#l4949	03:46
stekern	http://git.qemu.org/?p=qemu.git;a=blob;f=linux-user/syscall_defs.h;h=c9e6323905486452f518102bf40ba73143c9d601;hb=HEAD#l1469	03:46
stekern	no idea	03:47
dalias	.....	03:48
dalias	qemu has it wrong	03:48
dalias	they truncate the earlier one to 32 bits :/	03:48
dalias	despite it apparently being the correct one	03:48
dalias	this looks really bad	03:50
stekern	yeah, wonder where that has even came from?	03:51
dalias	no idea	03:55
dalias	emailing several lists about it	03:57
dalias	shall i cc you?	03:57
stekern	out of curiosity, I looked in the kernels git log, and the kernel stat64 has never looked like that	03:58
stekern	sure, it'd be interesting to hear if there is any explanation to it	03:58
dalias	what address?	03:58
stekern	stefan.kristiansson@saunalahti.fi	03:59
dalias	i think we need to add an arch-specific __stat_fixup function...	04:00
dalias	if nothing else, mips seems to still be broken	04:00
dalias	mips idiotically has 32-bit dev_t still	04:00
dalias	and there's padding for plenty more	04:00
dalias	but the padding is situated such that when we define it as 64-bit in userspace, the lo/hi halves are backwards on big endian	04:01
dalias	maybe there'll be a way to work around this nasty microblaze qemu/kernel mismatch too with such a function	04:02
dalias	tho i doubt it	04:02
dalias	stekern, i think the broken files on our side came from arm	04:08
dalias	which is where many/most ports were initially forked from	04:08
dalias	(for arm, this stat.h is right)	04:09
stekern	ah, right. the initial commit of stat.h for microblaze is identical to the arm one	04:11
stekern	the math errors are because I don't have a fenv implementation	04:18
stekern	microblaze qemu-user was actually correct prior to this 'fix'... http://git.qemu.org/?p=qemu.git;a=commitdiff;h=a523eb06ec3fb2f4f4f4d362bb23704811d11379	05:01
maxpaln	Life is so much easier with console output :-)	09:10
maxpaln	is there a safe way to write to the console during the boto sequence - printk() seems to cause exceptions depending on where it sits.	09:10
stekern	maxpaln: printk should be safe in most places	10:14
chan1	hello, someone please help!	12:24
chan1	I was following http://openrisc.net/toolchain-build.html.	12:24
chan1	built the toolchain easy way,	12:24
chan1	then, am I supposed to go directly to building Busybox? (skipping install linux headers, stage 2 gcc, and uClibc)	12:24
chan1	That's what I did and I have an error building busybox.	12:24
stekern	chan1: http://juliusbaxter.net/openrisc-irc/%23openrisc.2014-07-07.log.html#t13:22	12:27
maxpaln	stekern: Taken to using pr_info() - it seems safer, although I have to be honest, I am using it blindly - it was the way I output to the UART when debugging the Ethernet PHY drivers several months ago.	12:27
stekern	maxpaln: pr_info() is just a wrapper to printk	12:28
stekern	http://lxr.free-electrons.com/source/include/linux/printk.h#L226	12:28
maxpaln	Oh, odd - it seems to cause fewer exceptions! Oh well....	12:28
maxpaln	out of interest, I get periodic I-TLB Miss exceptions - I am not paying particular attention to these at the moment as they get handled safely. But are they indicative of deeper issues?	12:29
stekern	maxpaln: I would guess it's just old mr Heisenbug that is visiting you ;)	12:29
maxpaln	:-)	12:29
stekern	TLB-miss exceptions are perfectly normal, and completely expected	12:30
maxpaln	excellent - finally a break! :-)	12:30
stekern	the TLB is just a pagetable cache, and the TLB-miss happens when the cache doesn't contain the pagetable entry for the requested address	12:30
chan1	stekern:oh thank you :-)	12:35
stekern	you're welcome	12:37
maxpaln	stekern: that's what I figured - but nice to have an assumption confirmed every now and again :-)	12:45
chan1	stekern : sorry but the link for the precompiled binary 1.0rc1 for CentOS-5.5 x86_64 doesn't work. I'll try installing from source. (someone please check why the link is dead..)	12:55
chan1	I mean in this page http://opencores.org/or1k/OpenRISC_GNU_tool_chain	12:55
stekern	chan1: I was pointing you to: http://opencores.org/or1k/OpenRISC_GNU_tool_chain#Linux_.28uClibc.29_toolchain_.28or1k-linux-uclibc.29	13:16
stekern	not the old precompiled toolchain	13:16
maxpaln	I am getting a little stuck - Linux is pausing during boot after the 'Mount-cache hash table entries: 1024' message	13:26
maxpaln	tracing through I can see it gets stuck during the initialisation of something (not sure what the actual construct is) - the function call stack looks like this:	13:28
maxpaln	start_kernel->rest_init->kernel_init->kernel_init_freeable->do_pre_smp_initcalls->do_one_initcall	13:31
maxpaln	It enters this function twice - the first time to execute spawn_ksoftirqd() from address c01c17e0 - this one completes fine	13:32
maxpaln	the second time it executes init_workqueues() from c01c1f38	13:33
maxpaln	tracing a combination of debug printk's and watching the instructions in HW I have traced the code as far as init_worker_pool() - unfortunately adding printk's into this function seems to hang the Linux boot earlier so I am back to tracing through HW	13:34
maxpaln	Does anyone have a birds eye view on this - what is the kernel actually doing at this stage?	13:35
maxpaln	It would be useful to have a broader appreciation as it might point me at the root cause a little quicker than my current strategy	13:35
stekern	maxpaln, presumably there is still some hw bug that causes this, right?	13:49
stekern	and you have random crashes when add debug printks	13:50
stekern	it's not fun, but the way I'd move forward in such cases is take a build where it crashes non-subtle and start inspect that from the hardware side	13:51
maxpaln	ah, so you think the crash from printk is causing this - interesting	13:52
maxpaln	or at least - the boot hang with printk is indicative of a HW bug	13:53
maxpaln	interesting, hadn't thought of that	13:53
maxpaln	I agree the problem is likely to be HW	13:53
maxpaln	but it could also be that I haven't correctly configured the Linux kernel per the HW	13:53
maxpaln	I am pretty happy the memory controller is doing the right thing now	13:54
maxpaln	and the basic ORPSOC is the same as the one I have previously had working on the ECP3 silicon (predecessor to the current silicon I am using)	13:54
maxpaln	but there have been numerous minor changes	13:55
maxpaln	I can print to the UART during boot - I am using that a lot at the moment	13:55
maxpaln	I am getting hangs when placing the printk's at certain points -	13:56
maxpaln	Inspecting on HW side is straight forward - I am tracing through the instructions in HW, comparing against the disassembled Kernel and using printks as a guide. Its working so far	13:56
stekern	what are the certain points?	14:02
stekern	and, does the same kernel work in or1ksim?	14:03
maxpaln	yep, it simulates in or1ksim	14:20
maxpaln	I haven't really been paying attention to the points at which the printk causes the boot to hang	14:20
maxpaln	I hadn't really made the connection when it last happened	14:20
maxpaln	I just assumed there was a genuine reason why printk wouldn't work during some functions	14:21
maxpaln	I will try and add one to init_workqueues() - I have traced the code as far as here, I think this caused the problem last time I tried it.	14:22
maxpaln	hmmm, that seems to work	14:30
maxpaln	i'll return to using the printk's and take note of the location that causes problems when it arises again	14:31
maxpaln	ok, I need to pop out for an hour or so - but I have tracked the system hang to init_workqueues - the following call never gets returned:	14:38
maxpaln	system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,	14:38
maxpaln	WQ_UNBOUND_MAX_ACTIVE);	14:38
maxpaln	I am not sure I understand what this code is doing - will look a little later when I get back.	14:38
rah	http://hardware.slashdot.org/story/14/07/16/1218238/sricambridge-opens-cheri-secure-processor-design	14:42
olofk	rah: I think some of the guys from that project were at last year's orconf	15:54
rah	is that good because these people with money are paying attention to openrisc, or bad because they went on and developed their own core anyway? :-)	16:07
stekern	dalias: from my point of view the or1k port is pretty much in shape to be merged, how do you want me to move forward with it? Should I post a patch to the musl ml?	17:44
stekern	I've already squashed the commits together and added a quite descriptive commit message on them already: https://github.com/skristiansson/musl-or1k/commit/a937ef3a8e4dac07fbd4e7e7777aaa0552780dc0	17:45
dalias	i'll take a look	17:49
dalias	you can go ahead and post to the mailing list tho if you like	17:49
stekern	great	17:49
stekern	sure	17:49
blueCmd_	olofk: stekern: http://bluecmd.github.io/	19:44
blueCmd_	I'm playing with jekyll and github pages. the way it works is that you have a git repository with static files that is then used to generate a static webpage that github (or another provider) use	19:45
blueCmd_	pros: it's git, so we can accept pull requests and stuff like that. cons: harder to edit	19:45
blueCmd_	pro: it's not opencores.net and it's not a wiki	19:51
blueCmd_	hm, who has all this money? http://opencores.org/donation	19:55
stekern	http://opencores.org/donation,faq	19:56
stekern	"then the money will used to upgrade the server hardware and Opencores"	19:56
stekern	haven't you noticed the vast improvements?	19:57
blueCmd_	stekern: ah. I guess the downtime is due to all the upgrades they are making	19:57
stekern	must be	19:57
blueCmd_	stekern: do you know if someone owns OpenCores as a trademark?	20:00
blueCmd_	it says on the website that it's registered trademark, but I don't believe that	20:01
stekern	I have no idea	20:06
dalias	stekern, does or1k have fpu and fenv that will eventually be supported? or is it all soft-float?	21:26
stekern	dalias: it does, but all implementation doesn't have support for it	21:29
dalias	i see	21:29
dalias	so are there separate hard and soft float abis?	21:29
dalias	or is the calling convention the same either way?	21:29
stekern	yes, same calling convention	21:30
dalias	nice	21:30
stekern	there are no seperate fpu regs and so on	21:30
dalias	very very nice	21:30
dalias	so we don't need two separate abi variant subarchs for that	21:30
stekern	right	21:30
dalias	btw what about endianness? just one or both?	21:31
stekern	well, in theory, the architecture is bi-endian. but in practice, there are no little-endian implementations	21:32
dalias	i see	21:33
stekern	and while there are some little endian support in the toolchains, it's far from complete	21:33
dalias	so for now we can just treat it as fixed big-engian	21:33
dalias	if little is needed later it can be added as the non-default subarch	21:33
stekern	yup	21:33
dalias	so it looks like there's no need for any subarchs right now	21:33
dalias	that makes adding it nice and clean :)	21:33
dalias	how was the jmp_buf size issue handled?	21:41
dalias	stekern, also, in __unmapself...	21:42
dalias	you don't load any args for the syscalls. i'm guessing the args are already in the right registers for munmap	21:42
dalias	but for SYS_exit, the arg should be 0	21:43
dalias	actually maybe it doesn't matter	21:43
dalias	i don't think this code is reachable if the exiting thread is the last thread	21:43
dalias	and the exit code to SYS_exit is only seen for the last exiting thread	21:43
stekern	dalias: hmm, yes. looking at the other archs, it is a bit of a mix whether 0 is loaded as the arg	21:52
stekern	regarding jmp_buf size, blueCmd_ and I discussed that and decided to change glibc to reflect what musl does	21:59
blueCmd_	(y)	22:00
stekern	the glibc port is still in a experimental state	22:00
blueCmd_	I was very hard to convince	22:00
blueCmd_	at this point I think musl might be more stable	22:00
dalias	:)	22:07
dalias	stekern, why are the SYS_ macros not in the same order as the __NR_ ones?	22:30
blueCmd_	http://bluecmd.github.io/or1k.html - that turned out quite nice for a simple ODT -> HTML	23:30
--- Log closed Thu Jul 17 00:00:23 2014

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!