IRC logs for #openrisc Friday, 2013-09-20

--- Log opened Fri Sep 20 00:00:04 2013
poke53281At least a little progress03:35
stekernoh, a good old sp trashing... sounds like fun!03:36
poke53281The kernel jumps back to the process after a timer interrupt to a wrong address.03:36
stekerndo I remember correctly that you've tried in or1ksim and got the same kind of crashes there, right?03:38
poke53281The output you see is my own little equivalent of strace.03:38
stekernnice ;)03:39
stekernbut you have pinpointed that r1 is correct before the timer interrupt?03:39
poke53281Well, all registers are wrong. It aborts after one error.03:40
poke53281I save the registers values and check when I am going back via the rfe opcode.03:40
stekernah, well, if something in timer exception trashes r103:40
stekernyou'll get wrong values in all registers on context restore03:41
poke53281Yes, and the wrong pc03:41
stekernso, that's why I'm only asking about r1 ;)03:41
poke53281Ok, the output is a little bit misleading. Everything is wrong.03:42
poke53281all registers and the pc. There is nothing else what could be wrong.03:42
poke53281So not only r103:42
poke53281I can check the mapping at that point03:43
stekernyes, but... trash r1 => everything wrong03:43
stekernon context restore03:43
stekernso r1 is the interesting part03:43
stekernall else is just collateral damage ;)03:44
poke53281the same happens in pid 0 and pid 1 after the init process is started. But I think this is normal. Has something to do with fork.03:47
stekernbut wait... what do you mean by wrong? isn't task switch triggered by the timer?03:47
poke53281Probably there is something additional like the task_id. But I cannot find that info.03:47
poke53281Yes, it is. And the rfe command is the last command in kernel mode after handling the timer interrupt.03:49
poke53281It should jump back to to the position where the timer occured. But it didn't happen.03:49
poke53281The command timer interrupt before worked as expected. Exception at 0x00085250 and turn back at 0x0008524C03:50
poke53281possible a delayed instruction03:51
poke53281So I have not seen any tasks in xorg-server03:56
poke53281And I have not seen any exception before that that stops at this position. So I can exclude any task-switches.03:57
poke53281And the program crashes a few hundred commands later.03:57
stekernyeah, something is clearly wrong with where it jumps03:58
stekernI'm just trying to pinpoint where it has went wrong03:58
poke53281Wait I give you something to play with04:06
stekernis it christmas already? ;)04:07
poke53281A puzzle. Find the needle in the haystack which is permutating if you have excluded 90% of the haystack.04:09
poke53281And the needle has the same color as the hay.04:09
stekernat least there is a needle04:09
poke53281Xfbdev &04:10
poke53281export DISPLAY=:004:10
poke53281look at the console output04:10
poke53281It is a more safe deoptimized version of the emulator. Use chrome if possible.04:11
poke53281To exclude errors of the emulator04:11
poke53281Probably you have to clear the console after each command. Otherwise the console could stuck.04:12
poke53281cat /proc/49/maps could be also useful04:13
stekernyes... I got one eye and now the browser window is non-responsive04:13
poke53281Yes, look in the console. It stopped for some reason.04:14
poke53281I think an unaligned access is the final reason.04:14
poke53281The output is so big that you should clear the console after "Xfbdev &"04:15
stekernno... there's no output other than "normal" output04:15
poke5328163: _libc_read04:15
poke5328164: _libc_write04:15
poke532811068: poll04:15
poke53281154: setpgid04:15
poke532811067: libc_select04:16
poke53281169: gettimeofday04:16
poke53281134: rt_sigaction04:16
poke53281221: execve04:16
poke53281214: brk04:16
stekern"Warning: X locale modifiers not supported, using default" is the last output04:16
poke53281Yes, I removed the files. Not enough space in the image.04:17
stekernyeah, I'm not so worried about that message, just to illustrate what I meant by "normal" ;)04:17
poke53281I can put them in. But you can ignore the warning message of Xfbdev. Most of them are because of missing files and missing command line04:17
poke53281X is already drawing the image. He is in libpixman and in the right function. So what should happen?04:19
stekernit shouldn't crash at least04:21
stekernbut... what am I suppose to see in console?04:22
poke53281Maybe it wouldn't if I would handle the unaligned access. As I said. I get all kind of errors with different clients.04:22
poke53281Javascript console04:22
poke53281Development tools04:22
stekernhaha, ok04:23
poke53281what browser do you use04:23
poke53281You get at least 1MB of text with all syscalls and exceptions and all pids04:23
poke53281Have to leave for 30 minutes.04:24
stekernI thought you meant Linux console04:24
stekernwhat is at 0x0016980C?04:48
stekernI think the 'r does match' might not be a failure04:49
stekernand what is at 0x30073448?04:50
poke53281one moment04:56
poke53281both pid 49?04:57
poke53281or higher04:58
stekernyes pid 49 it seems to be05:00
stekernthis is right before it crashes05:00
poke53281169800:       84 21 ff f4     l.lwz r1,-12(r1)05:02
poke53281  169804:       44 00 48 00     l.jr r905:02
poke53281  169808:       86 01 ff f8     l.lwz r16,-8(r1)05:02
poke53281  16980c:       d7 e1 4f fc     l.sw -4(r1),r905:02
poke53281  169810:       d7 e1 87 f8     l.sw -8(r1),r1605:02
poke53281  169814:       04 00 00 02     l.jal 16981c <OsCleanup+0x54>05:02
poke53281  169818:       1a 00 00 05     l.movhi r16,0x505:02
poke53281  16981c:       aa 10 14 fc     l.ori r16,r16,0x14fc05:02
poke53281  169820:       e2 10 48 00     l.add r16,r16,r905:02
poke53281so somewhere in between the function OSCleanup05:02
poke53281the other one is a shared library.05:03
stekernlet me elaborate what I'm after. there are a lot of 'r does not match' in the trace, "all" of them come after an rfe to 0x0016980C, then there's a syscall 139 and the rfe after that goes back to where the original exception occured05:03
stekernI think this is normal05:03
poke53281after is wrong05:05
poke53281they are checked directly at the rfe command05:05
stekernI'm not following05:06
poke53281" after an rfe to 0x0016980C" <- stekern05:06
poke53281So they are written after the rfe command but the check is directly at the position of the rfe command.05:07
stekernah, yes, but I meant "after" as in after in the trace output05:07
poke53281The registers are saved at the syscall or at the exception05:07
poke53281So what you see in the console is only what happens in user mode.05:09
poke53281or at the context switch05:09
stekernyes, but then it might do the _work_pending dance in the middle of an interrupt/exception05:11
poke53281exactly. Somewhere in the kernel05:11
poke53281"/a.out is a tiny test program I wrote05:13
poke53281run it with "/a.out /usr/lib/*"05:14
poke53281it loads all files and compares mmap with fread of all files.05:14
poke53281And it does not unmap.05:14
poke53281I have found no problem found with this tool. So it is not so easy.05:15
poke53281I am very that the other address is somewhere in libpixman.05:17
poke53281very sure ...05:17
poke53281Yep, it is05:20
poke53281  653e8:       00 00 00 4f     l.j 65524 <pixman_rasterize_edges+0xd860>05:24
poke53281   653ec:       d4 03 28 00     l.sw 0(r3),r505:24
poke53281   653f0:       10 00 00 4d 65524 <pixman_rasterize_edges+0xd860>05:24
poke53281   653f4:       e1 a5 63 06     l.mul r13,r5,r1205:24
poke53281   653f8:       e0 a5 43 06     l.mul r5,r5,r805:24
poke53281 65448:       86 23 00 00     l.lwz r17,0(r3)05:24
poke5328165448 is the correct address05:24
poke53281So I think I will watch the addresses where the registers are saved.05:28
poke53281Either the content will change, or the pointer or the mapping05:29
poke53281And I can print the exceptions inside the kernel and so on ...05:32
poke53281Enough to do05:32
poke53281According to this answer the pid is fine. So this is no other thread with the same pid.06:12
olofk_franck_: I think we could have a problem here. It looks to me like patch 0000 doesn't really change the line endings06:37
olofkIf I run dos2unix on mt48*.v first, I can apply the patches manually without problems06:38
olofk(including patch 0000)06:38
olofkNot really sure how to handle this07:02
olofkI got three ideas07:02
olofk1. Implement support in .core files for running commands at different stages (in this case, run dos2unix on the files when they have been downloaded to the cache). This functionality will have to be added sooner or later anyway, but I wanted to wait for a while07:03
olofk2. Keep the windows file endings and make patches that are compatible with this. Not sure if this is doable actually, but it might work with the --binary switch in patch07:04
olofk3. Keep the damn file local. Feels like the most hacky solution, and I really don't want to carry files with weird licenses in the orpsoc-cores tree07:05
olofkoh. and alternative 4. Fix the problem by making the patch actually change the line endings07:05
stekernpoke53281: I'm still curious about the 0x16980c though, why is it always going there?08:00
stekernthis yocto stuff is just annoying10:10
stekern...perhaps partly because I don't know what I'm doing10:19
stekernit could be useful for building an openrisc bsp though10:21
jeremybennettis OK or just slow?10:40
jeremybennettjust slow it seems...10:41
stekernjeremybennett: I noticed it being slow in the morning10:45
jeremybennettI've just updated the schedule for ORCONF with the new talks we've got. I'll post the updates to the mailing lists.11:17
olofkGreat. It's fun to see an increased interest this year11:20
PowermaniacAnyone awake/there in here?11:45
jeremybennettPowermaniac: yes11:45
PowermaniacAhh good!11:46
PowermaniacMaybe you can help11:46
PowermaniacSo now I'm slowly beginning to understand more11:46
PowermaniacAnd I was wondering if I could synthesize the OR1200 to look at basically.11:46
jeremybennettWell it is synthesizable.11:47
PowermaniacAs in from what I understand you can synthesize verilog HDL code into RTL schematics?11:47
jeremybennettIf you come along to ORCONF, there is a workshop on bringing up the ORPSoCv3 on a DE0-nano.11:47
jeremybennett(which includes the OR1200 core)11:47
PowermaniacWhere is ORCONF?11:47
jeremybennettCambridge, UK11:47
PowermaniacYeah that probably won't be possible. I'm beyond broke.11:48
PowermaniacUnless it will be streamed or something?11:48
PowermaniacAlso I'm not actually trying to put it on an FPGA...11:49
PowermaniacI just want to look at the verilog code and then the schematics as from what I understand I can turn that verilog code into schematics to look at.11:49
PowermaniacI'm just sort of skipping ahead where I'm at in the book (Digital Design by Morris Mano) so I have something to look at.11:49
PowermaniacIf that makes sense?11:50
olofkPowermaniac: I think you underestimate how complex the schematics would be at register level from synthesized or120011:51
olofkA single submodule of or1200 might be a better starting point. For example, the alu or the cache controller11:52
PowermaniacOkay, sure11:52
PowermaniacGives me a place to start11:52
olofkThe easiest way to look at a synthesized netlist is probably to install some FPGA vendors toolchain and run a module (or the whole or1200) through synthesis11:53
olofkThat would give you a schematic representation of what it might look like11:53
olofkBoth Altera and Xilinx have free basic versions of their toolchains11:54
Powermaniac_Odd how IRC randomly disconnects on me, only IRC though nothing else...11:56
olofkPowermaniac_: Did you get my messages, or were you disconnected before that?12:03
Powermaniac_This was the last one I got: <olofk> The easiest way to look at a synthesized netlist is probably to install some FPGA vendors toolchain and run a module (or the whole or1200) through synthesis12:04
olofk11:53 < olofk> That would give you a schematic representation of what it might look like12:05
olofk11:54 < olofk> Both Altera and Xilinx have free basic versions of their toolchains12:05
Powermaniac_Trying to find a guide, so once I know where to find the verilog code of one of the parts of the OR1200 then I can turn it into a schematic12:06
olofkThey are available in SVN12:07
Powermaniac_Aloso could I use QFlow as my synthesizer?
olofkThat might work. I haven't used it though, so I can't say for sure if it works and if you get a RTL schematic out of it12:08
Powermaniac_Okay fair enough.12:08
Powermaniac_Also is this the code I'm looking for:
Powermaniac_Currently on Windows need to swap over to Debian12:09
Powermaniac_As from what I gather it will make the process easier12:09
stekernPowermaniac_: yes, that's or1200's alu12:13
olofkPowermaniac_: I made you a PDF of or1200's alu using the default parameters synthesized with XST
Powermaniac_olofk: Thanks!12:14
olofkI don't with to discourage you from doing digital electronics (it's a lot of fun), but I think you should know that there is a reason why people aren't doing these kind of things by hand anymore12:15
olofkThis is just one of the submodule in one of the cores that you use to build one of the chips of your computer12:15
Powermaniac_I wonder if it's possible to get a bigger company to create completely open source processors?12:16
Powermaniac_Like Adafruit or Arduino12:16
Powermaniac_Is it actually possible to synthesize the entire OR1200 as that must be huge then12:17
olofkSynthesize just means that you take the HDL code and translate it into a netlist (a representation of the schematics). That is done by a computer program and is very possible to do :)12:18
olofkBut you never use that as a basis to build things with discrete components12:18
Powermaniac_Oh okay(?)12:18
Powermaniac_Well yeah I was sort of planning to build it with discrete components. But also use certain chips I can easily get the schematics for of what's inside. So less discrete components are needed12:19
olofkThat netlist is just consumed by other tools that create the layout for an ASIC or an FPGA12:20
Powermaniac_And make it on loads of breadboards you put into a slotted box, or something.12:20
Powermaniac_So how do you learn enough to actually be able to contribue to something this huge?12:21
olofkYou learn the theories of digital electronics, and then get good at writing Verilog or VHDL code12:21
olofkIf you desperately want to build a CPU on a breadboard, there are plenty of people who have done that with 4-bit or perhaps even 8-bit CPUs. They are quite large and complex, and a 32-bit RISC CPU like OpenRISC would be out of the question to design that way12:22
stekern...starting crazy projects building multistage RISC machines out of discrete components on breadboards you do *after* that12:22
Powermaniac_Ahh, stepping stones...Probably a good idea, I like having grand ideas and trying to go straight for them...12:23
Powermaniac_Doesn't really work out that well though...12:24
Powermaniac_What do you guys think about Moore's Law and the fact Intel is worrying about it already. And the idea of spintronics or quantum computers or soemthing else taking silicon's place like graphene?12:25
olofkHere's a design for a 4-bit CPU if you would like to try something like that
Powermaniac_Wait what do you guys do jobs wise to be able to be knowledgeable about designing the OR1200?12:28
Powermaniac_If you don't mind me asking that is...12:29
olofkI work as a Digital Design Engineer. I guess most people here work with embedded systems or FPGA/ASIC12:29
Powermaniac_Oh cool12:29
Powermaniac_Have you heard of the Neurosynaptic chip made by IBM? What do you think of that?12:29
olofkNever heard of12:30
Powermaniac_As I'm also a technological singularity nut and well some people think hardware is also blocking our path to the technological singularity as computers or CPUs rather don't work anything like the brain12:30
Powermaniac_Is there any talk about designing processors more like brains in the digital design area?12:35
Powermaniac_Area I hope to get into actually, first I need to get into electronic engineering and from there I can get into computational neuroscience12:37
Powermaniac_Mainly just so I can try and make the technological singularity happen, or well that is the plan anyway12:37
Powermaniac_Anyway thanks guys for the help once again!13:02
olofkNo problems. Hope you make it into EE13:02
jeremybennettPowermaniac_: We won't be streaming ORCONF, but all being well, the talks will be videoed.13:14
Powermaniac_jeremybennett: Sweet thanks!13:14
jeremybennettolofk: Do you know why is so slow today?14:47
poke53281stekern: Not sure. I will find out15:17
stekernthis yocto stuff is driving me crazy, all I want is for it to build a native toolchain and put it into my image...15:18
stekernstarts to feel just building the whole image by hand would be faster and less frustrating15:19
poke53281Found the position where the saved registers are overridden.18:20
poke53281Somewhere in between the signal handler.18:21
stekernpoke53281: hmm, ok r2 == regs in that function18:41
stekernI assume setup_rt_frame is inlined there too18:42
stekernand it's actually in that that the bad stuff happens18:42
stekernbut is it really "bad"?18:49
poke53281No I think the function is Ok.18:51
poke53281The bad stuff is happening somewhere else18:51
poke53281But it is not a position where a saved register is changed willingly.18:52
poke53281That's important18:53
stekernyeah, as I said earlier, the register overwriting is probably fine, in the general case18:53
stekernanswer to what you wrote two lines up18:54
stekernhow do you mean "But it is not a position where a saved register is changed willingly."18:54
poke53281During the timer exception around 9000 assembler command are executed. No exceptions.18:54
poke53281not intended .18:55
poke53281With "saved register" I mean the saved registers for the user process.18:56
poke53281Everything what could cause the problem happens within 9000 instructions18:58
poke53281r2 points directly to the same virtual memory address as r1 when the registers are saved. No page table change is done during that time.19:04
poke53281and the problems occurs when 95% of the whole exception has already been executed.19:07
stekerndid you change the opencore5?19:11
stekernfunny, I get different result now than during the day (different machine)19:12
poke53281Depends on what you are doing before.19:13
poke53281But it is not machine dependent. I am working also on different machines.19:13
stekernnow I got a similar crash as earlier19:14
poke53281I would say it is the same as yesterday19:15
stekernbut, everything up to Line 12 looks ok, no?19:16
poke53281no, it has found the register mismatches but continues executing.19:18
poke53281line no 10 seems to be syscall executed on the stack frame.19:19
poke53281In principle everything below line 10 you can remove, because it is meaningless.19:20
stekernare you sure?19:23
poke53281Well, the error already occured. The exception returned at the wrong address with wrong register entries. What do you expect?19:24
stekernbut isn't that the work of the signal handler?19:25
poke53281I don't check the registers of syscalls. They are changed of course.19:25
stekernI'm not sure, I might add ;)19:25
poke53281I am checking this. At the moment a call graph would be fine of this exception.19:27
poke53281It is definitely executing the signal_handler and copying data from user space.19:29
poke53281which is strange for a normal timer exception.19:29
stekernbut if there are signals pending19:29
stekernit's not so strange I think19:30
poke53281Yes, maybe a timer from X11.19:30
poke53281I was wondering that X11 seems not to have threads running.19:31
poke53281Well I can put some debug messages in the signal handler.19:32
poke53281I wish I could use better debug techniques.19:34
stekerndebugging on hardware is even more fun, then you get a window of ~2000 clock cycles to look at various signals ;)19:35
poke53281signal 1419:41
poke53281That makes sense.19:43
poke53281Maybe I can write a small little program to test this signal.19:44
poke53281int main() { alarm(1); while(1){} return 0; }19:50
poke53281Works without problems19:50
poke53281but the signal is not used. But before I continue ... lunch19:53
poke53281This code seems to reproduce partly the problem. Not crashing but the kernel has exactly the same behavior.21:19
poke53281Slowly I begin to understand.21:41
poke53281how this works21:41
poke53281I think you are right stekern.21:48
poke53281The error happens later, after the signal21:54
poke53281Returning back after the signal is the problem.21:55
poke53281but for some reason the addresses I get with objdump are wrong, or shifted by a few bytes.21:56
poke53281The context restore after a signal does not work. Most of the registers are wrong.23:11
poke53281Correction, only r12-r14 are wrong23:35
poke53281I think I have found the problem. It is in syscall_resume_userspace in entry.S23:52
poke53281This code is not valid for a sigreturn.23:52
poke53281because we have to restore all registers23:53
poke53281Maybe we need a switch there23:55
--- Log closed Sat Sep 21 00:00:05 2013

Generated by 2.15.2 by Marius Gedminas - find it at!