--- Log opened Wed Sep 14 00:00:11 2016 | ||
-!- LoneTech_ is now known as LoneTech | 03:53 | |
shorne | olofk: agree, I think its better our tree not be so much different from upstream. I'm looking into how others manage the patches | 06:53 |
---|---|---|
olofk | Anyone got a good idea for an FPGA board with GigE and a couple of user-accessible I/O that can handle ~100MHz LVDS | 07:14 |
olofk | Small. Credit-card size would be nice | 07:14 |
olofk | Hmm... come to think of it, a microZed might work, but I would like something that is easier to solder than the Samtec connectors | 07:15 |
olofk | DE0-Nano-SoC might work | 07:18 |
shorne | ah, this one is a bit too big and not gitE https://www.xilinx.com/products/boards-and-kits/arty.html | 07:50 |
olofk | Yeah, I thought about Arty too, but need faster ethernet | 07:52 |
shorne | I see some virtex 7 boards with gigE, looks really expensive | 07:53 |
shorne | just searching xilinx site | 07:53 |
shorne | http://www.xilinx.com/products/boards-and-kits/1-3bwl52.html | 07:56 |
shorne | this looks nice | 07:56 |
shorne | http://www.xilinx.com/products/boards-and-kits/1-hlf2sm.html - this lookst ok too, $99 | 07:57 |
shorne | actually snowleo looks really nice | 07:57 |
olofk | We got some poeple from Trenz coming to orconf actually | 07:59 |
shorne | oh, nice | 08:00 |
El | Is there a tutorial or documentation to tell me how to create a .jic file from a .sof and a .elf file and program it onto a de0_nano? I can load a bare-metal file onto my de0_nano using OpenODC/GDB (from https://github.com/embecosm/chiphack/wiki/OpenRISC-SoC-Practical-Session-Instructions) but can't find how to load this onto the EPCS64 to boot the OR1K and app on power-up. | 11:12 |
shorne | stekern: I have tested that 'minimum patch set of fixes' https://github.com/stffrdhrn/linux/wiki/commit-batches | 11:15 |
shorne | the one I listed here openrisc-fixes | 11:15 |
shorne | Branch is here, https://github.com/stffrdhrn/linux/tree/openrisc-fixes-4.8 | 11:16 |
shorne | I even created a signed tag, https://github.com/stffrdhrn/linux/releases/tag/or-tag-test | 11:19 |
shorne | Anyway, It runs on de0 nano, so I guess its good :) | 11:19 |
shorne | The initramfs stuff I split out to another directory, need to put some docs and make it a project | 11:20 |
shorne | but it seems to work fine not being in the kernel | 11:20 |
olofk | Oh, missed the guy asking about jic files | 11:54 |
olofk | If he comes back, direct him to me or to the mailing list | 11:54 |
olofk | shorne: Cool. Good job | 11:55 |
olofk | Can you build initramfs out-of-tree btw? | 11:55 |
stekern | shorne: this: "pick 09fc079 openrisc: set OUTPUT_FORMAT to elf32-or1k" | 11:55 |
stekern | is superseeded by openrisc: Support both old (or32) and new (or1k) toolchain | 11:55 |
olofk | Also, is it possible to use external dts files? I'd like to keep them close to the FuseSoC systems, and eventually be able to create them on the fly matching the hw configuration | 12:13 |
stekern | at least if you're not building them into the kernel it's possible | 12:34 |
stekern | building into the kernel kind of defeats the "one kernel to rule them all" | 12:35 |
olofk | Yeah. True | 13:49 |
olofk | But I can't come up with any practical way to use them separately for or1k | 13:50 |
olofk | We could of course store the kernel in SPI Flash on a board, load the dtb (dtb is a compiled dts, right?) and set r3 | 13:51 |
olofk | But that is just more hassle than loading a custom kernel | 13:51 |
olofk | And you still need to remember to enable all the kernel options you potentially want | 13:52 |
olofk | It would of course make more sense if we wanted to provide an image with kernel+rootfs that can be used for multiple boards | 13:55 |
olofk | shorne: That would be a good end use-case for the stuff you've been talking about today | 13:55 |
olofk | Can u-boot take a kernel and a device tree from an SPI flash and boot? | 14:04 |
ZipCPU | olofk: stekern: stekern did some wonderful work optimizing the strcpy and strcmp library functions for aligned string accesses. These optimized versions have not made it into the newlib library for or1k. Is there any plans to import them? | 14:24 |
stekern | olofk: yes | 14:31 |
ZipCPU | stekern: Were your updated versions assembly optimized, or does there exist C code for them? | 14:34 |
stekern | it wasn't me that did them, it was olofk | 14:38 |
ZipCPU | So ... I should pick on olofk then? :D | 14:39 |
olofk | ZipCPU: I based it on a C algorithm that I found in some other arch that was originally from some guy at Intel I think | 14:49 |
olofk | But I hand-wrote them in asm to take advantage of delay slots and such | 14:50 |
ZipCPU | That's kind of what I thought. I'm hoping to have a copy that works within newlib. I suppose I could disassemble stekern's executables to reverse engineer them and rebuild them, or I might ask you kindly if you have any plans to update those libraries soon. ;) ? | 14:51 |
olofk | ZipCPU: Do you mean that stekern's binaries have optimized strcpy/strcmp routines? | 14:53 |
ZipCPU | Yes. | 14:53 |
olofk | Hmm.. are they built for bare-metal or Linux? | 14:54 |
ZipCPU | newlib, so probably bare-metal. | 14:54 |
olofk | Well, then they must be using the generic newlib routines, I guess | 14:54 |
olofk | No optimized versions at all | 14:54 |
ZipCPU | Actually, I don't even know that it was newlib. I'm just assuming bare-metal and newlib. | 14:54 |
olofk | Well, that's safe to assume | 14:55 |
ZipCPU | No, they were definitely optimized. I disassembled them and had a peek. | 14:55 |
olofk | Do you have the binaries somewhere? | 14:55 |
olofk | Because stekern and I were (I think) just talking about the optimized memset I wrote for Linux | 14:55 |
olofk | Which shouldn't affect strcmp or strcpy, and definitely not anything in newlib | 14:56 |
ZipCPU | Yes, I still have the binaries. They are his binaries of the dhrystone algorithm. Executables that would run on de0 or atlys. | 14:57 |
olofk | Found them in the OpenRISC documentation database | 14:58 |
olofk | I looked at the one in dhry-de0.dis, but I don't have a clue if they look optimized :) | 15:01 |
ZipCPU | Heheh ... | 15:01 |
ZipCPU | dhry-de0.dis is very nicely optimized, or at least the two functions I looked at are--strcpy and strcmp | 15:02 |
ZipCPU | Both very nicely check for aligned accesses first, and then (if aligned) operate on 4-bytes at a time. | 15:02 |
ZipCPU | The speed up is ... very valuable. | 15:03 |
ZipCPU | Let's see ... I annotated the disassembly of the memcpy function. | 15:04 |
ZipCPU | There was also some loop unrolling taking place there too. | 15:05 |
SMDhome | coremark is still running on icarus sim. 70+ hours! | 15:09 |
ZipCPU | olofk: Here's an annotated disassembly of the memcpy function: https://justpaste.it/yci0 | 15:11 |
_franck_ | I started to port this one to or1k assembly some times ago: http://git.musl-libc.org/cgit/musl/tree/src/string/memcpy.c | 15:13 |
_franck_ | never finished | 15:13 |
olofk | ZipCPU: Looking at the newlib code, they use optimized C algorithms | 15:28 |
olofk | So it's a generic newlib thing | 15:29 |
ZipCPU | Then something's still not right. 'Cause with the newlib build I have, or1k got a very poor dhrystone score, and because | 15:29 |
ZipCPU | the disassemblies didn't match at all. | 15:30 |
olofk | Ha! But my hand-optimized memset for linux is half the size of the one newlib produces :) | 15:31 |
olofk | ZipCPU: That's strange | 15:31 |
olofk | Maybe there is something with the newlib build options. In the source code they use #if defined(PREFER_SIZE_OVER_SPEED) || defined(__OPTIMIZE_SIZE__) | 15:32 |
olofk | So it could be that your newlib is built to optimize for size | 15:32 |
olofk | Otherwise I don't know | 15:34 |
olofk | What does your disassembled functions look like? | 15:34 |
ZipCPU | Hmm ... let me go dig into how to adjust those optimization flags, and see if I can get a different number then. | 15:39 |
olofk | ZipCPU: Do you use -Osomething when you compile? | 15:45 |
ZipCPU | I found the default "-g -O2", and just changed it to "-O3" ... still haven't found how to set the right preference flags. | 15:46 |
ZipCPU | No ... that wasn't the difference. | 16:21 |
olofk | :/ | 16:22 |
ZipCPU | I'm getting almost identical scores to what I was getting before. | 16:22 |
ZipCPU | Could it be that mor1kx-generic is somehow ... poorly optimized? | 16:22 |
ZipCPU | I'm now using -O3 and I've verified that the strcmp, memcpy, and strcpy calls are all the optimized newlib versions | 16:26 |
ZipCPU | I'm still getting scores less than half of what stekern has posted some time ago. | 16:28 |
olofk | ZipCPU: Could be | 16:34 |
olofk | Not sure how cache size, store buffer, mul/div implementations are set up | 16:34 |
ZipCPU | mul/div shouldn't have any effect. | 16:35 |
ZipCPU | Cache size and store buffer ... that I don't know. | 16:35 |
ZipCPU | (Okay, mul/div will have some effect--shouldn't be this much ...) | 16:35 |
olofk | Store buffer seems to be enabled by default | 16:40 |
olofk | I and D caches are enabled | 16:41 |
olofk | There is only a serial divider, which is optimized by default | 16:43 |
olofk | s/optimized/enabled | 16:43 |
olofk | No idea then | 16:44 |
olofk | Oh, I missed that El guy again | 17:06 |
shorne | ZipCPU: there are 2 commit here for optimized routines. https://github.com/stffrdhrn/linux/wiki/commit-batches | 18:32 |
shorne | pick eb6b230 openrisc: Add optimized memcpy routine | 18:32 |
ZipCPU | shorne: Thanks! | 18:32 |
shorne | pick a728fc8 openrisc: Add optimized memset | 18:33 |
shorne | one from me one from olofk | 18:34 |
shorne | _franck_: you can look at my memcpy routine, I also send it to the kernel list and got some response on it | 18:34 |
shorne | olofk: for our of kernel dts, Ill have a look, it looks like for most archirectures the maintain it in the kernel though | 18:57 |
shorne | stekern: about or32, or1k compile output, thanks, I did seem to remember that, but didnt apply it, will fix | 18:58 |
--- Log closed Thu Sep 15 00:00:12 2016 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!