--- Log opened Thu Oct 16 00:00:39 2014 | ||
stekern | olofk: the compiler know nothing about caches | 03:12 |
---|---|---|
stekern | you have to manually invalidate the cache line before you read it | 03:15 |
stekern | if you make the assumption that you always will have write-through caches, you don't need to do anything special on writes | 03:16 |
-!- FreezingAlt is now known as FreezingCold | 06:16 | |
olofk | Yes, I'll try to invalidate the cache before reading the data. Is there some newlib function for that, or should I do some asm? | 07:08 |
olofk | Also, wallento, stekern, how does the snooping stuff work? I guess that this is one of the situations where that could be used, right? | 07:09 |
olofk | *((volatile uint8_t *)(0x91000000)) = (amplitude >> 30); | 07:31 |
olofk | What is wrong with that line? If I'm doing a 30-bit right-shift, I expect the value to be at most 3 | 07:31 |
olofk | But in my output I see that all bits can be set | 07:32 |
poke53282 | is amplitude an int or an unsigned int? | 08:13 |
poke53282 | if it is an unsigned int you are right. | 08:13 |
poke53282 | olofk: If not, than you do a signed shift right | 08:14 |
poke53282 | an all bits can be set. | 08:15 |
poke53282 | opt/cross/or1k-linux-musl/lib/gcc/or1k-linux-musl/4.9.1/crtbeginS.o: In function `deregister_tm_clones': | 08:19 |
poke53282 | crtstuff.c:(.text+0x48): relocation truncated to fit: R_OR1K_GOT16 against undefined symbol `_ITM_deregisterTMCloneTable' | 08:19 |
poke53282 | I guess this is a problem in binutils. | 08:19 |
olofk | poke53282: Yeah, I thought of sign extension as well. amplitude is uint32_t, so I think that should be ok | 08:22 |
olofk | This is my whole algorithm http://1c42a31d698cec4d.paste.se/ | 08:23 |
olofk | rx_buf is an array of uint32_t and every element consists of two packed sign-extended 12-bit values | 08:24 |
poke53282 | abs should be a noop in your code | 08:30 |
poke53282 | ahh, sry, q and i are 16 bit | 08:31 |
poke53282 | no wait. This is the assignment. They are indeed noops, because sample is 32 Bit. | 08:34 |
olofk | hmm | 08:35 |
olofk | hmm.. I need to get the values into a 16 bit signed int _before_ I do abs. Is that what you mean? | 08:36 |
poke53282 | Well, actually there is not abs(short int) | 08:36 |
poke53282 | So C will transform you 16 bit values into 32 bit. | 08:37 |
olofk | With the top half always == 0 ? | 08:37 |
poke53282 | No, this must be sign extended by the compiler | 08:37 |
olofk | Whoops. Found a typo in the verilog code. Getting a lot more consistent results now | 08:38 |
poke53282 | there is no abs(unsigned int) | 08:38 |
olofk | Ahh.. right. i and q shouldn't be signed | 08:38 |
poke53282 | abs((short int)((sample >> 16) & 0xffff)) | 08:40 |
poke53282 | this should work for example. | 08:40 |
olofk | And then i and q should be uint16 instead of int16 | 08:40 |
poke53282 | I guess so. | 08:41 |
olofk | This is probably the most advanced math I've done since I started working :) | 08:42 |
poke53282 | It depends on what you want to do with this code: ((i>>1)+q) | 08:42 |
olofk | It's hypotenuse approximation I found online | 08:42 |
poke53282 | In math such problems usually never arise. You have integers from -inf..inf and that's it. | 08:43 |
olofk | So many integers to choose from :) | 08:43 |
poke53282 | Yes, wait until we have 128 Bit CPUs. :) | 08:51 |
stekern | olofk: there's this https://github.com/openrisc/or1k-src/blob/or1k/newlib/libc/machine/or1k/or1k-support-asm.S#L285 | 09:02 |
stekern | which is completely misnamed, since it does an invalidate and not a flush | 09:02 |
stekern | but it's an invalidate that you want, not a flush | 09:04 |
stekern | with write-through caches, they do the same thing though | 09:04 |
olofk | Can I just call it from C with or1k_dcache_flush() ? | 09:07 |
stekern | yes | 09:07 |
olofk | Should we take the opportunity to fix stuff like this before we're pushing newlib? | 09:07 |
stekern | yes | 09:07 |
olofk | wallento: You heard the man | 09:08 |
stekern | and I would not use that function in places where you indeed intend to invalidate and not flush | 09:08 |
olofk | Not sure what I want | 09:09 |
olofk | Well, I know that I want the CPU to actually read the stuff from RAM | 09:09 |
stekern | because it's 1) confusing 2) going to be painful if the function get fixed | 09:09 |
stekern | flush = write what you have in cache to ram and make the cache line invalid | 09:10 |
stekern | invalidate = make the cache line invalid | 09:10 |
stekern | you definetely don't want flush, since you then overwrite what's in RAM with your stale cache data | 09:10 |
olofk | Ah ok. But then you need to know beforehand if there are things in the cache that are supposed to be written | 09:10 |
stekern | in case of DMA, the write side should always flush, the read side invalidate | 09:11 |
olofk | hmm | 09:12 |
olofk | I get it now | 09:12 |
olofk | flush, because otherwise the DMA component will read the old values from RAM | 09:12 |
olofk | I might add an extra CPU to this system. In that case I would need a snoop unit (correct?). Should I use that to track DMA as well in that case? | 09:14 |
stekern | I don't, since all generic software we have (read e.g. Linux) has to handle DMA coherency seperately anyway | 09:18 |
stekern | I think wallento1 has DMA connected to the snoop unit in his systems though | 09:19 |
wallento1 | yes, normally you also snoop dma access | 09:21 |
wallento1 | thats the reason snooping is not really multi-core but multi-master | 09:22 |
stekern | "normally" in what sense? | 09:22 |
wallento1 | only if you have are sure that DMA is not to shared address spaces | 09:22 |
wallento1 | or its single use addresses | 09:22 |
wallento1 | i think in some network processors they do it that way | 09:23 |
stekern | I've got an impression that in generic multicore systems it's not done, and you have to handle the coherency manually | 09:23 |
wallento1 | explicitly handling it or don't cache those areas DMA accesses | 09:23 |
wallento1 | mmh, I am not sure, I think the ARM stuff is also coherent | 09:23 |
wallento1 | i hate this wallento1 | 09:24 |
wallento1 | ah, logged in twice.. | 09:24 |
wallento | does mor1kx support non-cached pages? | 09:26 |
stekern | I'm not sure, but I thought typical ARM SoCs doesn't do snooping on DMA | 09:27 |
stekern | yes, non-cached pages are supported (i.e. the CI bit is implemented) | 09:27 |
stekern | that's how DMA coherency works in openrisc Linux | 09:27 |
stekern | ...and since my main interest was to run SMP Linux, it'd be of not much benefit to add DMA accesses to the snooping | 09:29 |
wallento | This might be interesting in this context: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0228a/index.html | 09:32 |
wallento | i think it strongly depends on how you use the data. If it is single-use ethernet frames, you may not want the coherency overhead | 09:34 |
wallento | on which platform are you working, olofk? | 09:34 |
stekern | yes, that's an interesting page. I guess the answer is "it depends" | 09:40 |
stekern | but in either case, as said, you can use snooping regardless of if you have one or several CPUs | 09:42 |
stekern | using snooping has the advantage that you get more fine-grained invalidate (if you have more than 1-way caches) + you save the overhead of manually doing it | 09:44 |
olofk | wallento: Platform as in OS, FPGA device, board...? | 09:47 |
wallento | system and FPGA | 09:47 |
wallento | is it in fusesoc already? | 09:47 |
olofk | Nope. Not yet | 09:48 |
olofk | wallento: This platform https://github.com/myriadrf/myriadrf-boards/blob/master/de0nano-interface/docs/Myriad-RF%20Development%20Kit_1.0r6.pdf | 09:49 |
olofk | I'm building an OpenRISC-based SoC that can transmit and receive samples. It's going to be ported to a newly developed board after I got the prototype running | 09:50 |
blueCmd | does anyone know if poke managed to get regression tests up and running and published? | 10:55 |
olofk | wallento: Here's some PCB porn for the new board https://www.dropbox.com/s/2awyo0xt6ivn4jx/IMG_2644.JPG?dl=0 | 10:59 |
wallento | nice one | 11:05 |
olofk | Gigabit Ethernet, HDMI, USB3 slave, 2 separate DDR3 memories, 2xUSB2 hosts, FMC connector. (Trying to stir up some interest as you can see :)) | 11:07 |
stekern | usb hosts are interesting | 11:09 |
stekern | blueCmd: as far as I know, at least up and running | 11:15 |
olofk | stekern: Hosts are routed via an FTDI Vinculum-II ASIC, so they are not connected directly to the FPGA | 11:16 |
olofk | Meant to be used for keyboard, Mice and stuff | 11:17 |
olofk | But I think you can actually hook up a USB Disk to them as well | 11:17 |
olofk | And it should support the USB Pet Rock | 11:17 |
stekern | blueCmd: jeremybennett also got some sort of contact with Jungsook (linkedin connection acceptance or something like that) | 11:18 |
stekern | as long as it supports USB Pet Rock | 11:18 |
blueCmd | stekern: cool! | 11:26 |
olofk | blueCmd: Too bad you couldn't make it to Munich. We missed you | 11:33 |
jeremybennett | blueCmd: Yes - Jung Sook responded to my LinkedIn connection. I'll see if I can get an answer about her FP changes. | 11:52 |
stekern | +1 what olofk said | 12:45 |
blueCmd | olofk: yeah, I had some other stuff to hack on that was more urgent - I would have loved to come. | 12:47 |
blueCmd | olofk: the other love of my life, Dreamhack is soon so there's a lot of effort going into that | 12:50 |
stekern | blueCmd: I found out that my mum attended the last Dreamhack | 12:52 |
stekern | obviously she's more a hardcore geek than I ;) | 12:54 |
blueCmd | yep! | 12:54 |
stekern | ..or she just have a job that made her attend | 12:55 |
blueCmd | what does she do? | 12:55 |
stekern | something with giving out money to organizations | 12:56 |
stekern | I'm not entirely sure ;) | 12:57 |
stekern | but it's mutual, she says that I "do something with computers and electronics" | 12:57 |
blueCmd | which is correct | 12:59 |
maxpaln | howdy all - proper SW question coming up (so bear with me - this isn't my strongest area :-) ) | 13:02 |
-!- rah_ is now known as rah | 13:05 | |
maxpaln | we are trying to assess performance on a memcpy. The way it is currently implemented (possibly by the C code or more likely by the assembler as interpretted by the compiler) is at a word-by-word basis. So copying a block of, say, 1KB, requires a 1000x loop that reads a byte and writes it somewhere else. | 13:05 |
maxpaln | Due to the inefficient nature of DDR3 memories for such byte-wise tasks, this is taking a long time. It would be MUCH better if this could be initiated as a burst transaction over the wishbone bus. | 13:05 |
maxpaln | Does anyone have any thoughts/experience of handling the compiler to implement memory accesses in burst/single accesses? | 13:06 |
maxpaln | is it even possible? | 13:07 |
blueCmd | that's an interesting problem | 13:08 |
_franck__ | you could use a DMA controller, ask olofk | 13:08 |
blueCmd | (which isn't what you want to hear :P) | 13:08 |
poke53281 | blueCmd: http://juliusbaxter.net/openrisc-irc/%23openrisc.2014-10-01.log.html | 13:09 |
_franck__ | maxpaln: are talking about memcpy from Linux or from the libc ? | 13:09 |
maxpaln | from linux | 13:10 |
_franck__ | at leat we could optimize it tu use 32bits words when possible , that's on my TODO list | 13:11 |
maxpaln | _franck__: yep, a DMA controller is really what is needed but one isn't implemented at the moment, at least not one that Linux can use. | 13:11 |
_franck__ | but I don't think we could user burst. I think it could be possible if we had a write back data cache | 13:12 |
_franck__ | I think olofk and stekern have something working as a DMA controller. And stekern might have a Linux driver | 13:12 |
maxpaln | mmm, sounds interesting... | 13:13 |
blueCmd | poke53281: cool thanks | 13:13 |
blueCmd | poke53281: quite horrible results though | 13:13 |
poke53281 | yes, the ones from musl are better. but it looks like that my script was not working corrsctly. | 13:14 |
poke53281 | so, this .gcds errors are not related to openrisc | 13:14 |
poke53281 | .gcda I mean | 13:15 |
poke53281 | and the shared library problem was weird | 13:16 |
poke53281 | the quwstion is if i have to compile libc after the stage 2 gcc compilation. | 13:17 |
blueCmd | nah | 13:18 |
blueCmd | shouldn't need that | 13:18 |
poke53281 | he found the correct file but declined it to load as shared library. | 13:19 |
poke53281 | after reading the first 512 bytes | 13:20 |
poke53281 | blueCmd: you hadnever this shared library problem? | 13:32 |
poke53281 | qemu-user with the built sysroot with your scripts | 13:32 |
poke53281 | just try to compile with -lm and try to run with qemu-user in your sysroot | 13:34 |
blueCmd | poke53281: I didn't run with qemu-user I think | 13:48 |
poke53281 | well, I thought you did, after you put it in your debian readme. | 13:53 |
poke53281 | but it should not matter | 13:54 |
stekern | maxpaln: yes, I've been using olofk's wb_streamer (https://github.com/olofk/wb_streamer/) to get DMA functionality together with an i2s core | 13:57 |
stekern | and I have a work-in-progress driver for it: http://git.openrisc.net/cgit.cgi/stefan/linux/tree/drivers/dma/wb_streamer-dma.c?h=smp | 13:58 |
maxpaln | stekern: looks interesting - where can I find the wb_streamer verilog? | 14:00 |
stekern | in the first github link | 14:00 |
maxpaln | D'oh - missed it at the top of the file: https://github.com/olofk/wb_streamer | 14:00 |
stekern | I've only tested the writer part so far, but olofk claims that the reader should be working now too | 14:01 |
olofk | stekern, maxpaln : Yes, it seems to be working quite well | 14:02 |
stekern | I think there might be some slight work in the driver left for that still, but I'm hoping to attend to that in the next week or so | 14:02 |
olofk | I've been diffing the corresponding writer/reader files of the streamer, and they are extremely similar. Thinking of refactoring it slightly to be able to share even more code | 14:03 |
stekern | regardless of that, we should optimize copy to/from user too | 14:13 |
juliusb | some slides from ORCONF have been put up at orconf.org | 14:15 |
poke53281 | great, but why dropbox? does orconf.org run on a 1MB hard drive? | 14:17 |
simoncook | no, GitHub, which I think as a 1GB repository limit, (thus 1GB website limit?) | 14:19 |
juliusb | i dunno, just chose it because it was the simplest I thought | 14:19 |
juliusb | i have no idea how much space there is on that git repo | 14:19 |
maxpaln | ok, so apart from understanding how to use the wb_streamer :-) this looks like it could be a good solution - is there an example SOC with it implemented? (I'm guessing not but it's worth asking :-) ) | 14:20 |
blueCmd | poke53281: I used or1ksim when testing gcc, but qemu-user for a lot of other things | 14:23 |
blueCmd | TBH I don't recall details how I did stuff | 14:24 |
blueCmd | I'm bad in that way | 14:24 |
wallento | juliusb: its 1 gig, but not hard, just read it a few secs ago | 14:27 |
maxpaln | juliusb: looks like you have a new employer: http://www.design-reuse.com/news/35676/qualcomm-csr-acquisition.html | 14:41 |
juliusb | maxpaln: indeed | 15:52 |
juliusb | well, not yet, it'll take about a year to finalise | 15:52 |
maxpaln | yeah - these things always take forever. But that's a pretty big deal - qualcommm have some real cutting edge stuff and they seem to really know how to work the technology. I'll be interested to see what magic they have you working - although it'll be several years before there's a product or anything as real as that :-) | 15:59 |
stekern | maxpaln: you can take a look at the sockit-multicore system in my orpsoc-cores | 16:05 |
stekern | it's in the multicore branch | 16:06 |
stekern | i'd paste a link if this phone allowed me ;) | 16:10 |
maxpaln | :-) - I'll find it | 16:13 |
juliusb | maxpaln: well, we'll see what they make of us :) I think there's an opportunity for them to branch out into the sort of things we do, maybe some cool (buzzword alert) synergies | 16:14 |
juliusb | maxpaln: it was you who came to ORCONF and presented for Lattice right? | 16:15 |
maxpaln | aw, there'll be plenty of synergies. They'll no doubt be leveraging any low hanging fruit to implement a paradigm shift in technology for a solid return on investment. | 16:16 |
maxpaln | [too much?] | 16:16 |
juliusb | :) | 16:17 |
juliusb | you didn't have enough references to IoT in there | 16:18 |
maxpaln | AH, damn it - my buzzword lexicon needs to be brought into the 21st century. I'm putting together a framework for offshoring this stuff soon anyway :-) | 16:22 |
maxpaln | stekern: I found this one (https://github.com/openrisc/orpsoc-cores/tree/master/systems/sockit) but it doesn't seem to use the wb_streamer. Did I miss it? | 16:22 |
maxpaln | juliusb: yes, it was me at ORCONF :-) | 16:23 |
maxpaln | just sent you my slides. | 16:27 |
juliusb | ah cool, good to put irc handles to faces | 16:27 |
juliusb | and thanks | 16:27 |
mor1kx | [mor1kx] skristiansson pushed 8 new commits to master: https://github.com/openrisc/mor1kx/compare/64651c8af488...e27b9e6d1afd | 20:44 |
mor1kx | mor1kx/master 1ed93f7 Stefan Wallentowitz: Add snoop port | 20:44 |
mor1kx | mor1kx/master 0be97e7 Stefan Wallentowitz: DCache: Coherency... | 20:44 |
mor1kx | mor1kx/master dba9361 Stefan Kristiansson: cappuccino: check atomic_reserve at end of storebuffer... | 20:44 |
stekern | ok, all multicore stuff merged into master | 20:45 |
imphil_ | stekern, nice! thanks | 20:49 |
--- Log closed Fri Oct 17 00:00:41 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!