IRC logs for #openrisc Thursday, 2014-10-16

--- Log opened Thu Oct 16 00:00:39 2014
stekernolofk: the compiler know nothing about caches03:12
stekernyou have to manually invalidate the cache line before you read it03:15
stekernif you make the assumption that you always will have write-through caches, you don't need to do anything special on writes03:16
-!- FreezingAlt is now known as FreezingCold06:16
olofkYes, I'll try to invalidate the cache before reading the data. Is there some newlib function for that, or should I do some asm?07:08
olofkAlso, wallento, stekern, how does the snooping stuff work? I guess that this is one of the situations where that could be used, right?07:09
olofk  *((volatile uint8_t *)(0x91000000)) = (amplitude >> 30);07:31
olofkWhat is wrong with that line? If I'm doing a 30-bit right-shift, I expect the value to be at most 307:31
olofkBut in my output I see that all bits can be set07:32
poke53282is amplitude an int or an unsigned int?08:13
poke53282if it is an unsigned int you are right.08:13
poke53282olofk: If not, than you do a signed shift right08:14
poke53282an all bits can be set.08:15
poke53282opt/cross/or1k-linux-musl/lib/gcc/or1k-linux-musl/4.9.1/crtbeginS.o: In function `deregister_tm_clones':08:19
poke53282crtstuff.c:(.text+0x48): relocation truncated to fit: R_OR1K_GOT16 against undefined symbol `_ITM_deregisterTMCloneTable'08:19
poke53282I guess this is a problem in binutils.08:19
olofkpoke53282: Yeah, I thought of sign extension as well. amplitude is uint32_t, so I think that should be ok08:22
olofkThis is my whole algorithm
olofkrx_buf is an array of uint32_t and every element consists of two packed sign-extended 12-bit values08:24
poke53282abs should be a noop in your code08:30
poke53282ahh, sry, q and i are 16 bit08:31
poke53282no wait. This is the assignment. They are indeed noops, because sample is 32 Bit.08:34
olofkhmm.. I need to get the values into a 16 bit signed int _before_ I do abs. Is that what you mean?08:36
poke53282Well, actually there is not abs(short int)08:36
poke53282So C will transform you 16 bit values into 32 bit.08:37
olofkWith the top half always == 0 ?08:37
poke53282No, this must be sign extended by the compiler08:37
olofkWhoops. Found a typo in the verilog code. Getting a lot more consistent results now08:38
poke53282there is no abs(unsigned int)08:38
olofkAhh.. right. i and q shouldn't be signed08:38
poke53282abs((short int)((sample >> 16) & 0xffff))08:40
poke53282this should work for example.08:40
olofkAnd then i and q should be uint16 instead of int1608:40
poke53282I guess so.08:41
olofkThis is probably the most advanced math I've done since I started working :)08:42
poke53282It depends on what you want to do with this code: ((i>>1)+q)08:42
olofkIt's hypotenuse approximation I found online08:42
poke53282In math such problems usually never arise. You have integers from -inf..inf and that's it.08:43
olofkSo many integers to choose from :)08:43
poke53282Yes, wait until we have 128 Bit CPUs. :)08:51
stekernolofk: there's this
stekernwhich is completely misnamed, since it does an invalidate and not a flush09:02
stekernbut it's an invalidate that you want, not a flush09:04
stekernwith write-through caches, they do the same thing though09:04
olofkCan I just call it from C with or1k_dcache_flush() ?09:07
olofkShould we take the opportunity to fix stuff like this before we're pushing newlib?09:07
olofkwallento: You heard the man09:08
stekernand I would not use that function in places where you indeed intend to invalidate and not flush09:08
olofkNot sure what I want09:09
olofkWell, I know that I want the CPU to actually read the stuff from RAM09:09
stekernbecause it's 1) confusing 2) going to be painful if the function get fixed09:09
stekernflush = write what you have in cache to ram and make the cache line invalid09:10
stekerninvalidate = make the cache line invalid09:10
stekernyou definetely don't want flush, since you then overwrite what's in RAM with your stale cache data09:10
olofkAh ok. But then you need to know beforehand if there are things in the cache that are supposed to be written09:10
stekernin case of DMA, the write side should always flush, the read side invalidate09:11
olofkI get it now09:12
olofkflush, because otherwise the DMA component will read the old values from RAM09:12
olofkI might add an extra CPU to this system. In that case I would need a snoop unit (correct?). Should I use that to track DMA as well in that case?09:14
stekernI don't, since all generic software we have (read e.g. Linux) has to handle DMA coherency seperately anyway09:18
stekernI think wallento1 has DMA connected to the snoop unit in his systems though09:19
wallento1yes, normally you also snoop dma access09:21
wallento1thats the reason snooping is not really multi-core but multi-master09:22
stekern"normally" in what sense?09:22
wallento1only if you have are sure that DMA is not to shared address spaces09:22
wallento1or its single use addresses09:22
wallento1i think in some network processors they do it that way09:23
stekernI've got an impression that in generic multicore systems it's not done, and you have to handle the coherency manually09:23
wallento1explicitly handling it or don't cache those areas DMA accesses09:23
wallento1mmh, I am not sure, I think the ARM stuff is also coherent09:23
wallento1i hate this wallento109:24
wallento1ah, logged in twice..09:24
wallentodoes mor1kx support non-cached pages?09:26
stekernI'm not sure, but I thought typical ARM SoCs doesn't do snooping on DMA09:27
stekernyes, non-cached pages are supported (i.e. the CI bit is implemented)09:27
stekernthat's how DMA coherency works in openrisc Linux09:27
stekern...and since my main interest was to run SMP Linux, it'd be of not much benefit to add DMA accesses to the snooping09:29
wallentoThis might be interesting in this context:
wallentoi think it strongly depends on how you use the data. If it is single-use ethernet frames, you may not want the coherency overhead09:34
wallentoon which platform are you working, olofk?09:34
stekernyes, that's an interesting page. I guess the answer is "it depends"09:40
stekernbut in either case, as said, you can use snooping regardless of if you have one or several CPUs09:42
stekernusing snooping has the advantage that you get more fine-grained invalidate (if you have more than 1-way caches) + you save the overhead of manually doing it09:44
olofkwallento: Platform as in OS, FPGA device, board...?09:47
wallentosystem and FPGA09:47
wallentois it in fusesoc already?09:47
olofkNope. Not yet09:48
olofkwallento: This platform
olofkI'm building an OpenRISC-based SoC that can transmit and receive samples. It's going to be ported to a newly developed board after I got the prototype running09:50
blueCmddoes anyone know if poke managed to get regression tests up and running and published?10:55
olofkwallento: Here's some PCB porn for the new board
wallentonice one11:05
olofkGigabit Ethernet, HDMI, USB3 slave, 2 separate DDR3 memories, 2xUSB2 hosts, FMC connector. (Trying to stir up some interest as you can see :))11:07
stekernusb hosts are interesting11:09
stekernblueCmd: as far as I know, at least up and running11:15
olofkstekern: Hosts are routed via an FTDI Vinculum-II ASIC, so they are not connected directly to the FPGA11:16
olofkMeant to be used for keyboard, Mice and stuff11:17
olofkBut I think you can actually hook up a USB Disk to them as well11:17
olofkAnd it should support the USB Pet Rock11:17
stekernblueCmd: jeremybennett also got some sort of contact with Jungsook (linkedin connection acceptance or something like that)11:18
stekernas long as it supports USB Pet Rock11:18
blueCmdstekern: cool!11:26
olofkblueCmd: Too bad you couldn't make it to Munich. We missed you11:33
jeremybennettblueCmd: Yes - Jung Sook responded to my LinkedIn connection. I'll see if I can get an answer about her FP changes.11:52
stekern+1 what olofk said12:45
blueCmdolofk: yeah, I had some other stuff to hack on that was more urgent - I would have loved to come.12:47
blueCmdolofk: the other love of my life, Dreamhack is soon so there's a lot of effort going into that12:50
stekernblueCmd: I found out that my mum attended the last Dreamhack12:52
stekernobviously she's more a hardcore geek than I ;)12:54
stekern..or she just have a job that made her attend12:55
blueCmdwhat does she do?12:55
stekernsomething with giving out money to organizations12:56
stekernI'm not entirely sure ;)12:57
stekernbut it's mutual, she says that I "do something with computers and electronics"12:57
blueCmdwhich is correct12:59
maxpalnhowdy all - proper SW question coming up (so bear with me - this isn't my strongest area :-) )13:02
-!- rah_ is now known as rah13:05
maxpalnwe are trying to assess performance on a memcpy. The way it is currently implemented (possibly by the C code or more likely by the assembler as interpretted by the compiler) is at a word-by-word basis. So copying a block of, say, 1KB, requires a 1000x loop that reads a byte and writes it somewhere else.13:05
maxpalnDue to the inefficient nature of DDR3 memories for such byte-wise tasks, this is taking a long time. It would be MUCH better if this could be initiated as a burst transaction over the wishbone bus.13:05
maxpalnDoes anyone have any thoughts/experience of handling the compiler to implement memory accesses in burst/single accesses?13:06
maxpalnis it even possible?13:07
blueCmdthat's an interesting problem13:08
_franck__you could use a DMA controller, ask olofk13:08
blueCmd(which isn't what you want to hear :P)13:08
_franck__maxpaln: are talking about memcpy from Linux or from the libc ?13:09
maxpalnfrom linux13:10
_franck__at leat we could optimize it tu use 32bits words when possible , that's on my TODO list13:11
maxpaln_franck__: yep, a DMA controller is really what is needed but one isn't implemented at the moment, at least not one that Linux can use.13:11
_franck__but I don't think we could user burst. I think it could be possible if we had a write back data cache13:12
_franck__I think olofk and stekern have something working as a DMA controller. And stekern might have a Linux driver13:12
maxpalnmmm, sounds interesting...13:13
blueCmdpoke53281: cool thanks13:13
blueCmdpoke53281: quite horrible results though13:13
poke53281yes, the ones from musl are better. but it looks like that my script was not working corrsctly.13:14
poke53281so, this .gcds errors are not related to openrisc13:14
poke53281.gcda I mean13:15
poke53281and the shared library problem was weird13:16
poke53281the quwstion is if i have to compile libc after the stage 2 gcc compilation.13:17
blueCmdshouldn't need that13:18
poke53281he found the correct file but declined it to load as shared library.13:19
poke53281after reading the first 512 bytes13:20
poke53281blueCmd: you hadnever this shared library problem?13:32
poke53281qemu-user with the built sysroot with your scripts13:32
poke53281just try to compile with -lm and try to run with qemu-user in your sysroot13:34
blueCmdpoke53281: I didn't run with qemu-user I think13:48
poke53281well, I thought you did, after you put it in your debian readme.13:53
poke53281but it should not matter13:54
stekernmaxpaln: yes, I've been using olofk's wb_streamer ( to get DMA functionality together with an i2s core13:57
stekernand I have a work-in-progress driver for it:
maxpalnstekern: looks interesting - where can I find the wb_streamer verilog?14:00
stekernin the first github link14:00
maxpalnD'oh - missed it at the top of the file:
stekernI've only tested the writer part so far, but olofk claims that the reader should be working now too14:01
olofkstekern, maxpaln : Yes, it seems to be working quite well14:02
stekernI think there might be some slight work in the driver left for that still, but I'm hoping to attend to that in the next week or so14:02
olofkI've been diffing the corresponding writer/reader files of the streamer, and they are extremely similar. Thinking of refactoring it slightly to be able to share even more code14:03
stekernregardless of that, we should optimize copy to/from user too14:13
juliusbsome slides from ORCONF have been put up at orconf.org14:15
poke53281great, but why dropbox? does run on a 1MB hard drive?14:17
simoncookno, GitHub, which I think as a 1GB repository limit, (thus 1GB website limit?)14:19
juliusbi dunno, just chose it because it was the simplest I thought14:19
juliusbi have no idea how much space there is on that git repo14:19
maxpalnok, so apart from understanding how to use the wb_streamer :-) this looks like it could be a good solution - is there an example SOC with it implemented? (I'm guessing not but it's worth asking :-) )14:20
blueCmdpoke53281: I used or1ksim when testing gcc, but qemu-user for a lot of other things14:23
blueCmdTBH I don't recall details how I did stuff14:24
blueCmdI'm bad in that way14:24
wallentojuliusb: its 1 gig, but not hard, just read it a few secs ago14:27
maxpalnjuliusb: looks like you have a new employer:
juliusbmaxpaln: indeed 15:52
juliusbwell, not yet, it'll take about a year to finalise15:52
maxpalnyeah - these things always take forever. But that's a pretty big deal - qualcommm have some real cutting edge stuff and they seem to really know how to work the technology. I'll be interested to see what magic they have you working - although it'll be several years before there's a product or anything as real as that :-)15:59
stekernmaxpaln: you can take a look at the sockit-multicore system in my orpsoc-cores16:05
stekernit's in the multicore branch16:06
stekerni'd paste a link if this phone allowed me ;)16:10
maxpaln:-) - I'll find it16:13
juliusbmaxpaln: well, we'll see what they make of us :) I think there's an opportunity for them to branch out into the sort of things we do, maybe some cool (buzzword alert) synergies16:14
juliusbmaxpaln: it was you who came to ORCONF and presented for Lattice right?16:15
maxpalnaw, there'll be plenty of synergies. They'll no doubt be leveraging any low hanging fruit to implement a paradigm shift in technology for a solid return on investment.16:16
maxpaln[too much?]16:16
juliusbyou didn't have enough references to IoT in there16:18
maxpalnAH, damn it - my buzzword lexicon needs to be brought into the 21st century. I'm putting together a framework for offshoring this stuff soon anyway :-)16:22
maxpalnstekern: I found this one ( but it doesn't seem to use the wb_streamer. Did I miss it?16:22
maxpalnjuliusb: yes, it was me at ORCONF :-)16:23
maxpalnjust sent you my slides.16:27
juliusbah cool, good to put irc handles to faces16:27
juliusband thanks16:27
mor1kx[mor1kx] skristiansson pushed 8 new commits to master:
mor1kxmor1kx/master 1ed93f7 Stefan Wallentowitz: Add snoop port20:44
mor1kxmor1kx/master 0be97e7 Stefan Wallentowitz: DCache: Coherency...20:44
mor1kxmor1kx/master dba9361 Stefan Kristiansson: cappuccino: check atomic_reserve at end of storebuffer...20:44
stekernok, all multicore stuff merged into master20:45
imphil_stekern, nice! thanks20:49
--- Log closed Fri Oct 17 00:00:41 2014

Generated by 2.15.2 by Marius Gedminas - find it at!