IRC logs for #openrisc Saturday, 2013-03-16

stekernglowplug: what do you mean by optional vs integral lsu?05:59
stekernbut to go back to your question about blockram speed vs system memory, that depends of course what kind of system memory you have, but let's assume we are speaking SDRAM06:00
stekerneven if you run that SDRAM at 100MHz, you have a 2 clock cycle latency on random accesses (+ you have to break for refreshes)06:02
stekernplus, you are using that memory for both your instructions and data, so they have to wait for accesses from the other bus06:03
glowplugI wasn't sure if the LSU was a module that could be turned on and off for cappuccino but after reading more I realized that it was integral to it and not present in the other pipeline implimentations.06:24
glowplugThe memory would be DDR.  A board with two DDR chips could be used for caching data and instructions seperately.06:25
glowplugThere is an extremely interesting project that I discovered today where a small group is working on DIY semiconductor fabrication.  They have 20 micron lithography which can be scaled down to 405nm cheaply.  The first mor1kx ASIC could be homemade on a 405nm process in someones garage. Pretty exciting stuff. =)06:38
glowplugThey have an IRC channel #homecmos.06:41
glowplugI think communication between the two groups is probably important given that each group is solving a seperate half of the same problem.  The democritization of technology.06:41
stekernall pipelines have lsus06:44
stekernbut only cappuccino have a seperate pipeline stage for the data memory accesses06:45
glowplugI think the first thing I'm going to do when I understand the HDL is to help write better documentation for mortals.  8)06:50
stekernto understand the basics of a risc pipeline, I find this pretty helpful: http://en.wikipedia.org/wiki/Classic_RISC_pipeline06:53
glowplugI read that one many times.  =)06:54
glowplugUnfortunately I'm very new to Verilog so it's hard to apply the high level concepts with the code.  Getting there though (slowly).06:54
glowplugIf you don't mind me asking where did you learn chip design?07:07
stekernI studied electrical engineering, there I was introduced to vhdl07:09
stekernthen I didn't do any hdl design for a couple of years, and felt I needed to dust of the knowledge, so I bought a spartan 3 dev board and designed a subtractive synthesizer07:10
glowplugAt a University?07:10
stekernyes07:11
glowplugWhy is it that Universities all teach VHDL over Verilog?  Haha07:11
stekernbecause I lived in Sweden ;)07:12
glowplugThat is really great though.  I've never attended a University but hopefully I can catch up and contribute.  8)07:12
stekerneurope = vhdl, states = verilog (roughly)07:12
stekernopen source hdl tend to be verilog, since there are better open source tool support for it (simulators and such)07:13
glowplugI live in Metro Detroit in Michigan, USA.  =)07:13
glowplugIt doesn't sound like you guys will be coming to the States anytime soon.  I would attend if you had the 2013 conference here though!07:15
stekernI wouldn't mind having a conference in the states, if someone paid my flight ticket ;)07:22
stekernthere are others involved in the project from the states07:23
glowplugReally?  I thought everyone was in Europe.  Very cool.07:23
stekernpeter gavin, that have done massive work on the toolchain, is for instance07:39
glowplugDoes he ever come into the IRC channel?07:47
stekernyes, but it was a while since he was active07:51
mor1kx[mor1kx] skristiansson pushed 6 new commits to master: https://github.com/openrisc/mor1kx/compare/ea9ec64ae27e...e11294b50bd816:03
mor1kxmor1kx/master 26e8358 Stefan Kristiansson: cappuccino/execute_ctrl: allow mfspr to stall the pipeline16:03
mor1kxmor1kx/master 5f8cb0f Stefan Kristiansson: cappuccino/rf: fix typo in comment16:03
mor1kxmor1kx/master 6a2745c Stefan Kristiansson: cappuccino/execute_ctrl: whitespace cleanup16:03
juliusbstekern: I like your move icache into fetch work17:22
juliusbthat makes a lot of sense17:22
stekernyeah, in hindsight it was a mistake to keep the hierarchy flat like that17:53
stekernthe mmus are in the fetch and lsu too17:54
stekernouch... seems I have created a insanely slow path to the debug system now...17:59
stekernand just when I was so happy that I just had fixed the single stepping that I had broke :(18:06
stekernlet's see if I can pinpoint what it is18:07
stekernone thing I have done is to flush the tail of the pipeline, since we are taking the exceptions at the end of it.18:08
stekernbut I think that a du_stall should not flush the tail, only up until execute18:09
stekernbut having the du_stall connected there might be the cause of the slow path18:10
stekern...nope, it's something else18:11
stekernjuliusb: I'm going to make you a sad panda, looks like it's your range or overflow stuff that is causing the insane slow path18:30
stekerncan't say I understand how that can make a path from dcache to the debug interface though18:32
stekernI just tried disabling both in mor1kx.v, I'll try to see which one of them is the guilty one18:34
stekern(both as in OVERFLOW and ADDC)18:35
stekernso my comment about range was probably wrong, it's OVERFLOW or ADDC18:37
stekernmy guess is addc, since that is the only thing that I can come up with having anything in common with the debug unit and dcache, the address18:39
juliusbHmm OK18:43
juliusbmaybe addressing into the debug SPRs?18:43
stekernit goes through the wb arbiter18:44
juliusb??!18:45
juliusbpastie the report?18:45
stekernyeah, I will, just got to run the synthesis again :)18:45
juliusbnps18:45
juliusbare you building for the DE0?18:45
stekernjupp18:45
stekernit's no crisis though, they are easy to disable with the parameter18:46
juliusbnice18:46
juliusbwe're using that for the chiphack so I'll have to take your build system for it :)18:46
juliusbyeah I agree18:46
juliusbi's no biggie18:46
stekernhave it run at 30 MHz with ADDC or 80 MHz without ;)18:47
juliusbit's kinda useless unles you want it (C compiler doesn't emit etc.)18:47
stekernchances are that it's solve it self when I get to reworking the dcache18:47
juliusb!!18:47
juliusbI'd like to do some research on the espressos18:47
juliusbare you building in orpsoc or mor1kx-dev-env?18:49
juliusbor c)18:49
juliusb :)18:49
stekernorpsoc18:52
stekernactually, it looks like the RANGE feature is creating a long path too (sorry about being confusing, it was FEATURE_RANGE and FEATURE_ADDC that I disabled)18:55
stekernjuliusb: http://git.openrisc.net/cgit.cgi/stefan/orpsoc/ <- that's my build system18:55
stekernjust copy a fresher mor1kx in there18:55
stekernthe range feature creates a long path through immu, dmmu and icache though18:58
stekernlet's concentrate on the ADDC then, I guess the range path can be because it's hooked up to the exceptions18:59
juliusbyes, but we should perhaps regsiter it or something19:02
juliusb(the range exception)19:02
juliusbthat's easy to to - out of the ALU19:02
juliusbor maybe in ctrl19:02
stekernshouldn't it already be registered from the alu?19:04
stekernhmm... enabling ADDC without RANGE doesn't bring out that long path to debug if anymore...19:07
stekernso I guess it's the RANGE feature that is the problem after all19:08
juliusbok cool19:16
juliusbno not sure we're registering it19:16
juliusbyou got a logic overhead number out of interest (for disabling RANGE?)19:16
juliusbwell, for enabling it19:16
juliusbwhat's design size diff>?19:16
stekernhrrm, not sure19:17
stekernI got the signaltap in there too, so probably should rip that out and test before doing anything drastic19:17
juliusbah no probs19:18
juliusbjust out of curiosity19:18
juliusbI will look at it later19:18
stekernhttp://pastie.org/pastes/6574031/text19:19
stekernthat's the debug if path anyways19:19
stekernbut I think that is one of the slower/longer paths, so perhaps it's just getting pushed up by the range exception19:20
juliusbwtf19:20
juliusb25ns of that path is outside of mor1kx19:21
juliusbodd....19:21
juliusband this is only being enabled when you have RANGE going?19:21
stekernjupp19:21
juliusbwow, OK19:21
juliusbthat makes no sense to me at the moment19:22
juliusbi have to head out for a little bit, should be back in an hour or so19:22
stekernnps, I'll be on and off here (as always =P)19:22
juliusbvery good :)19:23
stekernthat path doesn't make sense, but I know why it happens19:23
stekernthe root problem is the cache, where the wb-bus ack is combinatorially connected to the ack of the dcache (which of course look at the addresses to generate the ack)19:24
stekernin practice, they are never active at the same time, but it's damn hard for the synthesizer to know that19:25
juliusbso false path19:25
juliusbbut... what do you mean address?19:25
stekernthe lsu address19:27
juliusbah right19:27
stekernis compared to what comes out of the caches tagram19:27
juliusbyep, but how is the overflow stuff playing with that?19:28
juliusbi thought we just generated a signal based on it19:28
stekernmy guess it's not related to that, just that it creates a (different) critical path which the router resolves nicely, but forces this path to be long and slow19:29
stekernI've seen that happen before19:30
juliusbohh OK19:33
juliusb:(19:33
juliusbbut we're not really adding that much logic with the overflow thing19:33
juliusbi'd be very interested to see areas of overflow vs no-overflow19:34
juliusbbut I can play with that19:34
juliusbanywya19:34
juliusbreally away for a bit19:34
stekernwell, you have it connected to execute_valid19:43
stekernthat's a "heavy" signal19:43
stekernand the lsu_valid is connected to it19:43
stekern(which in turn is connected to the dcache ack)19:43
stekernso I guess that path makes sense then19:43
stekernso, yeah, the solution is probably to move that logic out of ctrl into execute_ctrl and generate the (registered) exeption signal in there19:47
stekernwhy is it connected to execute_valid anyway? when the overflow_set/clear signals are in control stage?19:48
juliusbwhich file? in ctrl?20:40
stekernyes20:41
stekernlooks like the execute_valid is the problem, having RANGE enabled and commenting out that execute_valid makes the long path go away20:42
juliusbit's not:?!20:42
juliusbI can't see where/how20:43
juliusbit only goes to the range exception signal20:44
stekernyeah, exactly20:44
stekernthat's what I'm speaking about20:44
stekernhttps://github.com/openrisc/mor1kx/blob/master/rtl/verilog/mor1kx_ctrl_cappuccino.v#L31420:46
juliusbohhh, AND with it?!20:46
stekernmmm20:46
stekernconnected to it with an AND =P20:47
stekernheh, maybe my way of saying that was confusing20:47
juliusbok... but20:48
juliusbI don't get it20:48
juliusbwe're reading it there20:48
stekernbut the question remains, why is it anded with the ctrl_overflow_set_i?20:48
juliusbi guess if execute valid is controlled by the LSU20:48
juliusbumm20:49
juliusbyes I don't know :)20:49
stekernctrl_overflow_set_i is in control stage and execute_valid_i is in execute20:49
juliusbmaybe it's something I forgot to remove from the espresso and pronto20:49
juliusbyes20:49
juliusbit's probably wrong then!20:49
juliusbwell20:49
juliusbti'20:49
juliusbit's from the execute_ctrl module20:49
juliusboverflow is registered in there20:49
stekernmmm, and the execute_valid is implied by the latching with padv_i20:50
juliusbahh, execute_valid is combinatorial20:53
juliusbnot registered in execute_ctrl_capp20:54
juliusbbut then the overflow indication will come a cycle later20:54
juliusbit's not like that in espressos20:54
stekernbut that's all cool, all exceptions are handled in ctrl stage20:54
juliusbok20:54
juliusbbut, we can then break the path can't we maybe?20:54
stekernyeah, to me it looks like that execute_valid_i could just be removed and it should work as intended20:55
juliusbI just picked a signal in cappuccino which I think was right20:55
juliusbyes I think so\20:55
juliusbbecause we register it based on padv in the execute_ctrl stage20:56
juliusbs/stage/block/20:56
juliusblet me check this :)20:56
stekernyes, and that's basically how all exceptions are handled20:56
stekernyou designed it that way ;)20:57
juliusbI suspect, though, we only want that range guy to pulse for a sec20:57
juliusbso it may get stuck20:57
stekernnaah, when the exception hits SR will be "overwritten"20:58
juliusbnope. worked fine20:58
juliusbor1k-ov passes anyway20:59
juliusbrunning or1k-ovcy21:00
juliusb(it takes a while)21:00
juliusbwill see how it goes21:00
juliusb(if the verilator model built for cappuccinno it'd be alot quicker :P)21:00
stekernor1k-cy didn't pass before at least21:00
juliusboh, or1k-cy shouldn't be effected by this21:00
stekernyeah, I'm gonna fix that, the problem with the dcache and and wb-ack is what makes those loops there too (I think)21:01
juliusbodd that it didn't pass thoguh21:01
juliusbok I just saw an overflow exception go through in ovcy, so I think this is behaving fine21:05
juliusbdo you want to check that in?21:05
stekernyeah, sure, if you're not up to it =P21:08
juliusbah yeah im just not exactly sat infront of the machine at present21:19
juliusbso before we forget , I reckon it's agood thing21:19
stekernok, consider it done21:23
stekernI'll queue it up among my millions of other commits21:24
stekernthe execute_valid is a bit tricky in cappuccino btw, because it can depend on two instructions, one in execute and one in ctrl stage.21:28
stekernso it should only go high when both of them have completed21:29
juliusbah cool21:45
juliusbor1k-cy passes here man21:45
juliusbon capp21:45
juliusbim in front of a PC now21:45
stekernok, got to recheck that then22:25
stekernI'm pretty close to have queued up all the dmmu work now22:26
stekerns/dmmu/mmu22:26
stekernit's basically just dropping in the mmu stuff and the signals related to that left22:27
stekernup until now I've only merged "bugfixes" =P22:27
stekernincluding basically rewriting the fetch-stage (again)22:29
stekernwhy is that such a pain...22:30
stekern?22:30
juliusbhaha22:32
juliusbyeah, it's a fairly important bit of the thing22:33
stekernoh no... now I broke the debugexcept test again...22:33
juliusb:-/22:35
juliusbwell I wrote a test that breaks espresso :)22:37
juliusbhttps://github.com/juliusbaxter/mor1kx-dev-env/blob/master/sw/tests/or1k/sim/or1k-intmulticycle.S22:38
stekernbut that's more positive, that's finding already existing bugs22:38
stekernI'm just creating new ones...22:38
stekern:(22:38
juliusbhehe22:39
stekernat least my small increamental changes strategy is paying off, I found the cause for it within minutes :)22:39
juliusbyes, that is good22:39
juliusbi like your strategy actually22:40
juliusbdo the work in a big lump22:40
juliusbthen progressively verify as you check in22:40
juliusbI like writing these asynchronous-exceptions-while-we're-doing-some-work tests22:41
juliusbfeels like a real world aplication to me22:41
juliusbanother real-world application - FreeRTOS22:41
juliusbyou know it's the second most used RTOS in the industy, second only to in-house written ones22:41
juliusbI saw that on eetimes recently and was surprised22:42
juliusbso figured I should get a nice little demo app for OpenRISC together22:42
juliusband it's what I intend to demo at the chip hack22:42
juliusbthere's a port22:42
juliusbbut it's only ever been run on or1ksim22:42
juliusbso needs some massaging22:42
stekernat our company, uCos is by far the most used22:42
juliusbyeah uCos isn't that popular surprisingly22:42
juliusbhttp://eetimes.com/ContentEETimes/Images/news/20130227pcEmbeddedSurvey841.jpg22:43
juliusbsorry I was wrong22:43
juliusbandroid is second now22:43
juliusbbut that's for serious systems22:43
juliusbfor deeply embedded freertos surely gets used second most22:43
glowplugHow many run the RTOS on the FPGA though?  Normally the RTOS runs on a uC and communicates with an FPGA if necessary correct?22:43
juliusbglowplug: that's probably very true22:44
juliusbbut why not put it all on the FPGA? save yourself another IC22:44
juliusb(i'll give you some reasons, though - power, cost :/ )22:44
juliusbeCos is nothing, I'm surprised by that22:45
glowplugMaybe we are 5 years away from DIY 405nm ASICs that can run mor1kx.  Until then using OpenRISC for RTOS would be a $100+ FPGA board.  Companies are short sighted and will take the $5 i.mx233 instead.22:45
stekernyeah, if you only count RTOS it's FreeRTOS, VxWorks, ucOS22:45
juliusbmicrium lost 4% lsat year!22:45
juliusbThreadX I thought was massive, too, but it's not so big22:45
juliusbFreeRTOS rules them all :)22:45
glowplugThere are not cost savings.  iC's with PC power cost ~$5 now.22:45
juliusbwe use all home-grown stuff on our chips22:46
juliusbnothing commodity22:46
juliusbno real reason for it, just it was done that way in the beginning and they've continued with it22:46
juliusbbut, apparently that's not too uncommon22:46
glowplugThen you add a cheap FPGA to your design for $5 (versus $50).  That is what most companies will do (and as far as I know, are doing).22:46
stekernglowplug: not if they need to have an FPGA anyway22:46
juliusbi was surprised to see that, too. I knew nothing about the amount of use RTOSes get in industry until I saw that chart22:46
glowplugThe systems that I know of have a uC and FPGA on a single board.22:47
glowplugMaybe that is unique to motion control / communications.22:47
stekernglowplug: I actuall had motion control/fieldbus systems in mind22:47
glowplugThe raw performance of OpenRISC on a $100 FPGA is roughly 10% that of a ~$5 32-bit uC.22:47
stekernthat's what I see most of at dayjob22:47
juliusbSure, but if you can have spare capacity on your FPGA, and can deal with a cut in the MIPS you'll get from running an RTOS on a softcore, then you can save yourself money by buying only a FPGA + RAM instead of FPGA + uC + RAM22:48
juliusbanyway22:48
juliusbthis is my dream for April22:48
juliusbFreeRTOS on mor1kx on de0 nano22:48
juliusbit shall be22:48
glowplugI agree with all of that except the money savings part.  I don't think 11k LE's would ever be spare space for cost sensitive operations.22:48
glowplugFor non cost sensitive operations they need the MIPS.  So its uC anyways.22:48
juliusbthe cost sensitive, it depends on what the cost of the FPGA To fit your soft-core is vs the cost for a uC + stuff it requires22:49
juliusbthe incremental cost, I mean, if you need to go one model up or something22:50
glowplugThere are many many factors.  Too many.22:50
glowplugBut in my opinion for a project to benefit from an OpenRISC core on an FPGA a lot of factors need to fall into place perfectly.22:50
stekernlook at this for example: http://industrial.softing.com/en/products/functionality/software/protocol-stacks-for-fpgas/offering-for-altera-fpgas/offering-for-altera-fpgas.html22:51
stekernin right quantities, large FPGAs can become _very_ cheap22:52
glowplugNIOS II softcores occupy ~1k LE's22:52
glowplug11 times less than ORPSoC.22:52
stekernbut Nios II is just the cpu22:52
stekernorpsoc is a whole soc22:53
glowplugThat is with peripherals.22:53
glowplugLet me see if I can find the PDF I had.22:53
glowplugwww.altera.com/literature/ds/ds_nios2_perf.pdf22:55
glowplugIt looks like you are right.  It says 1,230 LE's for the NIOS II core.  RAM controller is 3,100.22:56
glowplugNo wait.  Cyclone IV GX LE usage 360.22:57
glowplugI'm not really sure what the deal is with the Cyclone V LE usages.  But the Cyclone IV with RAM controller is ~$2100 LE's.22:58
glowplug*~210022:58
stekernno, it says 1810 for only the cpu with 512 byte cache and no hw multiplier22:58
stekernon a IV GX22:59
glowplugI agree that the NIOS II isn't comparable to OR1200.22:59
glowplugBut I think the systems that utilize NIOS II do so because the LE requirement is very low making the overall hardware cost very low.22:59
stekernno, they do so because altera have been succesful with their tools around it + with their pr23:01
glowplugHaha23:01
glowplugThat may very well be the case.23:01
stekern1810 isn't "very low"23:01
stekernit's good, but it shouldn't be completely impossible to achieve23:02
glowplugIf that is achievable then I totally agree with the cost benefits.23:02
glowplugBecause such a small softcore is usable in ~$15 FPGA boards in volume.23:02
glowplugThe problem then becomes what you mentioned.  Tools and PR / advertising.23:03
stekernI know that at least cyclone III 25k devices become dirt cheap in volume23:03
stekernthere isn't really a connection between LEs and price when you go up in volume, there are certain devices that are cheap, and some are not23:04
stekernyou'd easily fit orpsoc on one of those ;)23:05
glowplugI plan on using mor1kx for my own motion control projects.  But that is because I see the immense importance of open / distributed design.  Large companies arent as forward-looking.  =)23:05
stekernI agree23:05
glowplugThat is true at volume.  The problem though becomes obtaining that volume yourselves.  Because nobody else is going to financially jump before you guys do.23:06
glowplugSo to offer devices at those prices you probably have to already have them.23:06
glowplugAre you familiar with this company?23:08
glowplughttp://www.mesanet.com/fpgacardinfo.html23:08
stekernnope23:09
glowplugThey are extremely popular in motion control and industrial networking.23:09
glowplugAll of their products are impressive.  But most notibly are that they offer softcore motor control that can be loaded onto the FPGA's and can drive eight motors simultanously with encoder feedback.23:10
stekernnice23:11
glowplugMesa is probably who you would be competing with mostly.23:11
glowplugIf you decided to design such a device.23:11
stekernmost of the motion control I've seen is on the client side of fieldbuses23:12
stekernlike ethercat, powerlink (and the more exotic macro)23:13
glowplugThe Mesa devices can be client side.  And even communicate with EtherCAT.23:14
stekernwhat I meant was that I haven't been involved with the actual motor control, only interfacing with the fieldbus as a client23:19
glowplugOhh I see.23:19
glowplugThis is something that I would like to achieve myself.  Six axis bldc motor controller with closed loop encoder feedback on a single FPGA.  Mesa offers it but what fun is that?23:20
glowplugI think if you could offer such a thing at ~$150 they would sell.23:21
stekernI agree23:21
glowplugWell lets get on it then!  Haha23:21
glowplugMesa offers the IP for SoftDMC but obviously its a proprietary license.23:22
glowplugIf I can build the Quandary FPGA board for ~$40 each and we can figure out the motion control IP then I think that such a product is possible.23:24
glowplugI found a site that will prototype the 6-layer boards for Quandary at $40 each with a $275 minimum order.  Cheaper than China.23:25
glowplugIn volume order the boards can be made for ~$3 each.  But the $40 boards can be used for prototyping.23:26
glowplugAt any rate when I get some boards made I will send you guys some after I verify that they actually work.  =)23:37
juliusbstekern: ah that bug in the new test wasn't a bug in the HW, it was a bug in the test23:44

Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!