--- Log opened Fri Nov 14 00:00:27 2014 | ||
poke53282 | The last two hours I tried to figure out the speed benefit of dynamic recompilation in Javascript. In my example it is a factor of five in Firefox, but just a factor of two in Chrome and IE. Let's see. | 04:45 |
---|---|---|
stekern | poke53282: my boys are complaining about chrome being slow when they play some games, so they tend to use firefox for that | 06:58 |
stekern | I've currently strayed into a non-openrisc project... porting scummvm to windows phone 8.1 | 06:59 |
poke53282 | :) Yes. I think Chrome is only good for non-optimized code. The first few month when I started working on jor1k Chrome was always the fastest. Sind 1.5 years, this is no longer the case. | 07:17 |
poke53282 | ScummVM doesn't exist for Windows Phone yet? | 07:17 |
poke53282 | I thought ScummVM was ported to every device. Like Doom :) | 07:18 |
stekern | nope, it's not ported. a big factor why, is that there's no SDL 1.2 port for windows phone | 07:51 |
stekern | there's a 2.0 one, but that's highly experimental, but since scummvm use 1.2, it doesn't help much | 07:51 |
stekern | the port I'm working on doesn't use SDL at all, so it's quite a bit of work | 07:52 |
olofk | stekern: The ScummVM port for Sailfish is SDL2-based | 07:56 |
olofk | Sailfish runs Wayland | 07:56 |
olofk | So that's why they need SDL2 | 07:56 |
olofk | So there must be an SDL2-port of ScummVM available somewhere I guess | 07:57 |
olofk | If you want to cheat :) | 07:57 |
poke53282 | Maybe it is easier to port Dosbox | 08:06 |
poke53282 | The lesson I learned today: Optimizing the top 20 hot spots (~200 instructions) in the Linux kernel doesn't change anything. Instead managing the heuristics to find those hot spots increases the loading time by 50% :( | 08:24 |
olofk | poke53282: I did an analysis of the most executed functions during linux boot a while ago | 08:26 |
poke53282 | At the moment I make heuristics from one jump instruction (including l.bf) to the next. The average instructions in a code block are 6 instructions. But probably I have to optimize on the function level in the end. | 08:28 |
olofk | IIRC memcpy and memset where the biggest contributors. I did some experiments to optimize that, and by stealing the memset code from Microblaze, it dropped down from the top 10 | 08:28 |
olofk | I wonder if I still have the results somewhere | 08:30 |
poke53282 | yes, our memcopy is a bit ... let's say simple. | 08:31 |
olofk | We could say that it's optimized for code size :) | 08:31 |
olofk | Can you use DMA for memcpy easily under Linux btw? | 08:32 |
poke53282 | exactly :) | 08:32 |
olofk | I mean if you have a DMA component that you can use. We have one in verilog now | 08:32 |
poke53282 | Not sure. Does it make sense. Probably only for sizes > 1MB. | 08:32 |
olofk | True. I haven't got a clue how large the common memcpy is | 08:33 |
poke53282 | The overhead for < 10kB is probably too much. | 08:33 |
olofk | As long as you can let the CPU do other things in the meantime, I think that the overhead would be quite small. Setting up the DMA only require a few instructions, but the code paths might be longer | 08:34 |
olofk | Does SDRAM and DDR2 use the same commands? I'm getting a bit interested in reusing the state machine from wb_sdram_ctrl in my DDR2 controller | 08:36 |
poke53282 | Interrupt handling, scheduling. All that is overhead. | 08:37 |
poke53282 | I don't have a clue about those memory chips. | 08:37 |
olofk | Yeah, it's probably more overhead than I first thought, but still interesting to know if the Linux kernel can be set up to use it for larger blocks | 08:38 |
poke53282 | By the way. At the moment my code tries to find hotspots and exchanges those spots by some compiled Javascript lines. For example like this: http://pastie.org/9718518 | 08:38 |
poke53282 | http://stackoverflow.com/questions/25521422/dma-memcpy-operation-in-linux | 08:39 |
poke53282 | This might include the answer | 08:40 |
olofk | poke53282: What was the original instruction sequence that you replaced? | 08:44 |
olofk | And DMA for memcpy seems not worth the trouble, even though I think that our break-even block size would be much smaller than 1MB since we have smaller caches than an x86 | 08:45 |
poke53282 | http://pastie.org/9718539 | 08:45 |
poke53282 | I found already an error :) | 08:45 |
poke53282 | This is the work of around 5 hours. So still a lot to do. And tons of ways to optimize. But maybe we might get Descent working with a good framerate :) | 08:49 |
olofk | Woohoo!!! Keep working :) | 08:51 |
poke53282 | But I hope, that the browsers will survive such a torture. With the current way I have to recompile 10000-20000 functions just to boot the kernel. | 08:51 |
poke53282 | blocks, not functions. | 08:52 |
poke53282 | Butt I have enough ideas to prevent such cases. | 08:52 |
poke53282 | Let's see | 08:52 |
poke53282 | mv poke /home/bed && sleep 28800 | 08:54 |
stekern | olofk: hmm, can't seem to find that sailfish port | 09:03 |
stekern | but it's probably still more viable to do a 'pure' port, since the existing wp SDL port is experimental | 09:04 |
stekern | and besides that, SDL wouldn't help me with what I'm currently busy with, file system operations | 09:12 |
olofk | stekern: Here's the scummvm port for Sailfish if you decide to switch over to SDL2 https://build.merproject.org/package/show?package=scummvm&project=nemo%3Adevel%3Aapps | 11:09 |
olofk | _florent_, ysionneau : I'm reading the DFI spec now, and I get the impression that only the Read and Write Interfaces need two phases, and that I can use a single phase for the Control Interface. Any thoughts? | 11:12 |
olofk | I'm thinking that signals such as address, ras, cas, we will be the same in both phases anyway | 11:14 |
_florent_ | hi olofk, in fact you have to: | 11:32 |
_florent_ | - choose wrcmdphase and wrphase according to your write latency | 11:32 |
_florent_ | - choose rdcmdphase and rdphase according to your read latency | 11:32 |
_florent_ | - PRECHARGE, ACTIVATE or REFRESH can be done on any phase | 11:32 |
_florent_ | (Hope I'm not mixing things between DDR, DDR2, DDR3...) | 11:33 |
olofk | ah ok. | 11:40 |
olofk | I see it now. From the spec : "The PHY must be able to accept a command on all phases to be DFI compliant. If the MC is only using certain phases, the | 11:41 |
olofk | PHY must be appropriately connected to properly interpret the command stream" | 11:41 |
-!- rah_ is now known as rah | 11:48 | |
stekern | olofk: so how do I get the sources from that? | 13:33 |
stekern | nevermind, I found it | 13:42 |
stekern | ok, it's based on some random commit and no commit history... | 14:06 |
stekern | I think I'll try to compile the SDL 2.0 port and build against that and see what happens | 14:07 |
--- Log closed Sat Nov 15 00:00:29 2014 |
Generated by irclog2html.py 2.15.2 by Marius Gedminas - find it at mg.pov.lt!