# Fractals/Computer graphic techniques/2D/optimisation

"Your current attempt runs in about O(N^2). So even if you had enough memory, it won't finish in your lifetime." Alexander Yee^{[1]}

"premature optimization is the root of all evil" - Donald Knuth in paper "Structured Programming With Go To Statements"

"good algorithms beat optimized code" Bruce Dawson ( Fractal extreme )

# Software optimisation edit

- Profiling code
- benchmarking
^{[2]}

# How to ask about optimisation ? edit

- "If you've actually profiled the code, have specific snippets so that everyone can run the same code to see its performance, and you have this library published somewhere where it can be seen (GitHub, Bitbucket, etc.), then I don't necessarily see a problem with it going on Code Review. Including the results from your profiler of choice with identified bottlenecks would go a long way towards keeping it on-topic.
- If you're just starting the code but have profiled a small amount of code that exhibits the aberrant performance, then asking it on Stack Overflow would be acceptable. Including the results from your profiler of choice with identified bottlenecks would go a long way towards keeping it on-topic.
- If you're asking for people to just help you optimize the code, don't ask it anywhere. Getting a code dump and being asked to find the performance bottleneck is only something I do professionally, not on my free time." (Makoto)
^{[3]}

See also : code review^{[4]}

## Inner loop edit

- inner loop in wikipedia
^{[5]}

## Optimisation of numerical computations edit

### number type edit

- double is faster then float
- "In my tests (with perturbation) in Fraktaler-3, x87 long double is about 4.2x slower than double on AMD 2700X CPU. floatexp (float + int32) on AMD RX 580 GPU is about 2.3x faster than x87 long double on AMD 2700X CPU, so depending on hardware there might not be any point to x87 long double..."fractalforums.org: from-java-to-cplusplus

### Distance edit

- Distance approximation
- Koebe 1/4 theorem
^{[6]}

### Decomposition edit

- quadtree decomposition

"I did a web-based quad-tree Mandelbrot once.^{[7]}I labelled the quadrants (eg a,b,c,d) the tile files were in directories (something like a.png b.png a/a.png a/b.png a/c/a/b.png), along with an sqlite database that said which tiles were boring (all same colour), along with their period (0 for 100% exterior, 1 for 100% inside the main cardioid, 2 for 100% inside the main circle, etc). The client was really simple, just a web page with tile paths; the server would do a redirect to the relevant boring tile 0.png 1.png 2.png etc if any parent of the tile was boring, otherwise it'd try to serve the tile.png. Meanwhile there was another process that just generated the tiles and populated the database. I can't remember if I made 404 tile not found errors bump the priority of the tile for the renderer, but maybe. Eventually i abandoned the project because recalculating from scratch is typically faster and more flexible for shallow zooms, and for deep zooms the storage is prohibitive." Claude Heiland-Allen^{[8]}

## Symmetry edit

### Mirror symmetry edit

Parameter plane and the Mandelbrot set is symmetric with respect to the x-axis in the plane :^{[9]}

colour(x,y) = colour(x,-y)

Here is a C code that shows how to divide symmetric image (with axis in the middle of its height) into 3 parts :

- axis of symmetry
- 2 symmetric parts : above and below axis

```
// fill array using mirror symmetry of image
// uses global var : ...
int FillArray(unsigned char data[] )
{
unsigned char Color; // gray from 0 to 255
// draw axis of symmetry
iy = iyAxisOfSymmetry;
for(ix=ixMin;ix<=ixMax;++ix) PlotPoint(ix, iy, GiveColor(ix, iy));
// draw symmetric parts : above and below axis
for(iyAbove = iyAboveMin; iyAbove<=iyAboveMax; ++iyAbove)
for(ix=ixMin; ix<=ixMax; ++ix)
{ // above axis compute color and save it to the array
iy = iyAxisOfSymmetry + iyAbove;
Color = GiveColor(ix, iy);
PlotPoint(ix, iy, Color );
// below the axis only copy Color the same as above without computing it
PlotPoint(ix, iyAxisOfSymmetry - iyAbove , Color );
}
return 0;
}
```

See also Pascal code ( Delphi) for general case ^{[10]}

### Rotational symmetry edit

Dynamical plane and the Julia set have rotational symmetry. It can be used to speed up computing :

```
// fill array using symmetry of image
// uses global var : ...
int FillArraySymmetric(unsigned char data[] )
{
unsigned char Color; // gray from 0 to 255
printf("axis of symmetry \n");
iy = iyAxisOfSymmetry;
for(ix=ixMin;ix<=ixMax;++ix) PlotPoint(ix, iy, GiveColor(ix, iy));
// above and below axis
for(iyAbove = iyAboveMin; iyAbove<=iyAboveMax; ++iyAbove)
{printf(" %d from %d\n", iyAbove, iyAboveMax); //info
for(ix=ixMin; ix<=ixMax; ++ix)
{ // above axis compute color and save it to the array
iy = iyAxisOfSymmetry + iyAbove;
Color = GiveColor(ix, iy);
PlotPoint(ix, iy, Color );
// below the axis only copy Color the same as above without computing it
PlotPoint(ixMax-ix, iyAxisOfSymmetry - iyAbove , Color );
}
}
return 0;
}
```

## Complex numbers edit

Computations without explicit definition of complex numbers or without complex type are usually faster.

## Parallel computing edit

In Mandelbrot and Julia sets each point/pixel is calculated independently, so it can be easly divided into an smaller tasks and carried out simultaneously.^{[11]}

Types of parallel computing : Wikipedia description^{[12]}

- Multi-core (computing)
- Parallella and OpenCl
^{[13]} - Intel MIC
^{[14]} - MPI
^{[15]} - OpenMP
^{[16]}^{[17]}^{[18]}

- Parallella and OpenCl
- General-purpose computing on graphics processing units (GPGPU):
- OpenCl
^{[19]} - CUDA
^{[20]} - WebGl
^{[21]}^{[22]}

- OpenCl

### Vectorisation edit

#### C edit

- gcc vector extension
^{[23]}^{[24]}^{[25]}^{[26]} - Intel® Advanced Vector Extensions
^{[27]}^{[28]}

#### c++ edit

- Mandelbrot in Eigen - Explicit vectorization

#### R edit

Vectorisation ^{[29]}^{[30]}

Here is R code of above image with comments:^{[31]}

## based on the R code by Jarek Tuszynski ## http://commons.wikimedia.org/wiki/File:Mandelbrot_Creation_Animation_%28800x600%29.gif iMax=20; # number of frames and iterations ## world coordinate ( float ) cxmin = -2.2; cxmax = 1.0; cymax = 1.2; cymin = -1.2; ## screen coordinate ( integer) dx=800; dy=600 ; # define grid size ## sequences of values = rows and columns cxseq = seq(cxmin, cxmax, length.out=dx); cyseq = seq(cymin, cymax, length.out=dy); ## sequences of values = rows * columns cxrseq = rep(cxseq, each = dy); cyrseq = rep(cyseq, dx); C = complex( real=cxrseq, imag = cyrseq); # complex vector C = matrix(C,dy,dx) # convert from vector to matrix # Now C is a matrix of c points coordinates (cx,cy) # allocate memory for all the frames F = array(0, dim = c(dy,dx,iMax)) # dy*dx*iMax array filled with zeros Z = 0 # initialize Z to zero for (i in 1:iMax) # perform iMax iterations { Z = Z^2+C # iteration; uses n point to compute n+1 point F[,,i] = exp(-abs(Z)) # an omitted index is used to represent an entire column or row # store magnitude of the complex number, scale it using an exponential function to emphasize fine detail, } # color and save F array to the file library(caTools) # load library with write.gif function # mapped to color palette jetColors . Dark red is a very low number, dark blue is a very high number jetColors = colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F", "yellow", "#FF7F00", "red", "#7F0000")) write.gif(F, "Mandelbrot.gif", col=jetColors, delay=100, transparent=0)

### Multicore edit

"I always create as many worker threads as I have cores. More than that and your system spends too much time task switching" - Duncan C^{[32]}^{[33]}

Using OpenMP :

```
// fill array using symmetry of image
// uses global var : ...
int FillArraySymmetric(unsigned char data[] )
{
unsigned char Color; // gray from 0 to 255
printf("axis of symmetry \n");
iy = iyAxisOfSymmetry;
#pragma omp parallel for schedule(dynamic) private(ix,Color) shared(ixMin,ixMax, iyAxisOfSymmetry)
for(ix=ixMin;ix<=ixMax;++ix) PlotPoint(ix, iy, GiveColor(ix, iy));
/*
The use of ‘shared(variable, variable2) specifies that these variables should be shared among all the threads.
The use of ‘private(variable, variable2)’ specifies that these variables should have a separate instance in each thread.
*/
#pragma omp parallel for schedule(dynamic) private(iyAbove,ix,iy,Color) shared(iyAboveMin, iyAboveMax,ixMin,ixMax, iyAxisOfSymmetry)
// above and below axis
for(iyAbove = iyAboveMin; iyAbove<=iyAboveMax; ++iyAbove)
{printf(" %d from %d\n", iyAbove, iyAboveMax); //info
for(ix=ixMin; ix<=ixMax; ++ix)
{ // above axis compute color and save it to the array
iy = iyAxisOfSymmetry + iyAbove;
Color = GiveColor(ix, iy);
PlotPoint(ix, iy, Color );
// below the axis only copy Color the same as above without computing it
PlotPoint(ixMax-ix, iyAxisOfSymmetry - iyAbove , Color );
}
}
return 0;
}
```

#### Cell edit

"The Sony PlayStation 3 is one of the cheapest parallel computers available on the consumer market."^{[34]}^{[35]}

#### GPGPU edit

- CUDA
- Yellow Dog Linux for CUDA
^{[36]}

- Yellow Dog Linux for CUDA
- par4all
^{[37]}

- OpenCl
- GLSL - Tools for GLSL editing
- Fragmentarium : playground for working with GPU (GLSL) based pixel graphics built in C++, OpenGL/GLSL, and Qt 4.
- shader editor using kickjs
- GLSL Sandbox by Ricardo Cabello Miguel using three.js
- jsfiddle and example
- WebGl : GLSL thru html. It is a 3D graphics API for JavaScript based on the OpenGL ES 2.0 API, that developers can use to create fully 3D web apps.
- shadertoy by Inigo Quilez, Pol Jeremias : a browser based Fragment Shader editor for 2D effects with it.
- Chrome Experiments
- WebGL playground - new is not working !!
- examples :

Canvas : list of supporting browsers

## Effects edit

Effects on 2 core CPU ( with 1 thread per core )

Method | time [minutes] | relative speed |
---|---|---|

CPU & no optimisation | 80 | 1 |

CPU & symmetry | 39 | 2 |

CPU & symmetry & OpenMP | 24 | 3 |

GPU | ? ( to do ) |

GPU should be:

- 62 time faster than CPU
^{[38]} - 760 times faster
^{[39]}

Perturbation algorithm with series approximation ( 100 times faster on zoom levels around e100! ) ^{[40]}

Using WinXP Intel 2.9 GHz CPU (1 CPU used) with a GTX 480 GPU I get the following using 1000x1000 plot with 1000 max iterations :^{[41]}

type | time | description |
---|---|---|

gpu | 0.07s | gpu is a pure CUDA solution on the GPU |

gpuarray | 3.4s | gpuarray uses a numpy-like CUDA wrapper in Python on the GPU |

numpy | 43.4s | numpy is a pure Numpy (C-based) solution on the CPU |

python (serial) | 1605.6s | python is a pure Python solution on the CPU with numpy arrays |

# References edit

- ↑ stackoverflow question: what-is-the-fastest-way-to-calculate-e-to-2-trillion-digits
- ↑ The Computer Language Benchmarks Game
- ↑ Stack overflow meta : Is it okay to ask code optimization help?
- ↑ stack exchanfge : codereview
- ↑ Inner loop in wikipedia
- ↑ Adaptive super-sampling using distance estimate by Claude Heiland-Allen
- ↑ Distance estimation by Claude Heiland-Allen
- ↑ fractalforums.org: creating-super-resolution-fractal-images-really-fast-with-quad-trees
- ↑ wikipedia : Reflection symmetry
- ↑ Mirror symmetry around x axis at fraktal.republika.pl
- ↑ embarrassingly parallel problem
- ↑ wikipedia : Parallel_computers - Classes_of_parallel_computers
- ↑ Mandelbrot using OpenCl and Parallella
- ↑ wikipedia : Intel MIC
- ↑ parallel mandelbrot set (C Code with Message Passing Interface (MPI) library) by Omar U. Florez
- ↑ Guide into OpenMP: Easy multithreading programming for C++ by Joel Yliluoma
- ↑ Parallel Mandelbrot with OpenMP by dcfrogle
- ↑ claudiusmaximus : exponential mapping and openmp
- ↑ Part 1: OpenCL™ – Portable Parallelism By manythreads
- ↑ A Mandelbrot Set on the GPU in Matlab by Loren Shure
- ↑ progressive-julia-fractal using webgl by Felix Woitzel
- ↑ glsl sandbox at heroku.com
- ↑ gcc : Vector Extensions
- ↑ First code example using gcc vector support by bert hubert
- ↑ Mandelbrot calculation using SIMD by EJRH Edmund Horner
- ↑ The Computer Language Benchmarks Game
- ↑ Intel Advanced Vector Extensions
- ↑ ICC and Mandelbrot by Tim Horton
- ↑ Vectorization (parallel computing)
- ↑ Matlab - Code Vectorization Guide
- ↑ mandelbrot set in r by J.D. Long
- ↑ /programming/what-is-the-general-approach-to-threading-for-2d-plotting/ Fractal forums discussion : What is the general approach to threading for 2d plotting
- ↑ fractalforums discussion : most-powerful-computer-possible-for-a-reasonable-price/?PHPSESSID=7db41da33f499eb25ef85f46866438f2
- ↑ Algorithms for Specific Hardware -Mathematical Image Analysis Group Saarland University, Germany
- ↑ Jenks Parallel Programming Blog
- ↑ Programming high-performance applications on the Cell BE processor, Part 1: An introduction to Linux on the PLAYSTATION 3
- ↑ par4all
- ↑ MandelCPU vs MandelGPU Written by David Bucciarelli
- ↑ mathworks : llustrating-three-approaches-to-gpu-computing-the-mandelbrot-set
- ↑ Kalles Fraktaler 2 by Bernard Geiger
- ↑ Mandelbrot calculate using GPU, Serial numpy and faster numpy Use to show the speed difference between CPU and GPU calculations by Ian Ozsvald ianozsvald.com July 2010