YOUR ACCOUNT

Login or Register to post new topics or replies
byRo
an Englishman in Brazil

Posts: 138
Filters: 8
Maybe Vladimir is going to find a way to justify this, but from where I'm sitting this looks very very wrong. smile:evil:

That Blur components aren't the speediest of components we all know, but I have been noticing that, even when composing filters that don't use them, certain configurations just seem to slow down to a complete halt!
I just made one, using only the "faster" components (Tile, Blend, Offset) which I gave up on after an hour of calculating. Difference was that there were 4 stages in cascade.

Always one for a bit of investigative fun I made a test filter (see below). This one does use blur - so I could time it properly.

I ran the filter with just one blur component, then added another in cascade, then finally a third (as in the image attached).

For one blur component the filter took 14 seconds to finish (yes, my machine is that slow).

When adding a new blur stage, the second should take the output from the first and do a new blur. i.e. should take another 14 seconds (or less). No, it took 50 seconds.

With the third stage, my simple filter theory says total time should be around 40 seconds - but no, this simple little filter took 235 seconds (just shy of 4 minutes).

These results indicate, to me, that the Blur output is not being cached at all - at a a guess I'd say it's doing something like this:

1) Blur 1;
2) Blur 1; For Blur 2: do Blur 1 again, now do Blur 2;
3) Blur 1; For Blur 2: do Blur 1 again, now do Blur 2; For Blur 3: do Blur 2 again (For Blur 2: do Blur 1 again, now do Blur 2) now do Blur 3........and some more.

It's going to be sad if the FFFolks say that's the way it's got to be, 'coz this will severely limit filter complexity.



_________________________________
My favourite question is "Why?".
My second favourite is "Why not?"
  Details E-Mail
uberzev
not lyftzev

Posts: 1890
Filters: 36
Yeah, that's weird!
  Details E-Mail
onyXMaster
Filter Forge, Inc.
Posts: 350
The short answer:
Try enabling seamless, all blur-based components are wholly lot faster that way and won't exhibit this "strange" behavior.

The long answer:
First, about caches -- a single Blur component has _three_ internal caches, so it's much more complex than it looks like on the surface.

In seamless mode, everything is okay, we know the region bounds (which are effectively equal seamless wrapping region), so (if we're not talking about rotated motion blur) we can effectively lock everything inside a region that is known in advance.

Now, when seamless is disabled, things get a lot more complex.
Imagine you take a part of an image, sized (W)75x(H)75 pixels (real sizes for 600x600 image). Imagine that blur radius ® is 10 and image size is 600x600, with Size slider set to 600.
Question -- how many samples of input data you need to calculate the image? The most obvious answer (W*H = 75*75 = 5625) is completely wrong. The absolute minimum is (Size * R / 100 * 2 + W + 2) * (Size * R / 100 * 2 + W + 2), which is equal to (600 * 10 / 100 * 2 + 75 + 2) * (600 * 10 / 100 * 2 + 75 + 2) = 38809. So with the specified parameters, we need (38809 / 5625 = 6.8993(7)) ~ 6.9 times samples of the original image. So _without_ cache, each "layer" of blur of radius 10 on default image would be _at_least_ seven times slower (actually even more). The practical minimum should take into account some internal implementation quirks, which do make possible the non-seamless blur at all (don't forget that you can ask for blurred data which is outside your image by using offset or any other distorter, so we need to be able to create it on-demand along with caching) -- and the practical minimum is even bigger than you may imagine, like ~14 times larger than original image for the specified parameters.

The good thing is that I spent last week-and-a-half optimizing cache cell grid alignment along with cell size calculation, which leaded to dramatic (30-290%) decrease in excessive sampling, while still being viable for heavilly supersampled filters. Along with some SSE2-based optimizations done about a month before, some blur-based filters are seeing more than 3x rendering time improvements, and the larger the radius, the gains are generally better (percentage gain decreases, but absolute values steadily improve).

The conclusion: Non-seamless bitmap-based effects are very complex to implement effectively. This is the reason that usual texture generators either do not have blur and the likes of it at all, or have them implemented terribly slow, in a bruteforce way. The Filter Forge uses a novel approach, which is based on deferred calculation along with caching, which is also multithreading-capable (I don't know of any similar algorithm which supports multithreaded processing on the same image efficiently). While this approach allows us to perform non-seamless blurs at acceptable speed, it has it's drawbacks, which are difficult to overcome, and while we (I, really) put a considerable amount of effort to improve bitmap-based components performance (specifically blur), there is no silver bullet for all cases of blur -- its performance is highly dependent on the performance of the underlying tree, the radius, the amount of blur "layers", etc.

General performance recommendation: If you have a CPU which is less than P4 3GHz or AMD 3000+, buy a faster one (preferably multicore, they are cheap now). Gains from dual-core CPUs are really close to 2x, with 1.87x being average for most of filters. If you're low on memory (<512 Mb), get another stick of RAM, adding to 1 GB or even 2 GB will help.
  Details E-Mail
onyXMaster
Filter Forge, Inc.
Posts: 350
And please, post the offending filters here. Especially, we're interested in those which do not use Blur, Motion Blur, Sharpen and High Pass.
  Details E-Mail
byRo
an Englishman in Brazil

Posts: 138
Filters: 8
Quote
onyXMaster wrote:
The short answer: Try enabling seamless, all blur-based components are wholly lot faster that way and won't exhibit this "strange" behavior.
Aha! Short answer = quick fix. smile:D

Yes that is a LOT quicker, but (there's always a but) it seems that enabling seamless is a "user-side" option and not a filter parameter.
In other words, if (just as an example smile:| ) I make a three-stage blur filter, there is no way I can force it's use only in seamless mode - it would require "user" intervention to select the option.

..or am I missing something?

Rô
(I'm still digesting the long answer smile8) )
_________________________________
My favourite question is "Why?".
My second favourite is "Why not?"
  Details E-Mail
byRo
an Englishman in Brazil

Posts: 138
Filters: 8
About the long answer:

I can see from your explanation (and in pratice smile:D ) that the "seamless" blur is a lot quicker.
What I still don't get too well is why the execution time builds up (exponentially ?) with each stage?

If I get this right, you are saying that the final image is calculated one "tile" at a time and is not cached as a whole. So when we do a blur we are fetching "pixels" from outside the available tile cache.

If that is true then it would explain why the attached filter brings my machine to a halt. smile:cry:

The filter mixes up samples from all over the input image - in this case to detect a "white point". So for every output tile (pixel?!) the whole filter has to be recalculated for the entire image.

Rô

Mix.ffxml
_________________________________
My favourite question is "Why?".
My second favourite is "Why not?"
  Details E-Mail
onyXMaster
Filter Forge, Inc.
Posts: 350
The final image is calculated one "tile" (block is the correct internal term) at a time and is stored in a cache cell. There's no "whole" for non-seamless case, all tiles that are needed are created, stored and destroyed (based on LRU cache) automatically, so you have something close to the "whole" after the image is being rendered. The execution time builds up because if for first blur (600x600 image, radius 10) you'll need 600*600*k input pixels where k > 1 and is close to 12, you'll need 600*600*(k+k2) pixels, where k2 is also close to 12 and so on. I already worked on reducing the generic "k" value for most common cases and this leads to significant improvements, but it cannot be brought to be less than 7 for radius 10. To calculate blur you need more pixels you can see in the output (that's obvious). To calculate blur over blur, you'll need even more, and so on...

Also, I'll look into the problem in the provided filter smile:)
  Details E-Mail
onyXMaster
Filter Forge, Inc.
Posts: 350
Well, I took a quick look at the filter and I see your point -- my home machine while being very far from "slow" (Athlon 64 X2 3800+, 2 GB RAM) is crawling on the filter.

The next release will contain (already implemented) improved handling of large component trees (will make editing such filters less painful), so it will switch to such filters much faster, enter editor a lot faster and even render them faster.

Unfortunately, since I don't have fresh sources right here (I'm at home) and the source control server is inaccessible to me right now, I'm not declaring that "problem solved" or so and will try to determine the source of the slowdown when I'll get to the office and will have some time to spend on this (approximately next Wednesday).
  Details E-Mail
byRo
an Englishman in Brazil

Posts: 138
Filters: 8
onyXMaster, thank you for taking your time to share such interesting replies.

Can't wait for the next release. smile:D

Rô
_________________________________
My favourite question is "Why?".
My second favourite is "Why not?"
  Details E-Mail

Join Our Community!

Filter Forge has a thriving, vibrant, knowledgeable user community. Feel free to join us and have fun!

33,711 Registered Users
+18 new in 30 days!

153,531 Posts
+39 new in 30 days!

15,347 Topics
+72 new in year!

Create an Account

Online Users Last minute:

21 unregistered users.