Filter Forge - The DOs and DON'Ts of Filter Construction

Active Topics Search

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Very nice contribution, Sphinx. Thank you! smile:beer:

One never stops learning! smile:)

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 16, 2007 10:41 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

I'm wondering whether we should start setting up broader categories for the entries. We've already got several entries on 'Blur' and 'Noise' for example...

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 16, 2007 10:44 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Yeah, that might a good idea.. I have one more entry on blur, high passes and sharpens brewing..

Btw. how about merging in the "Optimizing Filters for Speed" section?

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 16, 2007 10:54 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Sphinx. wrote: Yeah, that might a good idea.. I have one more entry on blur, high passes and sharpens brewing..

Alright, I'll reorganize the entries once that second entry of yours has landed. smile;)

Quote
Crapadilla wrote: Btw. how about merging in the "Optimizing Filters for Speed" section?

Yup. That might be a good idea, since its also about efficiency!

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 16, 2007 11:00 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Crapadilla wrote: Alright, I'll reorganize the entries once that second entry of yours has landed.

Having second thoughts on that entry - its about using offsets to construct a low radius blur as an alternative to using the blur/highpass/sharpen component for low radius filtering (as shown, once we have blur, we also have High Pass and Sharpen).. but maybe I should just check that in as a snippet instead.. I think that is better.. so just go ahead with the reorganization smile:)

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 16, 2007 11:17 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Reorganized! smile:)

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 16, 2007 12:45 pm

Details E-Mail

Sjeiti

sock puppet

Posts: 722
Filters: 71

As Sphinxmorpher mentioned... it might be good to merge the 'Optimizing Filters for Speed' since all speed optimizations could be considered as DO's.
Then again it might be good to have a separate page with speed optimizations (you might not want to know about all the dont's when you're just looking for speed).
There are probably lots of entries that would fall under multiple categories. For instance the Channels would also perfectly fit under Hints, tips and tricks as well as Optimizing Filters for Speed.
In such cases maybe the best practice would be to put it into the page it would best fit into and cross-reference to it from other pages. So maybe Channels in 'Optimizing Filters for Speed' would simply have a header and a link to the Channels entry in 'The DOs and DON'Ts of Filter Construction'.

Posted: December 16, 2007 2:17 pm

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

An interesting side-effect of collecting random blips of FF wisdom for writing this article appears to be that I'm checking out all my filters to see whether I could optimize them further. During this process I'm finding that - here and there - I haven't always exactly practised what I preached! smile;)

Looks like I'll be updating some filters soon... smile:D

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 16, 2007 5:54 pm

Details E-Mail

Vladimir Golovin
Administrator

Posts: 3446
Filters: 55

Quote
Sphinx. wrote: Well, I would like to hear an "official" statement from Vlad about the buffering method, because it essentially opposes the procedural sampling concept

Yes, we render a bitmap version of the source input, then blur it. There's a short article in the Help on this -- it is linked from help articles of Blur, HighPass etc.

Also, I believe I explained the sampling architecture in my earlier forum posts. Try a keyword search for 'sampling' and 'sample'.

Posted: December 17, 2007 9:27 am

Details E-Mail

Vladimir Golovin
Administrator

Posts: 3446
Filters: 55

Quote
Sphinx. wrote: Seems like its a generic mechanism: last coordinate and sample is cached at the output, and if next sample request has the same coordinate, it will return the cached sample, and implicitly if the coordinate doesn't match, a new sample will be calculated and saved as the new cached sample..

Yes, this is correct.

Posted: December 17, 2007 9:31 am

Details E-Mail

Vladimir Golovin
Administrator

Posts: 3446
Filters: 55

Quote
Sphinx. wrote: I'll do some testing an reading

There's an option 'Show Elapsed Rendering Time' in Tools > Options -- use it to measure the rendering time.

Posted: December 17, 2007 9:33 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Vladimir Golovin wrote: Yes, we render a bitmap version of the source input, then blur it. There's a short article in the Help on this -- it is linked from help articles of Blur, HighPass etc.

Oh, its not as much the functionality of the bitmap based components I'm asking to - its the way I utilize that rasterization process as a way of buffering the input and thereby reducing the rendering time. Look at this filter and the comments for it. If you don't fancy this approach, I don't want to promote it further smile:)

Quote
Vladimir Golovin wrote: There's an option 'Show Elapsed Rendering Time' in Tools > Options -- use it to measure the rendering time.

Ah..great! I missed that option. Btw. I know this has been up to discussion before.. but some sort of rendering "report" would be really great - this report could describe where the performance culprits are - perhaps an option we can enable in filter construction mode, which then adds some sort of rendering time percentage to each module (including its subtree)...

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 17, 2007 9:59 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Sphinx. wrote: but some sort of rendering "report" would be really great - this report could describe where the performance culprits are - perhaps an option we can enable in filter construction mode, which then adds some sort of rendering time percentage to each module (including its subtree)...

Did you check this?

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 17, 2007 10:06 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Crapadilla wrote: Did you check this?

Yes, but the idea is different and more complicated there, and also the sample count may not be exact in expressing how fast a filteris - its the calculation of the sample that matters..

It should be easier to sum up performance percentages from subtrees simply by timing the individual components. Lets say we have a simple filter with three perlins equally fast connected to the surface result, then each component would have the timing labels:

result: 100%
perlin 1: 33.33%
perlin 2: 33.33%
perlin 3: 33.33%

or perhaps more realistic:

result: 100%
perlin 1: 30%
perlin 2: 30%
perlin 3: 30%

Which tells us that the result alone is resposible for 9.99% of the overall rendering time.

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 17, 2007 10:17 am

Details E-Mail

Sjeiti

sock puppet

Posts: 722
Filters: 71

Hey Crapadilla, Sphinx... maybe you could check this out...

Since that channels trick is pretty cool I thought I'd apply it in Rotated polaroid since that one uses three rotations (offset) that could be combined into one.

The weird thing is that my render time went up with ~175% after I applied the channels thing. So sometimes three offsets is faster than one offset plus some assemble disassemble RGB.

Posted: December 18, 2007 2:53 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

hmm.. looking at that filter now.. are you talking about the cluster that creates the red and blue fringe artifacts?

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 18, 2007 4:13 am

Details E-Mail

Sjeiti

sock puppet

Posts: 722
Filters: 71

no wait... attached
in the original the actual rotation occurs 4 times
in this version (with help from the channels) it is rotates twice

Rotated Polaroid VI.ffxml

Posted: December 18, 2007 4:28 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Sjeiti wrote: The weird thing is that my render time went up with ~175% after I applied the channels thing. So sometimes three offsets is faster than one offset plus some assemble disassemble RGB.

I have revisited some of my older filters and played around with the 'Channels' trick as well. Apparently it doesn't necessarily speed things up in all cases, so be careful! I'll have to experiment some more on this phenomenon.

It appears you'll have to take into account the speed of the component you wish to concatenate. Offsets are 'ultra-fast' while the Channel operations are just 'fast'. smile;)

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 18, 2007 4:51 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

I also noted some cases where its slowing things down.. I think we should add a note about that, i.e. that you have to check if the method is actually doing any good - the cost of setting up the channel combining, the intermediate combined processing, and finally the post channel extraction must not be higher than the alternative uncombined solution

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 18, 2007 5:17 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

So, did anyone toil through all that wiki rambling so far? Any critique, comments, suggestions, wishes, hate-mail? smile;)

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 18, 2007 6:00 pm

Details E-Mail

Kraellin

Kraellin

Posts: 12749
Filters: 99

you lost me at the second post, though i did go look at one of the wiki pages of yorus, dilla. looked good, but i just go to sleep when you start talking technical. i still have no idea how a blur speeds things up, though i notice you've been revising quite a few filters lately, so, at least you learned something smile;)

If wishes were horses... there'd be a whole lot of horse crap to clean up!

Craig

Posted: December 19, 2007 3:16 pm

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Kraellin wrote: you lost me at the second post

Which one?

Quote
Kraellin wrote: but i just go to sleep when you start talking technical

Well, I guess that can't be helped then. The wiki article is about efficiency considerations, which will always be technical in nature.

Actually, I was hoping the articles would convey a glimpse at the 'mindset' that one needs to develop as a filter author, an attention to certain details. It's always a balancing act between the technical on the one and the artistical on the other hand...

Quote
Kraellin wrote: i still have no idea how a blur speeds things up

Me neither, I'm afraid. smile;)

Quote
Kraellin wrote: though i notice you've been revising quite a few filters lately, so, at least you learned something

Just optimizations mostly, but I will do a few complete overhauls here and there.

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 20, 2007 4:37 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Crapadilla wrote: Quote Kraellin wrote: i still have no idea how a blur speeds things up Me neither, I'm afraid.

You should read the description and comments for this filter then, and if the method is still unclear after reading through that, just ask (in the comment thread please) smile:)

The method will not be described in the DOs and DON'Ts before I know what Vlad/FF thinks about it.

Quote
Crapadilla wrote: Quote Sphinx. wrote: Seems like its a generic mechanism: That is a question for Vlad or a programmer to answer. Sadly, there is not much info on FF's sample architecture available, so I'm just deducing how all this might work from the help file, Vlad's postings, etc. ...

Vlad has answered.. and it seems we need to revisit the stuff about the sample cache system. For example using two duplicated components (with same settings) can be faster than using multiple outbound connections from one, if one or more (but not all) of the incoming coordinates have been changed by offsets or distortions (remember that the coordinate flow is "backward", i.e. going in through the outputs, which then returns a sample based on this).

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 6:04 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Vladimir Golovin wrote: Sphinx. wrote: Seems like its a generic mechanism: last coordinate and sample is cached at the output, and if next sample request has the same coordinate, it will return the cached sample, and implicitly if the coordinate doesn't match, a new sample will be calculated and saved as the new cached sample.. Yes, this is correct.

My question would be: If indeed the coordinate doesn't match, do the calculations for new sample draw upon the sample cache to retrieve their data, or does the original component that the cache was derived from get reevaluated?

Quote
Sphinx. wrote: For example using two duplicated components (with same settings) can be faster than using multiple outbound connections from one, if one or more (but not all) of the incoming coordinates have been changed by offsets or distortions (remember that the coordinate flow is "backward", i.e. going in through the outputs, which then returns a sample based on this).

Sphinx,

I'll have to admit I fail to understand how this could be the case (and what you mean by "backward" for that matter). If you ask me, this would go against the logic of the sample caching system, at least the way I understand it.

An example: If you have a high-detail worley noise with six outgoing connections, the noise gets calculated once. All six subsequent operations on this noise would then draw upon one-and-the-same preprocessed sample cache, making things much speedier. However, if there were six duplicate worleys instead, it is logical that FF would need to prepare six noises instead of one. Consequently, I would dare speculate that multiple connections are always faster than duplicates.

Vlad,

would you shed some light on this?

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 20, 2007 6:26 am

Details E-Mail

Vladimir Golovin
Administrator

Posts: 3446
Filters: 55

Quote
Crapadilla wrote: My question would be: If indeed the coordinate doesn't match, do the calculations for new sample draw upon the sample cache to retrieve their data, or does the original component that the cache was derived from get reevaluated?

Yes -- if the sample coordinates differ from the coordinates of the cached sample, the current cache is discarded, the component is evaluated, and the result of this evaluation is stored in the cache.

Quote
Crapadilla wrote: An example: If you have a high-detail worley noise with six outgoing connections, the noise gets calculated once. All six subsequent operations on this noise would then draw upon one-and-the-same preprocessed sample cache, making things much speedier.

Yes, this is correct as long as these six components request samples with the same coordinates.

Quote
Crapadilla wrote: However, if there were six duplicate worleys instead, it is logical that FF would need to prepare six noises instead of one.

Exactly.

Quote
Crapadilla wrote: Consequently, I would dare speculate that multiple connections are always faster than duplicates.

Not always. Consider the case you described, a Worley Noise with six outbound connections, where each connection is a distorter such as Offset or Noise Distortion with different parameters. This way, each of the six connections will request samples from the Worley at different coordinates, thus invalidating the sample cache.

Posted: December 20, 2007 7:16 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Crapadilla wrote: An example: If you have a high-detail worley noise with six outgoing connections, the noise gets calculated once. All six subsequent operations on this noise would then draw upon one-and-the-same preprocessed sample cache, making things much speedier.(...)

This is correct when the coordinates recieved from the outbound connections all are the same, but if the coordinates are different, it will cause a new sample to be calculated and as a generic behaviour the last calculated sample will always be stored in the cache - thats what Vlad confirmed.

Quote
Crapadilla wrote: I'll have to admit I fail to understand how this could be the case (and what you mean by "backward" for that matter). If you ask me, this would go against the logic of the sample caching system, at least the way I understand it.

Yes you are right, it seems counter intuitive, but let me explain how I understand it:

Lets first get perfectly clear on the difference btwn sample and coordinate flow. Each module can very simplified be thought of as a function that produces a sample from a coordinate (x, y). So in other words a module takes a coordinate and gives a sample, in that order! And we can then say that the sample flow direction goes from component to result, but the coordinate flow goes from result to component, because the coordinates are used to calculate the samples.

That FF uses a procedural sampling approach means that your filter will be executed for each pixel in the final image. If the image you are rendering is 2x2 pixels (and no AA enabled), the filter executes 2x2 times. For the first pixel, the result component will request a sample from the connected component, lets say this is a perlin noise, by feeding it the coordinate (0, 0). When the perlin noise has calculated the sample value, it stores the sample to a local variable/cache along with the coordinate that produced it.

Remember that this cache at all times hold only one sample: the last calculated. This goes on for the rest of the pixels, (0, 1), (1, 0) and (1, 1). So in a simple setup like this: [PERLIN]--->[RESULT], it is evident that the sample cache system can't improve anything, since the perlin only recieves a given coordinate one time per pixel - we need several outbound connections for this system to do any good.

So if there are six outbound connections from the perlin to a multiblend, which then goes to the result, the actual sample is only calculated once in the perlin, because the following five requests from the multiblend match the coordinate associated with the perlin cached sample.

If we have an offset inbetween the result and the multiblend, which offsets by 1 horizontally, then we would have this change in coordinate flow:

[RESULT]--(0, 0)-->[OFFSET X + 1]--(1, 0)-->[MULTIBLEND]-->[PERLIN]

And this means that the perlin would cache the sample for the coordinate (1, 0) and not (0, 0). Again we have no problem if there is many connections between the perlin and multiblend, because all coordinates have been offset by the same value.

The problem start showing if we move the offset in between the multiblend and perlin for only some of the connections, because then the perlin would recieve a mix of the coordinate (0, 0) for those connections without offset, and (1, 0) for those going through offset. How bad the cache flushing will be, depends on the actual setup.. The example filter I posted earlier in the thread shows how to contruct a bad case of constant sample cache flushing.

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 7:25 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Vladimir Golovin wrote: Not always. Consider the case you described, a Worley Noise with six outbound connections, where each connection is a distorter such as Offset or Noise Distortion with different parameters. This way, each of the six connections will request samples from the Worley at different coordinates, thus invalidating the sample cache.

So in that particular case, it would actually be irrelevant whether you had one worley with multiple connections or six duplicates. That's interesting.

Quote
Vladimir Golovin wrote: Yes, this is correct as long as these six components request samples with the same coordinates.

Very interesting! So which components apart from Offsets and Noise Distortions actually DO modify sample coordinates? I'm guessing pattern components with the jumble fill mode active, and maybe Kaleidoskopes?

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 20, 2007 7:29 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Actually there is a solution that could free us from worrying about this at all - an improvement to the samplecache system:

Any component that changes coordinates, should attach an ID number to its sources, making the sources spawn a new sample cache reserved for incoming requests with that ID smile:)

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 7:35 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Thanks Sphinx, that lit the light bulb. The whole thing is getting much clearer now... smile:D

Quote
Sphinx. wrote: The example filter I posted earlier in the thread shows how to contruct a bad case of constant sample cache flushing.

Now I finally understand what exactly you were testing with that filter: Cache Flushing! smile:)

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 20, 2007 7:49 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Crapadilla wrote: Very interesting! So which components apart from Offsets and Noise Distortions actually DO modify sample coordinates? I'm guessing pattern components with the jumble fill mode active, and maybe Kaleidoskopes?

Yep, and as mentioned also Refraction. Also I think the bitmap based components will change the sources cached samples as the bitmap based components buffer a large tile of samples, but it might not be as frequent though - I know too little about the actual blur rasterization to tell, but I think its split up into small cells of samples for optimization reasons..

The elevation gradient will also change the coordinates for its gradient source, as only the top row is used from the gradient source..

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 7:56 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Sphinx. wrote: Thanks Sphinx, that lit the light bulb. The whole thing is getting much clearer now...

LOL! Looking back at my posts I can now see that I'm not as clear in explaining things as I'd like to think.. hehe - also my "danglish" could be a problem (I'm danish and my english is way too influenced by danish sentence constructions..sigh)

Quote
Sphinx. wrote: Actually there is a solution that could free us from worrying about this at all - an improvement to the samplecache system: Any component that changes coordinates, should attach an ID number to its sources, making the sources spawn a new sample cache reserved for incoming requests with that ID

Speaking of unclarity, let me just give this one another go smile;)

As we are discussing now, the cache system seem to work best with unaltered coordinates, i.e. (0, 0) should stay (0, 0) throughout the flow for optimal chance of hitting a valid cached sample.

What I'm then proposing is that components that change the coordinate, should also change some sort of CacheID that all its sources will recieve along with the new coordinate. Using this ID the components can write the sample to another cache index.

This mechanism should effectively ensure that caches don't get flushed because one outbound connection is different than the others.

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 8:11 am

Details E-Mail

Vladimir Golovin
Administrator

Posts: 3446
Filters: 55

Quote
Sphinx. wrote: also my "danglish" could be a problem (I'm danish and my english is way too influenced by danish sentence constructions

I don't think this is the case smile:)

Actually, when I visited Denmark, I was surprised by the fact that many people there speak English, and do this quite well -- especially young people.

Posted: December 20, 2007 8:36 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Question: In case of a component having multiple (map) inputs, can we predict the order in which these inputs will be evaluated? In the case of a Multiblend we'd have 14 map inputs, for example. Are these processed from the top input (i.e. "Layer 7") to the bottom input (i.e. "Opacity 1")? From what I could glean from Sphinx' sample cache test filter, I'm guessing this actually is the case.

Knowing this "order of processing" would allow us to predict when sample cache flushing would occur, wouldn't it?

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 20, 2007 9:38 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Crapadilla wrote: Knowing this "order of processing" would allow us to predict when sample cache flushing would occur, wouldn't it?

Yeah, definitely - actually I'm experimenting with that just now, but it is getting really complicated, specially when you have "interbranch" connections (i.e. connections btwn two major branches/clusters of components) - predicting in which order things are executed is not very easy, and to get things even messier it seems that for some cases it doesn't matter which of two connections are executed first and therefore the execution order is the order by which you actually connected "parallel" components.. ouch, my analysis skills starts to dissolve here smile:|

.. Vlad.. how about some indexed sample caching like proposed smile;)

Btw. another option for optimizing the coordinate "flow" would be to do a "test run" and for components with many outbound connections then sort the execution order so that sample requests belonging to a given coordinate are executed in sequence (it is probably more complicated than the indexed solution, but it should be slightly more memory and performance efficient). Here's the idea:

A component with several outbound connections as it is now, could receive the following sequence of coordinates (from different offset and so on) in a given branched filter construction:

(0, 0) - first sample
(10, 3) - cache flush/new sample
(0, 0) - cache flush/new sample
(10, 3) - cache flush/new sample
(7, 3) - cache flush/new sample
(0, 0) - cache flush/new sample

"recording" these changes in coordinate flow in a test run
would then allow the optimizer to sort the execution order so that we'd have the following order instead:

(0, 0) - first sample
(0, 0) - cache match
(0, 0) - cache match
(10, 3) - cache flush/new sample
(10, 3) - cache match
(7, 3) - cache flush/new sample

effectively reducing the cache flushes to a minimum smile:)

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 9:55 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Sphinx. wrote: Ok check out this test filter.. [...] on my single core nutcracker, the duplicated chain is faster.. how does it perform on your CPU octopus?

Same here. And it shows drastic speed differences (28 sec VS. 49 sec).

Wow. If I get this right, it appears that - on the non-duplicated branch - both the Perlin Noise's and the Offset's sample caches get flushed alternatingly for each of the 28 Multiblend inputs, requiring each cache to be rebuilt 14 times!?!?

On the duplicated branch however, the two sample caches are created once and then stay valid for the whole evaluation... Very intriguing!

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 20, 2007 10:14 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Crapadilla wrote: Vlad.. how about some indexed sample caching like proposed Wink

... or how about a 'console' that logs sample cache building and flushing on a per-component basis?

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 20, 2007 10:22 am

Details E-Mail

Vladimir Golovin
Administrator

Posts: 3446
Filters: 55

Sphinx, the sample-based architecture already comes with inherent computational burden, which becomes especially apparent on simple filters -- in such cases the time spent on the sampling process is greater than the time spent on evaluation of the component's output. That's why we wanted to keep the sampling cache layer as thin and as fast as possible -- the current implementation has only two conditional jumps (one for each coordinate), and, in the majority of cases, only one jump is executed.

Posted: December 20, 2007 10:23 am

Details E-Mail

Vladimir Golovin
Administrator

Posts: 3446
Filters: 55

Quote
Crapadilla wrote: ... or how about a 'console' that logs sample cache building and flushing on a per-component basis?

This would require at least one conditional jump in the innermost cycle, which we definitely wouldn't like -- it will slow the rendering down for everyone, even for people who have the console turned off.

Posted: December 20, 2007 10:28 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Crapadilla wrote: Same here. And it shows drastic speed differences (28 sec VS. 49 sec).

Ok! thats pretty significant - here the numbers are 1:34 for the duplicated and 2:44 for the non-dup. So there is definitely something to gain by having this fact in mind when hooking up components in a complex system

Quote

Vladimir Golovin wrote:
Sphinx, the sample-based architecture already comes with inherent computational burden, which becomes especially apparent on simple filters -- in such cases the time spent on the sampling process is greater than the time spent on evaluation of the component's output. That's why we wanted to keep the sampling cache layer as thin and as fast as possible -- the current implementation has only two conditional jumps (one for each coordinate), and, in the majority of cases, only one jump is executed.

Yep, I fully understand that.. but actually the ID/indexed solution would not change this much regarding execution speed of the evaluation (perhaps no change at all), since the ID simply could be thought of as a change of a pointer value on where the cached coordinate should be fetched from when comparing.

lets say the unchanged coordinates have cache index = 0 by default, i.e. the first cache item in the components internal cache array. This idx remains in use until we reach a coordinate altering component (e.g. offset). The offset will then increase the idx to 1, so that all its connected sources now use a new index when storing and comparing cached samples in the offsets line of sample requests.

So as you can see its not some type of extreme "search through a list of cached samples" solution, but rather a simple solution that just changes to a new memory location - the conditionals will remain the same...

oh.. btw.. ditch that test run proposal, I was too fast there, its only in rare cases the offset remain constant for the whole rendering smile;)

Oops..

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 10:42 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Sphinx. wrote: Quote Crapadilla wrote: Same here. And it shows drastic speed differences (28 sec VS. 49 sec). Ok! thats pretty significant - here the numbers are 1:34 for the duplicated and 2:44 for the non-dup. So there is definitely something to gain by having this fact in mind when hooking up components in a complex system

Inspired by this indexed sample cache proposal, I just made an interesting observation!
Try inserting an Invert component (disable invert) between the Perlin and the Offset in the non-duplicated branch smile;)

It rendered at 1:36 which is very close to the duplicated branch. So it seems that the Invert's sample cache takes over somehow.. can you confirm that it works?

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 10:56 am

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Sphinx. wrote: Try inserting an Invert component (disable invert)

Both branches render 28-29 sec now! smile:eek:

So, should we conclude this means that each and every component always maintains its own sample cache, regardless of the number of outgoing connections?

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 20, 2007 11:47 am

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

Quote
Crapadilla wrote: So, should we conclude this means that each and every component always maintains its own sample cache, regardless of the number of outgoing connections?

Yeah, every component has its own output sample cache, but only one sample cached - the last rendered/requested - so if you have different coordinate connections interleaved (like in my example filter), it causes constant recalculation (and cache flushing). By introducing a new intermediate component, like the Invert, you also add a new cache - very much like the indexed cache idea.

I think what is happening is that the Invert draws one time from the perlin, and from there on, the Invert's own sample cache will match all the next offset requests, effectively minimizing the flushes. This is more elegant than the duplicated component solution I think smile:)

But since every component has its own cache, this is also why the "problem" is not more prominent than it is, but there are still quite many situations where you could spare a sample calculation here and there by throwing in an Invert (with "invert" unchecked) and then dragging connections for a certain coordinate branch from that one..

This is quite interesting, and I definitely have to look at some of my filters in this regard.
Another related thing: IIRC the height tree draws more samples per output sample than the other surface inputs.. is that right? If so, then there could be some very significant optimizations by seperating connections to components shared btwn the height tree input and the other surface inputs.. haven't tested this though..

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 12:33 pm

Details E-Mail

onyXMaster

Filter Forge, Inc.

Posts: 350

Sphinx, you're correct, the Height input in both root component (in surface filter mode) and Refraction component takes three input samples for one output sample.

Offtopic:
"...idx remains in use until we reach a coordinate altering component (e.g. offset)"

This rings a bell, Vlad, remember that dreaded is_distorter_present flag? smile:)

Posted: December 20, 2007 2:19 pm

Details E-Mail

Crapadilla

lvl 52 Filter Weaver and Official "Filter Forge Seer"

Posts: 4365
Filters: 65

Quote
Crapadilla wrote: So which components apart from Offsets and Noise Distortions actually DO modify sample coordinates? I'm guessing pattern components with the jumble fill mode active, and maybe Kaleidoskopes?

Quote

Sphinx. wrote:
Yep, and as mentioned also Refraction. Also I think the bitmap based components will change the sources cached samples as the bitmap based components buffer a large tile of samples, but it might not be as frequent though - I know too little about the actual blur rasterization to tell, but I think its split up into small cells of samples for optimization reasons..

The elevation gradient will also change the coordinates for its gradient source, as only the top row is used from the gradient source..

Vlad, could you give us an official confirmation on this list?

* Offset
* Noise Distortion
* Bricks (jumble)
* Pavements (jumble)
* Tiles (jumble)
* Kaleidoskope
* Refraction
* Bitmap-based components (?)
* Elevation Gradient

--- Crapadilla says: "Damn you, stupid redundant feature requests!" ;)

Posted: December 20, 2007 4:47 pm

Details E-Mail

Sphinx.

Filter Optimizer

Posts: 1750
Filters: 39

There is one more I think, Worley noises (Solid Fill Mode)

Also I'm really not sure about the elevation gradient - could be that the upper row of the gradient input is rasterized somehow..

Njyldgarkn init/prepare process x no. of render blocks x no. of passes

Posted: December 20, 2007 4:59 pm

Details E-Mail

Messages 46 - 90 of 184
First | Prev. | 1 2 3 4 5 | Next | Last

Join Our Community!

Filter Forge has a thriving, vibrant, knowledgeable user community. Feel free to join us and have fun!

33,762 Registered Users
+5 new in 7 days!

153,621 Posts
+7 new in 7 days!

15,362 Topics
+66 new in year!

Create an Account

Online Users Last minute:

12 unregistered users.

Recent Forum Posts:

Preview issue with animation by Rachel Duim
11 hours ago
Random Crashes using FF14 by PixelStar
December 16, 2025
Filter Forge 15 Released by samedy
December 15, 2025
Affinity 3: Use size of single Artboard by Raschid Abdul-Nour
December 13, 2025
FFEasyRender 2.0 - GUI based batch renderer for Windows for FF 9.0 by Ruckage
December 2, 2025
Suggest a new filter. by ivkis
December 1, 2025
How to declare your love by Djekki by Gent
November 27, 2025
Is this a Filter Forge or MacOS issue, anyone? by CFandM
November 27, 2025
Adaptive Tiling by byRo by ivkis
November 26, 2025
A small harbor with fishing boats in Turkey by Foxi77 by Texasgrammy
November 17, 2025
Rough Wood Planks by emme by Erik Pedersen
November 12, 2025
Affinity V3 by CFandM
November 7, 2025