[Lumiera] Lumiera threads (was: parallel2012 Conference in Karlsruhe)

Ichthyostega prg at ichthyostega.de
Sun Dec 25 04:06:53 CET 2011

Am 23.12.2011 15:05, schrieb Benny Lyons:
> I was wondering how Lumiera implements threads.  Unavoidable for a project 
> like Lumiera.  That much overused euphemism 'scalability' often means 'get 
> your threads right, otherwise the application will be a CPU hog; or won't 
> properly run at all!

Hi Benny,

for Lumiera we intend to do pretty much the obvious thing, given the current
hardware and expected further development: for the stuff where performance
really matters, we create small quasi atomic jobs which get scheduled, using
a thread pool of a size in accordance with the real possibilities for
parallelism on the given system (# of cores). The ultimate goal is that running
jobs should never block. Thus there will be a second kind of pseudo-job, which
cares for preparing the IO, so the actually working jobs become activated only
when the input data is already mapped into memory.

One of the absolutely fundamental ideas of Lumiera is that the system shall
adjust and throttle its load, especially the IO bandwidth, to create an optimal
load, while not driving the system into "saturation".

> I haven't really seen any concurrency stuff in the source: although I haven't
> looked closely enough.  Has such stuff been implemented per thread basis; or
> is there an API?

A basic thread pool is in place since quite some time. Michael Ploujnikov did a
huge proportion of that work. Christian started working on the scheduler shortly
before the "woodwork interruption". We've already integrated the conventional
pthreads (used by the GUI and at places in the session) as a special case into
that "lumiera threads" framework. But we don't have a scheduler interface yet (I
am currently working towards an high-level
interface on my side). Surprisingly enough, this wasn't much of a problem.
Since, for me it turned out that the major challenge is to translate a quite
elaborate "high-level-model" (oriented towards the requirements of professional
editing) into a graph of simple nodes, which can be processed easily through
such quasi atomic jobs.

My starting point was that several of the practical impediments we suffer
from when using Cinelerra for professional work are due to Cinelerra's session
model being coupled too tightly to the engine. Thus Lumiera takes the opposite
approach: trying to separate both, so the session can be optimised for human
workflow, and the engine for efficient processing.

> I've very recently been looking into OpenMP...

Libraries of that kind are very frequently proposed for use by Lumiera
(similarily stuff for using the GPU). This is almost kind of a FAQ.

There is one huge problem with such -- people tend to overlook that. Lumiera
isn't intended to run in some research department to do number crunching.
It's not so much a technical, but a social problem: The existing plug-ins for
processing video and audio with all the surrounding ecosystems. We're perfectly
aware that it's completely out of reach for a project like Lumiera just to
define a new shiny plug-in standard. We're roughly 10 years too late for that.
There is ffmpeg, ladspa, gstreamer, mplayer and vlc, plus several more.

Thus we rather focus on getting the typical execution situation of such an
already existing plug-in into line with our threadpool architecture: We will
care for preparing the input data, and then fed the already filled input
buffers to the external processing function of the plug-in running embedded into
a "job". Of course, supporting a (very popular) framework like gstreamer
(I fear we can't avoid that) will create a lot of headaches with this approach,
while something like Gavl / Gmerlin can be expected to play quite well with our

It is likely that we'll write a very small number of absolutely basic stuff
ourselves (like a video fader, video overlayer, mask, colour adjust tool or
sound fader). But for anything beyond that, we should rather use the existing
code and expertise.

The combination of these two design decisions explains why such a huge deal of
work goes into measures to get the complexity under control and keep all of
that metadata processing out of the actual engine. When the playback or render
is started, the scheduler needs to be fed just with simple jobs, and we have
to care beforehand to get the right number of channels to the right number of
buffers, that the buffer sizes match, that each job uses the right processing
function from some plug-in and so on. I wouldn't be surprised if, for the final
app, 80% of the source base are either related to GUI and workflow or do just
metadata processing and transforming, with only a small performance critical
core for the actual processing. But if we succeed with getting that complexity
under control and build that architecture, then we'll be able just to integrate
stuff like GPU based processing as if it was just yet another external library
and plug-in system.


More information about the Lumiera mailing list