Wednesday, September 30, 2009

Pimp my Nepomuk

From time to time people show up on the #akonadi IRC channel and complain about the dependency of Akonadi on Nepomuk, because Nepomuk is so slow and takes 100% CPU when they start their desktop. The 'bad' news are, we won't drop the dependency on Nepomuk, because this is a technology no other desktop environment provides so far and it gives us all the nice features we want to have (from a Pimsters point of view ;))

So what's about the good news you might ask? Well, for those of you who have Nepomuk running with 100% CPU usage, we can make Nepomuk a lot faster!

When you execute the following command on a shell

qdbus org.kde.NepomukStorage /nepomukstorage usedSopranoBackend

you will see either 'redland' or 'semsame2' as output.

If your Nepomuk uses Sesame2 and it is still slow, your hardware must be really old (and I mean really, really old ;)), it works fine here on a Laptop with 500MHz and 256MB RAM...
For most of you, who see a high CPU usage of Nepomuk, however the Redland backend is used. So what is this backend thingy about? Nepomuk is a KDE specific high-level API for Soprano, which itself is a Qt-only storage solution for RDF (semantic) data. But Soprano doesn't implement the actual storage, it just forwards this task to its backends where currently two stable ones exists, the C++ based Redland and the Java based Sesame2. The reason for the high CPU usage is, that the Redland backend is much, much slower than the Sesame2 backend!

The careful reader might wonder now "Why is the C++ based backend slower than the Java based?" Isn't Java always slower than C++? Not in this case, the storage of RDF data and the querying involves many clever algorithms, so the C++ backend is bit here by the complexity theory. The Sesame2 backend in opposite has implemented many optimizations on the algorithm level and beats Redland performance wise.

So as strange as it sounds, to get a faster Nepomuk you have to install Java.

Yeah, I know, many people will scream now "WTF, why do I have to install this Java crap?!?", well, actually it could be that you have already installed it, OpenOffice for example brings along a JRE and to be honest, the reputation of Java is worse than the current implementations actually are. But back to pimping up Nepomuk...

If you use the KDE packages from a distribution, check whether there exists a package soprano-backend-sesame and install this one. If there is no such package available and all the other Soprano related packages do not contain a file
$KDEDIR/lib/soprano/libsoprano_sesame2backend.so bug your distributor to create them or do it yourself and publish them ;)

For the brave guys that use KDE compiled from sources, you need the following things:

  • The Java development package (under Debian sun-java6-jdk)
  • The Java runtime package (under Debian sun-java6-jre)
Now make sure that your compiler can find the development files by adjusting the LD_LIBRARY_PATH to contain the Java library directory:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/jvm/java-6-sun/jre/lib/i386:/usr/lib/jvm/java-6-sun/jre/lib/i386/client/


(this are the needed paths under Debian, your paths might look a bit different).

Then just recompile Soprano from kdesupport (remember to clear the CMake cache before). CMake should output in the status report whether it has found the Java headers and libraries or not, if not, check your LD_LIBRARY_PATH variable again. If everything compiles fine, after a 'make install' you should have the file
$KDEDIR/lib/soprano/libsoprano_sesame2backend.so
available and after a restart of Nepomuk the qdbus command, I listed above, should return 'sesame2' now. And maybe you have already noticed, Nepomuk doesn't run with 100% CPU usage any more ;)

I hope this blog helps to reduce the prejudices against Nepomuk, because this software really rocks! Let's hope the distributors will finally ship sesame2 based Soprano packages!!!

28 comments:

Michael said...

I'm using nepomuk with sesame2 backend a while now but it uses frequently 100% cpu...

Emil said...

And, any update on Virtuoso backend? Any time estimate when soprano is going to be released with it?

simca said...

This whole nepomuk thing is seems like a solution looking for problems. Also, if i have the sesame2 backend, and it still uses the redland one, how to switch backends?

tokoe said...

@michael Is it nepomuk that uses 100% CPU usage or the nepomukindexing service?

@emil No updates so far, The Virtuoso guys are still working on a simplified version of their server we could use for the desktop

@simca Try to set the LD_LIBRARY_PATH as written in the blog. And no, this is no solutions looking for a problem. We Akonadi developers have a problem and want to use Nepomuk as the solution :)

Dion Moult said...

I was already using the Sesame2 backend but stopped using Nepomuk as it used to annoy me greatly when indexing when I was not idle - basically eating my share of the computer when I was trying to use it.

I think what's more important is to create an easy to follow set of case studies to show exactly what Nepomuk, Strigi, Soprano, and Akonadi is.

Kevin said...

I think when building kdesupport one also has to set JAVA_HOME.

I set it to
JAVA_HOME=/usr/lib/jvm/java-6-openjdk
(on Debian/Unstable)

Andras said...

You can read about Akonadi in Tom Alber's posts from the past days. About Nepomuk, you can read the article on Userbase: http://userbase.kde.org/Nepomuk

George Hron said...

Is there any way to configure/disable nepomuk indexing? On a little old machine, it can be painful if i have a lot of data.

shamaz said...

You might convince more people by advising to use openjdk6 (which is completely opensource) instead of sun-jdk.
Anyway, it's always nice to see someone admitting java can be fast ;)

Tom said...

Why can't the indexer just wait for the user being idle. I don't mind the indexing when I am typing or reading a webpage.

But indexing while I try start apps or move stuff on my disk so unacceptable it makes me want to kill it instantly. There is no excuse, there just isn't.

albbas said...

On Kubuntu Karmic, I had to stop the nepomukservice, and edit .kde/share/config/nepomukserverrc to contain this:
[Basic Settings]
Configured repositories=main
Start Nepomuk=true

[Service-nepomukstrigiservice]
autostart=true

[main Settings]
Storage Dir[$e]=$HOME/.kde4/share/apps/nepomuk/repository/main/
Used Soprano Backend=sesame2
index version=2

After that I ran the nepomukserver manually in a shell, and the server seems to run well now...

Luke Plant said...

I'm afraid you are living in cloud cuckoo land:

1) I'm using sesame2 on a very modern machine, with 3 Gb RAM, 2 GHz dual-core CPU, and Nepomuk brings my machine to a crawl when indexing is on. I cannot use it. Maybe it's because I have lots of files (probably 400,000). Nepomuk already has a repository of 2 Gb on my machine. I don't care what the reason is though, it's completely unusable. It's so unresponsive that with indexing on it did not even respond to the dbus command you mentioned. It took me several more minutes to kill all processes etc. before I could finally get an answer out of it. I agree with Tom completely.

2) Which nice features does Nepomuk bring us that we want? ABSOLUTELY NONE as far as I can see. Sorry for shouting, but KDE developers seemed to have completely stopped listening to even their most loyal users.

Does Nepomuk bring us decent desktop search? No (nothing approaching a decent GUI, and indexing is too painful to turn on). Does it bring us fast e-mail search? No. Those are the features I want, and Nepomuk does not provide them. "They will come in 4.X"? Whatever, I'm not gonna be a sucker for that any more.

Given the huge disadvantages of, it would have do revolutionise my desktop to be worth it. But if I turn Nepomuk off, I lose absolutely nothing. Can you explain to me any way that Nepomuk even slightly improves my desktop experience? Please ditch Akonadi and Nepomuk.

Balinares said...

The issues of perceived slowness with Java have more often to do with its often humongous memory consumption, I found. That's JIT compilation for you.

As for Nepomuk I suspect that acceptance would be significantly higher if it was off by default.

I know, I know. Bear with me for a moment.

The problem with Nepomuk is that to the averagely uninformed and uncaring user, which I believe is whom you should target, it's this bizarre thing that serves no clear useful purpose and just sits there glomping resources. (And the wonky name doesn't help any on that count.)

Now, let's imagine that Nepomuk is off by default. When you fire up Dolphin, in lieu of the Nepo panel, you get a pretty pushbutton that reads "Turn on search engine" or something friendly like this. You click it, and the panel smoothly changes into the usual Nepo panel. With, perhaps, an additional field letting you know if the search engine is updating its index, and something clickable to pause it or stop that entirely.

That way your user knows what's happening and WHY it's happening. Also your user can feel in control of it happening or not.

And the advanced user who just doesn't want anything to do with Nepomuk can easily ignore it.

What do you think?

illissius said...

Yeah, as far as I know these days Java is just as fast as native code (sometimes faster, sometimes slower) in terms of CPU, it 'just' gobbles a lot of memory due to the VM and all.

Cyrille Berger said...

And to make it clear, if many distributions don't adopt sesame2, it's because it doesn't come with a build system. You can either read the source, or use the binary provided by the project. And it's not how many distributions work, they work by building the software themselves from the source. As long as sesame2 doesn't provide that build system (or that no one else does it) sesame2 won't be accepted by distributions.

Sinok said...

Well sesame2 comes with standard java build system, Ant and Maven combination being a really standard way to build java projects and libraries. Ant has been here for ages and Maven really established itself during the lasts years. Moreover both are open source and came from the Apache incubator. So the problem isn't that they aren't free or unknown.

To my mind, the problem rather comes from distributions lacking java skills/background.

Till said...

I think those who complain here about slowness and an unresponsive system are hit by Strigi's awful resource management. There is a bug report about this, but there seems to be no attention paid to it yet:
https://bugs.kde.org/show_bug.cgi?id=196402

For now, what I would recommend is disabling Strigi file indexing.

I do hope that this gets fixed, though. Personally, I use tracker for desktop search. ^^

Kevin said...

Does it bring us fast e-mail search?

Man, why didn't we think about that?

Or searching in huge addressbooks, or searching for files originating from emails (attachments) or grouping contacts of the same person from different sources into meta contacts?

Oh, wait! We did.

Michael said...

The process which steels my cpu time is "nepomukstub nepomukstorage"

tokoe said...

@DionMoult This slowness is caused by Strigi which is not smart enough to figure out when to start indexing. That is no problem of Nepomuk.

@GeorgeHron To disable Strigi, goto systemsettings -> Advanced -> Desktop Search -> Uncheck the 'Strigi File Indexer' checkbox

@LukePlant Sorry but I can't take your comments serious, in nearly every comment (alos in other blogs) your are complaining about something and mostly bashing Nepomuk without any valid arguments. Could it be you have some personal problems instead of computer problems? /me stops wasting his time with answering LukePlant

@Balinares The problem is that Akonadi needs Nepomuk as search engine and therefor can't wait until the user decides to start Nepomuk...

@CyrilleBerger Uhh? But the sesame2 jar is shipped with Soprano from kdesupport, there is no need for an additional distro specific package!

@Michael Could you send a mail to trueg@kde.org, please, and ask him what the reason could be? This shouldn't really happen

lefty.crupps said...

> basically eating my share of the
> computer when I was trying to use it.

Yes, the indexing should happen only when the computer has been idle for X minutes, such as when the screen saver starts.

> Now, let's imagine that Nepomuk is
> off by default. When you fire up
> Dolphin, in lieu of the Nepo panel,
> you get a pretty pushbutton that
> reads "Turn on search engine" or
> something friendly like this. You
> click it, and the panel smoothly
> changes into the usual Nepo panel.
> With, perhaps, an additional field
> letting you know if the search engine
> is updating its index, and something
> clickable to pause it or stop that
> entirely.

I like that idea.

I have nepomuk turned on with one of my machines, and off with another, and I really don't use it anywhere and I am not sure what it is meant to do. Nepomuk, Strigi, Anokadi — what are these and why don't they just have one name? :D

Kraplax said...

Hello,
I'm using the ArchLinux distro. When i heard on IRC that there's even faster backend for soprano - virutoso - i decided to give it a spin. I checked the AUR (Arch User Repository) and there it was - soprano-virtuoso-svn. I've tried to search for sesame, there were none.
When i built soprano-virutoso-svn (packages from AUR aren't actually packages, but sources + rules to build them) i got the note about both sesame and virutoso backends being built!
So, i've checked the libsoprano_virtuosobackend.so and libsoprano_sesame2backend.so in my /var/lib/soprano/, so i guess the build process went fine. And even if not - i'd be warned.
So, i quitted KDE session, switched to tty1, changed the nepomukserverrc config so that it has sesame2 instead of redland and loggd back in.
It didn't work - Nepomuk was running but in the config it still had redland :(
What am i doing wrong? Those variables (the LD_LIBRARY_PATH) should be exported only during the build process, or during the nepomuk usage too?

Luke Plant said...

I'm sorry for what seems like constant bashing - but it's not an accurate perception.

A grand total of two of my KDE blog posts have been negative about KDE, with lots of others that are positive and helpful . I have posted at most a handful of negative comments on other people's blogs, but all with good reason. (Lots of people have agreed with my recent criticisms)

I spend many, many hours filing bugs and trying to improve KDE in ways that I can. I test beta versions where I am able, despite the instability it brings my desktop.

So it is not fair to accuse me of trolling.

I have been forced to be more vocal in my criticisms because I love KDE, and I don't want to see it die. Using recent KDE has become very painful. Unfortunately, KDE developers seem to ignore the possibility that these criticisms might have anything in them. It seems you too are ignoring the possibility that Nepomuk might have some serious performance problems.

Sorry again for my negative tone. It is just very frustrating that the KDE community is not able to respond well to valid criticism.

Nicolas said...

In kubuntu sesame2 is not working even if a package is provided. The bug is known but I don't think that it will be corrected any time soon. If you want sesame2 to work you need to add the jre you are using inside /etc/ld.so.conf
If you are not doing it your backend will always be redland.

Alex said...

@Nikolas

A symbolic link to $JRE_HOME/lib/i386/client/libjvm.so in /usr/lib also helps.

Alex said...

BTW, Redland is written in C, not C++.

Shane said...

Nepomuk really slowed my KDE4 desktop. It wasn't as much the high CPU usage, but the frequent and persistent chugging away of my hard disk. And my machine is a Sempron 3000 with 768MB RAM. Yes it was using sesame2 according to nepomukserverrc. Also this was on Arch + KDEmod.

So not a lot going for Nepomuk in my eyes. But what I still don't get is... what exactly does it do? What feature does it provide that would justify a churning hard drive every time I move a folder?

KevinCoonanMD said...

I cannot believe that in this day in age that people still think that Java is slow. The first few editions (year and years ago) were terrible, esp. w/ graphics. Sun/Oracle still has some things to learn about the GUI (SWT is much better, but still is GTK+ based, rather than QT so it always looks a little funny).

If you look at the benchmarks, use Eclipse, etc. I think you will be quite surprised that on production applications (or anything after the first start up / run) Java is as fast as C++, and the biggest difference in performance is...SURPRISE...the skill of the developer who wrote the code.