The complete tutorial for RDF data ingestion in Virtuoso

Skip to end of metadata
Go to start of metadata

Here are all the steps needed to perform a clean RDF ingestion on Virtuoso

  1. Tune Virtuoso configuration (virtuoso.ini file located in your database directory, default is database/) as per http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtRDFPerformanceTuning. In most cases, you just need to change NumberOfBuffers and MaxDirtyBuffers parameters depending on your machine specs.
  2. Put your dataset somewhere in the machine that hosts Virtuoso
  3. Add the path where you put your dataset to the DirsAllowed parameter in virtuoso.ini (make sure all paths are separated by commas)
    e.g. DirsAllowed = ., /home/someone/stuff, /home/me/dataset
  4. Open an isql connection with the command isql <HOST>[:<PORT>] <LOGIN> <PASSWORD> (default credentials are dba dba)
    e.g. isql localhost dba dba
    Now you should be in isql command line interface (SQL> prompt)
  5. ld_dir('<source-filename-or-directory>', '<file name pattern>', '<graph iri>');
    e.g. to load multiple gzipped files, do
    ld_dir('/home/me/dataset', '*.gz', 'http://my.default.graph');
  6. set isolation='uncommitted';
  7. rdf_loader_run();
Note that you can run more loader sessions in parallel, but pay attention because each session is a I/O intensive operation.

Okay, the ingestion has started! Now, how can you monitor it?

  1. Open another isql connection
  2. set isolation='uncommitted';
  3. The files that are queued for ingestion can be found in the SQL table load_list. Check the value of the following fields:
  • ll_state
    • 0 means the file has not been processed yet
    • 1 means it's being processed
    • 2 means it has been processed
  • ll_error (useful only if ll_state=2)
    • NULL means the file was properly ingested, otherwise you should see the error code and reason

Therefore:

  • How many files have been processed?
    select count(1) from load_list where ll_state=2;
  • How many files left?
    select count(1) from load_list where ll_state=0;

If you want to stop the ingestion for some reason

  • rdf_load_stop(); (soft stop)
  • txn_killall(1); (brute force stop)

What to do if you get into a Virtuoso deadlock, i.e. everything is blocked and the server is not responding?!?

  1. Try opening a new isql connection and run status(); If it gets stuck and there is no output, congratulations! you are in a deadlock.
  2. Ctrl-c all your isql connections
  3. Check with ps command there are no isql connections running
  4. sudo killall -9 virtuoso
  5. Remove virtuoso.lck file located in your database directory
  6. Restart the server

How to restart an ingestion that stopped unexpectedly

There will be one load_list record with ll_state=1: if you update that one back to 0, it will be reprocessed

  1. Open an isql connection
  2. update load_list set ll_state=0 where ll_state=1;
  3. set isolation='uncommitted';
  4. rdf_loader_run();
    It should continue where it left off.

A big thanks goes to Patrick van Kleef from OpenLink for the support.

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.