WebStar - Data Sets

Version 16 by markar
on Jul 23, 2009 15:06.

compared with
Current by markar
on Aug 26, 2009 11:38.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (1)

View Page History
This dataset spans a number of big news events (the Olympics; both US presidential nominating conventions; the beginnings of the financial crisis; ...) as well as everything else you might expect to find posted to blogs."[ICWSM09|http://www.icwsm.org/2009/data/] | compressed XML (tar.gz) | [HTTPS|https://webstar.deri.ie/datasets/icwsm2009/] | NO |
| SINDICE-DUMP | This data set is a sindice dump from March 2009. Most of the sindice dump can be also find in the BTC09 data set | [NQ format|http://sw.deri.org/2008/07/n-quads/] \- 100GB uncompressed, 7.5GB gz | [HTTPS|https://webstar.deri.ie/datasets/sindice/] | NO |
| boards.ie \\ | Ten years of discussions from the Irish forum site [boards.ie|http://boards.ie] \- from year 1998 to 2008.  Transformed form from the SIOC format. All forums, threads, users, posts - a [general description of the data|http://wiki.sioc-project.org/index.php/Data/Boards.ie/Structure] is available. There is also a [graphical representation|https://dev.deri.ie/confluence/download/attachments/16777326/boards.ie-schema.png] of the schema available and some simple [example queries|https://dev.deri.ie/confluence/download/attachments/16777326/boards.ie-exmpl-queries.txt]. Raw data and SPARQL access only available after agreeing to a [license|https://dev.deri.ie/confluence/download/attachments/16777326/boards.ie-license.txt], please contact Marcel Karnstedt from the UIMR unit. \\ | RDF/XML \\
N-Triples \\ | \-\- \\ | YES \\ |