Difference between revisions of "Workdocumentation 2022-08-19"

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search
Line 174: Line 174:
 
2022-08-19 06:02:37.463 - INFO:  Setting text description to: "All literals, search with FILTER CONTAINS(?var, "...")"
 
2022-08-19 06:02:37.463 - INFO:  Setting text description to: "All literals, search with FILTER CONTAINS(?var, "...")"
 
</source>
 
</source>
 +
= Test Queries =
 +
== CEUR-WS Papercount ==
 +
 +
=== query ===
 +
<source lang='sparql'>
 +
PREFIX dblp: <https://dblp.org/rdf/schema#>
 +
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
 +
SELECT (COUNT(?paper) as ?count)
 +
WHERE {
 +
    ?proceeding dblp:publishedIn "CEUR Workshop Proceedings".
 +
    ?paper dblp:publishedAsPartOf ?proceeding.
 +
}
 +
</source>
 +
 +
[https://qlever.cs.uni-freiburg.de/dblp?query=PREFIX%20dblp%3A%20%3Chttps%3A//dblp.org/rdf/schema%23%3E%0APREFIX%20xsd%3A%20%3Chttp%3A//www.w3.org/2001/XMLSchema%23%3E%0ASELECT%20%28COUNT%28%3Fpaper%29%20as%20%3Fcount%29%0AWHERE%20%7B%20%0A%20%20%20%20%3Fproceeding%20dblp%3ApublishedIn%20%22CEUR%20Workshop%20Proceedings%22.%0A%20%20%20%20%3Fpaper%20dblp%3ApublishedAsPartOf%20%3Fproceeding.%0A%7D try it!]
 +
=== result ===
 +
{| class="wikitable" style="text-align: left;"
 +
|+ <!-- caption -->
 +
|-
 +
! align="right"|  count
 +
|-
 +
| align="right"|  45158
 +
|}

Revision as of 07:32, 19 August 2022

Participants

  • Wolfgang

Agenda

  • dblp

dblp

Import RDF Dump to QLever (39 min)

see Workdocumentation_2022-08-16#on_RWTH_Aachen_DBIS_i5_server for preparations

Steps with QLever Control script

Download and Indexing

wf@confident:/hd/torterra/dblp2022-08$ . ../qlever/qlever-control/qlever dblp

QLEVER CONFIG

Checking your PATH ...
Added the directory "/hd/torterra/qlever/qlever-control" to your PATH

Setting up bash autocompletion ...
Done, number of completions: 35

Creating new Qleverfile ...
Copied pre-configured Qleverfile for "dblp" into current directory.

Setup is complete
Type "qlever" and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
"qlever index show"). Typing "qlever" without arguments gives some basic help
and pointers for further help. Edit your local "Qleverfile" to change settings.

wf@confident:/hd/torterra/dblp2022-08$ qlever get-data

This is the "qlever" script, call without argument for help

Executing "get-data":

wget -nc -O dblp.nt.gz https://dblp.org/rdf/dblp.nt.gz

Getting data using GET_DATA_CMD from Qleverfile ...

--2022-08-19 07:16:17--  https://dblp.org/rdf/dblp.nt.gz
Resolving dblp.org (dblp.org)... 192.76.146.204
Connecting to dblp.org (dblp.org)|192.76.146.204|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2793364255 (2.6G) [application/x-gzip]
Saving to: ‘dblp.nt.gz’

dblp.nt.gz          100%[===================>]   2.60G  43.3MB/s    in 64s     

2022-08-19 07:17:21 (41.8 MB/s) - ‘dblp.nt.gz’ saved [2793364255/2793364255]


wf@confident:/hd/torterra/dblp2022-08$ qlever index

This is the "qlever" script, call without argument for help

Executing "index":

bash -c "zcat dblp.nt.gz | IndexBuilderMain -F ttl -f - -i dblp -s dblp.settings.json --words-from-literals | tee dblp.index-log.txt"

bash: IndexBuilderMain: command not found

wf@confident:/hd/torterra/dblp2022-08$ 
Max RAM usage: 0.0 GB

wf@confident:/hd/torterra/dblp2022-08$ ls
Qleverfile  dblp.index-log.txt  dblp.nt.gz  dblp.settings.json
wf@confident:/hd/torterra/dblp2022-08$ vi Qleverfile 
# modify USE_DOCKER              = true 
wf@confident:/hd/torterra/dblp2022-08$ qlever index

This is the "qlever" script, call without argument for help

Executing "index":

docker run -it --rm -u 1001:1001 -v /hd/torterra/dblp2022-08:/index -w /index --entrypoint bash --name qlever.dblp.index-build adfreiburg/qlever -c "zcat dblp.nt.gz | IndexBuilderMain -F ttl -f - -i dblp -s dblp.settings.json --words-from-literals | tee dblp.index-log.txt"

2022-08-19 05:19:12.735	- INFO:  QLever IndexBuilder, compiled on Mon Aug 15 05:40:57 UTC 2022 using git hash 406dda
2022-08-19 05:19:12.736	- INFO:  You specified the input format: TTL
2022-08-19 05:19:12.737	- INFO:  Locale was not specified in settings file, default is en_US
2022-08-19 05:19:12.737	- INFO:  You specified "locale = en_US" and "ignore-punctuation = 0"
2022-08-19 05:19:12.738	- INFO:  You specified "ascii-prefixes-only = true", which enables faster parsing for well-behaved TTL files
2022-08-19 05:19:12.738	- INFO:  You specified "num-triples-per-batch = 5,000,000", choose a lower value if the index builder runs out of memory
2022-08-19 05:19:12.738	- INFO:  Integers that cannot be represented by QLever will throw an exception (this is the default behavior)
2022-08-19 05:19:12.738	- INFO:  Processing input triples from /dev/stdin ...
2022-08-19 05:31:18.190	- INFO:  Triples converted: 100,000,000
2022-08-19 05:31:36.447	- INFO:  Triples converted: 200,000,000
2022-08-19 05:31:48.312	- INFO:  Done, total number of triples converted: 268,701,236
2022-08-19 05:31:48.318	- INFO:  Building prefix tree from internal vocabulary ...
2022-08-19 05:32:32.605	- INFO:  Computing maximally compressing prefixes (greedy algorithm) ...
2022-08-19 05:33:59.130	- INFO:  Reduction of size of internal vocabulary: 24%
2022-08-19 05:34:02.208	- INFO:  Writing compressed vocabulary to disk ...
2022-08-19 05:35:42.396	- INFO:  Creating a pair of index permutations ... 
2022-08-19 05:37:03.671	- INFO:  Statistics for PSO: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 05:37:03.674	- INFO:  Statistics for POS: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 05:37:03.675	- INFO:  Exchanging multiplicities for PSO and POS ...
2022-08-19 05:37:03.675	- INFO:  Writing meta data for PSO and POS ...
2022-08-19 05:37:08.712	- INFO:  Creating a pair of index permutations ... 
2022-08-19 05:38:11.124	- INFO:  Statistics for SPO: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 05:38:11.124	- INFO:  Statistics for SOP: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 05:38:11.124	- INFO:  Exchanging multiplicities for SPO and SOP ...
2022-08-19 05:38:21.281	- INFO:  Writing meta data for SPO and SOP ...
2022-08-19 05:38:21.385	- INFO:  Number of distinct patterns: 1,276
2022-08-19 05:38:21.385	- INFO:  Number of subjects with pattern: 44,834,357 [all]
2022-08-19 05:38:21.385	- INFO:  Total number of distinct subject-predicate pairs: 228,395,931
2022-08-19 05:38:21.385	- INFO:  Average number of predicates per subject: 5.1
2022-08-19 05:38:21.389	- INFO:  Average number of subjects per predicate: 3,625,332
2022-08-19 05:38:28.373	- INFO:  Creating a pair of index permutations ... 
2022-08-19 05:39:29.422	- INFO:  Statistics for OSP: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 05:39:29.423	- INFO:  Statistics for OPS: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 05:39:29.423	- INFO:  Exchanging multiplicities for OSP and OPS ...
2022-08-19 05:39:48.764	- INFO:  Writing meta data for OSP and OPS ...
2022-08-19 05:39:48.946	- INFO:  Index build completed
2022-08-19 05:39:49.086	- INFO:  
2022-08-19 05:39:49.086	- INFO:  Adding text index ...
2022-08-19 05:39:49.086	- INFO:  Considering each literal as a text record
2022-08-19 05:39:49.099	- INFO:  The git hash used to build this index was "406ddab3953b604f7f37e83307b8c3db5a3c04dd"
2022-08-19 05:39:49.100	- INFO:  Reading vocabulary from file dblp.vocabulary.internal ...
2022-08-19 05:39:58.361	- INFO:  Done, number of words: 92,096,717
2022-08-19 05:39:58.361	- INFO:  Building text vocabulary ...
2022-08-19 05:41:07.506	- INFO:  Writing vocabulary to file dblp.text.vocabulary ...
2022-08-19 05:41:07.592	- INFO:  Done, number of words: 9,463,510
2022-08-19 05:41:07.896	- INFO:  Building the half-inverted index lists ...
2022-08-19 05:46:10.425	- WARN:  Entity from text not in KB: "James Cummings and Ernest Schimmerling, editors. Lecture Note Series of the London Mathematical Society, vol. 406. Cambridge University Press, New York, xi + 419 pp. - Paul B. Larson, Peter Lumsdaine, and Yimu Yin. An introduction to Pmax forcing. pp. 5-23. - Simon Thomas and Scott Schneider. Countable Borel equivalence relations. pp. 25-62. - Ilijas Farah and Eric Wofsey. Set theory and operator algebras. pp. 63-119. - Justin Moore and David Milovich. A tutorial on set mapping reflection. pp. 121-144. - Vladimir G. Pestov and Aleksandra Kwiatkowska. An introduction to hyperlinear and sofic groups. pp. 145-185. - Itay Neeman and Spencer Unger. Aronszajn trees and the SCH. pp. 187-206. - Todd Eisworth, Justin Tatch Moore, and David Milovich. Iterated forcing and the Continuum Hypothesis. pp. 207-244. - Moti Gitik and Spencer Unger. Short extender forcing. pp. 245-263. - Alexander S. Kechris and Robin D. Tucker-Drob. The complexity of classification problems in ergodic theory. pp. 265-299. - Menachem Magidor and Chris Lambie-Hanson. On the strengths and weaknesses of weak squares. pp. 301-330. - Boban Veličković and Giorgio Venturi. Proper forcing remastered. pp. 331-362. - Asger ToÖrnquist and Martino Lupini. Set theory and von Neumann algebras. pp. 363-396. - W. Hugh Woodin, Jacob Davis, and Daniel RodrÍguez. The HOD dichotomy. pp. 397-419."
2022-08-19 05:47:50.808	- WARN:  Entity from text not in KB: "Natasha Dobrinen: James Cummings and Ernest Schimmerling, editors. Lecture Note Series of the London Mathematical Society, vol. 406. Cambridge University Press, New York, xi + 419 pp. - Paul B. Larson, Peter Lumsdaine, and Yimu Yin. An introduction to Pmax forcing. pp. 5-23. - Simon Thomas and Scott Schneider. Countable Borel equivalence relations. pp. 25-62. - Ilijas Farah and Eric Wofsey. Set theory and operator algebras. pp. 63-119. - Justin Moore and David Milovich. A tutorial on set mapping reflection. pp. 121-144. - Vladimir G. Pestov and Aleksandra Kwiatkowska. An introduction to hyperlinear and sofic groups. pp. 145-185. - Itay Neeman and Spencer Unger. Aronszajn trees and the SCH. pp. 187-206. - Todd Eisworth, Justin Tatch Moore, and David Milovich. Iterated forcing and the Continuum Hypothesis. pp. 207-244. - Moti Gitik and Spencer Unger. Short extender forcing. pp. 245-263. - Alexander S. Kechris and Robin D. Tucker-Drob. The complexity of classification problems in ergodic theory. pp. 265-299. - Menachem Magidor and Chris Lambie-Hanson. On the strengths and weaknesses of weak squares. pp. 301-330. - Boban Veličković and Giorgio Venturi. Proper forcing remastered. pp. 331-362. - Asger ToÖrnquist and Martino Lupini. Set theory and von Neumann algebras. pp. 363-396. - W. Hugh Woodin, Jacob Davis, and Daniel RodrÍguez. The HOD dichotomy. pp. 397-419. (2014)"
2022-08-19 05:49:30.949	- WARN:  Entity from text not in KB: "Tony Owen: Numerical Recipes Book (PASCAL) by William H. Press, Brian P. Flannery, Saul A. Teukolsky and William T. Vetterling Cambridge University Press, Cambridge, 1990, 759 pages including index (£30.00 hdb).Numerical Recipes Diskette (PASCAL) version 2.0 by William H. Press, et al. Cambridge University Press, Cambridge, 03 1990 (£21.50).Numerical Recipes Example Handbook (PASCAL) by William H. Press, Brian P. Flannery, Saul A. Teukolsky and William T. Vetterling Cambridge University Press, Cambridge, 09 1990, 223 pages including index of demonstrated procedures (£19·50, hdb).Numerical Recipes Example Diskette (PASCAL) version 2.0 by William H. Press et al. Cambridge University Press, Cambridge, 02 1990 (£21.50).Numerical Recipes Routines and Examples in Basic by Julian C. Sprott Cambridge University Press, Cambridge (paperback), 1991, 398 pages including index of programs (£19.50; pbk).Numerical Recipes Diskette Basic version 1.0 by Julian C. Sprott Cambridge University Press, Cambridge, 1991 (£21.50). (1992)"
2022-08-19 05:50:15.628	- WARN:  Number of mentions of entities not found in the vocabulary: 3
2022-08-19 05:55:07.011	- INFO:  Statistics for text index: #records = 32,052,337, #words = 256,962,549, #entities = 32,052,337, #blocks = 32,279,050
2022-08-19 05:55:12.745	- INFO:  Text index build completed

Server Start

qlever start

This is the "qlever" script, call without argument for help

Executing "start":

docker run -d --restart unless-stopped -u 1001:1001 -it -v /hd/torterra/qlever/dblp:/index -p 7015:7015 -w /index --entrypoint bash --name qlever.dblp adfreiburg/qlever -c "ServerMain -i dblp -j 8 -p 7015 -m 20 -c 5 -e 1 -k 100 -a \"dblp_620614028\" -t > dblp.server-log.txt" > /dev/null

Starting the QLever server in the background and waiting until it's ready (Ctrl+C will not kill it) ...

2022-08-19 06:02:25.290	- INFO:  QLever Server, compiled on Mon Aug 15 05:40:57 UTC 2022 using git hash 406dda
2022-08-19 06:02:25.294	- INFO:  Initializing server ...
2022-08-19 06:02:25.297	- INFO:  The git hash used to build this index was "406ddab3953b604f7f37e83307b8c3db5a3c04dd"
2022-08-19 06:02:25.298	- INFO:  Reading vocabulary from file dblp.vocabulary.internal ...
2022-08-19 06:02:33.264	- INFO:  Done, number of words: 92,096,717
2022-08-19 06:02:33.266	- INFO:  Registered PSO permutation: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 06:02:33.267	- INFO:  Registered POS permutation: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 06:02:33.268	- INFO:  Registered OPS permutation: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 06:02:33.269	- INFO:  Registered OSP permutation: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 06:02:33.270	- INFO:  Registered SPO permutation: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 06:02:33.270	- INFO:  Registered SOP permutation: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 06:02:33.270	- INFO:  Reading patterns from file dblp.index.patterns ...
2022-08-19 06:02:34.049	- INFO:  Reading vocabulary from file dblp.text.vocabulary ...
2022-08-19 06:02:34.424	- INFO:  Done, number of words: 9,463,510
2022-08-19 06:02:34.424	- INFO:  Reading metadata from file dblp.text.index ...
2022-08-19 06:02:36.068	- INFO:  Registered text index: #records = 32,052,337, #words = 256,962,549, #entities = 32,052,337, #blocks = 32,279,050
2022-08-19 06:02:36.232	- INFO:  Sorting random result tables to estimate the sorting performance of this machine ...
2022-08-19 06:02:37.124	- INFO:  Access token for restricted API calls is "****"
2022-08-19 06:02:37.124	- INFO:  The server is ready, listening for requests on port 7015 ...
2022-08-19 06:02:37.438	- INFO:  
2022-08-19 06:02:37.438	- INFO:  Request received via GET, no content type specified
2022-08-19 06:02:37.438	- INFO:  Alive check with message "from the qlever script"
2022-08-19 06:02:37.451	- INFO:  
2022-08-19 06:02:37.451	- INFO:  Request received via GET, no content type specified
2022-08-19 06:02:37.451	- INFO:  Setting index description to: "RDF from https://dblp.org/rdf/dblp.nt.gz, version from 19.08.2022 01:33"
2022-08-19 06:02:37.463	- INFO:  
2022-08-19 06:02:37.463	- INFO:  Request received via GET, no content type specified
2022-08-19 06:02:37.463	- INFO:  Setting text description to: "All literals, search with FILTER CONTAINS(?var, "...")"

Test Queries

CEUR-WS Papercount

query

PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT (COUNT(?paper) as ?count)
WHERE { 
    ?proceeding dblp:publishedIn "CEUR Workshop Proceedings".
    ?paper dblp:publishedAsPartOf ?proceeding.
}

try it!

result

count
45158