Difference between revisions of "Workdocumentation 2022-08-19"

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search
Line 7: Line 7:
  
 
= dblp =
 
= dblp =
 +
== Import RDF Dump to QLever (39 min) ==
 
* https://github.com/ad-freiburg/qlever/issues/739
 
* https://github.com/ad-freiburg/qlever/issues/739
 +
=== Steps with QLever Control script ===
 
<source lang='bash'>
 
<source lang='bash'>
 
wf@confident:/hd/torterra/dblp2022-08$ . ../qlever/qlever-control/qlever dblp
 
wf@confident:/hd/torterra/dblp2022-08$ . ../qlever/qlever-control/qlever dblp
Line 83: Line 85:
 
2022-08-19 05:19:12.738 - INFO:  Integers that cannot be represented by QLever will throw an exception (this is the default behavior)
 
2022-08-19 05:19:12.738 - INFO:  Integers that cannot be represented by QLever will throw an exception (this is the default behavior)
 
2022-08-19 05:19:12.738 - INFO:  Processing input triples from /dev/stdin ...
 
2022-08-19 05:19:12.738 - INFO:  Processing input triples from /dev/stdin ...
2022-08-19 05:22:48.180 - INFO:  Input triples processed: 100,000,000
+
2022-08-19 05:31:18.190 - INFO:  Triples converted: 100,000,000
 +
2022-08-19 05:31:36.447 - INFO:  Triples converted: 200,000,000
 +
2022-08-19 05:31:48.312 - INFO:  Done, total number of triples converted: 268,701,236
 +
2022-08-19 05:31:48.318 - INFO:  Building prefix tree from internal vocabulary ...
 +
2022-08-19 05:32:32.605 - INFO:  Computing maximally compressing prefixes (greedy algorithm) ...
 +
2022-08-19 05:33:59.130 - INFO:  Reduction of size of internal vocabulary: 24%
 +
2022-08-19 05:34:02.208 - INFO:  Writing compressed vocabulary to disk ...
 +
2022-08-19 05:35:42.396 - INFO:  Creating a pair of index permutations ...
 +
2022-08-19 05:37:03.671 - INFO:  Statistics for PSO: #relations = 65, #blocks = 542, #triples = 268,672,977
 +
2022-08-19 05:37:03.674 - INFO:  Statistics for POS: #relations = 65, #blocks = 542, #triples = 268,672,977
 +
2022-08-19 05:37:03.675 - INFO:  Exchanging multiplicities for PSO and POS ...
 +
2022-08-19 05:37:03.675 - INFO:  Writing meta data for PSO and POS ...
 +
2022-08-19 05:37:08.712 - INFO:  Creating a pair of index permutations ...
 +
2022-08-19 05:38:11.124 - INFO:  Statistics for SPO: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
 +
2022-08-19 05:38:11.124 - INFO:  Statistics for SOP: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
 +
2022-08-19 05:38:11.124 - INFO:  Exchanging multiplicities for SPO and SOP ...
 +
2022-08-19 05:38:21.281 - INFO:  Writing meta data for SPO and SOP ...
 +
2022-08-19 05:38:21.385 - INFO:  Number of distinct patterns: 1,276
 +
2022-08-19 05:38:21.385 - INFO:  Number of subjects with pattern: 44,834,357 [all]
 +
2022-08-19 05:38:21.385 - INFO:  Total number of distinct subject-predicate pairs: 228,395,931
 +
2022-08-19 05:38:21.385 - INFO:  Average number of predicates per subject: 5.1
 +
2022-08-19 05:38:21.389 - INFO:  Average number of subjects per predicate: 3,625,332
 +
2022-08-19 05:38:28.373 - INFO:  Creating a pair of index permutations ...
 +
2022-08-19 05:39:29.422 - INFO:  Statistics for OSP: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
 +
2022-08-19 05:39:29.423 - INFO:  Statistics for OPS: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
 +
2022-08-19 05:39:29.423 - INFO:  Exchanging multiplicities for OSP and OPS ...
 +
2022-08-19 05:39:48.764 - INFO:  Writing meta data for OSP and OPS ...
 +
2022-08-19 05:39:48.946 - INFO:  Index build completed
 +
2022-08-19 05:39:49.086 - INFO: 
 +
2022-08-19 05:39:49.086 - INFO:  Adding text index ...
 +
2022-08-19 05:39:49.086 - INFO:  Considering each literal as a text record
 +
2022-08-19 05:39:49.099 - INFO:  The git hash used to build this index was "406ddab3953b604f7f37e83307b8c3db5a3c04dd"
 +
2022-08-19 05:39:49.100 - INFO:  Reading vocabulary from file dblp.vocabulary.internal ...
 +
2022-08-19 05:39:58.361 - INFO:  Done, number of words: 92,096,717
 +
2022-08-19 05:39:58.361 - INFO:  Building text vocabulary ...
 +
2022-08-19 05:41:07.506 - INFO:  Writing vocabulary to file dblp.text.vocabulary ...
 +
2022-08-19 05:41:07.592 - INFO:  Done, number of words: 9,463,510
 +
2022-08-19 05:41:07.896 - INFO:  Building the half-inverted index lists ...
 +
2022-08-19 05:46:10.425 - WARN:  Entity from text not in KB: "James Cummings and Ernest Schimmerling, editors. Lecture Note Series of the London Mathematical Society, vol. 406. Cambridge University Press, New York, xi + 419 pp. - Paul B. Larson, Peter Lumsdaine, and Yimu Yin. An introduction to Pmax forcing. pp. 5-23. - Simon Thomas and Scott Schneider. Countable Borel equivalence relations. pp. 25-62. - Ilijas Farah and Eric Wofsey. Set theory and operator algebras. pp. 63-119. - Justin Moore and David Milovich. A tutorial on set mapping reflection. pp. 121-144. - Vladimir G. Pestov and Aleksandra Kwiatkowska. An introduction to hyperlinear and sofic groups. pp. 145-185. - Itay Neeman and Spencer Unger. Aronszajn trees and the SCH. pp. 187-206. - Todd Eisworth, Justin Tatch Moore, and David Milovich. Iterated forcing and the Continuum Hypothesis. pp. 207-244. - Moti Gitik and Spencer Unger. Short extender forcing. pp. 245-263. - Alexander S. Kechris and Robin D. Tucker-Drob. The complexity of classification problems in ergodic theory. pp. 265-299. - Menachem Magidor and Chris Lambie-Hanson. On the strengths and weaknesses of weak squares. pp. 301-330. - Boban Veličković and Giorgio Venturi. Proper forcing remastered. pp. 331-362. - Asger ToÖrnquist and Martino Lupini. Set theory and von Neumann algebras. pp. 363-396. - W. Hugh Woodin, Jacob Davis, and Daniel RodrÍguez. The HOD dichotomy. pp. 397-419."
 +
2022-08-19 05:47:50.808 - WARN:  Entity from text not in KB: "Natasha Dobrinen: James Cummings and Ernest Schimmerling, editors. Lecture Note Series of the London Mathematical Society, vol. 406. Cambridge University Press, New York, xi + 419 pp. - Paul B. Larson, Peter Lumsdaine, and Yimu Yin. An introduction to Pmax forcing. pp. 5-23. - Simon Thomas and Scott Schneider. Countable Borel equivalence relations. pp. 25-62. - Ilijas Farah and Eric Wofsey. Set theory and operator algebras. pp. 63-119. - Justin Moore and David Milovich. A tutorial on set mapping reflection. pp. 121-144. - Vladimir G. Pestov and Aleksandra Kwiatkowska. An introduction to hyperlinear and sofic groups. pp. 145-185. - Itay Neeman and Spencer Unger. Aronszajn trees and the SCH. pp. 187-206. - Todd Eisworth, Justin Tatch Moore, and David Milovich. Iterated forcing and the Continuum Hypothesis. pp. 207-244. - Moti Gitik and Spencer Unger. Short extender forcing. pp. 245-263. - Alexander S. Kechris and Robin D. Tucker-Drob. The complexity of classification problems in ergodic theory. pp. 265-299. - Menachem Magidor and Chris Lambie-Hanson. On the strengths and weaknesses of weak squares. pp. 301-330. - Boban Veličković and Giorgio Venturi. Proper forcing remastered. pp. 331-362. - Asger ToÖrnquist and Martino Lupini. Set theory and von Neumann algebras. pp. 363-396. - W. Hugh Woodin, Jacob Davis, and Daniel RodrÍguez. The HOD dichotomy. pp. 397-419. (2014)"
 +
2022-08-19 05:49:30.949 - WARN:  Entity from text not in KB: "Tony Owen: Numerical Recipes Book (PASCAL) by William H. Press, Brian P. Flannery, Saul A. Teukolsky and William T. Vetterling Cambridge University Press, Cambridge, 1990, 759 pages including index (£30.00 hdb).Numerical Recipes Diskette (PASCAL) version 2.0 by William H. Press, et al. Cambridge University Press, Cambridge, 03 1990 (£21.50).Numerical Recipes Example Handbook (PASCAL) by William H. Press, Brian P. Flannery, Saul A. Teukolsky and William T. Vetterling Cambridge University Press, Cambridge, 09 1990, 223 pages including index of demonstrated procedures (£19·50, hdb).Numerical Recipes Example Diskette (PASCAL) version 2.0 by William H. Press et al. Cambridge University Press, Cambridge, 02 1990 (£21.50).Numerical Recipes Routines and Examples in Basic by Julian C. Sprott Cambridge University Press, Cambridge (paperback), 1991, 398 pages including index of programs (£19.50; pbk).Numerical Recipes Diskette Basic version 1.0 by Julian C. Sprott Cambridge University Press, Cambridge, 1991 (£21.50). (1992)"
 +
2022-08-19 05:50:15.628 - WARN:  Number of mentions of entities not found in the vocabulary: 3
 +
2022-08-19 05:55:07.011 - INFO:  Statistics for text index: #records = 32,052,337, #words = 256,962,549, #entities = 32,052,337, #blocks = 32,279,050
 +
2022-08-19 05:55:12.745 - INFO:  Text index build completed
 
</source>
 
</source>

Revision as of 07:59, 19 August 2022

Participants

  • Wolfgang

Agenda

  • dblp

dblp

Import RDF Dump to QLever (39 min)

Steps with QLever Control script

wf@confident:/hd/torterra/dblp2022-08$ . ../qlever/qlever-control/qlever dblp

QLEVER CONFIG

Checking your PATH ...
Added the directory "/hd/torterra/qlever/qlever-control" to your PATH

Setting up bash autocompletion ...
Done, number of completions: 35

Creating new Qleverfile ...
Copied pre-configured Qleverfile for "dblp" into current directory.

Setup is complete
Type "qlever" and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
"qlever index show"). Typing "qlever" without arguments gives some basic help
and pointers for further help. Edit your local "Qleverfile" to change settings.

wf@confident:/hd/torterra/dblp2022-08$ qlever get-data

This is the "qlever" script, call without argument for help

Executing "get-data":

wget -nc -O dblp.nt.gz https://dblp.org/rdf/dblp.nt.gz

Getting data using GET_DATA_CMD from Qleverfile ...

--2022-08-19 07:16:17--  https://dblp.org/rdf/dblp.nt.gz
Resolving dblp.org (dblp.org)... 192.76.146.204
Connecting to dblp.org (dblp.org)|192.76.146.204|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2793364255 (2.6G) [application/x-gzip]
Saving to: ‘dblp.nt.gz’

dblp.nt.gz          100%[===================>]   2.60G  43.3MB/s    in 64s     

2022-08-19 07:17:21 (41.8 MB/s) - ‘dblp.nt.gz’ saved [2793364255/2793364255]


wf@confident:/hd/torterra/dblp2022-08$ qlever index

This is the "qlever" script, call without argument for help

Executing "index":

bash -c "zcat dblp.nt.gz | IndexBuilderMain -F ttl -f - -i dblp -s dblp.settings.json --words-from-literals | tee dblp.index-log.txt"

bash: IndexBuilderMain: command not found

wf@confident:/hd/torterra/dblp2022-08$ 
Max RAM usage: 0.0 GB

wf@confident:/hd/torterra/dblp2022-08$ ls
Qleverfile  dblp.index-log.txt  dblp.nt.gz  dblp.settings.json
wf@confident:/hd/torterra/dblp2022-08$ vi Qleverfile 
# modify USE_DOCKER              = true 
wf@confident:/hd/torterra/dblp2022-08$ qlever index

This is the "qlever" script, call without argument for help

Executing "index":

docker run -it --rm -u 1001:1001 -v /hd/torterra/dblp2022-08:/index -w /index --entrypoint bash --name qlever.dblp.index-build adfreiburg/qlever -c "zcat dblp.nt.gz | IndexBuilderMain -F ttl -f - -i dblp -s dblp.settings.json --words-from-literals | tee dblp.index-log.txt"

2022-08-19 05:19:12.735	- INFO:  QLever IndexBuilder, compiled on Mon Aug 15 05:40:57 UTC 2022 using git hash 406dda
2022-08-19 05:19:12.736	- INFO:  You specified the input format: TTL
2022-08-19 05:19:12.737	- INFO:  Locale was not specified in settings file, default is en_US
2022-08-19 05:19:12.737	- INFO:  You specified "locale = en_US" and "ignore-punctuation = 0"
2022-08-19 05:19:12.738	- INFO:  You specified "ascii-prefixes-only = true", which enables faster parsing for well-behaved TTL files
2022-08-19 05:19:12.738	- INFO:  You specified "num-triples-per-batch = 5,000,000", choose a lower value if the index builder runs out of memory
2022-08-19 05:19:12.738	- INFO:  Integers that cannot be represented by QLever will throw an exception (this is the default behavior)
2022-08-19 05:19:12.738	- INFO:  Processing input triples from /dev/stdin ...
2022-08-19 05:31:18.190	- INFO:  Triples converted: 100,000,000
2022-08-19 05:31:36.447	- INFO:  Triples converted: 200,000,000
2022-08-19 05:31:48.312	- INFO:  Done, total number of triples converted: 268,701,236
2022-08-19 05:31:48.318	- INFO:  Building prefix tree from internal vocabulary ...
2022-08-19 05:32:32.605	- INFO:  Computing maximally compressing prefixes (greedy algorithm) ...
2022-08-19 05:33:59.130	- INFO:  Reduction of size of internal vocabulary: 24%
2022-08-19 05:34:02.208	- INFO:  Writing compressed vocabulary to disk ...
2022-08-19 05:35:42.396	- INFO:  Creating a pair of index permutations ... 
2022-08-19 05:37:03.671	- INFO:  Statistics for PSO: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 05:37:03.674	- INFO:  Statistics for POS: #relations = 65, #blocks = 542, #triples = 268,672,977
2022-08-19 05:37:03.675	- INFO:  Exchanging multiplicities for PSO and POS ...
2022-08-19 05:37:03.675	- INFO:  Writing meta data for PSO and POS ...
2022-08-19 05:37:08.712	- INFO:  Creating a pair of index permutations ... 
2022-08-19 05:38:11.124	- INFO:  Statistics for SPO: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 05:38:11.124	- INFO:  Statistics for SOP: #relations = 44,834,357, #blocks = 342, #triples = 268,672,977
2022-08-19 05:38:11.124	- INFO:  Exchanging multiplicities for SPO and SOP ...
2022-08-19 05:38:21.281	- INFO:  Writing meta data for SPO and SOP ...
2022-08-19 05:38:21.385	- INFO:  Number of distinct patterns: 1,276
2022-08-19 05:38:21.385	- INFO:  Number of subjects with pattern: 44,834,357 [all]
2022-08-19 05:38:21.385	- INFO:  Total number of distinct subject-predicate pairs: 228,395,931
2022-08-19 05:38:21.385	- INFO:  Average number of predicates per subject: 5.1
2022-08-19 05:38:21.389	- INFO:  Average number of subjects per predicate: 3,625,332
2022-08-19 05:38:28.373	- INFO:  Creating a pair of index permutations ... 
2022-08-19 05:39:29.422	- INFO:  Statistics for OSP: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 05:39:29.423	- INFO:  Statistics for OPS: #relations = 85,894,696, #blocks = 435, #triples = 268,672,977
2022-08-19 05:39:29.423	- INFO:  Exchanging multiplicities for OSP and OPS ...
2022-08-19 05:39:48.764	- INFO:  Writing meta data for OSP and OPS ...
2022-08-19 05:39:48.946	- INFO:  Index build completed
2022-08-19 05:39:49.086	- INFO:  
2022-08-19 05:39:49.086	- INFO:  Adding text index ...
2022-08-19 05:39:49.086	- INFO:  Considering each literal as a text record
2022-08-19 05:39:49.099	- INFO:  The git hash used to build this index was "406ddab3953b604f7f37e83307b8c3db5a3c04dd"
2022-08-19 05:39:49.100	- INFO:  Reading vocabulary from file dblp.vocabulary.internal ...
2022-08-19 05:39:58.361	- INFO:  Done, number of words: 92,096,717
2022-08-19 05:39:58.361	- INFO:  Building text vocabulary ...
2022-08-19 05:41:07.506	- INFO:  Writing vocabulary to file dblp.text.vocabulary ...
2022-08-19 05:41:07.592	- INFO:  Done, number of words: 9,463,510
2022-08-19 05:41:07.896	- INFO:  Building the half-inverted index lists ...
2022-08-19 05:46:10.425	- WARN:  Entity from text not in KB: "James Cummings and Ernest Schimmerling, editors. Lecture Note Series of the London Mathematical Society, vol. 406. Cambridge University Press, New York, xi + 419 pp. - Paul B. Larson, Peter Lumsdaine, and Yimu Yin. An introduction to Pmax forcing. pp. 5-23. - Simon Thomas and Scott Schneider. Countable Borel equivalence relations. pp. 25-62. - Ilijas Farah and Eric Wofsey. Set theory and operator algebras. pp. 63-119. - Justin Moore and David Milovich. A tutorial on set mapping reflection. pp. 121-144. - Vladimir G. Pestov and Aleksandra Kwiatkowska. An introduction to hyperlinear and sofic groups. pp. 145-185. - Itay Neeman and Spencer Unger. Aronszajn trees and the SCH. pp. 187-206. - Todd Eisworth, Justin Tatch Moore, and David Milovich. Iterated forcing and the Continuum Hypothesis. pp. 207-244. - Moti Gitik and Spencer Unger. Short extender forcing. pp. 245-263. - Alexander S. Kechris and Robin D. Tucker-Drob. The complexity of classification problems in ergodic theory. pp. 265-299. - Menachem Magidor and Chris Lambie-Hanson. On the strengths and weaknesses of weak squares. pp. 301-330. - Boban Veličković and Giorgio Venturi. Proper forcing remastered. pp. 331-362. - Asger ToÖrnquist and Martino Lupini. Set theory and von Neumann algebras. pp. 363-396. - W. Hugh Woodin, Jacob Davis, and Daniel RodrÍguez. The HOD dichotomy. pp. 397-419."
2022-08-19 05:47:50.808	- WARN:  Entity from text not in KB: "Natasha Dobrinen: James Cummings and Ernest Schimmerling, editors. Lecture Note Series of the London Mathematical Society, vol. 406. Cambridge University Press, New York, xi + 419 pp. - Paul B. Larson, Peter Lumsdaine, and Yimu Yin. An introduction to Pmax forcing. pp. 5-23. - Simon Thomas and Scott Schneider. Countable Borel equivalence relations. pp. 25-62. - Ilijas Farah and Eric Wofsey. Set theory and operator algebras. pp. 63-119. - Justin Moore and David Milovich. A tutorial on set mapping reflection. pp. 121-144. - Vladimir G. Pestov and Aleksandra Kwiatkowska. An introduction to hyperlinear and sofic groups. pp. 145-185. - Itay Neeman and Spencer Unger. Aronszajn trees and the SCH. pp. 187-206. - Todd Eisworth, Justin Tatch Moore, and David Milovich. Iterated forcing and the Continuum Hypothesis. pp. 207-244. - Moti Gitik and Spencer Unger. Short extender forcing. pp. 245-263. - Alexander S. Kechris and Robin D. Tucker-Drob. The complexity of classification problems in ergodic theory. pp. 265-299. - Menachem Magidor and Chris Lambie-Hanson. On the strengths and weaknesses of weak squares. pp. 301-330. - Boban Veličković and Giorgio Venturi. Proper forcing remastered. pp. 331-362. - Asger ToÖrnquist and Martino Lupini. Set theory and von Neumann algebras. pp. 363-396. - W. Hugh Woodin, Jacob Davis, and Daniel RodrÍguez. The HOD dichotomy. pp. 397-419. (2014)"
2022-08-19 05:49:30.949	- WARN:  Entity from text not in KB: "Tony Owen: Numerical Recipes Book (PASCAL) by William H. Press, Brian P. Flannery, Saul A. Teukolsky and William T. Vetterling Cambridge University Press, Cambridge, 1990, 759 pages including index (£30.00 hdb).Numerical Recipes Diskette (PASCAL) version 2.0 by William H. Press, et al. Cambridge University Press, Cambridge, 03 1990 (£21.50).Numerical Recipes Example Handbook (PASCAL) by William H. Press, Brian P. Flannery, Saul A. Teukolsky and William T. Vetterling Cambridge University Press, Cambridge, 09 1990, 223 pages including index of demonstrated procedures (£19·50, hdb).Numerical Recipes Example Diskette (PASCAL) version 2.0 by William H. Press et al. Cambridge University Press, Cambridge, 02 1990 (£21.50).Numerical Recipes Routines and Examples in Basic by Julian C. Sprott Cambridge University Press, Cambridge (paperback), 1991, 398 pages including index of programs (£19.50; pbk).Numerical Recipes Diskette Basic version 1.0 by Julian C. Sprott Cambridge University Press, Cambridge, 1991 (£21.50). (1992)"
2022-08-19 05:50:15.628	- WARN:  Number of mentions of entities not found in the vocabulary: 3
2022-08-19 05:55:07.011	- INFO:  Statistics for text index: #records = 32,052,337, #words = 256,962,549, #entities = 32,052,337, #blocks = 32,279,050
2022-08-19 05:55:12.745	- INFO:  Text index build completed