Difference between revisions of "Workdocumentation 2022-08-23"
Jump to navigation
Jump to search
Line 105: | Line 105: | ||
2022-08-24 06:51:50.419 - INFO: Input triples processed: 100,000,000 | 2022-08-24 06:51:50.419 - INFO: Input triples processed: 100,000,000 | ||
2022-08-24 06:53:13.171 - INFO: Input triples processed: 200,000,000 | 2022-08-24 06:53:13.171 - INFO: Input triples processed: 200,000,000 | ||
+ | 2022-08-24 06:56:32.749 - INFO: Triples converted: 200,000,000 | ||
+ | 2022-08-24 06:56:42.036 - INFO: Done, total number of triples converted: 264,910,951 | ||
+ | 2022-08-24 06:56:42.038 - INFO: Building prefix tree from internal vocabulary ... | ||
+ | 2022-08-24 06:57:07.519 - INFO: Computing maximally compressing prefixes (greedy algorithm) ... | ||
+ | 2022-08-24 06:58:08.866 - INFO: Reduction of size of internal vocabulary: 29% | ||
+ | 2022-08-24 06:58:11.061 - INFO: Writing compressed vocabulary to disk ... | ||
+ | 2022-08-24 06:58:42.020 - INFO: Creating a pair of index permutations ... | ||
+ | 2022-08-24 06:59:54.815 - INFO: Statistics for PSO: #relations = 65, #blocks = 523, #triples = 259,133,822 | ||
+ | 2022-08-24 06:59:54.815 - INFO: Statistics for POS: #relations = 65, #blocks = 523, #triples = 259,133,822 | ||
+ | 2022-08-24 06:59:54.815 - INFO: Exchanging multiplicities for PSO and POS ... | ||
+ | 2022-08-24 06:59:54.815 - INFO: Writing meta data for PSO and POS ... | ||
+ | 2022-08-24 06:59:58.757 - INFO: Creating a pair of index permutations ... | ||
+ | 2022-08-24 07:00:52.468 - INFO: Statistics for SPO: #relations = 44,889,872, #blocks = 330, #triples = 259,133,822 | ||
+ | 2022-08-24 07:00:52.468 - INFO: Statistics for SOP: #relations = 44,889,872, #blocks = 330, #triples = 259,133,822 | ||
+ | 2022-08-24 07:00:52.468 - INFO: Exchanging multiplicities for SPO and SOP ... | ||
+ | 2022-08-24 07:01:00.962 - INFO: Writing meta data for SPO and SOP ... | ||
+ | 2022-08-24 07:01:01.102 - INFO: Number of distinct patterns: 1,278 | ||
+ | 2022-08-24 07:01:01.102 - INFO: Number of subjects with pattern: 44,889,872 [all] | ||
+ | 2022-08-24 07:01:01.102 - INFO: Total number of distinct subject-predicate pairs: 208,837,094 | ||
+ | 2022-08-24 07:01:01.102 - INFO: Average number of predicates per subject: 4.7 | ||
+ | 2022-08-24 07:01:01.102 - INFO: Average number of subjects per predicate: 3,314,875 | ||
+ | 2022-08-24 07:01:05.633 - INFO: Creating a pair of index permutations ... | ||
+ | 2022-08-24 07:01:55.634 - INFO: Statistics for OSP: #relations = 85,991,087, #blocks = 417, #triples = 259,133,822 | ||
+ | 2022-08-24 07:01:55.634 - INFO: Statistics for OPS: #relations = 85,991,087, #blocks = 417, #triples = 259,133,822 | ||
+ | 2022-08-24 07:01:55.634 - INFO: Exchanging multiplicities for OSP and OPS ... | ||
+ | 2022-08-24 07:02:13.771 - INFO: Writing meta data for OSP and OPS ... | ||
+ | 2022-08-24 07:02:13.938 - INFO: Index build completed | ||
+ | 2022-08-24 07:02:14.108 - INFO: | ||
+ | 2022-08-24 07:02:14.108 - INFO: Adding text index ... | ||
+ | 2022-08-24 07:02:14.108 - INFO: Considering each literal as a text record | ||
+ | 2022-08-24 07:02:14.109 - INFO: The git hash used to build this index was 3d1a56 | ||
+ | 2022-08-24 07:02:14.109 - INFO: Reading vocabulary from file dblp.vocabulary.internal ... | ||
+ | 2022-08-24 07:02:15.628 - INFO: Done, number of words: 92,198,728 | ||
+ | 2022-08-24 07:02:15.628 - INFO: Number of words in external vocabulary: 3 | ||
+ | 2022-08-24 07:02:15.628 - INFO: Building text vocabulary ... | ||
+ | 2022-08-24 07:02:54.328 - INFO: Writing vocabulary to file dblp.text.vocabulary ... | ||
+ | 2022-08-24 07:02:54.404 - INFO: Done, number of words: 9,473,027 | ||
+ | 2022-08-24 07:02:54.445 - INFO: Building the half-inverted index lists ... | ||
+ | 2022-08-24 07:11:33.076 - INFO: Statistics for text index: #records = 32,082,807, #words = 257,295,941, #entities = 32,082,807, #blocks = 32,309,703 | ||
+ | 2022-08-24 07:11:34.814 - INFO: Text index build completed | ||
+ | |||
</source> | </source> |
Revision as of 08:18, 24 August 2022
Participants
Agenda
new server
New Server
- Mount two hard disks
- create "wikidata" group
- Install docker according to https://docs.docker.com/engine/install/ubuntu/
qlever dblp
clone qlever-control
wf@wikidata:/hd/mantax/qlever$ git clone https://github.com/ad-freiburg/qlever-control
Cloning into 'qlever-control'...
remote: Enumerating objects: 399, done.
remote: Counting objects: 100% (239/239), done.
remote: Compressing objects: 100% (150/150), done.
remote: Total 399 (delta 94), reused 211 (delta 88), pack-reused 160
Receiving objects: 100% (399/399), 125.08 KiB | 7.36 MiB/s, done.
Resolving deltas: 100% (149/149), done.
qlever dblp
wf@wikidata:/hd/mantax/qlever$ mkdir dblp
wf@wikidata:/hd/mantax/qlever$ cd dblp
wf@wikidata:/hd/mantax/qlever/dblp$ . ../qlever-control/qlever dblp
QLEVER CONFIG
Checking your PATH ...
Added the directory "/hd/mantax/qlever/qlever-control" to your PATH
Setting up bash autocompletion ...
Done, number of completions: 35
Creating new Qleverfile ...
No pre-configuration name specified (as argument of ". qlever"). Copied default
Qleverfile to current directory, please edit and check.
Setup is complete
Type qlever and use autocompletion to see which actions are available. Add a
"show" in the end to see what an action does without executing it (for example,
qlever index show). Edit your local Qleverfile to change settings. A typical
sequence of actions if you have used a preconfigured Qleverfile is:
qlever get-data
qlever index
qlever start
qlever example-query
qlever dblp get-data
qlever get-data
This is the "qlever" script, call without argument for help
Executing "get-data":
wget -nc -O dblp.ttl.gz https://dblp.org/rdf/dblp.ttl.gz
Getting data using GET_DATA_CMD from Qleverfile ...
--2022-08-24 08:42:22-- https://dblp.org/rdf/dblp.ttl.gz
Resolving dblp.org (dblp.org)... 192.76.146.204
Connecting to dblp.org (dblp.org)|192.76.146.204|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1068155173 (1019M) [application/x-gzip]
Saving to: ‘dblp.ttl.gz’
dblp.ttl.gz 100%[===================>] 1019M 39.2MB/s in 30s
2022-08-24 08:42:52 (33.8 MB/s) - ‘dblp.ttl.gz’ saved [1068155173/1068155173]
qlever dblp index
qlever index
This is the "qlever" script, call without argument for help
Executing "index":
docker run -it --rm -u 10000:10000 -v /hd/mantax/qlever/dblp:/index -w /index --entrypoint bash --name qlever.dblp.index-build adfreiburg/qlever -c "zcat dblp.ttl.gz | IndexBuilderMain -F ttl -f - -i dblp -s dblp.settings.json --text-words-from-literals | tee dblp.index-log.txt"
Unable to find image 'adfreiburg/qlever:latest' locally
latest: Pulling from adfreiburg/qlever
125a6e411906: Pull complete
7c46a5754a97: Pull complete
ac188e4fc015: Pull complete
e574534926c8: Pull complete
e013d756805d: Pull complete
e839a7c7b682: Pull complete
072cb0a0501c: Pull complete
a4a3efb708f4: Pull complete
Digest: sha256:23ddc354c1b85d85ce56e0659875b7c9a89bc4c6778f3d7e91140d882c681bec
Status: Downloaded newer image for adfreiburg/qlever:latest
2022-08-24 06:50:21.033 - INFO: QLever IndexBuilder, compiled on Wed Aug 24 00:11:41 UTC 2022 using git hash 3d1a56
2022-08-24 06:50:21.034 - INFO: You specified the input format: TTL
2022-08-24 06:50:21.034 - INFO: Locale was not specified in settings file, default is en_US
2022-08-24 06:50:21.034 - INFO: You specified "locale = en_US" and "ignore-punctuation = 0"
2022-08-24 06:50:21.035 - INFO: You specified "num-triples-per-batch = 5,000,000", choose a lower value if the index builder runs out of memory
2022-08-24 06:50:21.035 - INFO: Integers that cannot be represented by QLever will throw an exception (this is the default behavior)
2022-08-24 06:50:21.035 - INFO: Processing input triples from /dev/stdin ...
2022-08-24 06:51:50.419 - INFO: Input triples processed: 100,000,000
2022-08-24 06:53:13.171 - INFO: Input triples processed: 200,000,000
2022-08-24 06:56:32.749 - INFO: Triples converted: 200,000,000
2022-08-24 06:56:42.036 - INFO: Done, total number of triples converted: 264,910,951
2022-08-24 06:56:42.038 - INFO: Building prefix tree from internal vocabulary ...
2022-08-24 06:57:07.519 - INFO: Computing maximally compressing prefixes (greedy algorithm) ...
2022-08-24 06:58:08.866 - INFO: Reduction of size of internal vocabulary: 29%
2022-08-24 06:58:11.061 - INFO: Writing compressed vocabulary to disk ...
2022-08-24 06:58:42.020 - INFO: Creating a pair of index permutations ...
2022-08-24 06:59:54.815 - INFO: Statistics for PSO: #relations = 65, #blocks = 523, #triples = 259,133,822
2022-08-24 06:59:54.815 - INFO: Statistics for POS: #relations = 65, #blocks = 523, #triples = 259,133,822
2022-08-24 06:59:54.815 - INFO: Exchanging multiplicities for PSO and POS ...
2022-08-24 06:59:54.815 - INFO: Writing meta data for PSO and POS ...
2022-08-24 06:59:58.757 - INFO: Creating a pair of index permutations ...
2022-08-24 07:00:52.468 - INFO: Statistics for SPO: #relations = 44,889,872, #blocks = 330, #triples = 259,133,822
2022-08-24 07:00:52.468 - INFO: Statistics for SOP: #relations = 44,889,872, #blocks = 330, #triples = 259,133,822
2022-08-24 07:00:52.468 - INFO: Exchanging multiplicities for SPO and SOP ...
2022-08-24 07:01:00.962 - INFO: Writing meta data for SPO and SOP ...
2022-08-24 07:01:01.102 - INFO: Number of distinct patterns: 1,278
2022-08-24 07:01:01.102 - INFO: Number of subjects with pattern: 44,889,872 [all]
2022-08-24 07:01:01.102 - INFO: Total number of distinct subject-predicate pairs: 208,837,094
2022-08-24 07:01:01.102 - INFO: Average number of predicates per subject: 4.7
2022-08-24 07:01:01.102 - INFO: Average number of subjects per predicate: 3,314,875
2022-08-24 07:01:05.633 - INFO: Creating a pair of index permutations ...
2022-08-24 07:01:55.634 - INFO: Statistics for OSP: #relations = 85,991,087, #blocks = 417, #triples = 259,133,822
2022-08-24 07:01:55.634 - INFO: Statistics for OPS: #relations = 85,991,087, #blocks = 417, #triples = 259,133,822
2022-08-24 07:01:55.634 - INFO: Exchanging multiplicities for OSP and OPS ...
2022-08-24 07:02:13.771 - INFO: Writing meta data for OSP and OPS ...
2022-08-24 07:02:13.938 - INFO: Index build completed
2022-08-24 07:02:14.108 - INFO:
2022-08-24 07:02:14.108 - INFO: Adding text index ...
2022-08-24 07:02:14.108 - INFO: Considering each literal as a text record
2022-08-24 07:02:14.109 - INFO: The git hash used to build this index was 3d1a56
2022-08-24 07:02:14.109 - INFO: Reading vocabulary from file dblp.vocabulary.internal ...
2022-08-24 07:02:15.628 - INFO: Done, number of words: 92,198,728
2022-08-24 07:02:15.628 - INFO: Number of words in external vocabulary: 3
2022-08-24 07:02:15.628 - INFO: Building text vocabulary ...
2022-08-24 07:02:54.328 - INFO: Writing vocabulary to file dblp.text.vocabulary ...
2022-08-24 07:02:54.404 - INFO: Done, number of words: 9,473,027
2022-08-24 07:02:54.445 - INFO: Building the half-inverted index lists ...
2022-08-24 07:11:33.076 - INFO: Statistics for text index: #records = 32,082,807, #words = 257,295,941, #entities = 32,082,807, #blocks = 32,309,703
2022-08-24 07:11:34.814 - INFO: Text index build completed