Difference between revisions of "Editor Extraction and Reconciliation"
Jump to navigation
Jump to search
Tim Holzheim (talk | contribs) |
|||
(One intermediate revision by one other user not shown) | |||
Line 182: | Line 182: | ||
} | } | ||
</source> | </source> | ||
+ | Depending on the available identifiers the query is adjusted accordingly by adding the corresponding OPTIONAL clauses. | ||
Running these queries for all 4942 editors known by dblp we get: | Running these queries for all 4942 editors known by dblp we get: | ||
Line 189: | Line 190: | ||
The figure below shows the distribution of the available identifiers depending of the three categories identified, conflict, unkown | The figure below shows the distribution of the available identifiers depending of the three categories identified, conflict, unkown | ||
+ | |||
+ | |||
[[File:editors_wikidata_reconciliation.png|800px]] | [[File:editors_wikidata_reconciliation.png|800px]] | ||
+ | [[Category:Text2KG]] |
Latest revision as of 10:07, 10 March 2023
Editor Extraction
- covered volumes 1-3354
- optimized for volumes 600+
- 11764 Editor records
- for 228 volumes no editors could be extracted
Reconciliation
dblp reconciliation
Volume Editors of CEUR-WS in dblp
PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT DISTINCT ?vol_number
(GROUP_CONCAT(DISTINCT ?name; separator="|") as ?names)
(GROUP_CONCAT(DISTINCT ?dblp_id; separator="|") as ?concat_dblp_id)
WHERE {
?volume dblp:publishedIn "CEUR Workshop Proceedings" ;
dblp:publishedInSeries "CEUR Workshop Proceedings" ;
dblp:publishedInSeriesVolume ?vol_number;
dblp:hasSignature ?editors.
?editors dblp:signatureDblpName ?name ;
dblp:signatureCreator ?dblp_id ;
dblp:signatureOrdinal ?editor_ordinal ;
dblp:signaturePublication ?dblp_publication_id ;
a dblp:EditorSignature.
}
GROUP BY ?vol_number
Volume Editors of CEUR-WS in dblp with identifiers
PREFIX datacite: <http://purl.org/spar/datacite/>
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX litre: <http://purl.org/spar/literal/>
SELECT DISTINCT
(group_concat(DISTINCT ?nameVar;separator='|') as ?name)
(group_concat(DISTINCT ?homepageVar;separator='|') as ?homepage)
(group_concat(DISTINCT ?affiliationVar;separator='|') as ?affiliation)
(group_concat(DISTINCT ?dblpVar;separator='|') as ?dblp)
(group_concat(DISTINCT ?wikidataVar;separator='|') as ?wikidata)
(group_concat(DISTINCT ?orcidVar;separator='|') as ?orcid)
(group_concat(DISTINCT ?googleScholarVar;separator='|') as ?googleScholar)
(group_concat(DISTINCT ?acmVar;separator='|') as ?acm)
(group_concat(DISTINCT ?twitterVar;separator='|') as ?twitter)
(group_concat(DISTINCT ?githubVar;separator='|') as ?github)
(group_concat(DISTINCT ?viafVar;separator='|') as ?viaf)
(group_concat(DISTINCT ?scigraphVar;separator='|') as ?scigraph)
(group_concat(DISTINCT ?zbmathVar;separator='|') as ?zbmath)
(group_concat(DISTINCT ?researchGateVar;separator='|') as ?researchGate)
(group_concat(DISTINCT ?mathGenealogyVar;separator='|') as ?mathGenealogy)
(group_concat(DISTINCT ?locVar;separator='|') as ?loc)
(group_concat(DISTINCT ?linkedinVar;separator='|') as ?linkedin)
(group_concat(DISTINCT ?lattesVar;separator='|') as ?lattes)
(group_concat(DISTINCT ?isniVar;separator='|') as ?isni)
(group_concat(DISTINCT ?ieeeVar;separator='|') as ?ieee)
(group_concat(DISTINCT ?geprisVar;separator='|') as ?gepris)
(group_concat(DISTINCT ?gndVar;separator='|') as ?gnd)
WHERE{
?proceeding dblp:publishedIn "CEUR Workshop Proceedings";
dblp:publishedInSeriesVolume ?volume;
dblp:editedBy ?editor.
?editor dblp:primaryCreatorName ?nameVar.
OPTIONAL{?editor dblp:primaryHomepage ?homepageVar.}
OPTIONAL{?editor dblp:primaryAffiliation ?affiliationVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?dblp_blank.
?dblp_blank datacite:usesIdentifierScheme datacite:dblp;
litre:hasLiteralValue ?dblpVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?wikidata_blank.
?wikidata_blank datacite:usesIdentifierScheme datacite:wikidata;
litre:hasLiteralValue ?wikidataVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?orcid_blank.
?orcid_blank datacite:usesIdentifierScheme datacite:orcid;
litre:hasLiteralValue ?orcidVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?googleScholar_blank.
?googleScholar_blank datacite:usesIdentifierScheme datacite:google-scholar;
litre:hasLiteralValue ?googleScholarVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?acm_blank.
?acm_blank datacite:usesIdentifierScheme datacite:acm;
litre:hasLiteralValue ?acmVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?twitter_blank.
?twitter_blank datacite:usesIdentifierScheme datacite:twitter;
litre:hasLiteralValue ?twitterVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?github_blank.
?github_blank datacite:usesIdentifierScheme datacite:github;
litre:hasLiteralValue ?githubVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?viaf_blank.
?viaf_blank datacite:usesIdentifierScheme datacite:viaf;
litre:hasLiteralValue ?viafVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?scigraph_blank.
?scigraph_blank datacite:usesIdentifierScheme datacite:scigraph;
litre:hasLiteralValue ?scigraphVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?zbmath_blank.
?zbmath_blank datacite:usesIdentifierScheme datacite:zbmath;
litre:hasLiteralValue ?zbmathVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?researchGate_blank.
?researchGate_blank datacite:usesIdentifierScheme datacite:research-gate;
litre:hasLiteralValue ?researchGateVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?mathGenealogy_blank.
?mathGenealogy_blank datacite:usesIdentifierScheme datacite:math-genealogy;
litre:hasLiteralValue ?mathGenealogyVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?loc_blank.
?loc_blank datacite:usesIdentifierScheme datacite:loc;
litre:hasLiteralValue ?locVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?linkedin_blank.
?linkedin_blank datacite:usesIdentifierScheme datacite:linkedin;
litre:hasLiteralValue ?linkedinVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?lattes_blank.
?lattes_blank datacite:usesIdentifierScheme datacite:lattes;
litre:hasLiteralValue ?lattesVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?isni_blank.
?isni_blank datacite:usesIdentifierScheme datacite:isni;
litre:hasLiteralValue ?isniVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?ieee_blank.
?ieee_blank datacite:usesIdentifierScheme datacite:ieee;
litre:hasLiteralValue ?ieeeVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?gepris_blank.
?gepris_blank datacite:usesIdentifierScheme datacite:gepris;
litre:hasLiteralValue ?geprisVar.}
OPTIONAL{
?editor datacite:hasIdentifier ?gnd_blank.
?gnd_blank datacite:usesIdentifierScheme datacite:gnd;
litre:hasLiteralValue ?gndVar.}
}
GROUP BY ?editor
Comparing Extracted and dblp Editors
- editor by volume comparison
- 2233 volume the extracted editors match the dblp editors
- 807 volumes are missing in dblp (editors extracted)
- 27 volumes more editors were extracted than in dblp
- 387 volumes dblp has more editors than we could extract
- 9321 out of 11764 editor records can be reconciled
- 79.23%
Wikidata Reconciliation
Using the ids queried from dblp to find the corresponding wikidata entry.
Current strategy:
- Input
- List of different identifiers that are known about a editor
- Output
- SPARQL query
Example:
- Input:
- homepage: http://www.stefandecker.org
- gnd id: 173443443
- dblp author id: d/StefanDecker
- Output:
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?person ?personLabel
WHERE
{
{OPTIONAL{ ?person wdt:P856 <http://www.stefandecker.org>.} }
UNION
{OPTIONAL{ ?person wdt:P227 "173443443".} } # gnd
UNION
{OPTIONAL{ ?person wdt:P2456 "d/StefanDecker".} } # dblp
?person rdfs:label ?personLabel. FILTER(lang(?personLabel)="en")
}
Depending on the available identifiers the query is adjusted accordingly by adding the corresponding OPTIONAL clauses.
Running these queries for all 4942 editors known by dblp we get:
- 1467 editors were found in wikidata
- 62 editor records in dblp have a conflict with wikidata
- 3413 dblp editor records were not found in wikidata
The figure below shows the distribution of the available identifiers depending of the three categories identified, conflict, unkown