Difference between revisions of "Editor Extraction and Reconciliation"

From BITPlan ceur-ws Wiki
Jump to navigation Jump to search
Line 182: Line 182:
 
}
 
}
 
</source>
 
</source>
 +
Depending on the available identifiers the query is adjusted accordingly by adding the corresponding OPTIONAL clauses.
  
 
Running these queries for all 4942 editors known by dblp we get:
 
Running these queries for all 4942 editors known by dblp we get:
Line 189: Line 190:
  
 
The figure below shows the distribution of the available identifiers depending of the three categories identified, conflict, unkown
 
The figure below shows the distribution of the available identifiers depending of the three categories identified, conflict, unkown
 +
 +
 
[[File:editors_wikidata_reconciliation.png|800px]]
 
[[File:editors_wikidata_reconciliation.png|800px]]

Revision as of 10:47, 9 March 2023

Editor Extraction

  • covered volumes 1-3354
    • optimized for volumes 600+
  • 11764 Editor records
  • for 228 volumes no editors could be extracted

Volume editor distribution.png

Reconciliation

dblp reconciliation

Volume Editors of CEUR-WS in dblp

PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT DISTINCT ?vol_number 
   (GROUP_CONCAT(DISTINCT ?name; separator="|") as ?names) 
   (GROUP_CONCAT(DISTINCT ?dblp_id; separator="|") as ?concat_dblp_id)
WHERE {
  ?volume dblp:publishedIn "CEUR Workshop Proceedings" ;
    dblp:publishedInSeries "CEUR Workshop Proceedings" ;
    dblp:publishedInSeriesVolume ?vol_number;
    dblp:hasSignature ?editors.
    ?editors dblp:signatureDblpName ?name ;
        dblp:signatureCreator ?dblp_id ;
        dblp:signatureOrdinal ?editor_ordinal ;
        dblp:signaturePublication ?dblp_publication_id ;
        a dblp:EditorSignature.
}
GROUP BY  ?vol_number

Volume Editors of CEUR-WS in dblp with identifiers

PREFIX datacite: <http://purl.org/spar/datacite/>
PREFIX dblp: <https://dblp.org/rdf/schema#>
PREFIX litre: <http://purl.org/spar/literal/>
SELECT DISTINCT 
	(group_concat(DISTINCT ?nameVar;separator='|') as ?name) 
	(group_concat(DISTINCT ?homepageVar;separator='|') as ?homepage)
	(group_concat(DISTINCT ?affiliationVar;separator='|') as ?affiliation)
	(group_concat(DISTINCT ?dblpVar;separator='|') as ?dblp)
	(group_concat(DISTINCT ?wikidataVar;separator='|') as ?wikidata)
	(group_concat(DISTINCT ?orcidVar;separator='|') as ?orcid)
	(group_concat(DISTINCT ?googleScholarVar;separator='|') as ?googleScholar)
	(group_concat(DISTINCT ?acmVar;separator='|') as ?acm)
	(group_concat(DISTINCT ?twitterVar;separator='|') as ?twitter)
	(group_concat(DISTINCT ?githubVar;separator='|') as ?github)
	(group_concat(DISTINCT ?viafVar;separator='|') as ?viaf)
	(group_concat(DISTINCT ?scigraphVar;separator='|') as ?scigraph)
	(group_concat(DISTINCT ?zbmathVar;separator='|') as ?zbmath)
	(group_concat(DISTINCT ?researchGateVar;separator='|') as ?researchGate)
	(group_concat(DISTINCT ?mathGenealogyVar;separator='|') as ?mathGenealogy)
	(group_concat(DISTINCT ?locVar;separator='|') as ?loc)
	(group_concat(DISTINCT ?linkedinVar;separator='|') as ?linkedin)
	(group_concat(DISTINCT ?lattesVar;separator='|') as ?lattes)
	(group_concat(DISTINCT ?isniVar;separator='|') as ?isni)
	(group_concat(DISTINCT ?ieeeVar;separator='|') as ?ieee)
	(group_concat(DISTINCT ?geprisVar;separator='|') as ?gepris)
	(group_concat(DISTINCT ?gndVar;separator='|') as ?gnd)
WHERE{
	?proceeding dblp:publishedIn "CEUR Workshop Proceedings";
		dblp:publishedInSeriesVolume ?volume;
		dblp:editedBy ?editor.
	?editor dblp:primaryCreatorName ?nameVar.
	OPTIONAL{?editor dblp:primaryHomepage ?homepageVar.}
	OPTIONAL{?editor dblp:primaryAffiliation ?affiliationVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?dblp_blank.
		?dblp_blank datacite:usesIdentifierScheme datacite:dblp;
		litre:hasLiteralValue ?dblpVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?wikidata_blank.
		?wikidata_blank datacite:usesIdentifierScheme datacite:wikidata;
		litre:hasLiteralValue ?wikidataVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?orcid_blank.
		?orcid_blank datacite:usesIdentifierScheme datacite:orcid;
		litre:hasLiteralValue ?orcidVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?googleScholar_blank.
		?googleScholar_blank datacite:usesIdentifierScheme datacite:google-scholar;
		litre:hasLiteralValue ?googleScholarVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?acm_blank.
		?acm_blank datacite:usesIdentifierScheme datacite:acm;
		litre:hasLiteralValue ?acmVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?twitter_blank.
		?twitter_blank datacite:usesIdentifierScheme datacite:twitter;
		litre:hasLiteralValue ?twitterVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?github_blank.
		?github_blank datacite:usesIdentifierScheme datacite:github;
		litre:hasLiteralValue ?githubVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?viaf_blank.
		?viaf_blank datacite:usesIdentifierScheme datacite:viaf;
		litre:hasLiteralValue ?viafVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?scigraph_blank.
		?scigraph_blank datacite:usesIdentifierScheme datacite:scigraph;
		litre:hasLiteralValue ?scigraphVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?zbmath_blank.
		?zbmath_blank datacite:usesIdentifierScheme datacite:zbmath;
		litre:hasLiteralValue ?zbmathVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?researchGate_blank.
		?researchGate_blank datacite:usesIdentifierScheme datacite:research-gate;
		litre:hasLiteralValue ?researchGateVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?mathGenealogy_blank.
		?mathGenealogy_blank datacite:usesIdentifierScheme datacite:math-genealogy;
		litre:hasLiteralValue ?mathGenealogyVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?loc_blank.
		?loc_blank datacite:usesIdentifierScheme datacite:loc;
		litre:hasLiteralValue ?locVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?linkedin_blank.
		?linkedin_blank datacite:usesIdentifierScheme datacite:linkedin;
		litre:hasLiteralValue ?linkedinVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?lattes_blank.
		?lattes_blank datacite:usesIdentifierScheme datacite:lattes;
		litre:hasLiteralValue ?lattesVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?isni_blank.
		?isni_blank datacite:usesIdentifierScheme datacite:isni;
		litre:hasLiteralValue ?isniVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?ieee_blank.
		?ieee_blank datacite:usesIdentifierScheme datacite:ieee;
		litre:hasLiteralValue ?ieeeVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?gepris_blank.
		?gepris_blank datacite:usesIdentifierScheme datacite:gepris;
		litre:hasLiteralValue ?geprisVar.}
	OPTIONAL{
		?editor datacite:hasIdentifier ?gnd_blank.
		?gnd_blank datacite:usesIdentifierScheme datacite:gnd;
		litre:hasLiteralValue ?gndVar.}
}
GROUP BY ?editor

Comparing Extracted and dblp Editors

  • editor by volume comparison
    • 2233 volume the extracted editors match the dblp editors
    • 807 volumes are missing in dblp (editors extracted)
    • 27 volumes more editors were extracted than in dblp
    • 387 volumes dblp has more editors than we could extract
  • 9321 out of 11764 editor records can be reconciled
    • 79.23%

Wikidata Reconciliation

Using the ids queried from dblp to find the corresponding wikidata entry.

Current strategy:

Input
List of different identifiers that are known about a editor
Output
SPARQL query

Example:

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT DISTINCT ?person ?personLabel
WHERE
{
  {OPTIONAL{ ?person wdt:P856 <http://www.stefandecker.org>.} }
  UNION
  {OPTIONAL{ ?person wdt:P227 "173443443".} } # gnd
  UNION
  {OPTIONAL{ ?person wdt:P2456 "d/StefanDecker".} } # dblp
  ?person rdfs:label ?personLabel. FILTER(lang(?personLabel)="en")
}

Depending on the available identifiers the query is adjusted accordingly by adding the corresponding OPTIONAL clauses.

Running these queries for all 4942 editors known by dblp we get:

  • 1467 editors were found in wikidata
  • 62 editor records in dblp have a conflict with wikidata
  • 3413 dblp editor records were not found in wikidata

The figure below shows the distribution of the available identifiers depending of the three categories identified, conflict, unkown


Editors wikidata reconciliation.png