Semantic web

Advanced

The Semantic Web is a set of standards from the W3C to bring graph data and knowledge graphs to the web. Linked data is well suited for integrating data from heterogeneous sources and for situations where the data model is undergoing constant changes. As a distributed system, this is precisely the nature of the web.

The building block of the semantic web is the resource description framework (RDF). Resources are denoted by international resource identifiers (an extended version of URLs), and become related to each other when sets of triples are defined. Triples consist of a subject, a predicate and an object (all of which are resources). A set of triples thus defines a graph. The W3C also defines a query language for RDF graphs, the SPARQL query language.

RDF graphs themselves are meaningless, just a bunch of related identifiers. Therefore, a set of standards meant to allow this graph data to gain meaning, effectively transforming them to knowledge graphs, were also published by the W3C, ranging from the simplest, RDFS to the generalist web ontology language based on description logics and several specialized vocabularies, such as DCAT.

I have worked both on data engineering and software development projects involving semantic web technologies. The data engineering projects involved building ETL pipelines to transform data in tabular format to a knowledge graph based on an OWL ontology (UrWerk project), as well as "the reverse", transforming semantic web vocabularies to JSON, with the aim of indexing them in Elasticsearch (EOSC-Pillar project).

On the software engineering side, I took over the development of SimPhoNy, a Python software framework designed to manipulate knowledge graphs and more importantly, to provide a knowledge-graph based interface to simulation engines, data repositories and databases.