Français Anglais
Accueil Annuaire Plan du site
Home > Research results > Dissertations & habilitations
Research results
Faculty habilitation de SAÏS Fatiha
SAÏS Fatiha
Faculty habilitation
Group : Large-scale Heterogeneous DAta and Knowledge

Knowledge Graph Refinement: Link Detection, Link Invalidation, Key Discovery and Data Enrichment

Starts on
Advisor :

Funding :
Affiliation : vide
Laboratory :

Defended on 20/06/2019, committee :

Research activities :

Abstract :
This habilitation thesis outlines some methods and tools resulting from my research activities during the last ten years as well as my scientific projects for a near future. These methods and tools have been developed for knowledge graph refinement in the context of Web of data. We are experiencing an unprecedented production of resources published as Linked Open Data (LOD, for short). This has led to the creation of knowledge graphs (KGs) containing billions of RDF (Resource Description Framework) triples, such as DBpedia, YAGO and Wikidata on the academic side, and the Google Knowledge Graph or eBay Knowledge Graph or Facebook Graph on the commercial side. However, building knowledge graphs while ensuring their completeness and correctness, is a challenging endeavour. For this challenging problem, my research contributions have focused on several issues. First, identity link invalidation problem for which we developed two main approaches relying on either the semantics of ontology axioms to detect inconsistency in the KGs or on the network structure of identity links to assign an error degree for every identity link in the LOD. Second, in the settings of scientific KGs, we defined a generic approach for detecting contextual identity links representing a weak identity relation between entities that is valid in an explicit context expressed as a sub-part of the ontology. This approach is a contribution to the overcoming problem of the strict semantics of owl:sameAs}predicate, that is not required in all application domains. Third, we proposed a data fusion approach that is able to aggregate data coming from different sources and to compute a unique representation for a set of given linked entities. Furthermore, to deal with missing value prediction, we developed an approach that relies on data linking and case-based reasoning to predict missing values. Finally, to enrich the conceptual level of KGs with new key axioms, that are particularly important for detecting identity links, we defined three efficient methods: KD2R, for discovering exact keys, SAKey for discovering n-almost keys and VICKEY for discovering conditional keys. These three methods are based on computing first the maximal non-keys and then deriving the minimal keys, and apply several strategies to prune the search space.

Overall these approaches have been developed in collaboration with several fellow researchers, in the setting of several PhD theses, post-docs and master theses; some of them in the context of ANR, CNRS and industrial research projects, involving different organisms and companies, such as, INRA, INA, ABES, IGN and Thalès.

Ph.D. dissertations & Faculty habilitations
CAUSAL LEARNING FOR DIAGNOSTIC SUPPORT


CAUSAL UNCERTAINTY QUANTIFICATION UNDER PARTIAL KNOWLEDGE AND LOW DATA REGIMES


MICRO VISUALIZATIONS: DESIGN AND ANALYSIS OF VISUALIZATIONS FOR SMALL DISPLAY SPACES
The topic of this habilitation is the study of very small data visualizations, micro visualizations, in display contexts that can only dedicate minimal rendering space for data representations. For several years, together with my collaborators, I have been studying human perception, interaction, and analysis with micro visualizations in multiple contexts. In this document I bring together three of my research streams related to micro visualizations: data glyphs, where my joint research focused on studying the perception of small-multiple micro visualizations, word-scale visualizations, where my joint research focused on small visualizations embedded in text-documents, and small mobile data visualizations for smartwatches or fitness trackers. I consider these types of small visualizations together under the umbrella term ``micro visualizations.'' Micro visualizations are useful in multiple visualization contexts and I have been working towards a better understanding of the complexities involved in designing and using micro visualizations. Here, I define the term micro visualization, summarize my own and other past research and design guidelines and outline several design spaces for different types of micro visualizations based on some of the work I was involved in since my PhD.