An Introduction to Ontology Engineering

1

First order logic and automated reasoning in a nutshell

2 First order logic and automated reasoning in a nutshell

🧭 Overview

🧠 One-sentence thesis

This excerpt is a table of contents and preface material that does not contain substantive technical content about first-order logic or automated reasoning.

📌 Key points (3–5)

  • The excerpt consists entirely of a table of contents listing chapters and sections on ontology engineering topics (Description Logics, OWL 2, ontology development methods, etc.).
  • The preface sections describe the textbook's evolution from version 1 to version 1.5, including additions of exercises, tutorials, and a new chapter on modularisation.
  • The intended audience is advanced undergraduate and early postgraduate computer science students, with assumed background in UML and databases but not necessarily deep logic or complexity theory.
  • The material originated from blog posts (2009) and evolved through multiple course iterations in Italy, Cuba, and South Africa before becoming this textbook.
  • No actual technical content about first-order logic, automated reasoning, Description Logics semantics, or reasoning services is present in the excerpt—only references to where such content appears in the full book.

📚 What the excerpt contains

📑 Table of contents structure

The excerpt shows a detailed table of contents covering:

  • Part I topics: Description Logics (constructors, semantics, important DLs like ALC and SROIQ), OWL 2 (features, profiles, syntaxes, complexity), reasoning services and tableau techniques.
  • Part II topics: Ontology development methodologies (macro and micro-level), quality improvement methods (logic-based, philosophy-based like OntoClean, heuristics like OOPS!), top-down development (foundational ontologies, mereology), bottom-up development (from databases, spreadsheets, thesauri, text processing, design patterns).
  • Part III topics: Advanced topics including Ontology-Based Data Access (OBDA), multilingual ontologies, verbalisation, uncertainty/vagueness (fuzzy and rough ontologies), temporal ontologies, and modularisation.

📖 Appendices and supporting material

  • Tutorials on OntoClean in OWL and OBDA systems.
  • Practical and project assignments for developing domain ontologies.
  • Technical appendices: OWL 2 Profiles features, complexity recap, answers to selected exercises.
  • Bibliography and author information.

🔄 Textbook evolution and version changes

📈 Version 1.5 additions

The preface explains that version 1.5 adds approximately 10% more content compared to version 1:

  • About 10% more exercises in Chapters 2–9.
  • A new preliminary Chapter 11 on ontology modularisation.
  • A new section (§9.1.3) on challenges for multilingualism.
  • Two new tutorials (OntoClean and OBDA) in the appendix.
  • More answers to selected exercises.
  • Corrections of typos and copyright-related figure adjustments.
  • Total increase: 36 pages.

🌐 Accompanying materials

The textbook website (https://people.cs.uct.ac.za/~mkeet/OEbook/) provides:

  • Slides in multiple formats (pdf, LaTeX source, ppt).
  • Ontologies for tutorials and exercises.
  • Additional software for exercises.
  • Accessibility instructions for visually impaired users, particularly for screen reader training to handle Description Logics symbols.

🎓 Intended audience and pedagogical approach

👥 Target audience

The intended audience for this textbook are people at the level of advanced undergraduate and early postgraduate studies in computer science.

Assumed background:

  • Familiarity with UML class diagrams and databases.
  • Not assumed: solid background in logic, reasoning, and computational complexity (a gentle introduction is provided).

Other audiences:

  • Philosophers and domain experts may find sections of interest but may prefer to work through chapters in a different order.

📚 Course design philosophy

  • Designed for a semester-long course: each chapter can be covered in approximately one week.
  • Core material is contained within each chapter, not requiring extensive external reading.
  • For undergraduate level: in-text citations may be ignored.
  • For postgraduate level: reading 1–3 scientific papers per chapter is recommended for more detail.
  • In-text references help students begin reading scientific papers when working on assignments.

🔍 Scope and balance

The author acknowledges that ontology engineering is still an active research field, so some basics may change. The textbook aims to:

  • Strike a balance between topics and depth in the first two blocks.
  • Allow flexibility in course programmes so instructors can emphasise certain topics.
  • Provide an introduction that is comprehensive enough for a semester but not overwhelming.

Don't confuse: This is an introductory textbook, not a handbook, conference proceedings, or a book promoting one specific ontology or methodology.

📜 Historical development and licensing

🕰️ Content origins

The textbook evolved through multiple iterations:

  1. 2009: Started as blog posts for the European Masters in Computational Logic's Semantic Web Technologies course at Free University of Bozen-Bolzano, Italy (2009/2010).
    • Original goal: generate and facilitate online discussions (which "failed miserably," though posts were visited often).
  2. 2010: Reworked into short syllabi for courses at University of Havana, University of Computer Science (Cuba), and Masters Ontology Winter School 2010 (South Africa).
  3. 2010–2015: Further developed into lecture notes for COMP718/720 at University of KwaZulu-Natal and Ontology Engineering honours course at University of Cape Town, South Africa.
  4. Version 1: All chapters updated, new material added, course-specific data removed.
  5. Version 1.5: Current version with additions described above.

⚖️ Licensing

  • The 2015 lecture notes were released under a CC BY-NC-SA (Creative Commons Attribution-NonCommercial-ShareAlike) licence.
  • This textbook retains the same CC BY-NC-SA licence.

🙏 Acknowledgments

Contributors to version 1.5 additions include:

  • Former and current students: Zubeida Khan, Zola Mahlaza, Frances Gillis-Webber, Michael Harrison, Toky Raboanary, Joan Byamugisha.
  • Grant from the "Digital Open textbooks for Development" (DOT4D) Project.
  • Constructive feedback on ontologies from Ludger Jansen.

📍 Publication details

  • Version 1 was published in print by the non-profit publisher College Publications and is still actual and relevant.
  • Version 1 pdf remains available as OEbookV1.pdf.
  • Version 1.5 completed in Cape Town, South Africa, February 2020.

⚠️ Important note

This excerpt does not contain the actual technical content of Chapter 2 ("First order logic and automated reasoning in a nutshell"). It only lists where such content appears in the full textbook structure. To learn about first-order logic, automated reasoning, Description Logics, reasoning services, or tableau techniques, you would need to consult the actual chapter content, not this table of contents and preface material.

2

Description Logics

3 Description Logics

🧭 Overview

🧠 One-sentence thesis

Description Logics form a family of decidable fragments of first-order logic that underpin the Web Ontology Language OWL and enable automated reasoning over ontologies.

📌 Key points (3–5)

  • What Description Logics are: a family of languages that are decidable fragments of FOL (first-order logic), forming the logical foundation for most versions of OWL.
  • Why they exist: they provide a balance between expressiveness and computational tractability for ontology representation and reasoning.
  • Connection to OWL: Description Logics lie at the basis of most 'species' of the World Wide Web consortium's standardised Web Ontology Language OWL.
  • Reasoning support: tableau reasoning methods are adapted from FOL to work specifically in the Description Logic setting.
  • Common confusion: Description Logics are not the same as full first-order logic—they are restricted fragments designed to be decidable, meaning reasoning tasks are guaranteed to terminate.

🎯 Purpose and positioning

🎯 Why Description Logics matter

Description Logics serve as the theoretical foundation for practical ontology languages used in real-world applications. The excerpt positions them as:

  • A bridge between formal logic theory (FOL) and practical ontology engineering (OWL)
  • A solution to the computability problems inherent in full first-order logic
  • The basis for automated reasoning services in ontology development

🔗 Relationship to other topics

The excerpt places Description Logics in a specific learning sequence:

  • Builds on: First-order predicate logic and automated reasoning (Chapter 2)
  • Leads to: The Web Ontology Language OWL and automated reasoning (Chapter 4)
  • Enables: Practical ontology development and reasoning services

🧩 Core characteristics

🧩 Decidable fragments of FOL

Description Logics are a family of languages that are decidable fragments of FOL.

  • What "decidable" means: reasoning tasks are guaranteed to terminate with an answer (unlike full FOL where some queries may run forever)
  • What "fragments" means: Description Logics use only a subset of FOL's expressive power, restricting what can be stated in exchange for computational guarantees
  • Why this trade-off matters: it makes automated reasoning practical for real-world ontology applications

Don't confuse: Description Logics are not "simplified logic"—they are carefully designed subsets that balance expressiveness with computational feasibility.

👨‍👩‍👧‍👦 A family of languages

The excerpt describes Description Logics as "a family of languages," not a single language:

  • Different Description Logic languages offer different trade-offs between expressiveness and computational complexity
  • This variety allows ontology developers to choose the right logic for their specific needs
  • The family relationship means they share common foundations and reasoning approaches

🔧 Reasoning adaptation

🔧 Tableau reasoning in Description Logics

The excerpt mentions that "tableau reasoning returns and is adapted to the DL setting":

  • Foundation: Tableau reasoning methods introduced for FOL (Chapter 2) are reused
  • Adaptation: These methods are modified to work specifically with Description Logic languages
  • Purpose: To provide automated reasoning services over ontologies written in Description Logics

Example: The same tableau approach used for general logical reasoning is tailored to handle the specific constructs and restrictions present in Description Logic languages.

⚙️ Enabling automated reasoning

Description Logics are designed to support automated reasoning, which is essential for:

  • Checking consistency of ontologies
  • Classifying concepts and relationships
  • Answering queries over knowledge bases
  • Supporting ontology development and validation

🌐 Connection to OWL

🌐 Foundation for OWL

The excerpt states that Description Logics "lie at the basis of most 'species' of the World Wide Web consortium's standardised Web Ontology Language OWL":

  • OWL varieties: Different versions or "species" of OWL are based on different Description Logics
  • Standardization: OWL is a W3C (World Wide Web Consortium) standard, giving Description Logics practical importance
  • Implementation path: Understanding Description Logics is necessary for working effectively with OWL
AspectDescription LogicsOWL
NatureFormal logical languages (theory)Standardised ontology language (practice)
PurposeProvide decidable reasoningEnable web-based knowledge representation
RelationshipTheoretical foundationPractical implementation

Don't confuse: Description Logics are the underlying formal languages; OWL is the standardised syntax and semantics built on top of them for practical use.

📚 Learning context

📚 Prerequisites and follow-up

The excerpt indicates Chapter 3 (Description Logics) requires understanding of:

  • First-order predicate logic basics
  • Model-theoretic semantics
  • Principles of reasoning over logical theories
  • Tableau reasoning fundamentals

After studying Description Logics, students should be prepared to:

  • Understand OWL language features and their logical basis
  • Appreciate trade-offs between expressiveness and computational complexity
  • Use automated reasoning services effectively

🎓 Pedagogical approach

The excerpt describes the chapter as providing "a gentle introduction to the basics of Description Logics," suggesting:

  • The material is made accessible to students new to the topic
  • Focus is on foundational concepts rather than advanced technical details
  • The goal is to prepare students for practical ontology engineering work
3

The Web Ontology Language OWL

4 The Web Ontology Language OWL

🧭 Overview

🧠 One-sentence thesis

An ontology is a logic-based, machine-processable artefact that represents domain knowledge in multiple formats for different users, with OWL being the most widely-used serialization standard.

📌 Key points (3–5)

  • What an ontology looks like: the same knowledge can be rendered in multiple formats—first-order logic, Description Logic, natural language, graphical diagrams, and machine-processable serializations like OWL.
  • Machine-processable core: despite user-friendly renderings, an ontology must have a format that faithfully adheres to logic and can be processed by computers; OWL's RDF/XML format is the required standard.
  • Ontology vs similar artefacts: ontologies are application-independent and domain-focused (reusable across applications), unlike conceptual data models (application-specific) and databases (which lack explicit knowledge representation and automated reasoning).
  • Common confusion: just because something is represented in OWL does not make it an ontology (e.g., a thesaurus or ER diagram translated to OWL), and conversely, ontologies can exist in other logic languages besides OWL.
  • Definition debate: there is no unanimously agreed-upon definition; definitions range from vague ("specification of a conceptualization") to restrictive ("equivalent to a Description Logic knowledge base"), with ongoing philosophical and technical debates.

🎨 Multiple representations of the same knowledge

🧮 Logic-based formats

The excerpt uses the African Wildlife Ontology (AWO) example to show different formal representations of the same fact: "all lions eat herbivores, and they also eat some impalas."

First-order predicate logic:

  • For all x, if x is a Lion, then for all y, if x eats y then y is a Herbivore, AND there exists some z such that x eats z and z is an Impala.
  • Mathematicians may prefer this notation.

Description Logic:

  • Lion is a subclass of "eats only Herbivore" AND "eats some Impala."
  • Uses different symbols but represents the same logical constraint.

📝 User-friendly renderings

Natural language rendering: "Each lion eats only herbivore and eats some Impala"

  • The universal quantifier (for all) is verbalized as "Each" and "only."
  • The existential quantifier (there exists) becomes "some."
  • The conjunction (AND) becomes "and."

Why this matters: domain experts typically prefer pseudo-natural language over formal logic notation.

🖼️ Graphical representations

The excerpt mentions two graphical options:

  • OntoGraf plugin rendering in the Protégé ontology development environment.
  • UML Class Diagram style notation.

Don't confuse: graphical renderings are "more or less precise" in showing knowledge—they are approximations for human understanding, not the authoritative machine-processable format.

💾 Machine-processable serialization

🌐 Web Ontology Language (OWL)

Serialization: a representation of the ontology into a text file that is easily computer-processable.

  • Most widely-used serialization: OWL.
  • Required format: RDF/XML.
  • Core principle: an ontology is an engineering artefact that must have a machine-processable format that faithfully adheres to the logic.

🔧 OWL RDF/XML syntax mapping

The excerpt shows how logical operators map to OWL tags:

Logic notationOWL RDF/XML tagMeaning
∀ (for all)owl:allValuesFromUniversal quantifier
∃ (exists)owl:someValuesFromExistential quantifier
Subclassing (→ or ⊑)rdfs:subClassOfClass hierarchy

Example from the lion class:

  • <owl:allValuesFrom rdf:resource="&AWO;herbivore"/> means lions eat only herbivores.
  • <owl:someValuesFrom rdf:resource="&AWO.owl;Impala"/> means lions eat at least some impala.
  • <rdfs:subClassOf rdf:resource="&AWO;animal"/> means lions are a subclass of animals.

🛠️ Ontology development environments (ODEs)

Key point: You typically will not have to write an ontology in RDF/XML format manually.

  • ODEs like Protégé render the ontology graphically, textually, or with a logic view.
  • Computer scientists may design tools that process or modify ontology files, but tool development toolkits and APIs cover many tasks.
  • The excerpt includes a screenshot showing the lion example in Protégé's interface.

🔍 Distinguishing ontologies from similar artefacts

📊 Ontologies vs conceptual data models

Key distinction:

AspectConceptual data models (EER, UML)Ontologies
ScopeApplication-specificApplication-independent
FocusImplementation-independent representation of data for a prospective applicationDomain representation reusable by multiple applications
FormalizationMore about drawing boxes and lines informallyNormally formalized in a logic language

Implication: This distinction leads to further differences in content, usage, and purpose (to be covered in later chapters).

🗄️ Ontologies vs relational databases

Ontologies (as knowledge bases) differ from RDBMSs in several ways:

  • Explicit knowledge representation: ontologies include rules explicitly.
  • Automated reasoning: goes beyond plain queries to infer implicit knowledge and detect inconsistencies.
  • Open World Assumption: ontologies usually operate under OWA, whereas relational databases use the Closed World Assumption (CWA).

Don't confuse: The excerpt notes these differences will be revisited in a later chapter.

⚠️ Common confusion: OWL ≠ ontology

The excerpt emphasizes a critical distinction:

  • Just because something is in OWL does not make it an ontology: translating a thesaurus or ER diagram into OWL creates a "lightweight ontology" or "application ontology" by virtue of the format, but this blurs important distinctions.
  • Ontologies can exist in other formats: OBO format, Common Logic, etc.—representation in OWL is not a requirement.

Why this matters: The blurring of distinctions between different artefacts is problematic for various reasons (discussed in later chapters).

📖 Defining "ontology": the evolution and debate

🎯 Gruber's definition (most quoted but problematic)

Definition 1.1: An ontology is a specification of a conceptualization.

Problems:

  • Uses two nebulous terms ("conceptualization" and "specification") to describe a third.
  • Does not clarify what either term means exactly.
  • You may see this quote especially in older scientific literature, but it has been superseded.

🔄 Studer, Benjamins, Fensel refinement

Definition 1.2: An ontology is a formal, explicit specification of a shared conceptualization.

Improvements: Adds "formal," "explicit," and "shared."

Remaining questions:

  • What is a "conceptualization"?
  • What is a "formal, explicit specification"?
  • Why and how "shared"? (Is agreement between two people enough, or does it need group support?)

🏛️ Guarino's comprehensive definition

Definition 1.3: An ontology is a logical theory accounting for the intended meaning of a formal vocabulary, i.e., its ontological commitment to a particular conceptualization of the world. The intended models of a logical language using such a vocabulary are constrained by its ontological commitment. An ontology indirectly reflects this commitment (and the underlying conceptualization) by approximating these intended models.

Characteristics:

  • Landmark paper from 1998, revisited in 2009.
  • More precise and comprehensive than earlier definitions.
  • Still not free of debate and "a bit of a mouthful."

🌐 W3C OWL developers' definition

Definition 1.4: An ontology being equivalent to a Description Logic knowledge base.

Issues with this definition:

  1. Unduly restrictive: It is possible to have an ontology in another logic language (OBO format, Common Logic, etc.).
  2. Creates false positives: Formalizing a thesaurus or translating an EER/UML model into OWL would make them "ontologies" by this definition, despite important differences.

Current tendency: In the context of the Semantic Web (the most prominent application area), the tendency is toward ontologies being equivalent to a logical theory, particularly a Description Logic knowledge base—but this remains contentious.

🎓 Philosophical foundations

📚 Why philosophy matters

The term "ontology" is taken from philosophy, where it has a millennia-old history.

Key distinction:

  • Ontology (capital 'O'): the philosophical notion; does not have a plural.
  • ontology (lowercase 'o'): the computing artefact; can be plural (ontologies).

Why philosophers' scrutiny matters: Insights from philosophy are used when developing good ontologies, and definitions must be philosophically sound.

Note: The excerpt mentions this topic but indicates the discussion is orthogonal to the definition game and continues beyond what is shown.

4

5 Methods and Methodologies

5 Methods and Methodologies

🧭 Overview

🧠 One-sentence thesis

The excerpt argues that an ontology is not simply any knowledge representation in OWL or Description Logics, but must be distinguished from thesauri, conceptual data models, and other artefacts based on what it represents and how it is structured, not merely the formalism used.

📌 Key points (3–5)

  • The formalism trap: just because something is represented in OWL does not make it an ontology; conversely, ontologies can exist in languages other than OWL.
  • Common confusion: thesauri translated into OWL and ER/UML diagrams formalised in OWL are often mislabeled as ontologies, but they differ in purpose and structure.
  • Philosophical foundations: the term "ontology" comes from philosophy (Ontology with capital O), and philosophical debates about representing reality vs. conceptualisation affect ontology content.
  • Quality dimensions: ontologies can be good, less good, bad, or worse based on precision (how much unintended content) and coverage (how much intended content is captured).
  • Primary use case: ontologies were first proposed to solve data integration problems by providing a common vocabulary at a higher abstraction level than conceptual data models.

🔤 The formalism vs. content distinction

🔤 Why OWL ≠ ontology

The excerpt warns against equating representation language with ontology status:

  • Formalising a thesaurus in OWL produces a "lightweight ontology" (e.g., NCI thesaurus as cancer 'ontology').
  • Translating an EER or UML conceptual data model into OWL creates an 'application ontology' or 'operational ontology' by virtue of the formalism.
  • The problem: this blurs important distinctions between different types of artefacts.

The blurring of distinctions between different artefacts is problematic for various reasons.

🌐 Current Semantic Web tendency

  • In the Semantic Web context, there is a tendency to treat ontologies as equivalent to logical theories, particularly Description Logics knowledge bases.
  • Ontologists "frown" when someone calls 'a thesaurus in OWL' or 'an ER diagram in OWL' an ontology.
  • Key principle: representation in OWL does not make something an ontology; representation in another language does not prevent something from being an ontology.

Don't confuse: the language used (OWL, Common Logic, OBO format, etc.) with the nature of the artefact itself.

🏛️ Philosophical foundations

🏛️ Why philosophy matters

The term 'ontology' in computer science is borrowed from philosophy:

  • Philosophy has a millennia-old history with the term.
  • Philosophical insights are used when developing good ontologies.
  • Notation: "Ontology" with capital O refers to the philosophical notion and has no plural.

🌍 Reality vs. conceptualisation debate

Two contrasting views on what ontologies represent:

ViewWhat it representsExample domains
Empiricist DoctrineActually existing entities in the real world; mind-independentJacaranda tree, HIV infection—these exist regardless of human thought
Conceptualist viewConcepts as psychological or abstract formal entities; mind-dependentPhlogiston, Unicorn—exist only in outdated theories or stories

🧬 Practical implications

  • Medical/scientific domains: representing reality matters—wrong representations lead to wrong inferences and harmful treatments (e.g., malaria infections).
  • Other domains: the distinction may not matter as much; some domains allow representing things that don't exist in reality.
  • Historical note: these debates were commonplace 10-15 years ago but have quieted down recently.

🔮 The Universalist Doctrine

Universals: "a class of mind independent entities, usually contrasted with individuals, postulated to ground and explain relations of qualitative identity and resemblance among individuals."

  • General scientific terms (like HIV infection) are understood as referring directly to universals.
  • Individuals are similar by virtue of sharing universals.
  • Philosophical disagreement: whether universals exist and what kind of things they are leads into metaphysics.
  • Practical stance: metaphysics may not be crucial for building information system ontologies (e.g., developing a virus ontology while adhering to empiricist doctrine).

🎭 Other philosophical contributions

Philosophy helps with modeling decisions:

  • Distinguishing what you are vs. the role(s) you play.
  • Distinguishing participating in an event vs. being part of an event.
  • Clarifying assumptions: is a vase and the clay it's made of the same thing or two different things?

✅ Ontology quality dimensions

✅ Syntax and logical errors

Ontology errors parallel software code errors:

Software codeOntology equivalent
Does not compileSyntax violation
Bugs (runtime errors)Conflicting constraints (e.g., a class cannot have instances)
Semantic errors (wrong logic)Logically correct but unintended (e.g., Student as subclass of Table)

🍏 Ontological constraints example

The green apples case illustrates structural choices:

  • Option 1: Apples that have the attribute green.
  • Option 2: Green objects that have an apple-shape.
  • Logic doesn't distinguish these, but ontologically they differ.
  • Why Option 1 is better: Apple carries an identity condition (it's a 'sortal'—you can identify the object), whereas Green does not (it's a value of the attribute hasColor).

Don't confuse: logical equivalence with ontological appropriateness.

📊 Precision and coverage framework

The excerpt presents four quality levels based on two dimensions:

Quality levelPrecisionCoverageDescription
Good ontologyHighMaximumRepresents what's intended with minimal extra content
Less good ontologyLowMaximumContains all intended content but also much unintended content
Bad ontologyMaximumLimitedVery precise but missing content it should have
Worse ontologyLowLimitedContains unintended content AND missing intended content
  • Precision: how close the represented content is to the intention (avoiding unintended content).
  • Coverage: how much of the intended content is captured.
  • Factors affecting quality: both the representation language and good modeling practices.

Example: If you want to represent African Wildlife, a good ontology captures all relevant wildlife concepts without including irrelevant concepts; a bad ontology might miss key species or include unrelated concepts.

🔗 Primary application: data integration

🔗 The integration problem

Ontologies were first proposed to solve data integration issues:

  • They provide a common vocabulary for applications.
  • They operate at one level of abstraction higher than conceptual data models (EER diagrams, UML Class Diagrams).
  • Over the years, ontologies have been used for other purposes beyond integration.

🗂️ Schema-based data integration

The excerpt describes a three-layer architecture:

  1. Implementation layer (bottom): different implementations—relational databases, object-oriented software (C++ application).
  2. Conceptual model layer (center): conceptual data models tailored to each application (EER, ORM, UML diagrams with different attribute names and datatypes).
  3. Ontology layer (top): provides the shared common vocabulary and constraints that hold across applications.

Example scenario from Figure 1.5:

  • Different databases store flower information with different schemas (Bloem/Lengte/Kleur vs. Flower/Height/Colour vs. color:String/height:inch).
  • The ontology layer (showing Flower, Colour, ColourRegion, Pantone, Height concepts) provides the common vocabulary to integrate these disparate representations.

🔄 Data-based integration

The excerpt mentions a second integration scenario (Figure 1.6) involving data-based integration, though details are not provided in this excerpt.

Key distinction: ontologies serve as the interoperability layer, not just another database schema or conceptual model.

5

Top-Down Ontology Development

6 Top-Down Ontology Development

🧭 Overview

🧠 One-sentence thesis

Ontologies provide a shared vocabulary at a higher abstraction level than conceptual data models, enabling integration of disparate information systems by mapping their different representations to common agreed-upon concepts.

📌 Key points (3–5)

  • Primary purpose: Ontologies solve data integration problems by providing common vocabulary at one level of abstraction higher than conceptual data models (EER, UML, etc.).
  • Two integration approaches: schema-based integration (mapping conceptual models to ontology) vs. data-level integration (annotating individual database tuples with ontology terms).
  • What ontologies capture: the underlying agreed-upon meaning that different implementations represent differently (e.g., "Flower" in EER, "Bloem" in ORM, "Flower" in UML all map to the same ontology concept).
  • Common confusion: Ontologies are not just another database schema—they sit above conceptual models and provide mappings between different representations, including generic relations like dependency/inherence.
  • Quality spectrum: Good ontologies balance what you want to represent (subject domain) with what the language can represent; bad ontologies have poor overlap between these circles.

🎯 What makes a good ontology

🎯 The representation challenge

Good ontologies balance the subject domain (what you want to represent) with what you can represent using the ontology language.

  • The excerpt uses a visual metaphor: pink circle = subject domain (e.g., African Wildlife), green circle = what's in the ontology (e.g., AWO).
  • Good: large overlap between what you want to represent and what you do represent.
  • Less good: partial overlap, some domain concepts missing or some ontology content outside the domain.
  • Bad/Worse: minimal or no overlap between intended domain and actual representation.
  • This depends on both the language capabilities and good modelling practices (addressed in Block I and Block II).

🔗 Schema-based data integration

🔗 The legacy systems problem

Real-world scenario: multiple databases on the same topic need to be combined.

Common triggers:

  • University mergers (two student databases → one unified database)
  • Corporate mergers and acquisitions
  • E-Government initiatives (integrating citizen services)
  • Healthcare electronic health records (laboratory + doctor databases)

The challenge: Each system has its own physical schema, relational model, conceptual model, and possibly different modelling languages (EER, ORM, UML).

🌸 How ontology-driven mapping works

The excerpt provides a flower shop example with three systems:

SystemConceptual ModelFlower representationColour representation
Database 1EER (bubble notation)Entity: FlowerAttribute with Pantone values
Database 2ORMEntity: BloemUnary predicate Kleur + colour region (real datatype)
ApplicationUML Class DiagramClass: FlowerAttribute: Color (String, Pantone)

What the ontology does:

  • Maps all "Flower"/"Bloem" representations → single ontology concept Flower
  • Maps all "Colour"/"Kleur"/"Color" representations → single ontology concept Colour
  • Handles different data types: ColourRegion (physical region in spectrum) vs. PantoneSystem (abstract encoding) both map to ontology regions

🔗 Generic relations in ontologies

Ontologies provide more than just concept names—they capture fundamental relationships:

qt relationship: between enduring objects (like flowers) and their qualities (like colour)

ql relationship: from qualities to value regions (qualia)

Why this matters: Conceptual models can name relationships arbitrarily (e.g., "heeftKleur" = "hasColour" in Dutch), but ontologies specify the type of relation.

Example: The colour of a specific flower depends on that flower's existence—if the flower doesn't exist, that colour instance doesn't exist either. This is dependency or inherence, which the ontology makes explicit.

Don't confuse: Ontology relationships are not just renamed database foreign keys—they capture ontological dependencies that hold across all implementations.

🛠️ What ontologies enable

The excerpt emphasizes that establishing ontology mappings is "the crucial step" in data integration:

  • After mappings are defined, the rest becomes "largely an engineering exercise"
  • The ontology is linked to a foundational ontology (DOLCE in the example), where Flower is a subclass of Non-Agentive Physical Object
  • This provides deeper semantic grounding beyond what any single conceptual model captures

📊 Data-level integration

📊 The molecular biology approach

Context: Domain experts in molecular biology needed urgent, practical data integration solutions.

Their innovation: Instead of schema-level mapping, annotate individual database records (tuples, even individual cells) with ontology terms.

Key difference from schema-based:

  • Schema-based: maps conceptual models within an organization's RDBMSs
  • Data-level: enables interoperability at the instance level across multiple databases over the Internet

🧬 Structured controlled vocabularies

The excerpt introduces a lighter-weight approach:

Structured controlled vocabularies: lightweight ontologies used for data-level annotation

How it works (illustrated with KEGG and InterPro databases):

  • Multiple databases exist independently (KEGG, InterPro)
  • Each database tuple is annotated with terms from a shared ontology (Gene Ontology in the example)
  • Example: KEGG entry K01834 is linked to Gene Ontology term GO:0004619
  • Web-based interfaces display these annotations, enabling cross-database queries

Why lightweight: Domain experts prioritized quick deployment over formal ontological rigor—the goal was practical interoperability, not complete semantic integration.

Don't confuse: Data-level integration doesn't replace the databases or their schemas; it adds a layer of shared terminology that links equivalent or related entries across systems.

6

7 Bottom-Up Ontology Development

7 Bottom-Up Ontology Development

🧭 Overview

🧠 One-sentence thesis

Ontologies enable data integration by providing shared vocabularies and semantic links that allow distinct databases to interoperate at both the schema and instance levels.

📌 Key points (3–5)

  • Schema-level integration: ontologies provide generic relations and concepts that map different database schemas to a common conceptual model.
  • Data-level integration: structured controlled vocabularies allow tuple-by-tuple annotation across multiple databases, creating entity-level linking through shared identifiers.
  • Common confusion: schema-level vs. data-level integration—schema-level maps database structures to shared concepts; data-level annotates individual records with ontology terms.
  • Prevention approach: generating new conceptual models from ontologies ensures interoperability upfront by reusing shared vocabulary.
  • Real-world impact: thousands of databases use ontologies for integration, and some journals require ontology terms in publications for discoverability.

🔗 Schema-level data integration

🔗 What schema-level integration means

Schema-level integration: mapping different database schemas to a common ontology that provides generic relations and concepts.

  • The ontology acts as a bridge between different database structures.
  • It provides the shared conceptual vocabulary that both databases can reference.
  • Example: the excerpt mentions a qt relationship between enduring objects (like flowers) and their qualities (like color), and a ql relation to value regions.

🏗️ How it works

  • The ontology defines generic relationships that can be reused across databases.
  • In the flower example, the ontology captures that a specific color instance depends on the particular flower—if the flower doesn't exist, that color instance doesn't exist either.
  • The excerpt notes this approach links to foundational ontologies like DOLCE, where concepts are classified (e.g., as subclasses of Non-Agentive Physical Object).
  • Key point: establishing these links is "the crucial step" in data integration; the rest becomes an engineering exercise.

🧬 Data-level integration with controlled vocabularies

🧬 What data-level integration means

Data-level integration: interoperability at the instance-level, tuple-by-tuple or cell-by-cell, using lightweight ontologies or structured controlled vocabularies.

  • This approach emerged from molecular biology domain experts who needed a quick, practical solution.
  • Instead of integrating database schemas, it annotates individual data records with shared ontology terms.
  • Works across multiple databases over the Internet, not just within a single organization.

🔬 The Gene Ontology example

The excerpt provides a detailed illustration using KEGG and InterPro databases:

DatabaseRecord IDOntology annotationPhysical location
KEGGK01834 (gpmA)GO:0004619Japan
InterProIPR005995GO:0004619USA

How the linking works:

  • Each database has a tuple with its own identifier scheme and attributes.
  • Both tuples are annotated with the same Gene Ontology term: GO:0004619 (Phosphoglycerate Mutase Activity).
  • By asserting they relate to the same GO term, they create entity-level linking.
  • The GO is a structured controlled vocabulary containing over 40,000 concepts agreed upon by domain experts.

🌐 Network effect

  • Ontology fields are hyperlinked in database web interfaces.
  • Users can browse from database to database through ontology terms.
  • The result appears as "one vast network of knowledge" rather than separate databases.
  • Thousands of databases are connected this way, with the ontology serving as the essential ingredient.

📄 Adoption in scientific publishing

  • Some scientific journals require authors to use ontology terms when writing about discoveries.
  • This makes papers about the same entity easier to find.
  • Don't confuse: the ontology doesn't replace the databases; it provides the shared semantic layer that makes them interoperable.

🛡️ Preventing interoperability problems

🛡️ Proactive approach

Prevention approach: generating conceptual models for new applications based on knowledge represented in the ontology.

  • Instead of fixing integration problems after they occur, this approach prevents them upfront.
  • Similar to Enterprise Models in information system design.
  • Interoperability is guaranteed from the start because elements in new conceptual data models are already shared through the ontology.

🎓 How reuse works

The excerpt illustrates with a university example:

Shared elements across systems:

  • Concepts: Student, Course
  • Relation: enrols (between Student and Course)

System-specific variations:

  • University 1: students must register for 1–6 courses to count as a student
  • University 2: students can be registered for zero courses and still count as a student

Key insight:

  • The ontology provides the shared common vocabulary (Student, Course, enrols).
  • Each system can add its own constraints, data types, and attributes.
  • Example: one system might define att1: String, att2: Integer while another uses att1: String, att2: Real.
  • The semantics of the core concepts remain shared, ensuring interoperability.

🔄 Generation process

  • New conceptual data models (e.g., UML Class Diagrams) are generated from the ontology.
  • The ontology defines the basic structure (classes A, B, and relation R).
  • Individual applications refine constraints (e.g., cardinality: 1..* vs *) or add attributes.
  • All applications agree on the fundamental meaning of the shared elements.
7

Ontology-Based Data Access

8 Ontology-Based Data Access

🧭 Overview

🧠 One-sentence thesis

Ontology-Based Data Access (OBDA) enables domain experts to query integrated data sources using familiar ontology terms without needing to know SQL or understand how the data is physically stored.

📌 Key points (3–5)

  • What OBDA solves: allows non-technical domain experts to access federated data sources through an ontology layer, eliminating the need for SQL knowledge or system administrators.
  • How it works: mappings link ontology elements (classes, relations, attributes) to underlying data sources, translating user queries automatically.
  • Architecture layers: data sources at the bottom, federation/mapping layer in the middle, ontology at the top for user interaction.
  • Real-world application: demonstrated in digital humanities research (e.g., Roman food distribution systems) where historians query archaeological data using domain concepts.
  • Common confusion: OBDA is not just a query interface—it's a complete integration solution that federates multiple heterogeneous data sources while hiding technical complexity.

🏛️ The digital humanities use case

🏺 The Roman food distribution problem

Historians and anthropologists studying the Mediterranean basin 2000 years ago face a data integration challenge:

  • Food was stored in amphorae (pots) with engravings containing information about who, what, where, etc.
  • This data has been documented and stored across multiple databases and resources.
  • No single resource contains all data points needed for comprehensive research.
  • Researchers need to combine data to understand food trading systems but lack technical SQL skills.

🎯 Why traditional approaches fail

  • Humanities researchers are not familiar with writing SQL queries.
  • They would become dependent on system administrators for data access.
  • Page-long SQL queries would be needed to retrieve information across federated sources.
  • Inflexibility prevents researchers from conducting exploratory analysis independently.

🏗️ OBDA architecture and components

📊 The four-layer structure

The OBDA system (illustrated in the EPnet example) consists of:

LayerDescriptionPurpose
Data sourcesTypically relational databases and RDF triple storesPhysical storage of actual data
Federation engineOperates at physical/relational schema layerProvides unified interface to heterogeneous sources
MappingsLinking elements from ontology to queries over data sourcesTranslates between conceptual and physical levels
OntologyLogic-based conceptual data modelUser-facing vocabulary and query interface

🔗 How mappings work

Mappings: assertions that link elements in the ontology to queries over the data source(s).

  • Each ontology element (class, relation, attribute) is connected to corresponding database queries.
  • The OBDA system automatically translates ontology-based queries into appropriate SQL.
  • Users never see or write the underlying database queries.
  • Example: the ontology term "Inscription" maps to specific tables/columns in the federated databases.

🔍 Querying with OBDA

💬 Query example in practice

The excerpt provides a concrete example query:

  • User's conceptual query: "Retrieve inscriptions on amphorae found in the city of 'Mainz' containing the text 'PNN'."
  • What the user uses: only ontology terms like Inscription, Amphora, City, found in, inscribed on, plus value constraints (like "PNN").
  • What the system does: takes care of translating this into appropriate database queries and returning results.
  • What the user doesn't need: knowledge of table names, join conditions, SQL syntax, or which database contains what data.

🎓 Benefits for domain experts

  • Researchers query using their own domain vocabulary.
  • No dependency on technical staff for routine queries.
  • Flexibility to explore data according to research questions.
  • Integration happens transparently behind the scenes.

🔧 Technical foundation

🛠️ Maturing technology

  • OBDA technologies have been maturing in recent years.
  • The approach is becoming practical for real-world applications.
  • More details about the EPnet system are available in referenced literature [CLM+16].
  • Technical details are covered in Chapter 8 of the source material.

🌉 Bridging conceptual and physical

OBDA serves as a bridge:

  • Conceptual level: domain experts think in terms of their subject matter (inscriptions, amphorae, cities).
  • Physical level: data is stored in relational databases with technical schemas.
  • The gap: OBDA mappings and federation eliminate the need for users to understand the physical level.
  • Don't confuse: this is not just a simplified query interface—it's a complete data integration solution with semantic mediation.

📚 Context within ontology applications

🎯 OBDA as one of many ontology uses

The excerpt positions OBDA among other ontology applications:

  • e-Learning (adaptive content delivery)
  • Question answering (Watson system)
  • Semantic scientific workflows
  • Digital humanities (the OBDA focus)

🔄 Relationship to data integration

  • OBDA addresses the data access and integration problem mentioned earlier in the source.
  • It builds on the general principle that ontologies provide shared vocabulary for interoperability.
  • Unlike schema-level integration, OBDA maintains separate data sources while providing unified access.

First Order Logic and Automated Reasoning in a Nutshell

🧭 Overview

🧠 One-sentence thesis

First order logic provides the foundational language for ontology representation, and understanding its syntax, semantics, and automated reasoning principles is essential before learning how to build ontologies.

📌 Key points (3–5)

  • Why logic matters for ontologies: logic-based languages underpin formal ontology representation, so understanding logic helps with both comprehension and modeling.
  • What this chapter covers: First Order Predicate Logic (FOL) basics—syntax, semantics (formal meaning), and automated reasoning principles.
  • Two main sections: Section 2.1 refreshes FOL syntax and semantics; Section 2.2 introduces automated reasoning with tableau examples.
  • Target audience note: mathematics students may skim Section 2.1 but should still engage with Section 2.2, as automated reasoning is typically not covered in standard logic courses.
  • Important clarification: logic is not the study of truth, but rather the study of valid inference and formal representation.

📖 Chapter structure and prerequisites

🎯 Learning path guidance

The chapter acknowledges a pedagogical tension:

  • More modeling foundations might be useful before diving into representation.
  • But one also needs to understand the representation language itself.
  • The solution: provide logic foundations first to support better understanding of ontologies and ontology engineering.

📚 Background assumptions

  • Comprehensive FOL introductions exist elsewhere (e.g., [Hed04] reference).
  • This chapter provides a "nutshell" refresher, not a complete course.
  • Readers with mathematics logic background can skim Section 2.1.
  • Automated reasoning (Section 2.2) is essential for all readers regardless of background.

🔗 Connection to ontology engineering

Understanding logic-based ontology languages helps with:

  • Comprehending existing ontologies.
  • Formalizing concepts one wants to represent.
  • Understanding ontology engineering processes better.
  • Working with automated reasoning tools.

Note: The excerpt for Chapter 2 ends at the beginning of Section 2.1, so detailed content about FOL syntax, semantics, and automated reasoning is not included in this source material. The overview captures the chapter's stated purpose and structure based on the introductory paragraphs provided.

8

Ontologies and natural languages

9 Ontologies and natural languages

🧭 Overview

🧠 One-sentence thesis

Ontologies are used worldwide in systems that operate in many different languages, making the interaction between natural language and ontologies an important consideration in ontology engineering.

📌 Key points (3–5)

  • Global usage: Ontologies are deployed throughout the world, not just in English-speaking contexts.
  • Language diversity: Not all ontology-driven information systems operate in English, creating a need to understand how natural language interacts with ontologies.
  • Multilingual requirements: The interaction between natural language and ontologies must be addressed to support international and multilingual applications.
  • Placement in learning path: This topic appears in Block III (advanced topics) and requires understanding of both Block I (logic foundations) and Block II (ontology development) as prerequisites.

🌍 The multilingual reality of ontologies

🌍 Why language matters in ontology engineering

The excerpt explicitly states that "Ontologies are used throughout the world, and not all systems are in English."

  • This observation highlights a practical constraint: ontology engineering cannot assume a single-language environment.
  • Systems built in different countries and regions need to work with their local languages.
  • The field must address how ontologies—which formalize knowledge—interact with the diverse ways humans express concepts in different natural languages.

🔗 Natural language and ontology interaction

The interaction of natural language with ontologies is explored in Chapter 9.

  • This is presented as a distinct area of study within ontology engineering.
  • The interaction is bidirectional: natural language can inform ontology development, and ontologies can support natural language processing tasks.
  • Example: An ontology-driven information system deployed in Spain would need to handle Spanish terminology and linguistic structures, not just translate English terms.

📚 Context within the textbook structure

📚 Placement in the curriculum

Chapter 9 appears in Block III, which covers advanced topics that deepen and extend foundational material.

BlockContentRelationship to Chapter 9
Block ILogic foundations (FOL, Description Logics, OWL)Prerequisite
Block IIOntology development (top-down, reuse, methodologies)Prerequisite
Block IIIAdvanced topics (querying, natural language, extensions)Contains Chapter 9

🧩 Prerequisites and dependencies

  • Both Block I and Block II are prerequisites for Block III material.
  • Chapters within Block III can be studied "in order of preference, or just a subset thereof."
  • This suggests that natural language interaction with ontologies builds on understanding both the formal logic underpinnings and the practical development methods.

🎯 Learning flexibility

The textbook allows readers to:

  • Study Block III chapters in any order based on interest.
  • Focus on specific advanced topics rather than covering all of them.
  • Choose natural language interaction as an area of deeper exploration if it aligns with their goals.

Don't confuse: The flexibility in studying Block III chapters does not mean the prerequisites can be skipped—understanding the foundations from Blocks I and II remains necessary.

🔍 Broader implications

🔍 International deployment considerations

The brief mention of this topic signals several practical concerns:

  • Localization: Ontology-driven systems need to work in the user's native language.
  • Cross-cultural knowledge representation: Concepts may be expressed or categorized differently across languages and cultures.
  • Terminology management: Maintaining consistency when the same ontology supports multiple languages.

Example: An e-learning system (mentioned earlier in the excerpt) deployed internationally would need to present learning objects and navigate ontology concepts in multiple languages while maintaining semantic consistency.

🔍 Connection to other use cases

The excerpt's earlier examples implicitly touch on language issues:

  • Question answering systems (like Watson) need natural language processing to parse questions and find answers.
  • Digital humanities projects may deal with historical texts in ancient or multiple languages.
  • Gene Ontology and biological ontologies need to support international research communities.

Note: While these connections are implicit in the excerpt's structure, Chapter 9 specifically addresses the natural language dimension that underlies many of these applications.

9

Advanced Modelling with Additional Language Features

10 Advanced Modelling with Additional Language Features

🧭 Overview

🧠 One-sentence thesis

First-order logic provides the formal foundation for ontology languages by offering unambiguous syntax and model-theoretic semantics that enable automated reasoning over explicitly represented knowledge to derive implicit conclusions.

📌 Key points (3–5)

  • What logic studies: not truth itself, but the relationship between the truth of one statement and another—if one statement is true/false, what does that imply about others?
  • Syntax vs semantics: syntax defines what symbols and constructs are allowed in the language; semantics defines what those symbols and sentences actually mean through structures and interpretations.
  • Theory and models: a theory is a consistent set of sentences; a model is a structure that makes all sentences in a theory true; they form an interplay between formal sentences and the domain of objects.
  • Common confusion: free vs bound variables—a sentence must have no free variables (all variables must be quantified); formulas may contain free variables but sentences cannot.
  • Why reasoning matters: automated reasoning algorithms (like tableaux) can infer implicit knowledge from what has been explicitly represented, scaling beyond manual truth tables.

🔤 Syntax: Building blocks of first-order logic

🔤 The alphabet and lexicon

First-order logic (FOL) uses a controlled vocabulary:

  • Connectives: ¬ (not), → (implies), ↔ (if and only if), ∧ (and), ∨ (or), plus parentheses
  • Quantifiers: ∀ (for all / universal) and ∃ (there exists / existential)
  • Variables: x, y, z, ... ranging over individual objects
  • Constants: a, b, c, ... representing specific elements
  • Functions: f, g, h, ... with arguments f(x₁, ...xₙ)
  • Relations: R, S, ... with associated arity (number of arguments)

🧱 Terms, formulas, and sentences

Term: Every variable and constant is a term; if f is an m-ary function and t₁, ..., tₘ are terms, then f(t₁, ..., tₘ) is also a term.

Atomic formula: Has the form t₁ = t₂ or R(t₁, ..., tₙ) where R is an n-ary relation and t₁, ..., tₙ are terms.

Formula: Constructed from atomic formulas by repeated application of negation (¬φ), conjunction (φ ∧ ψ), and existential quantification (∃xφ).

Sentence: A formula having no free variables (all variables are bound to quantifiers).

Don't confuse: A formula may have free variables, but a sentence must not—every variable must be quantified.

🗣️ Natural language to FOL translation patterns

Natural language patternFOL patternExample
"Each/All X is Y"∀x(X(x) → Y(x))"All animals are organisms"
"There exists/are X"∃x X(x)"Aliens exist"
"There are X that are Y"∃x(X(x) ∧ Y(x))"There are books that are heavy"
"Each X has at least one Y"∀x(X(x) → ∃y R(x,y))"Each student is registered for a degree"

Key insight: Universal quantification (∀) typically pairs with implication (→); existential quantification (∃) typically pairs with conjunction (∧).

Example: "Each animal is an organism" becomes ∀x(Animal(x) → Organism(x))—the universal "each" maps to ∀, and "is a" maps to →.

🎯 Semantics: What the symbols mean

🏗️ Structures and interpretations

Vocabulary V: A set of function, relation, and constant symbols.

V-structure: Consists of a non-empty underlying set Δ (the domain of objects) along with an interpretation of V that assigns elements of Δ to constants, functions from Δⁿ to Δ for n-ary functions, and subsets of Δⁿ for n-ary relations.

In plain language: A structure connects the abstract symbols in your vocabulary to actual objects and relationships in some domain.

🔗 Models and theories

M models φ (written M ⊨ φ): The sentence φ is true with respect to structure M.

Theory of M (T(M)): The set of all V-sentences φ such that M ⊨ φ.

Model of Γ: A V-structure that models each sentence in a set of sentences Γ; the class of all models is denoted M(Γ).

Complete V-theory: A set of V-sentences Γ where, for any V-sentence φ, either φ or ¬φ is in Γ (but not both).

Theory: A consistent set of sentences (no contradiction can be derived).

The interplay: You can go from structures to theories (what sentences are true in this structure?) or from theories to models (what structures make these sentences true?).

📊 Example: Students and degree programmes

Consider a conceptual model with:

  • Vocabulary: {attends, Student, DegreeProgramme}
  • Sentences:
    • ∀x,y(attends(x,y) → Student(x) ∧ DegreeProgramme(y)) — "attends relates students to programmes"
    • ∀x(Student(x) → ∃=¹y attends(x,y)) — "each student attends exactly one programme"
  • Underlying set Δ: {John, Mary, Fabio, Claudio, Markus, Inge, ComputerScience, Biology, Design}
  • Interpretation: Maps John, Mary, etc. to Student; ComputerScience, Biology, Design to DegreeProgramme; and the pairs to attends

This structure does not contradict the constraints, so it is a model of the theory.

🔄 Logical equivalences and transformations

🔄 Key equivalence patterns

The excerpt lists many equivalences; here are the most important for ontology work:

Equivalence typePatternInformal reading
De Morgan¬(φ ∧ ψ) ≡ ¬φ ∨ ¬ψ"Negation of a conjunction is disjunction of negations"
Implicationφ → ψ ≡ ¬φ ∨ ψ"Implication can be rewritten as disjunction"
Quantifier negation¬∀x.φ ≡ ∃x.¬φ"Not all X are φ means some X are not φ"
Quantifier negation¬∃x.φ ≡ ∀x.¬φ"If there does not exist some, then there's always none"
Quantifier distribution∀x.φ ∧ ∀x.ψ ≡ ∀x.(φ ∧ ψ)"All satisfy φ and all satisfy ψ means all satisfy both"

Why these matter: These equivalences allow you to transform sentences into different forms, which is essential for automated reasoning algorithms.

🖼️ Translating diagrams to FOL

Conceptual diagrams (like UML class diagrams or ORM models) can be systematically translated:

  • Classes/entity types → unary predicates
  • Associations/relationships → binary (or n-ary) relations
  • Multiplicity constraints → quantifications (e.g., 1..* becomes ∃, * becomes ∀)
  • Subclassing → implication (∀x(Subclass(x) → Superclass(x)))
  • Disjointness → ∀x(A(x) ∧ B(x) → ⊥) where ⊥ is always false
  • Completeness → ∀x(Super(x) → Sub₁(x) ∨ Sub₂(x) ∨ ...)

Example: "Animal has exactly 4 limbs" becomes ∀x(Animal(x) → ∃=⁴y(part(x,y) ∧ Limb(y)))

Don't confuse: The diagram is "syntactic sugar"—it's more accessible to non-logicians, but the FOL sentences are the precise formal representation.

🤖 Automated reasoning foundations

🤖 What reasoning is about

Automated reasoning: Computing systems that automate the ability to make inferences by designing a formal language in which a problem's assumptions and conclusion can be written and providing correct algorithms to solve the problem efficiently.

Key distinction: Logic does not study whether statements are true in reality, but whether the truth of one statement follows from others—if the premises are true, must the conclusion be true?

🔍 Why not truth tables?

Truth tables work for small problems but are computationally too costly for many sentences. Automated reasoning techniques (like tableaux) provide algorithms that scale better.

What reasoning enables:

  • Infer implicit knowledge from explicit representations
  • Check whether a knowledge base is satisfiable (consistent)
  • Determine whether a formula is valid

🧩 Consistency and satisfiability

Consistent: A set of sentences Γ is consistent if no contradiction can be derived from Γ.

Why this matters: Before reasoning, you need to know your theory doesn't contain contradictions—otherwise, anything can be derived (principle of explosion).

The excerpt notes that tableaux is the main proof technique for ontology reasoning, though details are deferred to later sections.

10

Description Logics and First Order Logic Reasoning

11 Ontology modularisation

🧭 Overview

🧠 One-sentence thesis

Automated reasoning uses formal logic languages and algorithms to mechanically derive implicit knowledge from explicit axioms, enabling applications from hardware verification to ontology classification while balancing expressiveness against computational tractability.

📌 Key points (3–5)

  • What automated reasoning does: takes a formal theory (axioms in a logic language) and applies inference rules to derive conclusions that are implicit in the premises, checking consistency and computing classifications.
  • Three reasoning modes: deduction derives what is already entailed; abduction guesses explanations for observations; induction generalizes from examples (each has different guarantees about correctness).
  • Soundness vs completeness trade-off: a sound algorithm never derives false conclusions from true premises; a complete algorithm finds all entailments—both properties are essential for reliable reasoning.
  • Common confusion—expressiveness vs decidability: richer languages (more features like probabilities, temporal logic) can represent more domain knowledge, but often become undecidable or computationally expensive; Description Logics sacrifice some expressiveness to guarantee termination.
  • Tableaux proof technique: the dominant method for DL/OWL reasoners works by trying to build a model for the negated formula—if all branches clash (contradict), the original formula is valid.

🧩 Core logic concepts

🧩 Theory and structure

Theory: a set of sentences (formulas with no free variables) in a logic language that do not contradict each other and admit a model.

  • A theory consists of a vocabulary (predicates like Student, attends) and sentences (axioms like "every Student attends exactly one DegreeProgramme").
  • A structure (model) assigns real-world objects to the vocabulary: a non-empty set Δ of objects (e.g., {John, Mary, ComputerScience}) and an interpretation mapping instances to predicates.
  • Example: the conceptual data model in Example 2.2 has vocabulary {attends, Student, DegreeProgramme}, two sentences (typing constraint and cardinality constraint), and a structure with six students and three degree programmes that satisfies both constraints.

🔄 Equivalences for formula manipulation

The excerpt lists many logical equivalences used to rewrite formulas:

CategoryExampleInformal reading
Commutativityφ ∧ ψ ≡ ψ ∧ φOrder doesn't matter for AND/OR
De Morgan¬(φ ∧ ψ) ≡ ¬φ ∨ ¬ψNegation of conjunction becomes disjunction of negations
Implicationφ → ψ ≡ ¬φ ∨ ψImplication can be rewritten as disjunction
Quantifiers¬∀x.φ ≡ ∃x.¬φ"Not all" means "some are not"
Quantifiers∀x.φ ∧ ∀x.ψ ≡ ∀x.(φ ∧ ψ)Universal quantifier distributes over conjunction
  • Why they matter: these equivalences are used to push negations inside (Negation Normal Form) before applying tableau rules.
  • Don't confuse: propositional equivalences (up to quantifiers) hold in both propositional and first-order logic; quantifier equivalences only apply to first-order logic.

🤖 Automated reasoning essentials

🎯 Four essential components

The excerpt identifies four ingredients for automated reasoning:

  1. Class of problems: what the software must solve (e.g., check theory consistency, compute classification hierarchy).
  2. Formal language: how to represent domain knowledge (e.g., cardinality constraints, probabilities, temporal relations).
  3. Computation method: how the program computes the solution (e.g., natural deduction, resolution, tableaux).
  4. Efficiency strategy: how to do it efficiently (constrain language complexity, optimize algorithms, or both).

🔑 Soundness and completeness

Two critical properties for any reasoning calculus:

Completeness: if Γ ⊨ φ then Γ ⊢ φ (if φ is entailed, the algorithm can derive it).

Soundness: if Γ ⊢ φ then Γ ⊨ φ (if the algorithm derives φ, then φ is truly entailed).

  • Incomplete algorithm: misses some valid entailments (some results are missing).
  • Unsound algorithm: can derive false conclusions from true premises (even worse—undermines trust).
  • Example: tableau reasoning for Description Logics is both sound and complete, guaranteeing correct and exhaustive results.

🛠️ Real-world purposes and limitations

Applications mentioned:

  • Hardware/software verification (Intel Pentium floating-point bug cost $500 million; chips now routinely verified before production).
  • Scheduling problems (course/lecturer/time optimization, reduced from a summer of manual work to a fraction of the time).
  • Scientific discovery (protein phosphatase classification: reasoner matched human experts, refined classifications, identified novel enzyme types).

Limitations:

  • Computational complexity of the language and reasoning services.
  • Trade-off: Description Logics are decidable fragments of FOL (guaranteed to terminate) but less expressive—modellers criticize the loss of features needed to represent domains adequately.

🧰 Tool landscape

Different tools for different purposes:

Tool typeExamplesPurpose
FOL/higher-order theorem proversProver9, MACE4, Vampire, HOL4General logic proving
SAT solversGRASP, SatzCheck if a theory has a model
Constraint programmingEclipseScheduling, soft constraints
DL reasonersFact++, RacerPro, Hermit, CEL, QuOntoOWL ontology reasoning (satisfiability, consistency, classification)
Inductive logic programmingPROGOL, AlephLearn logic programs from examples

🔀 Three reasoning modes

🔀 Deduction: deriving what's already there

Deduction: ascertaining if a theory T entails an axiom α not explicitly asserted (written T ⊨ α), by repeatedly applying deduction rules.

  • How it works: either construct a step-by-step proof forward from premises (natural deduction) or prove indirectly that T ∪ {¬α} leads to a contradiction (resolution, tableaux).
  • Example: given "each Arachnid has exactly 8 legs" and "each Tarantula is an Arachnid," deduce "each Tarantula has 8 legs."
  • Not truly novel: strictly speaking, deduction only reveals what was already implicit—but with large theories, implications are hard to foresee, so deductions may feel novel to domain experts.
  • Don't confuse with abduction/induction: deduction guarantees correctness (the conclusion is logically entailed); the other two modes "guess" knowledge not already in the theory.

🔍 Abduction: guessing explanations

Abduction: inferring a as an explanation of b—given observations, a domain theory, and candidate explanations, find which explanations make the observations follow from the theory.

  • Requirements: the combination of theory + explanation must be consistent; the observations must follow from this combination.
  • Use case: fault detection—given system knowledge and a defective state, find the likely fault.
  • Techniques: sequent calculus, belief revision, probabilistic abductive reasoning, Bayesian networks.
  • Scientist appeal: could automate hypothesis generation from facts, but less permeated in automated reasoning than deduction.

📊 Induction: generalizing from instances

Induction: generalizing toward a conclusion based on individuals—the conclusion is not a logical consequence of the premise but has a degree of support.

  • Key difference: premises can be true while the conclusion is false (unlike deduction).
  • Two flavors:
    • Statistical syllogism: "95% of bacteria acquire genes through horizontal transfer; S. aureus is a bacterium → probability 95% that S. aureus acquires genes this way."
    • Analogy: Tibbles is a cat with tail, four legs, furry; Tib has four legs and is furry → induce Tib is a cat (but Tib might be a cheetah).
  • Don't confuse with deduction: by deductive reasoning, Tib would not be classified as a cat (only as a superclass like Feliformia if that superclass declares four legs + furry).
  • Machine learning connection: inductive logic programming takes positive examples + negative examples + background knowledge → derive a hypothesized logic program that entails all positives and no negatives.

🌲 Tableaux proof technique

🌲 What tableaux does

Tableau: a sound and complete procedure that decides satisfiability by exhaustively checking the existence of a model.

  • Core idea: φ ⊨ ψ if and only if φ ∧ ¬ψ is NOT satisfiable—if satisfiable, we found a counterexample.
  • Method: decompose the formula top-down, trying to build a model; if all branches clash (contradict), no model exists.
  • Example: to prove T entails some axiom, show that T ∪ {negation of the axiom} is unsatisfiable.

🔧 Tableaux procedure (four steps)

Step 1: Negation Normal Form

  • Push all negations inside using equivalences (e.g., ¬(φ ∧ ψ) becomes ¬φ ∨ ¬ψ; ¬∀x.φ becomes ∃x.¬φ).

Step 2: Apply completion rules Four rules decompose the formula:

RuleFormulaMeaningResult
2aφ ∧ ψConjunctionBoth φ and ψ (single branch)
2bφ ∨ ψDisjunctionφ OR ψ (two branches, non-deterministic)
2c∀x.φUniversalSubstitute x with all terms in tableau; keep ∀x.φ
2d∃x.φExistentialSubstitute x with a new Skolem constant a

Step 3: Continue until termination

  • Apply rules until either:
    • (a) Every branch has a clash (two opposite literals, e.g., p and ¬p), or
    • (b) A completed branch exists where no more rules apply.

Step 4: Determine outcome

  • All branches clash → φ ∧ ¬ψ is NOT satisfiable → original φ ⊨ ψ is valid.
  • A completed branch exists → found a model for φ ∧ ¬ψ → found a counterexample → φ does not entail ψ.

📝 Worked example: reflexive + asymmetric relation

Example 2.3 proves that a relation R cannot be both reflexive (∀x R(x,x)) and asymmetric (∀x,y (R(x,y) → ¬R(y,x))).

  • Setup: Theory T has reflexivity and asymmetry; we want to show T ∪ {¬∀x,y R(x,y)} is satisfiable by proving the negation is unsatisfiable.
  • Rewrite asymmetry: ∀x,y (R(x,y) → ¬R(y,x)) becomes ∀x,y (¬R(x,y) ∨ ¬R(y,x)) using the implication equivalence.
  • Negate the target: ¬∀x,y R(x,y) becomes ∀x,y R(x,y) (double negation).
  • Tableau steps (Figure 2.3):
    1. Start with three axioms.
    2. Apply rule 2d (existential) and 2c (universal).
    3. Apply rule 2b (disjunction) to create branches.
    4. All branches clash → unsatisfiable → original is valid.

Don't confuse: if you check a single formula (not a negated entailment), a completed branch means satisfiable (found a model); all clashes mean unsatisfiable (contradiction).

🧮 Propositional example

Example 2.4 (Figures 2.4–2.5) proves ((p ∨ (q ∧ r)) → ((p ∨ q) ∧ (p ∨ r))) is valid using 19 steps with only rules 2a (conjunction) and 2b (disjunction)—no quantifiers because propositional logic has no variables.

  • All branches eventually clash → formula is a tautology.

🔬 Validity, satisfiability, and contradictions

🔬 Three key outcomes

Valid formula (tautology): holds under every assignment; denoted ⊨ φ.

Satisfiable formula: holds under some assignment.

Unsatisfiable formula (contradiction): holds under no assignment.

  • Example: ∃x(p(x) ∧ ¬q(x)) ∧ ∀y(¬p(y) ∨ q(y)) is unsatisfiable (the excerpt states this but does not show the proof).
  • Why truth tables are impractical: computationally too costly for many sentences; tableaux and other techniques scale better.

📐 Relationship between concepts

  • To check if φ entails ψ: prove φ ∧ ¬ψ is unsatisfiable.
  • If φ ∧ ¬ψ is satisfiable, we found a counterexample (an assignment where φ is true but ψ is false).

🎓 Exercises and practice

The excerpt includes several exercises (omitted here for brevity):

  • Translating natural language to FOL and vice versa.
  • Formalizing graph properties.
  • Proving unsatisfiability using tableaux (e.g., Exercise 2.4b, Exercise 2.5).
  • Practicing with pizza ontology (disjointness, necessary/sufficient conditions, theory merging).

Key takeaway from exercises: automated reasoning is not just theoretical—it requires hands-on practice with formula manipulation, equivalences, and proof construction.