A stand-off annotation schema for modelling ideas, arguments, and their discursive contexts in textual sources. Built for researchers studying the history of ideas, philosophy, science, and technology.
TheSu XML (Thesis-Support XML) is a stand-off annotation schema for modelling ideas and their discursive contexts in textual sources. It enables researchers to annotate declarative statements (theses) and link them to their surrounding discourse contexts (supports), such as argumentative justifications, explanatory reformulations, and framing introductions. A distinctive feature is its support for comparative analysis: by linking multiple statements that express essentially the same core idea under abstract propositions, researchers can track how ideas vary in their details, presentation, and use across different sources or contexts. The schema is tailored for research in the history of ideas, philosophy, science, and technology.
Why Does It Matter?
Historical texts contain complex webs of ideas, arguments, and rhetorical structures. Traditional methods struggle to track these relationships across multiple sources. TheSu XML provides a systematic, computational approach to mapping discourse structures, allowing both detailed analysis and broader pattern discovery.
Who Is It For?
TheSu XML is tailored for researchers in:
History of ideas
History of philosophy
History of science and technology
Philology and textual analysis
Digital humanities
Whether analysing a single philosophical dialogue or comparing ideas across centuries, TheSu XML provides the structure needed for systematic, reproducible research.
Current Development Context
TheSu XML is currently being developed and refined as part of an FWO-funded research project at KU Leuven (2024-2027), which uses lead and lead white in Greco-Roman sources as a case study.
Note: Most examples throughout this website are drawn from ancient Greco-Roman sources on lead and lead white, reflecting the current research focus. This includes examples from Plutarch, Plato, Dioscorides, Theophrastus, and other ancient authors.
𓍿𓊃𓋭
Research Possibilities
History of Ideas
Track Ideas Across Sources
Determine how claims function within arguments—for instance, whether a statement like "the cosmos is rationally arranged" relies structurally on analogical reasoning or is supported by other types of arguments. TheSu XML enables systematic comparison of ideas across multiple works, revealing how argumentative strategies vary and how intellectual traditions develop over time.
History of Science
Compare Procedural Knowledge
Identify which aspects of technical procedures remain stable across centuries and which transform as knowledge is transmitted—for instance, comparing how distillation or metallurgical processes are described across different sources and time periods. TheSu XML enables step-by-step comparison of procedures, revealing patterns of continuity and change that illuminate how practical knowledge was preserved, modified, and adapted over time.
𓋭𓊃𓅱𓀜𓀀
Key Features
Theses
Model Declarative Statements
A thesis is a declarative statement conveyed explicitly or implicitly by a source. TheSu XML enables annotation of such statements, enriched with details including speaker attribution, thematic classification, formal structure (such as embedded analogies or aetiological explanations), and links to abstract propositions for cross-source comparison.
Each thesis can be marked as explicit (directly stated in the text) or implicit (reconstructed from context), and may include metadata such as speaker attribution, thematic classifiers, keywords, and notes on formal structure. Theses can represent factual claims, philosophical statements, procedural descriptions, or causal explanations. They form the foundation of discourse analysis, enabling systematic cataloguing and indexing of ideas across sources.
Examples
Chemical/Medical Example:
"Lead white is the most cooling of deadly drugs" — an explicit thesis from Plutarch, annotated with macro-theme "physical", micro-themes "chemistry" and "chromatology", speaker attribution, and linked to its source text span.
Philosophical Example:
"If someone dyes anything a certain colour, the colour of the thing that has been dyed is not the same as that of what is present to it" — an explicit thesis from Plato's Lysis, attributed to Socrates, with keywords for "colour" and "smear", and formal structure noting an embedded analogy.
Implicit Thesis:
"What is neither good nor bad becomes bad when bad is present" — an implicit thesis reconstructed from rhetorical questions in Plato's Lysis, marked as implicit because it is not directly stated but inferred from the text's meaning.
Procedural Thesis:
"To produce lead white, place lead in vinegar, wait for it to acquire thickness, then scrape off the white substance" — a procedural thesis describing a technical process, annotated with formal structure indicating it contains a sequence of phases: Phase 1 (placing lead in vinegar), Phase 2 (waiting for thickness, with duration specified), and Phase 3 (scraping off the white substance), each annotated with details on ingredients, duration, and required repetitions.
Supports
Map Discursive Contexts
Supports are textual elements that employ or target theses, modelling functional relationships between discourse components. Each support can have one or more functions: argumentative (justifications for acceptance or rejection), expository (explanations or clarifications), expansive (elaborations or excursuses), or contextualising (framing or background information). When multiple functions apply, they are ranked to indicate the primary function.
Supports are also classified by their form, such as particular illustrations, affirmations, questions, analogies, or definitions. They can include discourse markers (like "because", "for example", "namely") that signal their function, and are attributed to speakers just like theses. Supports can be explicit (directly stated) or implicit (reconstructed from context), and can themselves employ other theses as premises, enabling reconstruction of complex argumentative structures.
Examples
Expository Support:
"For example, when lead white is applied as a hair dye" — this support clarifies the abstract claim "If someone dyes anything a certain colour, the colour of the thing that has been dyed is not the same as that of what is present to it" by providing a concrete illustration, helping readers understand the main point through a specific example.
Argumentative Support:
"Because lead can produce the most cooling of deadly drugs" — this support provides a reason that justifies the claim "Lead is among the naturally cold substances", functioning as a premise that supports this conclusion about lead's properties.
Contextualising Support:
"What do you mean?" — Menexenus's question in Plato's dialogue that frames the discourse, helping readers understand the context and purpose of the lead white examples that follow in the conversation.
Propositions
Enable Cross-Source Comparison
Propositions are abstract ideas that multiple theses represent. Unlike theses, which are linked to specific text spans in sources, propositions exist independently as abstract concepts. They enable comparative analysis by linking multiple statements that express essentially the same core idea, even when those statements appear in different sources, contexts, or time periods.
Each proposition includes a paraphrase describing the abstract idea, and can be classified thematically with macro-themes and micro-themes, just like theses. Propositions can also contain sequences for procedural ideas (such as abstract recipe models). By linking theses to propositions, researchers can track how ideas vary in their details, presentation, use, or support throughout a text or corpus, revealing patterns of intellectual transmission and transformation.
Examples
Chemical/Medical Example:
The proposition "Lead white is a cooling substance" links multiple theses from different sources: Plutarch's statement that "Lead white is the most cooling of deadly drugs" and Dioscorides's descriptions of lead white's cooling properties. This enables comparison of how the same core idea appears and is supported across different texts.
Property Example:
The proposition "Lead white is white in colour" links theses from Plato's discussion of lead white's whiteness and Dioscorides's remark that properly produced lead white "comes about white and effective", showing how the same property claim appears in different contexts—philosophical discourse versus technical description.
Procedural Proposition:
The proposition "To produce lead white: place lead over vinegar, wait for thickness, scrape off the white substance" serves as an abstract recipe model. Multiple recipe theses from Theophrastus, Dioscorides, and other sources can be linked to this proposition, enabling step-by-step comparison of how the same process is described differently across sources.
Sequences
Analyse Procedural Knowledge
Sequences are detailed breakdowns of processes that are nested within theses—think of them as the step-by-step instructions or chronological events that a thesis describes. When a thesis contains a recipe, procedure, or chain of events, sequences allow you to break it down into individual phases, each annotated with specific details such as ingredients, objects, duration, repetitions, and conditions.
Each phase can be annotated with the objects or ingredients involved, how long it takes, whether steps need to be repeated, and what conditions must be met. Variant recipes or procedures can be mapped to their corresponding primary processes, enabling step-by-step comparison of how different sources describe the same process—identifying which steps are shared, which are altered, and which are extended or omitted.
Examples
Recipe Sequence:
The thesis "To produce lead white, place lead in vinegar, wait for it to acquire thickness, then scrape off the white substance" contains a sequence with multiple phases: (1) placing lead over vinegar in closed jars, (2) waiting for the lead to acquire thickness, (3) opening the jars, (4) scraping away the substance that has formed on the lead's surface (repeating until the lead is consumed), (5) grinding what has been scraped away, (6) filtering off what is being ground, and (7) the final precipitate is lead white. Each phase is annotated with ingredients (lead, vinegar, jars), objects (the substance formed), and repetition details (scraping continues until complete).
Variant Recipe Mapping:
Different sources describe lead white production with variations: Theophrastus's recipe uses closed jars and scraping, while Dioscorides's recipe uses a lattice of reeds and includes additional steps like decanting and sun-drying. Sequences allow these variant recipes to be mapped to a primary process, showing which phases are shared (placing lead with vinegar, waiting for transformation), which are altered (jar method vs. lattice method), and which are extended (Dioscorides adds decanting and multiple sifting phases).
Phase Details:
Within a sequence, each phase can include detailed annotations: objects are marked as ingredients or instruments (vinegar as ingredient, jars as instruments), transformations are tracked (lead becomes thickened substance), and temporal information is recorded (waiting phases, repetition conditions like "until consumed completely"). This granular annotation enables precise comparison of how different sources describe the same procedural steps.
Get Started
Ready to explore TheSu XML? Learn more about how it works, access the schema documentation, or view visualisations.