# CSV annotation prototype with SysMLv2 and SysIDE This proposal illustrates the use of a prototype software developed at Sensmetry, based on our (more mature) SysML v2 offering. The core concept of the proposal is to “invert” the problem of describing CSV columns. Rather than annotating or mapping columns to information, the user describes a model (in the SysMLv2-sense of the word) and then annotates certain model elements as consisting of instances found in a (or multiple) CSV files. Schematically, rather than annotating the four columns of ``` international title, director, country, copyright year Stalker, Tarkovsky, USSR, 1979 Solaris, Soderbergh, USA, 2002 ..., ..., ..., ... ``` to indicate their meaning, the user describes the contents (in simplified notation): ``` films : Film { directed_by : Director { name } country_of_origin : Country { name } year : Year } ``` and then adds information to indicate how to find instances of each in the dataset. ``` films : Film { @columns("international title", "copyright year") name @ columns("international title") directed_by : Director { @columns("director") name @columns("director") } country_of_origin : Country { @columns("country") name @columns("country") } year : Year @ columns("copyright year") } ``` The above would say that individual films correspond to unique values of of the "international title", "copyright year" column, where the former contributes the "name" while the latter contributes the "year". Directors correspond to unique values of the "director" column, and co-occurrence on the same row links a film to the corresponding director through the "directed_by" feature. Linking to semantic web or ontology-type resources is then achieved by annotating model elements to indicate that instance membership in the model implies class membership in a resource identified by a certain IRI, that a feature relating two instances implies a predicate (identified by a certain IRI), or that instances equal some resource, the IRI of which can be derived from features of the instance. This proposal implements the above idea, using SysML v2 (textual syntax) as the modelling language, SysIDE Editor (free and Open Source) or SysIDE Modeller for authoring, and custom tooling built on top of SysIDE Automator for processing the user authored model into useful derived artefacts. The tooling currently produces two primary derived artefacts from the user model: documentation describing the columns and a materialised RDF (turtle) file with all the assertions implied by the user model and the provided CSV file.