# Semantic Mapping Format for CSV-to-JSON Transformation This document defines a **declarative mapping format** that enables semantically rich transformation of flat CSV data into structured JSON representations aligned with the goals of the **OMG Semantic Augmentation Challenge**. Rather than relying on procedural logic, this format allows you to express **conceptual model alignment**, **identifier binding**, and **value coercion** in a compact, portable way that supports reuse and automation. --- ## Why this Format? This mapping syntax is designed to: - Link flat tabular data to **nested semantic structures** - Support **identifier-based merging and grouping** - Define **schema-aware coercion** (string, number, boolean, date) - Enable downstream use in JSON-LD, JSON Schema, and linked data - Be interpretable by both humans and machines - Allow alignment with **ontology-based models** (e.g. FIBO) --- ## Mapping Format Overview The mapping is a JSON object where: - **Keys** are **semantic paths** describing the structure of the output. - **Values** describe how to extract or compute values from CSV rows. Each mapping entry looks like this: ```json "path.to.target": { "column": "CSV_COLUMN_NAME", "type": "string" // or number, boolean, date } ``` Or for constant values: ```json "path.to.constant": { "constant": "Institution" } ``` --- ## Path Syntax The path syntax supports: | Pattern | Purpose | |-------------------------------|---------------------------------------------------------------| | `object.property` | Set a nested property | | `array[false]` | Always append to the array | | `array[$.key == @COLUMN]` | Find and reuse existing array items, or create if no match | | `map[{@COLUMN}]` | Use a row value as a key in an object map. Create or match | --- ## Value Resolution Each mapping value supports: | Field | Meaning | |-----------|----------------------------------------------------------| | `column` | The CSV column name | | `type` | `"string"`, `"number"`, `"boolean"`, `"date"` — coercion | | `constant`| A literal value | ## Data Type Coercion The mapping format supports coercing CSV strings into strongly typed JSON values using a `"type"` field. This ensures compatibility with schema validation, semantic tooling, and downstream processing. Each type is interpreted as follows: ### `"string"` Default behavior — no coercion is applied: { "column": "NAME", "type": "string" } --- ### `"number"` - Trims leading zeros (e.g. `"007"` → `7`) - Parses as a floating-point number - Returns `undefined` for empty strings or `"NULL"` { "column": "AGE", "type": "number" } --- ### `"boolean"` - Accepts the following truthy values: `"TRUE"`, `"1"` - Accepts the following falsy values: `"FALSE"`, `"0"` - Comparison is case-insensitive and trims whitespace - Invalid or empty values return `undefined` { "column": "ENABLED", "type": "boolean" } --- ### `"date"` Supports two formats: 1. **Excel-style serial numbers** (e.g. `45401` → `"2024-05-04"`) 2. **Plain date strings** (e.g. `"01/01/1890"`) Output is always ISO-8601: `"YYYY-MM-DD"` { "column": "ESTYMD", "type": "date" } --- ### `"constant"` You can skip `column` entirely and use a constant: { "constant": "Branch" } This is useful for setting fixed `"type"` values or flags within the output. --- ### Undefined and Null Handling If a value is empty (`''`) or the string `"NULL"`, it is treated as `undefined` and excluded from the output entirely. --- ## Strategic benefits This format has a number of strategic benefits: - Captures semantic **structure and identity** - Aligns flat data to **knowledge graph–ready** shapes - Integrates cleanly with **JSON-LD contexts** and vocabularies - Enables validation via **JSON Schema** - Facilitates automation in **ETL, modeling, and governance** pipelines By using declarative mappings and semantic paths, this approach turns traditional tabular data into structured, interoperable knowledge assets. --- ## Example This input CSV ```csv ID,NAME,AGE,ACTIVE,START 001,Alice,30,TRUE,44561 002,Bob,25,false,01/01/2020 003,Carol,NULL,,NULL ``` When converted with this mapping: ```json { "type[]": { "constant": "Document" }, "people[{@ID}].id": { "column": "ID", "type": "string" }, "people[{@ID}].name": { "column": "NAME", "type": "string" }, "people[{@ID}].age": { "column": "AGE", "type": "number" }, "people[{@ID}].active": { "column": "ACTIVE", "type": "boolean" }, "people[{@ID}].startDate": { "column": "START", "type": "date" } } ``` Produces this JSON: ```json { "type": [ "Document" ], "people": { "001": { "id": "001", "name": "Alice", "age": 30, "active": true, "startDate": "2021-12-31" }, "002": { "id": "002", "name": "Bob", "age": 25, "active": false, "startDate": "2019-12-31" }, "003": { "id": "003", "name": "Carol" } } } ``` This mapping: - Uses the syntax people[{@ID}] to produce a map where each key is the row’s ID, and the value is the corresponding person object. - Assigns structured fields such as `id`, `name`, `age`, `active`, and `startDate` using the corresponding CSV columns - Applies type coercion: - `"age"` is parsed as a number - `"active"` is interpreted as a boolean (e.g. `"TRUE"` → `true`, `"false"` → `false`) - `"startDate"` is parsed as a date, supporting both Excel serial numbers and human-readable formats - Skips any fields with `"NULL"` or empty values to avoid populating invalid or meaningless data - Annotates the document with a top-level `type` of `"Document"` via a constant assignment Observations: - "type" is a constant applied at the root. - "people" is keyed by the value of ID. - "age", "active", and "startDate" are omitted for Carol because they are "NULL" or blank and thus coerced to undefined. - Excel serial 44561 (row 1) becomes "2021-12-31". - "01/01/2020" string is parsed as "2020-01-01" in ISO format. - "false" (case-insensitive) becomes boolean false.