EEGUnity Kernel Tutorial: Rich Metadata, `misc`, `stim`, and Annotations#

This tutorial explains how to use EEGUnity kernels to inject dataset-specific metadata and channels in memory.

1. Design Principle#

EEGUnity keeps locator metadata as source of truth:

format_channel_names() standardizes locator channels as channel_type:channel_name.
get_data_row() uses locator metadata to overwrite raw metadata at load time.
Kernels are applied after locator-driven metadata patching.

This allows online metadata maintenance without modifying source files.

2. What a Kernel Can Do#

A kernel can:

add or update raw.info["description"]
add or adjust multiple misc channels
add or adjust multiple stim channels
add/update annotations
build a raw from scratch for files that EEGUnity’s parser cannot read (see Section 7)

Standard kernel interface#

class SomeKernel:
    KERNEL_ID: str = "my-kernel-v1"

    def apply(self, udataset, raw, row):
        # raw is a loaded mne.io.BaseRaw; row is the locator pandas.Series
        ...
        return raw

KERNEL = SomeKernel()

apply() is called for every file whose Completeness Check is not Unavailable.

3. Annotation vs `misc` vs `stim`#

Use these three mechanisms for different semantics:

Annotations: text labels mapped to time segments (onset, duration, description).
misc channels: continuous values over time (for example probability density, reaction-time trajectory).
stim channels: integer event codes over time (for example class sequence 1/2/3).

For a single scalar value for one segment, fill the covered segment in a misc channel.

4. Example Kernel with Multiple `misc` and `stim` Channels#

from __future__ import annotations
from dataclasses import dataclass
import numpy as np
import mne


def add_channel(raw: mne.io.BaseRaw, ch_name: str, ch_type: str, values: np.ndarray) -> mne.io.BaseRaw:
    """Append one channel to raw with explicit MNE channel type."""
    if values.ndim != 1:
        raise ValueError("values must be a 1D array")
    if values.shape[0] != raw.n_times:
        raise ValueError("values length must equal raw.n_times")

    info = mne.create_info([ch_name], sfreq=raw.info["sfreq"], ch_types=[ch_type])
    ch_raw = mne.io.RawArray(values[np.newaxis, :], info, verbose=False)
    raw.add_channels([ch_raw], force_update_info=True)
    return raw


@dataclass
class ExampleKernel:
    KERNEL_ID: str = "example_rich_meta"

    def apply(self, udataset, raw: mne.io.BaseRaw, row):
        n = raw.n_times

        # misc channels (continuous signals)
        prob_density = np.linspace(0.1, 0.9, n, dtype=float)
        reaction_time = np.full(n, 0.42, dtype=float)
        raw = add_channel(raw, "prob_density", "misc", prob_density)
        raw = add_channel(raw, "reaction_time", "misc", reaction_time)

        # stim channels (integer codes)
        task_code = np.zeros(n, dtype=float)
        task_code[n // 4: n // 2] = 1
        task_code[n // 2: 3 * n // 4] = 2
        task_code[3 * n // 4:] = 3

        stage_code = np.zeros(n, dtype=float)
        stage_code[n // 3: 2 * n // 3] = 7

        raw = add_channel(raw, "task_code", "stim", task_code)
        raw = add_channel(raw, "stage_code", "stim", stage_code)

        # annotation segments (text semantics)
        ann = mne.Annotations(
            onset=[0.0, raw.times[n // 2]],
            duration=[2.0, 2.0],
            description=["trial_start", "feedback"],
        )
        raw.set_annotations(ann)

        return raw


KERNEL = ExampleKernel()

5. Binding and Running#

from eegunity import UnifiedDataset

ud = UnifiedDataset(
    dataset_path=r"path/to/dataset",
    domain_tag="my_dataset",
    kernel_spec=r"path/to/example_kernel.py",
)

# Parser path
raw0 = ud.eeg_parser.get_data(0)

# Batch path (kernel is also applied when loading row data in batch methods)
ud.eeg_batch.get_file_hashes(data_stream=True)

6. Channel Type Compatibility#

EEGUnity standard prefixes are lowercase MNE-style (eeg, eog, emg, ecg, meg, stim, misc, bio) and it also accepts explicit MNE channel type strings in locator entries, for example:

seeg:LA1
ecog:G1
dbs:DBS1
fnirs_od:S1_D1_760
pupil:pupil_left
misc:prob_density
stim:task_code

Legacy uppercase prefixes (EEG, EOG, EMG, ECG, STIM, Unknown) are accepted for backward compatibility.

7. Extended Interface: Handling Unavailable Files#

EEGUnity marks files as Completeness Check = Unavailable when its built-in parser cannot determine the sampling rate (e.g., headerless CSV files, proprietary binary formats). By default, kernels are not called for Unavailable files.

For datasets where EEGUnity cannot parse the file format at all, a kernel can opt in to build the raw from scratch by implementing the extended interface:

Attribute / Method	Required	Description
`HANDLES_UNAVAILABLE = True`	yes	Opt-in flag. Must be set to `True`.
`load(self, row) -> BaseRaw \| None`	yes	Called first for Unavailable files. Build and return a `mne.io.RawArray` from the raw file. Return `None` to skip this file.
`apply(self, udataset, raw, row)`	yes (same as always)	Called after `load()` completes, with the raw returned by `load()`. Use this for annotation injection and metadata enrichment — same as for Completed files.

Call sequence for Unavailable files#

kernel.load(row)          →  raw   (format parsing, build RawArray)
kernel.apply(ud, raw, row) →  raw   (enrichment: annotations, description, …)

For Completed files the call sequence is unchanged:

EEGUnity parser           →  raw   (standard MNE loader)
kernel.apply(ud, raw, row) →  raw   (enrichment)

Example: headerless CSV dataset#

from __future__ import annotations
import json
from dataclasses import dataclass

import mne
import numpy as np
import pandas as pd


_SFREQ = 2048.0
_CH_NAMES = ["EEG1", "EEG2"]


@dataclass
class HeaderlessCSVKernel:
    KERNEL_ID: str = "headerless-csv-v1"
    HANDLES_UNAVAILABLE: bool = True   # opt in

    def load(self, row) -> mne.io.BaseRaw | None:
        """Build a RawArray from a headerless CSV file."""
        file_path = row["File Path"]
        if not file_path.endswith(".csv"):
            return None  # skip non-CSV files silently

        # Read EEG columns (0-indexed: columns 1 and 2)
        df = pd.read_csv(file_path, header=None, usecols=[1, 2])
        eeg = df.to_numpy(dtype=float).T          # (n_ch, n_samples)
        info = mne.create_info(_CH_NAMES, sfreq=_SFREQ, ch_types=["eeg", "eeg"])
        return mne.io.RawArray(eeg, info, verbose=False)

    def apply(self, udataset, raw: mne.io.BaseRaw, row) -> mne.io.BaseRaw:
        """Inject metadata and annotations into the loaded raw."""
        raw.info["description"] = json.dumps({
            "eegunity_description": {
                "amplifier": "unknown", "cap": "unknown",
                "age": "unknown", "sex": "unknown", "handedness": "unknown",
            }
        })
        # … add annotations here …
        return raw


KERNEL = HeaderlessCSVKernel()

Backward compatibility#

Kernels that do not set HANDLES_UNAVAILABLE = True are never called for Unavailable files — behaviour is identical to before this interface was added. Existing kernels require no changes.

8. Recommended Practice#

Use annotations for semantic event intervals.
Use stim for integer-coded sequences.
Use misc for continuous labels.
Keep kernel logic dataset-specific and deterministic.
For Unavailable-file support: put raw construction in load(), keep annotation/metadata logic in apply() so both code paths share the same enrichment step.

EEGUnity Kernel Tutorial: Rich Metadata, misc, stim, and Annotations#