EEGUnity Kernel Tutorial: Rich Metadata, `misc`, `stim`, and Annotations#

This tutorial explains how to use EEGUnity kernels to inject dataset-specific metadata and channels in memory.

1. Design Principle#

EEGUnity keeps locator metadata as source of truth:

format_channel_names() standardizes locator channels as channel_type:channel_name.
get_data_row() uses locator metadata to overwrite raw metadata at load time.
Kernels are applied after locator-driven metadata patching.

This allows online metadata maintenance without modifying source files.

2. What a Kernel Can Do#

A kernel can:

add or update raw.info["description"]
add or adjust multiple misc channels
add or adjust multiple stim channels
add/update annotations

Kernel interface:

class SomeKernel:
    def apply(self, udataset, raw, row):
        ...
        return raw

KERNEL = SomeKernel()

3. Annotation vs `misc` vs `stim`#

Use these three mechanisms for different semantics:

Annotations: text labels mapped to time segments (onset, duration, description).
misc channels: continuous values over time (for example probability density, reaction-time trajectory).
stim channels: integer event codes over time (for example class sequence 1/2/3).

For a single scalar value for one segment, fill the covered segment in a misc channel.

4. Example Kernel with Multiple `misc` and `stim` Channels#

from __future__ import annotations
from dataclasses import dataclass
import numpy as np
import mne


def add_channel(raw: mne.io.BaseRaw, ch_name: str, ch_type: str, values: np.ndarray) -> mne.io.BaseRaw:
    """Append one channel to raw with explicit MNE channel type."""
    if values.ndim != 1:
        raise ValueError("values must be a 1D array")
    if values.shape[0] != raw.n_times:
        raise ValueError("values length must equal raw.n_times")

    info = mne.create_info([ch_name], sfreq=raw.info["sfreq"], ch_types=[ch_type])
    ch_raw = mne.io.RawArray(values[np.newaxis, :], info, verbose=False)
    raw.add_channels([ch_raw], force_update_info=True)
    return raw


@dataclass
class ExampleKernel:
    KERNEL_ID: str = "example_rich_meta"

    def apply(self, udataset, raw: mne.io.BaseRaw, row):
        n = raw.n_times

        # misc channels (continuous signals)
        prob_density = np.linspace(0.1, 0.9, n, dtype=float)
        reaction_time = np.full(n, 0.42, dtype=float)
        raw = add_channel(raw, "prob_density", "misc", prob_density)
        raw = add_channel(raw, "reaction_time", "misc", reaction_time)

        # stim channels (integer codes)
        task_code = np.zeros(n, dtype=float)
        task_code[n // 4: n // 2] = 1
        task_code[n // 2: 3 * n // 4] = 2
        task_code[3 * n // 4:] = 3

        stage_code = np.zeros(n, dtype=float)
        stage_code[n // 3: 2 * n // 3] = 7

        raw = add_channel(raw, "task_code", "stim", task_code)
        raw = add_channel(raw, "stage_code", "stim", stage_code)

        # annotation segments (text semantics)
        ann = mne.Annotations(
            onset=[0.0, raw.times[n // 2]],
            duration=[2.0, 2.0],
            description=["trial_start", "feedback"],
        )
        raw.set_annotations(ann)

        return raw


KERNEL = ExampleKernel()

5. Binding and Running#

from eegunity import UnifiedDataset

ud = UnifiedDataset(
    dataset_path=r"path/to/dataset",
    domain_tag="my_dataset",
    kernel_spec=r"path/to/example_kernel.py",
)

# Parser path
raw0 = ud.eeg_parser.get_data(0)

# Batch path (kernel is also applied when loading row data in batch methods)
ud.eeg_batch.get_file_hashes(data_stream=True)

6. Channel Type Compatibility#

EEGUnity standard prefixes are lowercase MNE-style (eeg, eog, emg, ecg, meg, stim, misc, bio) and it also accepts explicit MNE channel type strings in locator entries, for example:

seeg:LA1
ecog:G1
dbs:DBS1
fnirs_od:S1_D1_760
pupil:pupil_left
misc:prob_density
stim:task_code

Legacy uppercase prefixes (EEG, EOG, EMG, ECG, STIM, Unknown) are accepted for backward compatibility.

7. Recommended Practice#

Use annotations for semantic event intervals.
Use stim for integer-coded sequences.
Use misc for continuous labels.
Keep kernel logic dataset-specific and deterministic.

EEGUnity Kernel Tutorial: Rich Metadata, misc, stim, and Annotations#