EEGUnity Kernel Tutorial: Rich Metadata, misc, stim, and Annotations#

This tutorial explains how to use EEGUnity kernels to inject dataset-specific metadata and channels in memory.

1. Design Principle#

EEGUnity keeps locator metadata as source of truth:

  • format_channel_names() standardizes locator channels as channel_type:channel_name.

  • get_data_row() uses locator metadata to overwrite raw metadata at load time.

  • Kernels are applied after locator-driven metadata patching.

This allows online metadata maintenance without modifying source files.

2. What a Kernel Can Do#

A kernel can:

  • add or update raw.info["description"]

  • add or adjust multiple misc channels

  • add or adjust multiple stim channels

  • add/update annotations

  • build a raw from scratch for files that EEGUnity’s parser cannot read (see Section 7)

Standard kernel interface#

class SomeKernel:
    KERNEL_ID: str = "my-kernel-v1"

    def apply(self, udataset, raw, row):
        # raw is a loaded mne.io.BaseRaw; row is the locator pandas.Series
        ...
        return raw

KERNEL = SomeKernel()

apply() is called for every file whose Completeness Check is not Unavailable.

3. Annotation vs misc vs stim#

Use these three mechanisms for different semantics:

  • Annotations: text labels mapped to time segments (onset, duration, description).

  • misc channels: continuous values over time (for example probability density, reaction-time trajectory).

  • stim channels: integer event codes over time (for example class sequence 1/2/3).

For a single scalar value for one segment, fill the covered segment in a misc channel.

4. Example Kernel with Multiple misc and stim Channels#

from __future__ import annotations
from dataclasses import dataclass
import numpy as np
import mne


def add_channel(raw: mne.io.BaseRaw, ch_name: str, ch_type: str, values: np.ndarray) -> mne.io.BaseRaw:
    """Append one channel to raw with explicit MNE channel type."""
    if values.ndim != 1:
        raise ValueError("values must be a 1D array")
    if values.shape[0] != raw.n_times:
        raise ValueError("values length must equal raw.n_times")

    info = mne.create_info([ch_name], sfreq=raw.info["sfreq"], ch_types=[ch_type])
    ch_raw = mne.io.RawArray(values[np.newaxis, :], info, verbose=False)
    raw.add_channels([ch_raw], force_update_info=True)
    return raw


@dataclass
class ExampleKernel:
    KERNEL_ID: str = "example_rich_meta"

    def apply(self, udataset, raw: mne.io.BaseRaw, row):
        n = raw.n_times

        # misc channels (continuous signals)
        prob_density = np.linspace(0.1, 0.9, n, dtype=float)
        reaction_time = np.full(n, 0.42, dtype=float)
        raw = add_channel(raw, "prob_density", "misc", prob_density)
        raw = add_channel(raw, "reaction_time", "misc", reaction_time)

        # stim channels (integer codes)
        task_code = np.zeros(n, dtype=float)
        task_code[n // 4: n // 2] = 1
        task_code[n // 2: 3 * n // 4] = 2
        task_code[3 * n // 4:] = 3

        stage_code = np.zeros(n, dtype=float)
        stage_code[n // 3: 2 * n // 3] = 7

        raw = add_channel(raw, "task_code", "stim", task_code)
        raw = add_channel(raw, "stage_code", "stim", stage_code)

        # annotation segments (text semantics)
        ann = mne.Annotations(
            onset=[0.0, raw.times[n // 2]],
            duration=[2.0, 2.0],
            description=["trial_start", "feedback"],
        )
        raw.set_annotations(ann)

        return raw


KERNEL = ExampleKernel()

5. Binding and Running#

from eegunity import UnifiedDataset

ud = UnifiedDataset(
    dataset_path=r"path/to/dataset",
    domain_tag="my_dataset",
    kernel_spec=r"path/to/example_kernel.py",
)

# Parser path
raw0 = ud.eeg_parser.get_data(0)

# Batch path (kernel is also applied when loading row data in batch methods)
ud.eeg_batch.get_file_hashes(data_stream=True)

6. Channel Type Compatibility#

EEGUnity standard prefixes are lowercase MNE-style (eeg, eog, emg, ecg, meg, stim, misc, bio) and it also accepts explicit MNE channel type strings in locator entries, for example:

  • seeg:LA1

  • ecog:G1

  • dbs:DBS1

  • fnirs_od:S1_D1_760

  • pupil:pupil_left

  • misc:prob_density

  • stim:task_code

Legacy uppercase prefixes (EEG, EOG, EMG, ECG, STIM, Unknown) are accepted for backward compatibility.

7. Extended Interface: Handling Unavailable Files#

EEGUnity marks files as Completeness Check = Unavailable when its built-in parser cannot determine the sampling rate (e.g., headerless CSV files, proprietary binary formats). By default, kernels are not called for Unavailable files.

For datasets where EEGUnity cannot parse the file format at all, a kernel can opt in to build the raw from scratch by implementing the extended interface:

Attribute / Method

Required

Description

HANDLES_UNAVAILABLE = True

yes

Opt-in flag. Must be set to True.

load(self, row) -> BaseRaw | None

yes

Called first for Unavailable files. Build and return a mne.io.RawArray from the raw file. Return None to skip this file.

apply(self, udataset, raw, row)

yes (same as always)

Called after load() completes, with the raw returned by load(). Use this for annotation injection and metadata enrichment — same as for Completed files.

Call sequence for Unavailable files#

kernel.load(row)          →  raw   (format parsing, build RawArray)
kernel.apply(ud, raw, row) →  raw   (enrichment: annotations, description, …)

For Completed files the call sequence is unchanged:

EEGUnity parser           →  raw   (standard MNE loader)
kernel.apply(ud, raw, row) →  raw   (enrichment)

Example: headerless CSV dataset#

from __future__ import annotations
import json
from dataclasses import dataclass

import mne
import numpy as np
import pandas as pd


_SFREQ = 2048.0
_CH_NAMES = ["EEG1", "EEG2"]


@dataclass
class HeaderlessCSVKernel:
    KERNEL_ID: str = "headerless-csv-v1"
    HANDLES_UNAVAILABLE: bool = True   # opt in

    def load(self, row) -> mne.io.BaseRaw | None:
        """Build a RawArray from a headerless CSV file."""
        file_path = row["File Path"]
        if not file_path.endswith(".csv"):
            return None  # skip non-CSV files silently

        # Read EEG columns (0-indexed: columns 1 and 2)
        df = pd.read_csv(file_path, header=None, usecols=[1, 2])
        eeg = df.to_numpy(dtype=float).T          # (n_ch, n_samples)
        info = mne.create_info(_CH_NAMES, sfreq=_SFREQ, ch_types=["eeg", "eeg"])
        return mne.io.RawArray(eeg, info, verbose=False)

    def apply(self, udataset, raw: mne.io.BaseRaw, row) -> mne.io.BaseRaw:
        """Inject metadata and annotations into the loaded raw."""
        raw.info["description"] = json.dumps({
            "eegunity_description": {
                "amplifier": "unknown", "cap": "unknown",
                "age": "unknown", "sex": "unknown", "handedness": "unknown",
            }
        })
        # … add annotations here …
        return raw


KERNEL = HeaderlessCSVKernel()

Backward compatibility#

Kernels that do not set HANDLES_UNAVAILABLE = True are never called for Unavailable files — behaviour is identical to before this interface was added. Existing kernels require no changes.