EEGUnity Kernels: Dataset-Specific In-Memory Preprocessing#

1. Introduction#

EEGUnity provides a unified interface for parsing, preprocessing, and managing EEG datasets. However, many public datasets are not fully standardized:

Event markers may be stored in separate .mat, .tsv, or .csv files.
Subject metadata may exist in independent tables.
Channel naming conventions may vary across releases.
Folder structures may differ between mirrors or versions.

To address this variability without duplicating EEG data, EEGUnity introduces the concept of external kernels.

A kernel is a dataset-specific, in-memory preprocessing plugin that runs automatically when data is read.

2. Why Use Kernels?#

Traditional workflows often:

Load raw data
Run dataset-specific preprocessing scripts
Export a new standardized dataset copy

This approach duplicates EEG arrays and complicates maintenance.

Kernels solve this by:

Running at read time
Updating mne.io.Raw objects in memory
Attaching metadata and annotations dynamically
Leaving the original dataset untouched

3. How Kernels Work#

When binding a kernel:

from eegunity import UnifiedDataset

ud = UnifiedDataset(
    dataset_path="/data/openneuro/ds005505",
    domain_tag="openneuro_ds005505",
    kernel_spec="/abs/path/openneuro_ds005505_kernel"
)

raw = ud.eeg_parser.get_data(0)

Internally:

EEGUnity loads the Raw object.
The external kernel is loaded dynamically.
The system calls:

kernel.apply(udataset, raw, row)

If the kernel fails, EEGUnity emits a warning and returns the unmodified raw.

4. Kernel File Requirements#

Each kernel file must:

Be a single Python module
Define exactly one object named:

KERNEL = YourKernelClass()

Implement:

apply(udataset, raw, row) -> raw

One file equals one kernel. No suffix such as :KERNEL is required.

Valid kernel specifications:

File path (extension optional): “/abs/path/figshare_largemi_kernel”
Module import path: “my_private_kernels.figshare_largemi_kernel”

5. Recommended Naming Convention#

Kernel names should reflect the dataset source:

figshare_xxxx
openneuro_ds005505
kaggle_xxxx
bcic_iv_2a

Inside the kernel class:

KERNEL_ID = "figshare_largemi"

6. Kernel Interface Specification#

Required structure:

class SomeKernel:
    def apply(self, udataset, raw, row):
        ...
        return raw

KERNEL = SomeKernel()

Parameters:

udataset: dataset-level context
raw: loaded MNE Raw object
row: locator row (contains “File Path”)

Return the modified raw object.

7. Determining Dataset Root#

If instantiated with dataset_path, use it directly.

If instantiated with locator_path only:

Use udataset.get_shared_attr()[“dataset_path”] if available.
Otherwise compute common minimal prefix of all File Path entries.
Fallback to directory of row[“File Path”].

8. Writing Robust Kernels#

To support dataset variants:

Avoid hardcoded paths
Search recursively for participants or event files
Tolerate alternate column names
Handle missing metadata gracefully
Avoid assuming fixed folder structures

Focus on robust logic. EEGUnity handles exception safety.

9. Minimal Kernel Template#

from __future__ import annotations
import json
from dataclasses import dataclass
import mne

@dataclass
class ExampleKernel:
    KERNEL_ID: str = "source_name"

    def apply(self, udataset, raw: mne.io.BaseRaw, row) -> mne.io.BaseRaw:
        description_dict = {
            "original_description": raw.info.get("description", ""),
            "eegunity_description": {
                "source_name": self.KERNEL_ID
            },
        }
        raw.info["description"] = json.dumps(description_dict)
        return raw

KERNEL = ExampleKernel()

10. Summary#

Kernels allow EEGUnity to remain lightweight, avoid licensing issues, and support diverse datasets through dynamic, in-memory preprocessing.