How to Speed Up EEGUnity with Built-in Parallelism#

1. Introduction#

EEGUnity supports parallel execution through:

Global worker count: UnifiedDataset(..., num_workers=...)
Batch backend selection: batch_process(..., execution_mode=...)

This tutorial explains when to use thread mode, process mode, or sequential mode.

2. Basic Configuration#

from eegunity import UnifiedDataset

ud = UnifiedDataset(
    dataset_path="your_dataset_root",
    domain_tag="your_domain_tag",
    num_workers=8,
)

num_workers=0 means sequential execution.
num_workers>0 enables parallel paths where supported.

3. Execution Modes in `batch_process`#

EEGBatch.batch_process supports:

execution_mode='thread': recommended for I/O-bound tasks
execution_mode='process': recommended for CPU-bound tasks
execution_mode=None: force sequential execution

Example:

results = ud.eeg_batch.batch_process(
    con_func=lambda row: row["Completeness Check"] == "Completed",
    app_func=lambda row: row["File Path"],
    is_patch=False,
    result_type="value",
    execution_mode="thread",
)

4. Where Parallelism Is Commonly Applied#

4.1 Parser Stage#

When loading from dataset_path, parser steps can use num_workers, including:

standard file scanning
MAT/CSV parsing
HDF5 EEGLAB .set fallback parsing
BrainVision .vhdr sidecar fallback parsing
WFDB header parsing

4.2 Batch Stage#

Many heavy eeg_batch methods now choose backend explicitly, for example:

filtering/resampling/normalization pipelines
quality scoring
hash calculation
epoch workflows

Some write-heavy paths stay sequential intentionally to keep output consistency.

5. Choosing Worker Count#

General guidance:

Start at os.cpu_count()
Reduce if memory pressure is high
Increase slightly for storage/network bound workloads
Benchmark on your own dataset

import os
from eegunity import UnifiedDataset

ud = UnifiedDataset(
    dataset_path="your_dataset_root",
    domain_tag="your_domain_tag",
    num_workers=os.cpu_count(),
)

6. Practical Tips#

Use execution_mode='process' for CPU-heavy transforms.
Use execution_mode='thread' for file I/O aggregation.
Set num_workers=0 during debugging for deterministic reproduction.
Avoid nesting your own thread/process pools around EEGUnity batch calls.

7. Summary#

Use num_workers to define available parallel workers and execution_mode to choose the right backend for each workload.