How to Speed Up EEGUnity with Built-in Parallelism#
1. Introduction#
EEGUnity supports parallel execution through:
Global worker count:
UnifiedDataset(..., num_workers=...)Batch backend selection:
batch_process(..., execution_mode=...)
This tutorial explains when to use thread mode, process mode, or sequential mode.
2. Basic Configuration#
from eegunity import UnifiedDataset
ud = UnifiedDataset(
dataset_path="your_dataset_root",
domain_tag="your_domain_tag",
num_workers=8,
)
num_workers=0means sequential execution.num_workers>0enables parallel paths where supported.
3. Execution Modes in batch_process#
EEGBatch.batch_process supports:
execution_mode='thread': recommended for I/O-bound tasksexecution_mode='process': recommended for CPU-bound tasksexecution_mode=None: force sequential execution
Example:
results = ud.eeg_batch.batch_process(
con_func=lambda row: row["Completeness Check"] == "Completed",
app_func=lambda row: row["File Path"],
is_patch=False,
result_type="value",
execution_mode="thread",
)
4. Where Parallelism Is Commonly Applied#
4.1 Parser Stage#
When loading from dataset_path, parser steps can use num_workers, including:
standard file scanning
MAT/CSV parsing
HDF5 EEGLAB
.setfallback parsingBrainVision
.vhdrsidecar fallback parsingWFDB header parsing
4.2 Batch Stage#
Many heavy eeg_batch methods now choose backend explicitly, for example:
filtering/resampling/normalization pipelines
quality scoring
hash calculation
epoch workflows
Some write-heavy paths stay sequential intentionally to keep output consistency.
5. Choosing Worker Count#
General guidance:
Start at
os.cpu_count()Reduce if memory pressure is high
Increase slightly for storage/network bound workloads
Benchmark on your own dataset
import os
from eegunity import UnifiedDataset
ud = UnifiedDataset(
dataset_path="your_dataset_root",
domain_tag="your_domain_tag",
num_workers=os.cpu_count(),
)
6. Practical Tips#
Use
execution_mode='process'for CPU-heavy transforms.Use
execution_mode='thread'for file I/O aggregation.Set
num_workers=0during debugging for deterministic reproduction.Avoid nesting your own thread/process pools around EEGUnity batch calls.
7. Summary#
Use num_workers to define available parallel workers and execution_mode to choose the right backend for each workload.