Documentation Index
Fetch the complete documentation index at: https://mintlify.com/alex-ber/AlexBerUtils/llms.txt
Use this file to discover all available pages before exploring further.
The alexber.utils.files module provides helpers for working with files produced by concurrent workers.
join_files(f)
Joins multiple partial files into a single output file.
When multiple threads or processes write output to separate files (for example report_0.csv, report_1.csv, report_2.csv), join_files collects all files that match the pattern <stem>_*<suffix> in the same directory as f and concatenates them into f.
from alexber.utils.files import join_files
from pathlib import Path
# Partial files produced by workers:
# output/results_0.txt
# output/results_1.txt
# output/results_2.txt
join_files(Path('output/results.txt'))
# output/results.txt now contains the concatenated content of all three files
| Parameter | Type | Description |
|---|
f | str or Path | The path to the output file. Also serves as the glob pattern anchor: all files in the same directory whose name matches <stem>_*<suffix> are joined into this file. |
The output file is always opened in write mode ('w'), so any pre-existing content is overwritten. The partial files are not deleted after joining.
Usage example
The following example shows a typical producer/consumer pattern where each worker writes to its own numbered file, and the main thread joins them at the end.
Workers write to separate partial files
import threading
from pathlib import Path
OUTPUT = Path('output/report.csv')
OUTPUT.parent.mkdir(parents=True, exist_ok=True)
def worker(index, rows):
partial = OUTPUT.parent / f"{OUTPUT.stem}_{index}{OUTPUT.suffix}"
with open(partial, 'w') as fh:
for row in rows:
fh.write(','.join(row) + '\n')
threads = [
threading.Thread(target=worker, args=(i, chunk))
for i, chunk in enumerate(data_chunks)
]
for t in threads:
t.start()
for t in threads:
t.join()
Join the partial files into the final output
from alexber.utils.files import join_files
join_files(OUTPUT)
# output/report.csv now contains all rows from all workers
The glob pattern used internally is {stem}_*{suffix}, so partial files must follow the naming convention <base>_<anything><ext> — for example report_0.csv, report_worker-a.csv, or report_2024-01-01.csv.