File utilities - AlexBerUtils

The alexber.utils.files module provides helpers for working with files produced by concurrent workers.

`join_files(f)`

Joins multiple partial files into a single output file. When multiple threads or processes write output to separate files (for example report_0.csv, report_1.csv, report_2.csv), join_files collects all files that match the pattern <stem>_*<suffix> in the same directory as f and concatenates them into f.

from alexber.utils.files import join_files
from pathlib import Path

# Partial files produced by workers:
#   output/results_0.txt
#   output/results_1.txt
#   output/results_2.txt

join_files(Path('output/results.txt'))

# output/results.txt now contains the concatenated content of all three files

Parameter	Type	Description
`f`	`str` or `Path`	The path to the output file. Also serves as the glob pattern anchor: all files in the same directory whose name matches `<stem>_*<suffix>` are joined into this file.

The output file is always opened in write mode ('w'), so any pre-existing content is overwritten. The partial files are not deleted after joining.

Usage example

The following example shows a typical producer/consumer pattern where each worker writes to its own numbered file, and the main thread joins them at the end.

Workers write to separate partial files

import threading
from pathlib import Path

OUTPUT = Path('output/report.csv')
OUTPUT.parent.mkdir(parents=True, exist_ok=True)

def worker(index, rows):
    partial = OUTPUT.parent / f"{OUTPUT.stem}_{index}{OUTPUT.suffix}"
    with open(partial, 'w') as fh:
        for row in rows:
            fh.write(','.join(row) + '\n')

threads = [
    threading.Thread(target=worker, args=(i, chunk))
    for i, chunk in enumerate(data_chunks)
]
for t in threads:
    t.start()
for t in threads:
    t.join()

Join the partial files into the final output

from alexber.utils.files import join_files

join_files(OUTPUT)
# output/report.csv now contains all rows from all workers

The glob pattern used internally is {stem}_*{suffix}, so partial files must follow the naming convention <base>_<anything><ext> — for example report_0.csv, report_worker-a.csv, or report_2024-01-01.csv.

Documentation Index

​join_files(f)

​Usage example

`join_files(f)`

Usage example