Skip to content

Filesystem Walker

Random file system navigator.

Classes:

Name Description
FSWalker

Abstract file system walker.

FSEntry

Lightweight wrapper for os.DirEntry with only path and name.

FSPachinkoPin

Represents a 'pin' on the Pachinko board.

PachinkoFSWalker

Simulates a Pachinko machine.

FSWalker() dataclass

Bases: ABC

Abstract file system walker.

Methods:

Name Description
reset

Reset the walker for a new batch.

walk

Generate candidates for a given directory.

reset() -> None abstractmethod

Reset the walker for a new batch.

walk() -> Iterator[FSEntry] abstractmethod

Generate candidates for a given directory.

FSEntry(path: str, stem: str, ext: str, size: int) dataclass

Lightweight wrapper for os.DirEntry with only path and name.

Methods:

Name Description
from_direntry

Create a lightweight FSEntry from an os.DirEntry.

__hash__

Return the hash based on the file path.

__fspath__

Return the file system path representation.

from_direntry(e: os.DirEntry) -> FSEntry classmethod

Create a lightweight FSEntry from an os.DirEntry.

__hash__() -> int

Return the hash based on the file path.

__fspath__() -> str

Return the file system path representation.

FSPachinkoPin(path: str, subdirs: list[str] = list(), files: list[FSEntry] = list(), is_scanned: bool = False, is_exhausted: bool = False) dataclass

Represents a 'pin' on the Pachinko board.

PachinkoFSWalker(root: str, quota: DiversityQuota, validator: FileValidator, rng: Random, should_follow_symlink: bool, board: dict[str, FSPachinkoPin] = dict()) dataclass

Bases: FSWalker

Simulates a Pachinko machine.

For every file needed, we 'drop' a search cursor from the Root. It bounces randomly down directory paths until it settles on a file.

Methods:

Name Description
__post_init__

Initialize the board with the root pin.

reset

Reset the walker and quota for a new batch.

walk

Continuously drop balls until the board is empty.

drop

Drop a ball from the root.

get_valid_subdirs

Get valid subdirectories for a given pin.

mark_exhausted

Mark a pin and all its subdirs as exhausted.

should_descend

Decide whether to descend into a subdir or select a file.

scan

Only look at the OS file system when a ball hits a specific folder for the first time.

__post_init__() -> None

Initialize the board with the root pin.

reset() -> None

Reset the walker and quota for a new batch.

walk() -> Iterator[FSEntry]

Continuously drop balls until the board is empty.

drop() -> FSEntry | None

Drop a ball from the root.

get_valid_subdirs(pin: FSPachinkoPin) -> list[str]

Get valid subdirectories for a given pin.

mark_exhausted(pin: FSPachinkoPin) -> None

Mark a pin and all its subdirs as exhausted.

should_descend(*, has_subdirs: bool, has_files: bool) -> bool

Decide whether to descend into a subdir or select a file.

scan(pin: FSPachinkoPin) -> None

Only look at the OS file system when a ball hits a specific folder for the first time.