Launching the RFI-File-Monitor
Windows
When installing the Monitor using the Anaconda package manager, you should see an entry for it in the Start menu. Click it to launch the program.
Linux and macOS
Assuming you installed the Monitor using Anaconda, you will now need to activate the conda environment:
conda activate rfi-file-monitor
rfi-file-monitor
Do not attempt to bypass activating the environment by using the absolute path to the rfi-file-monitor
executable: this will result in GLib related run-time errors!
Understanding the RFI-File-Monitor
Before going into a step-by-step guide on how to use this software package, it is probably best to share some of the concepts and terms it is built around to avoid any misunderstandings later on.
The engine is the main driver of activity: it monitors a provider for two kinds of changes:
- Newly created files
- Modified files
The provider can be a directory on the local filesystem, an S3 bucket, a file, …, depending on which type of engine was selected for the session.
Whenever a new file has been detected by the engine, a corresponding File
object will be instantiated with status set to Created
. This object is then added to the queue manager that is associated with the session.
File
s with Created
status are effectively ignored by the queue manager until the engine detects that they have been saved or modified: at this point will the queue manager be instructed to change File
status to Saved
, which is when things become interesting! After a configurable of time, and assuming no further modifications were detected for this file, will the File
be queued for processing, corresponding to a change of status to Queued
.
At this point will the queue manager check if the necessary resources to start the processing pipeline for the File
are available: one thread is used per pipeline, with the total number of usable threads configurable in the queue manager settings. If the outcome of the check is positive, the pipeline will be started and the File
status will be set to Running
.
Processing pipelines consist of a list of operations, that is defined by the user. Each operation works directly on the File
itself, not on any new files that may have been produced by a preceding operation! It is possible though, to pass information (metadata) down a File
’s pipeline to influence the behavior of subsequent operations. In fact, one can make an operation explicitly depend on another one, producing a run-time error when this condition is not met.
The File
status will be set to Success
or Failure
at the end of the pipeline, depending on its outcome. It should be noted that these statuses do not necessarily represent the final state for the File
object. Indeed, it is quite possible that the engine detects more modifications of the observed file, leading to its status being reset to Saved
, which will eventually lead to the File
being requeued for processing…
This section can be visualized using the following flowchart:
File types
The File
objects that were first used in the previous section are in fact instances of a base class with the same name. Several implementations of this base class exist in this software package, and it should not be hard to write your own. Important here is that the objects that are created by a particular engine, are always instances of the same class, derived from File
, the exported filetype. Operations however, can be written to support as many filetypes as necessary.
The following table summarizes the currently available file types, as well as the engines and operations that support them:
Filetype | Supporting Engines | Supporting Operations |
---|---|---|
RegularFile | Files Monitor, Temporary File Generator | Dropbox Uploader, Dummy operation, Local copier, S3 Uploader, SciCataloguer, SFTP Uploader |
Directory | Directories Monitor | Directory Compressor, Dropbox Uploader, Dummy operation, Local copier, S3 Uploader, SciCataloguer, SFTP Uploader |
URL | URLs from File Loader | Dummy operation, URL Downloader |
S3Object | AWS S3 Bucket Monitor, Ceph S3 Bucket Monitor | Dummy operation, S3 Copier, S3 Downloader |