Note: this document is out-of-date as it documents the runner without an online reduce phase. runner.mk API ------------- runner.mk has a fairly standard API for calling backend processes written in any language that has the following basic I/O facilities: a) - read input from a pipe (worker only) b) - accept at least one command-line argument (worker only) c) - write multiple files to the filesystem (worker only) d) - read from the filesystem (reducer only) e) - write to stdout and stderr (reducer only) f) - read environment variables (reducer only) Note that seeking files is not necessary. This allows languages like awk and sed to process extremely large files in parallel. worker ------ The worker will accept its data input from STDIN, process it, and dump it to the hard-coded report AGGREGATES in runner.mk For the WideFinder 2 project, these are currently: uhits ubytes s404s clients refs The first command-line argument is the $PREFIX of the output files. runner.mk will always use this argument, but it may be omitted when developing or debugging. It is important that you DO NOT use a slash ('/') to join $PREFIX with its partial path. Thus, the output of each report will be as such: (Shell notation) ${PREFIX}uhits ${PREFIX}ubytes ${PREFIX}s404s ${PREFIX}clients ${PREFIX}refs Directory creation is not necessary as runner.mk should create it. $PREFIX as provided by runner.mk is currently the partition name. The actual content of the report files created by depends on what your reducer implementation is capable of reading :) reducer ------- The reducer will read a list of files from the command-line and output only for the report given to it. runner.mk will provide the reducer with all the files it needs to read on the command-line, so no globbing is ever necessary. Thus, a separate reducer runs for each report type: uhits ubytes s404s clients refs The reducer read the files from the command line and output the human-readable report to stdout, and the total count of records read to stderr. It may (and is recommended to) read the "$LABEL" environment variable used to describe the report. Both runner and reducer should be stand-alone processes capable of running in parallel with other processes of the same type.