Nowcast Configuration File
TODO
Logging Configuration
TODO
# Logging system configuration
logging:
version: 1
disable_existing_loggers: False
formatters:
simple:
format: '%(asctime)s %(levelname)s [%(name)s] %(message)s'
handlers:
console:
class: logging.StreamHandler
level: DEBUG
formatter: simple
stream: ext://sys.stdout
root:
level: DEBUG
handlers:
- console
Rotating Log Files and Long-running Processes
All logging handlers that are configured to use logging.handlers.RotatingFileHandler
receive special processing during the logging setup in the long-running manager
,
message_broker
,
and scheduler
processes.
In those processes,
the logging.handlers.RotatingFileHandler
is replaced by a logging.handlers.WatchedFileHandler
.
That enables those processes to detect when the nemo_nowcast.workers.rotate_logs
worker rotates the log files so that they start writing to the new log files.
Distributed Logging
Distributed logging is intended for use in nowcast systems that have workers running on different hosts than the manager. For example, all of the pre- and post-processing workers run on one machine but the NEMO runs are executed on a different computer server or cloud platform. In such a system, all of the elements (message broker, manager, workers, scheduler) publish their log messages to network sockets. The Log Aggregator process subscribes to those sockets and processes the log messages as they are received.
Here is an example logging configuration for distributed logging:
# Distributed logging system configuration
logging:
aggregator:
version: 1
disable_existing_loggers: False
formatters:
simple:
format: '%(asctime)s %(levelname)s [%(logger_name)s] %(message)s'
handlers:
info_text:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: simple
filename: $(NOWCAST.ENV.NOWCAST_LOGS)/nowcast.log
backupCount: 7
debug_text:
class: logging.handlers.RotatingFileHandler
level: DEBUG
formatter: simple
filename: $(NOWCAST.ENV.NOWCAST_LOGS)/nowcast.debug.log
backupCount: 7
root:
level: DEBUG
handlers:
- info_text
- debug_text
publisher:
version: 1
disable_existing_loggers: False
formatters:
simple:
format: '%(asctime)s %(levelname)s [%(name)s] %(message)s'
handlers:
console:
class: logging.StreamHandler
# Level 100 disables console logging.
# Use worker --debug flag to enable console logging.
level: 100
formatter: simple
stream: ext://sys.stdout
zmq_pub:
class: zmq.log.handlers.PUBHandler
level: DEBUG
formatter: simple
root:
level: DEBUG
handlers:
- console
- zmq_pub
The aggregator section provides the logging configuration that is used by the Log Aggregator,
typically to write log files on disk.
The publisher section provides the logging configuration that is used by all the of the other elements of the nowcast system.
Those elements publish log messages on network ports defined in the zmq section of the config file
(see below).
The log aggregator subscribes to all of those ports.
The aggregator and publisher sections are structured so that they can be read as Python dict
objects that obey the Configuration dictionary schema defined in the Python logging
module.
Important things to note in the aggregator section:
The use of %(logger_name)s in the format string. This is done so that the name of the procees that published the log message will appear instead of log_aggregator which is what happens if %(name)s is used.
The use of
logging.handlers.RotatingFileHandler
logging handlers with backupCount values set so that the log files don’t grow without limit. Use the rotate_logs Worker to trigger rotation of the log files at an appropriate point in the daily automation cycle.The use of $(NOWCAST.ENV.NOWCAST_LOGS) in the log filename paths. Doing so allows the directory in which the log files are stored to be defined in the
NOWCAST_LOGS
environment variable. That avoids having to hard code the log files directory path in multiple places in both the Nowcast Configuration File and the supervisord configuration file (see Nowcast Process Management) and risking the two getting out of sync.
In the publisher section,
note that the logging handler used to publish log messages to the network sockets is zmq.log.handlers.PUBHandler
.
The network ports that the logging sockets are bound to are defined in the zmq section of the config file:
# Message system
zmq:
host: localhost
ports:
# traffic between manager and message broker
manager: 4343
# traffic between workers and message broker
workers: 4344
# pub/sub logging traffic for log aggregator
logging:
message_broker: 4345
manager: 4346
scheduler: 4347
workers: [4350, 4351, 4352]
# **host:port pairs in lists must be quoted to project : characters**
make_live_ocean_files: 'salish.eos.ubc.ca:4357'
run_NEMO: ['salish.eos.ubc.ca:4354', '210.15.47.113:4354']
watch_NEMO:
- 'salish.eos.ubc.ca:4356'
- '210.15.47.113:4356'
In this example the message broker, manager, scheduler, and most workers run on the local host, but the make_live_ocean_files worker runs on a remote host, salish.eos.ubc.ca, and the run_NEMO and watch_NEMO workers run on 2 different remote hosts, salish.eos.ubc.ca, and 210.15.47.113. Note that the instances of the run_NEMO and watch_NEMO workers must use the same port numbers.
The run_NEMO and watch_NEMO keys show 2 different YAML syntaxes for lists.
Each process that publishes log messages must do so on a unique network port.
The value associated with the workers key is a list of ports for workers running on the local host to use.
There should be enough ports in the list to ensure that all workers that run concurrently are able to find a port;
a nemo_nowcast.worker.WorkerError
exception will be raised if all of the ports in the list are found to be in use when a worker starts up.
Note
It is necessary to ensure that the appropriate firewall rules are in place to allow traffic to pass between the machines on which remote workers are running and the machine that hosts the log aggregator via the logging port(s).
Since manager/worker communication, and distributed logging all use ZeroMQ ports, it is crucial to ensure that all port numbers used are unique.
System State Checklist Logging
The system state checklist maintained by the Manager is written to disk as serialized YAML every time it is updated in a file given by the checklist file configuration key.
By convention,
that file is $NOWCAST_LOGS/nowcast_checklist.yaml
.
It is also possible to add logging configuration to the system so that the checklist is logged to another file just before it is cleared by the clear_checklist Worker. Doing so preserves the checklist from previous days operations. To enable checklist logging it is necessary to add a checklist logging handler to the logging configuration, and to register a logger for the checklist.
For systems that use local filesystem logging, that is accomplished by adding a checklist section to the logging: handlers: configuration section:
logging:
...
handlers:
...
checklist:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: simple
filename: $(NOWCAST.ENV.NOWCAST_LOGS)/checklist.log
backupCount: 7
The checklist logger is registered by adding a logging: loggers: checklist: section:
logging:
...
loggers:
checklist:
qualname: checklist
level: INFO
propagate: False
handlers:
- checklist
These examples set up a RotatingFileHandler
for the checklist that writes it to the $NOWCAST_LOGS/checklist.log
file and retains the previous 7 versions of that file when the log files are rotated.
For systems that use Distributed Logging, similar configuration sections are required, but they are added to the logging: publisher: configuration:
logging:
...
publisher:
...
handlers:
...
checklist:
class: logging.handlers.RotatingFileHandler
level: INFO
formatter: simple
filename: $(NOWCAST.ENV.NOWCAST_LOGS)/checklist.log
backupCount: 7
...
loggers:
checklist:
qualname: checklist
level: INFO
propagate: False
handlers:
- checklist
ZeroMQ Server and Ports
TODO
# Message system
zmq:
server: localhost
ports:
# traffic between manager and message broker
manager: 4343
# traffic between workers and message broker
workers: 4344
Message Registry
TODO
message registry:
# Message types that the manager process can send and their meanings
# Don't change this section without making corresponding changes in
# the nemo_nowcast.manager module of the NEMO_Nowcast package.
manager:
ack: message acknowledged
unregistered worker: ERROR - message received from unregistered worker
unregistered message type: ERROR - unregistered message type received from worker
no after_worker function: ERROR - after_worker function not found in next_workers module
# Module from which to load :py:func:`after_<worker_name>` functions
# that provide lists of workers to launch when :kbd:`worker_name` finishes
next workers module: nowcast.next_workers
workers:
# Worker module name
sleep:
# The key in the system checklist that the manager maintains that is to
# be used to hold message payload information provided by the
# :kbd:`example` worker
checklist key: sleepyhead
# Message types that the :kbd:`example` worker can send and their meanings
success: sleep worker slept well
failure: sleep worker slept badly
crash: sleep worker crashed
awaken:
checklist key: sleepyhead
success: awaken worker awoke - where's the coffee?
failure: awaken worker failed to awake
crash: awaken worker crashed
Most messages are handled by the Manager by passing them to the after_worker_name()
function in the next_workers
module given by the next workers module key.
For example,
when the manager receives a message with the type success from the sleep
worker it calls the nowcast.next_workers.after_sleep()
function with the message.
Special Message Types
There are several special message types that are handled differently by the manager:
The clear checklist message that is sent by the
nemo_nowcast.workers.clear_checklist
worker causes the system state checklist to be written to a log file, then clears it. Theclear_checklist
worker is typically run once per nowcast cycle (e.g. daily) at the end of processing, just before rotating the log files via thenemo_nowcast.workers.rotate_logs
worker. The log file that the checklist is written to is given by the handlers.checklist.filename key in the Logging Configuration section of the config file. The checklist is written as a pretty-printed representation of a Python dictionary.A need message is expected to have a system state checklist key as its payload. The manager handles need messages by returning an ack message with the requested section of the checklist as its payload.
Scheduled Workers
The scheduled workers section is an optional configuration section that is used to specify a list of workers that the Scheduler should launch, when to launch them, and what command-line options (if any) to use for the launches. The period between system clock checks that the scheduler uses is hard-coded to 60 seconds.
Note
Scheduled launching of workers is intended for use only in special cases in which a worker’s launch time depends on factors outside of the nowcast system (such as the availability of atmospheric forcing model product files).
The first choice for launching workers should be by the manager process in response to system state events (via the Next Workers Module).
Example scheduled workers configuration section:
# Workers scheduled to run at specific times
scheduled workers:
# Worker module name (fully qualified, dotted notation)
- nowcast.workers.download_weather:
# Time period for worker launch repetition
every: day
# Time at which to launch the worker
# (quotes are required to ensure that time is interpreted as a string)
at: '05:15'
# Optional command-line options for the worker
# (quotes are necessary to force interpretation as a string)
cmd line opts: '12'
Example Nowcast Configuration File
Here is the complete example nowcast configuration YAML file that is discussed in the sections above:
# Example system configuration file for a NEMO_Nowcast framework system
# System status checklist file
checklist file: $(NOWCAST.ENV.NOWCAST_LOGS)/nowcast_checklist.yaml
# Python interpreter in environment with all dependencies installed
# Used to launch workers
python: $(NOWCAST.ENV.NOWCAST_ENV)/bin/python
# Logging system configuration
logging:
version: 1
disable_existing_loggers: False
formatters:
simple:
format: '%(asctime)s %(levelname)s [%(name)s] %(message)s'
handlers:
console:
class: logging.StreamHandler
level: DEBUG
formatter: simple
stream: ext://sys.stdout
root:
level: DEBUG
handlers:
- console
# Message system
zmq:
host: localhost
ports:
# traffic between manager and message broker
manager: 4343
# traffic between workers and message broker
workers: 4344
message registry:
# Message types that the manager process can send and their meanings
# Don't change this section without making corresponding changes in
# the nemo_nowcast.manager module of the NEMO_Nowcast package.
manager:
ack: message acknowledged
checklist cleared: system checklist cleared
unregistered worker: ERROR - message received from unregistered worker
unregistered message type: ERROR - unregistered message type received from worker
no after_worker function: ERROR - after_worker function not found in next_workers module
# Module from which to load :py:func:`after_<worker_name>` functions
# that provide lists of workers to launch when :kbd:`worker_name` finishes
next workers module: nowcast.next_workers
workers:
# Worker module name
sleep:
# The key in the system checklist that the manager maintains that is to
# be used to hold message payload information provided by the
# :kbd:`example` worker
checklist key: sleepyhead
# Message types that the :kbd:`example` worker can send and their meanings
success: sleep worker slept well
failure: sleep worker slept badly
crash: sleep worker crashed