An alternative option may be based on Spark. A Spark-based project is detailed here.
In case we have corporate policies that do not prevent us from installing software,
the most obvious solution may be based on Ansible,
who guarantees idempotency of task executions and an agentless architecture.
https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_strategies.html
The following Archimate diagram shows when our needs, at the top, meet the tech options, at the bottom, what is adopted and what is ruled-out:
a map-reduce work schedule, assuming 7 map-tasks and 4 nodes:
E-R diagram of the entities used by the orchestrator (GNU make):
NB
sequence diagram of the actions performed by each map task:
NB: again, nothing needs to be deployed on the workers machines, except for the input and output data files.
deployment diagram:
The Makefile that defines the steps and the dependencies of the whole process is available at this link:
https://github.com/a-moscatelli/home/blob/main/am-wiki-assets/mapreducewin/Makefile
Based on the Makefile above, we just run:
set PARALLEL_DEGREE=3
make install
make initialize
make -j %PARALLEL_DEGREE% all
The activity log, with five 1-minute long map jobs and 3 nodes, is as below:
23:24:56.16 EXEC_ initialize
23:25:01.78 BEGIN TASK001.step1.workerAcquisitionLoopFinished.mkcontrol
23:25:01.84 BEGIN TASK002.step1.workerAcquisitionLoopFinished.mkcontrol
23:25:01.86 BEGIN TASK003.step1.workerAcquisitionLoopFinished.mkcontrol
23:25:05.97 END__ TASK001.step1.workerAcquisitionLoopFinished.mkcontrol
23:25:06.05 BEGIN TASK004.step1.workerAcquisitionLoopFinished.mkcontrol
23:25:06.11 END__ TASK002.step1.workerAcquisitionLoopFinished.mkcontrol
23:25:06.13 END__ TASK003.step1.workerAcquisitionLoopFinished.mkcontrol
23:25:06.19 BEGIN TASK005.step1.workerAcquisitionLoopFinished.mkcontrol
23:25:06.28 BEGIN TASK001.step2.inputFilePushedToWorker.mkcontrol HOST003
23:25:06.46 BEGIN TASK002.step2.inputFilePushedToWorker.mkcontrol HOST002
23:25:06.64 BEGIN TASK003.step2.inputFilePushedToWorker.mkcontrol HOST001
23:25:06.87 BEGIN TASK001.step3.submittedToWorker.mkcontrol HOST003
23:25:07.40 BEGIN TASK002.step3.submittedToWorker.mkcontrol HOST002
23:25:07.91 BEGIN TASK003.step3.submittedToWorker.mkcontrol HOST001
23:26:09.53 END__ TASK001.step4.workerCompletionCheckLoopFinished.mkcontrol HOST003
23:26:09.67 END__ TASK002.step4.workerCompletionCheckLoopFinished.mkcontrol HOST002
23:26:09.82 END__ TASK003.step4.workerCompletionCheckLoopFinished.mkcontrol HOST001
23:26:15.17 END__ TASK004.step1.workerAcquisitionLoopFinished.mkcontrol
23:26:15.28 BEGIN TASK004.step2.inputFilePushedToWorker.mkcontrol HOST003
23:26:15.30 END__ TASK005.step1.workerAcquisitionLoopFinished.mkcontrol
23:26:15.44 BEGIN TASK005.step2.inputFilePushedToWorker.mkcontrol HOST002
23:26:15.51 BEGIN TASK004.step3.submittedToWorker.mkcontrol HOST003
23:26:15.65 BEGIN TASK005.step3.submittedToWorker.mkcontrol HOST002
23:27:16.85 END__ TASK004.step4.workerCompletionCheckLoopFinished.mkcontrol HOST003
23:27:16.97 END__ TASK005.step4.workerCompletionCheckLoopFinished.mkcontrol HOST002
23:27:17.00 BEGIN reduce
23:27:17.08 END__ reduce
links:
the semaphore, required to ensure that only one process at a time can acquire a worker node, is based on the success/failure of a file rename.
Such option does not handle well the possibility that a lock holder will not survive the moment it is supposed to release the lock for other purposes.
a solution can be easily implemented using redis:
#docker
image: "redis:6.0.9"
Windows client:
https://github.com/microsoftarchive/redis/releases
example of a working script:
set REDISCLI=.\Redis-x64-3.0.504\redis-cli.exe
set REDISSERVERHOST=DESKTOP-B12345T
set REDIS_CALL=%REDISCLI% -h %REDISSERVERHOST% -p 6379
%REDISCLI% -h %REDISSERVERHOST% -p 6379 PING
rem > PONG
echo %ERRORLEVEL%
rem > 0
set autoexpire_seconds=600
set key=HOST1
set val=TASK4
rem simulation of a lock acquisition"
%REDIS_CALL% SET %key% %val% EX %autoexpire_seconds% NX
rem > OK
rem simulation of a competing concurrent lock acquisition:
%REDIS_CALL% SET %key% %val% EX %autoexpire_seconds% NX
rem > (nil)
%REDIS_CALL% KEYS "*"
rem > 1) "HOST1"
rem simulation of a lock release:
%REDIS_CALL% DEL %key%
rem > (integer) 1
%REDIS_CALL% KEYS "*"
rem > (empty list or set)
%ERRORLEVEL% after each CLI call above is always 0
back to Portfolio