distributedR_start() problem
Hello all,
I've installed the 0.5 version of DistributedR on CentOS 6.5.
I have one Master and four Workers installed as VMware VMs.
Starting DistributedR in Single Machine Mode works.
Starting DistributedR in Multiple Machine Mode fails.
The output of R is :
> distributedR_start(cluster_conf="/home/jin/dR/cluster.xml", log=3)Workers registered - 0/4. Wait upto 60 seconds.
Shutdown complete
Error in value[[3L]](cond) : No workers are registered
The Master log is :
2014-Apr-13 10:37:13.029405 [INFOR] Master node is listening at 8989 port.2014-Apr-13 10:37:13.029981 [INFOR] Resource Manager Created
2014-Apr-13 10:37:13.030012 [INFOR] Master Initialization done
2014-Apr-13 10:37:14.306944 [INFOR] Master awaiting HELLO handshaking with Workers.
2014-Apr-13 10:38:14.307571 [ERROR] No workers are registered
2014-Apr-13 10:38:14.310596 [DEBUG] Sending Shutdown message to Workers.
2014-Apr-13 10:38:15.315530 [DEBUG] Killed Master Message Handler
2014-Apr-13 10:38:15.315784 [DEBUG] Killed Scheduler
2014-Apr-13 10:38:15.315899 [INFOR] distributedR shutdown complete.
The Worker log is :
2014-Apr-13 10:37:14.119284 [INFOR] Starting worker.2014-Apr-13 10:37:14.121764 [INFOR] Creating Executors in Worker
2014-Apr-13 10:37:14.122062 [INFOR] Created new Executor 0 with Process ID 2978
2014-Apr-13 10:37:14.122525 [INFOR] Created new Executor 1 with Process ID 2979
2014-Apr-13 10:37:14.123041 [INFOR] Created new Executor 2 with Process ID 2980
2014-Apr-13 10:37:14.123560 [INFOR] Created new Executor 3 with Process ID 2981
2014-Apr-13 10:37:14.129698 [INFOR] Created HandleRequest threads to listen requests from Master
2014-Apr-13 10:37:14.129752 [INFOR] Worker centos2.localdomain:9090 with 4 executors and 1804647628 Shared Memory
2014-Apr-13 10:37:14.176329 [INFOR] Creating a connection for handshake with master centos1:8989
2014-Apr-13 10:37:14.176521 [INFOR] Worker opened connection to Master at centos1:8989
2014-Apr-13 10:37:14.176618 [INFOR] Sending reply with worker info: centos2 9090
2014-Apr-13 10:37:14.176776 [INFOR] HELLO Handshaking reply sent to Master. Master centos1:8989 registered with Worker
2014-Apr-13 10:37:14.228034 [DEBUG] Connected to master at tcp://centos1:8989
2014-Apr-13 10:39:44.195226 [INFOR] Master node is detected to be down. Shutdown worker : elapsed time since last heartbeat: 150
2014-Apr-13 10:39:44.195568 [INFOR] Worker Shutdown triggered.
2014-Apr-13 10:39:44.195850 [DEBUG] Total MB fetched: 0.00 MB
Total fetch time: 0.00 s
Total MB sent: 0.00 MBTotal send time: 0.00 s
Total cc time: 0.00 s
2014-Apr-13 10:39:44.195931 [DEBUG] PrestoWorker shutdown - joining threads
2014-Apr-13 10:39:44.197261 [DEBUG] PrestoWorker shutdown - joining threads for 0:0
2014-Apr-13 10:39:44.197416 [DEBUG] PrestoWorker shutdown - joining threads for 0:1
2014-Apr-13 10:39:44.197485 [DEBUG] PrestoWorker shutdown - joining threads for 0:2
2014-Apr-13 10:39:44.197538 [DEBUG] PrestoWorker shutdown - joining threads for 0:3
2014-Apr-13 10:39:44.197842 [DEBUG] PrestoWorker shutdown - joining threads for 1:0
2014-Apr-13 10:39:44.198115 [DEBUG] PrestoWorker shutdown - joining threads for 2:0
The cluster.xml file is :
<MasterConfig> <ServerInfo>
<Hostname>centos1</Hostname>
<Port>8989</Port>
</ServerInfo>
<Workers>
<Worker>
<Hostname>centos2</Hostname>
<Port>9090</Port>
<Executors>4</Executors>
<SharedMemory>0</SharedMemory>
</Worker>
<Worker>
<Hostname>centos3</Hostname>
<Port>9090</Port>
<Executors>4</Executors>
<SharedMemory>0</SharedMemory>
</Worker>
<Worker>
<Hostname>centos4</Hostname>
<Port>9090</Port>
<Executors>4</Executors>
<SharedMemory>0</SharedMemory>
</Worker>
<Worker>
<Hostname>centos5</Hostname>
<Port>9090</Port>
<Executors>4</Executors>
<SharedMemory>0</SharedMemory>
</Worker>
</Workers>
</MasterConfig>
Thank you for your help,
Jin
I've installed the 0.5 version of DistributedR on CentOS 6.5.
I have one Master and four Workers installed as VMware VMs.
Starting DistributedR in Single Machine Mode works.
Starting DistributedR in Multiple Machine Mode fails.
The output of R is :
> distributedR_start(cluster_conf="/home/jin/dR/cluster.xml", log=3)Workers registered - 0/4. Wait upto 60 seconds.
Shutdown complete
Error in value[[3L]](cond) : No workers are registered
The Master log is :
2014-Apr-13 10:37:13.029405 [INFOR] Master node is listening at 8989 port.2014-Apr-13 10:37:13.029981 [INFOR] Resource Manager Created
2014-Apr-13 10:37:13.030012 [INFOR] Master Initialization done
2014-Apr-13 10:37:14.306944 [INFOR] Master awaiting HELLO handshaking with Workers.
2014-Apr-13 10:38:14.307571 [ERROR] No workers are registered
2014-Apr-13 10:38:14.310596 [DEBUG] Sending Shutdown message to Workers.
2014-Apr-13 10:38:15.315530 [DEBUG] Killed Master Message Handler
2014-Apr-13 10:38:15.315784 [DEBUG] Killed Scheduler
2014-Apr-13 10:38:15.315899 [INFOR] distributedR shutdown complete.
The Worker log is :
2014-Apr-13 10:37:14.119284 [INFOR] Starting worker.2014-Apr-13 10:37:14.121764 [INFOR] Creating Executors in Worker
2014-Apr-13 10:37:14.122062 [INFOR] Created new Executor 0 with Process ID 2978
2014-Apr-13 10:37:14.122525 [INFOR] Created new Executor 1 with Process ID 2979
2014-Apr-13 10:37:14.123041 [INFOR] Created new Executor 2 with Process ID 2980
2014-Apr-13 10:37:14.123560 [INFOR] Created new Executor 3 with Process ID 2981
2014-Apr-13 10:37:14.129698 [INFOR] Created HandleRequest threads to listen requests from Master
2014-Apr-13 10:37:14.129752 [INFOR] Worker centos2.localdomain:9090 with 4 executors and 1804647628 Shared Memory
2014-Apr-13 10:37:14.176329 [INFOR] Creating a connection for handshake with master centos1:8989
2014-Apr-13 10:37:14.176521 [INFOR] Worker opened connection to Master at centos1:8989
2014-Apr-13 10:37:14.176618 [INFOR] Sending reply with worker info: centos2 9090
2014-Apr-13 10:37:14.176776 [INFOR] HELLO Handshaking reply sent to Master. Master centos1:8989 registered with Worker
2014-Apr-13 10:37:14.228034 [DEBUG] Connected to master at tcp://centos1:8989
2014-Apr-13 10:39:44.195226 [INFOR] Master node is detected to be down. Shutdown worker : elapsed time since last heartbeat: 150
2014-Apr-13 10:39:44.195568 [INFOR] Worker Shutdown triggered.
2014-Apr-13 10:39:44.195850 [DEBUG] Total MB fetched: 0.00 MB
Total fetch time: 0.00 s
Total MB sent: 0.00 MBTotal send time: 0.00 s
Total cc time: 0.00 s
2014-Apr-13 10:39:44.195931 [DEBUG] PrestoWorker shutdown - joining threads
2014-Apr-13 10:39:44.197261 [DEBUG] PrestoWorker shutdown - joining threads for 0:0
2014-Apr-13 10:39:44.197416 [DEBUG] PrestoWorker shutdown - joining threads for 0:1
2014-Apr-13 10:39:44.197485 [DEBUG] PrestoWorker shutdown - joining threads for 0:2
2014-Apr-13 10:39:44.197538 [DEBUG] PrestoWorker shutdown - joining threads for 0:3
2014-Apr-13 10:39:44.197842 [DEBUG] PrestoWorker shutdown - joining threads for 1:0
2014-Apr-13 10:39:44.198115 [DEBUG] PrestoWorker shutdown - joining threads for 2:0
The cluster.xml file is :
<MasterConfig> <ServerInfo>
<Hostname>centos1</Hostname>
<Port>8989</Port>
</ServerInfo>
<Workers>
<Worker>
<Hostname>centos2</Hostname>
<Port>9090</Port>
<Executors>4</Executors>
<SharedMemory>0</SharedMemory>
</Worker>
<Worker>
<Hostname>centos3</Hostname>
<Port>9090</Port>
<Executors>4</Executors>
<SharedMemory>0</SharedMemory>
</Worker>
<Worker>
<Hostname>centos4</Hostname>
<Port>9090</Port>
<Executors>4</Executors>
<SharedMemory>0</SharedMemory>
</Worker>
<Worker>
<Hostname>centos5</Hostname>
<Port>9090</Port>
<Executors>4</Executors>
<SharedMemory>0</SharedMemory>
</Worker>
</Workers>
</MasterConfig>
Thank you for your help,
Jin
0
Comments
login first and try to start distributedR in multiple node.
Thank you for your answers.
I have promptless and passwordless access for ssh between any couple of machines, centos1 to centos2.
I have promptless and passwordless access for ssh to 127.0.0.1
I have turned off iptables on all the 5 machines.
I've noticed in the logs that the worker node seems to send its HELLO handshake to the master before the master listens to it.
Best,
Jin
If iptable is disabled, can you please check selinux setup also? Based on the log file, a worker is assumed that it cannot make a connection to a master.