Fix failing StressTests.failoverOnClusterChangeTest
closes #135
@nbandener I know you said we could skip it for now, but at this point I already had an idea of what could be causing this problem.
It looks like from time to time the returned watch state may differ, depending on which node handles the request.
Logs from test execution on my local env (test failed, _node
and _shard
fields added for testing purposes):
last response received in the awaitAssert("Watch did not get assigned a node") which determines that node was assigned to watch:
{"_tenant":"_main","actions":{},"last_execution":null,"last_status":null,"node":"master_2","_node":"NnGNXwoyTa2HeYTfXGBoog","_shard":"0"}
result of GET /_cat/shards/.eliatra_internal__alerting_plus_watches_state?v=true&h=index,shard,prirep,state,docs,store,ip,node,id
index shard prirep state docs store ip node id
.eliatra_internal__alerting_plus_watches_state 0 r STARTED 1 3.4kb 127.0.0.1 data_1 bwagU_pyRNiK_xdScr_eOg
.eliatra_internal__alerting_plus_watches_state 0 p STARTED 1 9.6kb 127.0.0.1 data_0 NnGNXwoyTa2HeYTfXGBoog
response received when tests tried to assign name of node to the watchRunsOnNode variable:
{"_tenant":"_main","actions":{},"last_execution":null,"last_status":null,"node":null,"_node":"bwagU_pyRNiK_xdScr_eOg","_shard":"0"}
I also noticed that in case of the WatchStateIndexWriter.putAll method, RefreshPolicy is not set. Should we set it to make the updated watch state visible immediately?