Fix failing StressTests.failoverOnClusterChangeTest (!139) · Merge requests · eliatra-suite / Eliatra Suite Plugin for OpenSearch

Kacper Trochimiak requested to merge stress-test-failures into main May 25, 2023

closes #135

@nbandener I know you said we could skip it for now, but at this point I already had an idea of what could be causing this problem.

It looks like from time to time the returned watch state may differ, depending on which node handles the request.

Logs from test execution on my local env (test failed, _node and _shard fields added for testing purposes):

last response received in the awaitAssert("Watch did not get assigned a node") which determines that node was assigned to watch:

{"_tenant":"_main","actions":{},"last_execution":null,"last_status":null,"node":"master_2","_node":"NnGNXwoyTa2HeYTfXGBoog","_shard":"0"}

result of GET /_cat/shards/.eliatra_internal__alerting_plus_watches_state?v=true&h=index,shard,prirep,state,docs,store,ip,node,id

index                                          shard prirep state   docs store ip        node   id
.eliatra_internal__alerting_plus_watches_state 0     r      STARTED    1 3.4kb 127.0.0.1 data_1 bwagU_pyRNiK_xdScr_eOg
.eliatra_internal__alerting_plus_watches_state 0     p      STARTED    1 9.6kb 127.0.0.1 data_0 NnGNXwoyTa2HeYTfXGBoog

response received when tests tried to assign name of node to the watchRunsOnNode variable:

{"_tenant":"_main","actions":{},"last_execution":null,"last_status":null,"node":null,"_node":"bwagU_pyRNiK_xdScr_eOg","_shard":"0"}

I also noticed that in case of the WatchStateIndexWriter.putAll method, RefreshPolicy is not set. Should we set it to make the updated watch state visible immediately?

Fix failing StressTests.failoverOnClusterChangeTest

Merge request reports