Class NonBlockingCoordinator

  • All Implemented Interfaces:
    ChannelInterceptor, Heartbeat, MembershipListener

    public class NonBlockingCoordinator
    extends ChannelInterceptorBase

    Title: Auto merging leader election algorithm

    Description: Implementation of a simple coordinator algorithm that not only selects a coordinator, it also merges groups automatically when members are discovered that weren't part of the

    This algorithm is non blocking meaning it allows for transactions while the coordination phase is going on

    This implementation is based on a home brewed algorithm that uses the AbsoluteOrder of a membership to pass a token ring of the current membership.
    This is not the same as just using AbsoluteOrder! Consider the following scenario:
    Nodes, A,B,C,D,E on a network, in that priority. AbsoluteOrder will only work if all nodes are receiving pings from all the other nodes. meaning, that node{i} receives pings from node{all}-node{i}
    but the following could happen if a multicast problem occurs. A has members {B,C,D}
    B has members {A,C}
    C has members {D,E}
    D has members {A,B,C,E}
    E has members {A,C,D}
    Because the default Tribes membership implementation, relies on the multicast packets to arrive at all nodes correctly, there is nothing guaranteeing that it will.

    To best explain how this algorithm works, lets take the above example: For simplicity we assume that a send operation is O(1) for all nodes, although this algorithm will work where messages overlap, as they all depend on absolute order
    Scenario 1: A,B,C,D,E all come online at the same time Eval phase, A thinks of itself as leader, B thinks of A as leader, C thinks of itself as leader, D,E think of A as leader
    Token phase:
    (1) A sends out a message X{A-ldr, A-src, mbrs-A,B,C,D} to B where X is the id for the message(and the view)
    (1) C sends out a message Y{C-ldr, C-src, mbrs-C,D,E} to D where Y is the id for the message(and the view)
    (2) B receives X{A-ldr, A-src, mbrs-A,B,C,D}, sends X{A-ldr, A-src, mbrs-A,B,C,D} to C
    (2) D receives Y{C-ldr, C-src, mbrs-C,D,E} D is aware of A,B, sends Y{A-ldr, C-src, mbrs-A,B,C,D,E} to E
    (3) C receives X{A-ldr, A-src, mbrs-A,B,C,D}, sends X{A-ldr, A-src, mbrs-A,B,C,D,E} to D
    (3) E receives Y{A-ldr, C-src, mbrs-A,B,C,D,E} sends Y{A-ldr, C-src, mbrs-A,B,C,D,E} to A
    (4) D receives X{A-ldr, A-src, mbrs-A,B,C,D,E} sends sends X{A-ldr, A-src, mbrs-A,B,C,D,E} to A
    (4) A receives Y{A-ldr, C-src, mbrs-A,B,C,D,E}, holds the message, add E to its list of members
    (5) A receives X{A-ldr, A-src, mbrs-A,B,C,D,E}
    At this point, the state looks like
    A - {A-ldr, mbrs-A,B,C,D,E, id=X}
    B - {A-ldr, mbrs-A,B,C,D, id=X}
    C - {A-ldr, mbrs-A,B,C,D,E, id=X}
    D - {A-ldr, mbrs-A,B,C,D,E, id=X}
    E - {A-ldr, mbrs-A,B,C,D,E, id=Y}

    A message doesn't stop until it reaches its original sender, unless its dropped by a higher leader. As you can see, E still thinks the viewId=Y, which is not correct. But at this point we have arrived at the same membership and all nodes are informed of each other.
    To synchronize the rest we simply perform the following check at A when A receives X:
    Original X{A-ldr, A-src, mbrs-A,B,C,D} == Arrived X{A-ldr, A-src, mbrs-A,B,C,D,E}
    Since the condition is false, A, will resend the token, and A sends X{A-ldr, A-src, mbrs-A,B,C,D,E} to B When A receives X again, the token is complete.
    Optionally, A can send a message X{A-ldr, A-src, mbrs-A,B,C,D,E confirmed} to A,B,C,D,E who then install and accept the view.

    Lets assume that C1 arrives, C1 has lower priority than C, but higher priority than D.
    Lets also assume that C1 sees the following view {B,D,E}
    C1 waits for a token to arrive. When the token arrives, the same scenario as above will happen.
    In the scenario where C1 sees {D,E} and A,B,C cannot see C1, no token will ever arrive.
    In this case, C1 sends a Z{C1-ldr, C1-src, mbrs-C1,D,E} to D
    D receives Z{C1-ldr, C1-src, mbrs-C1,D,E} and sends Z{A-ldr, C1-src, mbrs-A,B,C,C1,D,E} to E
    E receives Z{A-ldr, C1-src, mbrs-A,B,C,C1,D,E} and sends it to A
    A sends Z{A-ldr, A-src, mbrs-A,B,C,C1,D,E} to B and the chain continues until A receives the token again. At that time A optionally sends out Z{A-ldr, A-src, mbrs-A,B,C,C1,D,E, confirmed} to A,B,C,C1,D,E

    To ensure that the view gets implemented at all nodes at the same time, A will send out a VIEW_CONF message, this is the 'confirmed' message that is optional above.

    Ideally, the interceptor below this one would be the TcpFailureDetector to ensure correct memberships

    The example above, of course can be simplified with a finite statemachine:
    But I suck at writing state machines, my head gets all confused. One day I will document this algorithm though.
    Maybe I'll do a state diagram :)

    State Diagrams

    Initiate an election

    Receive an election message

    Version:
    1.0