Introduction
In distributed systems, seamless coordination among nodes is essential. The OpenClovis Group Membership Service (GMS) extends the SA Forum Cluster Membership Service (CLM) to manage group memberships, elect leaders, and enable efficient communication. This article explores GMS’s key features and how it enhances system reliability and fault tolerance.
What is the Group Membership Service (GMS)?
Group Membership Service (GMS) is an OpenClovis SA Forum-compliant Cluster Membership Service (CLM). In addition to providing cluster membership services, it also provides leader election services to elect Active System Controllers and Deputies. It extends the SA Forum CLM by providing a generalized group membership service, which enables software processes to form Groups.
Features:
The OpenClovis Group Membership Service (GMS) provides the following services:
- CLM Functionality:
- Cluster Membership Services: A cluster consists of a set of nodes, each with a unique node name. As nodes join and leave the cluster, the cluster membership is modified and maintained. The Cluster Membership Service also allows application processes to register a callback function to receive membership change notifications as those changes occur. All of these functions are available on all nodes in the cluster. If the Cluster Membership Service detects serious communication problems between a node and the remaining cluster, such as intermittent communication or total lack of communication, and the Cluster Membership Service indicates that this node left the cluster through its API functions, the Cluster Membership Service must trigger a fail-fast mechanism on that node, such as a reboot, to isolate the leaving node from the remaining cluster. If a node leaves the cluster via administrative actions directed to the Cluster Membership Service, the Cluster Membership Service must inform its client processes of this membership change through its API functions.
- Additional services of GMS: GMS forms a cluster of nodes that have unique attributes like node name, node ID, etc, and that are running the SAFplus Platform. It does a leader election on the nodes in the cluster and elects a leader and a deputy node, which act as Active System Controller and Standby System Controller for the given SAFplus Platform cluster. It also provides functionality using which the user can keep track of the changes in the cluster such as node joins, node leaves, and leadership changes. Any application or OpenClovis SAFplus Platform service can register with GMS to keep track of the changes in the cluster.
- Multicast Group Functionality: GMS with Intelligent Object Communication (IOC) provides multicast messaging capability. GMS provides APIs for user applications to form a multicast group and use IOC multicast capabilities to send messages to the registered users in a given multicast group. GMS also provides APIs to track the membership of each multicast group for member and non-member applications.
Key APIs:
- Group Management: Group Membership Service (GMS) exposes APIs that enable you to manage multicast groups. Group Management involves:
- Creating/Deleting Groups: You can create a multicast group using the clGmsGroupCreate API. GMS generates and returns a unique group ID. Currently, all GMS groups are IOC multicast groups. GMS allocates a unique IOCMulticastAddress (MA) to each group. You can delete a group using the clGmsGroupDestroy API.
- Joining Groups: A user application or a node can join a multicast group using the clGmsGroupJoin API. By joining a group, a node becomes a member of that group. Components running on different nodes can join a group. Each component is assigned a member ID based on their compId (component ID). A component can receive messages sent to a group, only if it is a member of that group. A component can leave the group using clGmsGroupLeave API. After the component leaves the group, GMS ensures that the component does not receive any messages sent to this group.
- Tracking Groups: Group Membership Service provides processes to retrieve information about the group nodes and the group membership. The clGmsGroupTrack API provides the capability to retrieve information regarding the status and the list of nodes within a group. All members are informed whenever a member leaves or joins the group. A non-member can also keep track of the events in the group such as a node failure or a service failure. You can retrieve information about a group such as IOC MulticastAddress created by GMS, the number of members in a group, and so on using clGmsGetGroupInfo API.
- Updating Groups: If a component or a node fails, GMS deregisters that component or node from the IOCMulticastAddress group. It ensures that the failed components are removed from the group and communicates to all the other members about the status of the group.
- Group Addressing and Multicasting using Intelligent Object Communication (IOC): IOC provides the basic messaging and distribution mechanism to communicate between inter-node and intra-node components. If the destination address is a logical address, IOC sends the message to the corresponding physical address. If the destination address is of a broadcast address type, IOC sends the message to all the existing nodes. IOC, with the TIPC Linux Kernel module, also provides the multicast capability: An application wanting to start a group has to create and join the group using clIocMulticastRegister API. For creating a multicast address the macro CL_IOC_MULTICAST_ADDRESS_FORM should be used. Once the group is created, all the components, which are interested in joining the group can join the group using the same API mentioned above. When a component leaves a group, the IOC disassociates the member with the multicast/group address. Group members and non-members can send IOC multicast messages. A component can send a message to a group with the destination address as the group’s multicast address in clIocSend API.
- Broadcasting using The Remote Method Dispatch (RMD): A broadcast message can be sent to a multicast group synchronously using the clRmdWithMsg API that is part of the RMD library. This API takes the IOC address where the function is exposed as the parameter. RMD also ensures at-most-once delivery to all members of a multicast group.
Conclusion
The OpenClovis GMS ensures high availability in distributed systems through automated leader election, fail-fast mechanisms, and multicast communication. With its robust API set, GMS simplifies group management and failure handling, helping designers system and developers system to build robust, distributed systems with high availability. To discover more details about the APIs and features of GMS.
Other support, please send email to support@openclovis.org.
