Part 6 of my extensive examination of data structures is now available. Part 6 provides a look at sets. Sets are a construct used to represent an unordered collection of items, and are common in mathematics for describing properties of numbers. Interestingly, programming languages like Pascal (and its derivatives, Oberon and Modula) provided a set construct in the language. In fact, there is even a programming language called SETL that offers sets as first-class citizens in the language, just like strings, integers, and Booleans in C# or VB.NET.
Following a discussion of sets, along with examining how to create a Pascal-like set data structure, the article turns to a more common need - efficiently maintaining a collection of disjoint sets. A collection of sets are said to be disjoint if there are no similar members between the sets. There are many real-world situations where you might need to maintain disjoint sets. For example, consider that someone gave you a list of people in a town, where each townsperson provided their father's name (assume no two people share the same name). You are then tasked with determining who in the town is related to one another. That is, you should be able to efficiently answer the question, “Is Tim related to Ed?” The approach to take here is to group the townspeople into disjoint sets, where each disjoint set represents a family tree. You can then determine if any two individuals are related by determining if they are members in the same set. This article discusses two data structures that can be used to represent disjoint sets, and how their structure affects the efficiency of those operations being performed upon the sets.