Difference of Sets – Python

In Python, Sets are built-in, unordered, mutable, iterable collection datatype that can store heterogeneous elements. Set does not allow to store duplicate elements in it. Set in Python supports mathematical operations like union, intersection, difference, and symmetric difference, subset and super-set.

In this article, we will learn about difference operation on Sets in Python.

Difference of Sets

Difference of Sets refers to the operation to get difference between two sets. Difference of Sets is a fundamental concept in mathematics Set theory and in programming. Difference of Set will return a new resulting set. The result-set of the difference operation contains only Set_1 elements which are not present in Set_2.

Mathematical Definition

In Mathematics, Difference of Set is denoted as A − B or A ∖ B. This operation results in a new set containing all the elements that are in set A but not in set B. Lets see an example:

A = {10, 20, 30, 40} and B = {30, 40, 50, 60}

The difference A − B: A − B = {10, 20}. This is because 10 and 20 are in Set A but not in Set B.

Reverse operation, the difference B − A is: B − A = {50, 60}. This is because 50 and 60 are in Set B but not in Set A.

A = {10, 20, 30, 40}

Set A

B = {30, 40, 50, 60}

Set B

A − B = {10, 20}

A Difference B

B − A = {50, 60}

Properties for Difference of Sets

  • Difference of two sets A and B (denoted by A − B) consists of all elements that are in Set A but not in Set B.
  • Difference operation between two sets is not commutative.
  • Non-commutative: A − B ≠ B − A
  • Identity: A − ∅ = A. It means, difference between set A and an empty set will result in Set A.
  • Empty Set: ∅ − A = ∅. It means, difference between an empty set and Set A will result in empty set.

Python provides multiple built-in method to perform Difference operation between sets.

Difference of Sets – using difference_update() method

  • difference_update() method does not return any result-set
  • Returnset_1.difference_update(set_2) operation will update the set_1 with only elements which are in set_1 but not in set_2.
# using difference_update() method
print("using difference_update() method")

set_1 = {'Python','AWS','ML','Java','Azure'}
set_2 = {'Python', 'AWS', 'NumPy', 'Pandas', 'Matplotlib'}

set_difference = set_1.difference_update(set_2)    # set_1 difference set_2

print(set_1)
print(set_2)
print(set_difference)

we have defined two sets referenced by variables ‘set_1’ and ‘set_2’. We are using built-in difference_update() method on first set ‘set_1’ and second set is passed as an argument to difference_update() method. difference_update() method will return None i.e. it will not return any result-set with difference of elements. ‘set_1’ will be updated with only elements which are in set_1 but not in set_2. Returned value assigned to a variable ‘set_difference’ should be None. At end we are printing all the sets.

Program Output

using difference_update() method
{'Azure', 'Java', 'ML'}                 # set_1 updated with difference of elements
{'Python', 'Pandas', 'Matplotlib', 'NumPy', 'AWS'}
None          # None returned

From the program output, ‘set_1’ elements are updated with the result of difference operation. Now ‘set_1’ contains only three elements, which were in set_1 but not in set_2. difference_update() method did not return any value, so None was printed.

Difference of Sets – using difference() method

  • Syntax – set_1.difference(set_2), will return the result-set with elements which are in set_1 but not in set_2
  • We can perform reverse operation as well. set_2.difference(set_1), will return the result-set with elements which are in set_2 but not in set_1.
# using difference() method
print("using difference() method")

set_1 = {'Python','AWS','ML','Java','Azure'}
set_2 = {'Python', 'AWS', 'NumPy', 'Pandas', 'Matplotlib'}

set1_difference_set2 = set_1.difference(set_2)       # set_1 difference set_2
set2_difference_set1 = set_2.difference(set_1)       # set_2 difference set_1

print(set_1)
print(set_2)
print(set1_difference_set2)
print(set2_difference_set1)

In this program, we have defined two sets referenced by variables ‘set_1’ and ‘set_2’. We are using built-in difference() method on first set ‘set_1 and second set ‘set_2’ is passed as an argument to difference() method. difference() method will return the result-set ‘set1_difference_set2’ with elements which are in set_1 but not in set_2.

Program Output

using difference() method
{'ML', 'Java', 'Python', 'AWS', 'Azure'}                          # original set_1
{'Python', 'AWS', 'Pandas', 'Matplotlib', 'NumPy'}    # original set_2
{'ML', 'Java', 'Azure'}                           # result-set set1_difference_set2 = set_1.difference(set_2)
{'Matplotlib', 'NumPy', 'Pandas'}      # result-set set2_difference_set1 = set_2.difference(set_1)

From the program output, there is no change in the elements of ‘set_1’ and ‘set_2’. After the difference operation, we got new result sets for both operations. Result set contains elements which were present in first set but not in second set.

Difference of Sets – using negative (-) operator

  • Syntax – set_1 - set_2
# using negative (-) operator
print("using negative (-) operator")

set_1 = {'Python','AWS','ML','Java','Azure'}
set_2 = {'Python', 'AWS', 'NumPy', 'Pandas', 'Matplotlib'}

set1_difference_set2 = set_1 - set_2                 # set_1 difference set_2
set2_difference_set1 = set_2 - set_1                 # set_2 difference set_1

print(set_1)
print(set_2)
print(set1_difference_set2)
print(set2_difference_set1)

We have defined two sets referenced by variables ‘set_1’ and ‘set_2’.

In the first case – we are using negative (-) operator with ‘set_1’ and ‘set_2’. So, we got elements which are in set_1 but not in set_2.

In the second case – we are using negative (-) operator with ‘set_2’ and ‘set_1’. So, we got elements which are in set_2 but not in set_1.

Program Output

using negative (-) operator
{'ML', 'Java', 'Python', 'AWS', 'Azure'}                         # original set_1
{'Python', 'AWS', 'Pandas', 'Matplotlib', 'NumPy'}    # original set_2
{'ML', 'Java', 'Azure'}                                # result-set set1_difference_set2 = set_1 - set_2
{'Matplotlib', 'NumPy', 'Pandas'}           # result-set set2_difference_set1 = set_2 - set_1

After the difference operation, we got a result set in both cases. Result set contains elements which were present in first set but not in second set.

Summary

In this article we learn about various difference operation on Sets. Following scenarios were explored:

Code – Github Repository

All code snippets and programs for this article and for Python tutorial, can be accessed from Github repository – Comments and Docstring in Python.

Python Topics


Interview Questions & Answers

Q: What is the time complexity of difference() method in Python?

Time complexity of difference() in sets is O(len(set_1)), since the operation has to iterate over the first set and compare its elements with the other sets. This is possible because sets has hashtable based implementation and values between sets can be compared with O(1) time complexity.

Q: What is the difference between the difference() method and the - operator in Python sets?

The difference() method and the - operator works similarly and gives the same result – They are used to calculate the difference between two or more sets. The primary difference between them is in syntax and functionality.

difference() method can accept multiple sets as arguments, returning the difference between the first set and all others.

set_1 = {16, 24, 38, 34, 35}
set_2 = {24, 35}
set_3 = {34, 35}
difference_set = set_1.difference(set_2, set_3)
print(difference_set)     # elements in difference_set => {16, 38}

- operator – This works only with two sets. It cannot accept multiple sets at a time. We can perform difference again with the resulting set like set_1 - set_2 - set_3

set_1 = {16, 24, 38, 34, 35}
set_2 = {24, 35}
set_3 = {34, 35}
difference_set = set_1 - set_2 - set_3
print(difference_set)     # elements in difference_set => {16, 38}

Q: Can the difference() method be used on frozen sets?

Yes, the difference() method can be used on frozensets. frozensets are immutable version of sets. difference() of frozensets does not modify the original frozenset but returns a new frozenset with the computed difference.

frozenset_1 = frozenset([16, 24, 38, 34, 35])
frozenset_2 = frozenset([24, 35])
difference_frozenset = frozenset_1.difference(frozenset_2)
print(difference_frozenset)      # output difference_frozenset => frozenset({16, 34, 38})

Q: What are real-world use cases of the set difference operation in Python?

  • Remove duplicates – From two sets of data (e.g., customer emails or unique IDs), we can find which data points are unique to one dataset by using the difference.
  • Filtering data – If we need to exclude certain items from a dataset, set differences can be useful.
  • Detecting changes – In version control systems or inventory management, set differences help find items that have been added or removed between two states.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *