Thursday, 10 February 2022

Memory Management of List Vs Tuples

 


In python, we have more than one data structure type, to save items in an ordered way. Why do we need different types and how it is different? When to use them?. In this blog we will look into all these answers, and have an understanding of each of its the commonalities and need of two different data structure types of List and Tuples. Also how the memory is managed for both of these types.

Note: Below points are referenced for CPython Implementation


Common Between List and Tuple

  1. Both are sequence data type

  2. Each item stored in a list can be of any data type

  3. Elements can be accessed by indexing and slicing

  4. Both allow duplicates


Let's see how each are different with examples.


List

Below are some highly talked notes about Lists.

  1. Mutable in Nature

  2. Sortable


Syntaxlist(<values>) or [<values>]


Example:

mylist = ["apple", "banana", "cherry", 1]


Definition

Define and usage of the list. As said list can be used to save any different types of object. Like below.

Source:

# example of list
list1 = [1, 2, "three", 4, 5]
print(type(list1))
print(list1)

Output:

<class 'list'>

[1, 2, 'three', 4, 5]


Slicing

Sample ways to access the contents from the list.

Source:

# access elements using index ; first 2 indexes
print(list1[0:2])

Output:

[1, 2]

[4, 5]


Mutable

We can edit the values in the list.

Source:

# list are mutable, which means we can change the value of the list
list1[0] = 100
print(type(list1))
print(list1)

Output:

<class 'list'>

[100, 2, 'three', 4, 5]


Memory Allocation
Now we have come to see the crux of this blog, how the memory is managed while storing the items in the list. First we will see how much currently the memory is allocated, later how each time the size changes when new items are allocated.

Let's check much currently the memory is allocated.

Source:

# check the memory allocated
import sys
print(sys.getsizeof(list1))

Output:

96

Common function to see the how much memory is allocated before and after values append.

# appending the new item
def append_into_list(value):
print("address: ", id(list1))
print("before size of list: ", sys.getsizeof(list1))
list1.append(value)
print("updated list: ", list1)
print("address remains the same: ", id(list1))
print("after sizes of list: ", sys.getsizeof(list1))
print("")

Please closely observe the size and memory address of the list before and post update.

Note: In the below scenario the memory address didn't change, but it will always not be the case

# lets see after adding couple of more values the list size
append_into_list(6)
append_into_list(7)
append_into_list(8)
append_into_list(9)
append_into_list(10)
append_into_list(11)
append_into_list(12)

Output:

address: 140509666477824

before size of list: 96

updated list: [100, 2, 'three', 4, 5, 6]

address remains the same: 140509666477824

after size of list: 128


address: 140509666477824

before size of list: 128

updated list: [100, 2, 'three', 4, 5, 6, 7]

address remains the same: 140509666477824

after size of list: 128


address: 140509666477824

before size of list: 128

updated list: [100, 2, 'three', 4, 5, 6, 7, 8]

address remains the same: 140509666477824

after size of list: 128


address: 140509666477824

before size of list: 128

updated list: [100, 2, 'three', 4, 5, 6, 7, 8, 9]

address remains the same: 140509666477824

after size of list: 128


address: 140509666477824

before size of list: 128

updated list: [100, 2, 'three', 4, 5, 6, 7, 8, 9, 10]

address remains the same: 140509666477824

after size of list: 192


address: 140509666477824

before size of list: 192

updated list: [100, 2, 'three', 4, 5, 6, 7, 8, 9, 10, 11]

address remains the same: 140509666477824

after size of list: 192


address: 140509666477824

before size of list: 192

updated list: [100, 2, 'three', 4, 5, 6, 7, 8, 9, 10, 11, 12]

address remains the same: 140509666477824

after size of list: 192

As you could see the size of list expanded from 96 to 128, but for next couple of items it didn't change whereas it stayed there for sometime. And then the size got expanded to 192. The reason in Cpython the memory is "preallocated" in chunks before hand. The reason is it avoids making frequent heavy system calls. As well if you see the over allocation is not static its mild and linear.

The reallocation happens to extend the current memory needed.

So when you have huge array in need, and the realloc is not having so much space. It will go ahead and create new memory and copy, this will be an very expensive operation.
To avoid this we can preallocate required memory.

Please find the below the code snippet of C implementation of List. And also the pictorial representation.

C Source - from CPython:

#### Cpython : https://github.com/python/cpython/blob/master/Objects/listobject.c
'''
/* This over-allocates proportional to the list size, making room
* for additional growth. The over-allocation is mild, but is
* enough to give linear-time amortized behavior over a long
* sequence of appends() in the presence of a poorly-performing
* system realloc().
* Add padding to make the allocated size multiple of 4.
* The growth pattern is: 0, 4, 8, 16, 24, 32, 40, 52, 64, 76, 88, 120, 160 ...
* Note: new_allocated won't overflow because the largest possible value
* is PY_SSIZE_T_MAX * (9 / 8) + 6 which always fits in a size_t.
*/
new_allocated = ((size_t)newsize + (newsize >> 3) + 6) & ~(size_t)3;
/* Do not overallocate if the new size is closer to overalocated size
* than to the old size.
*/
'''
### eg: if the current size is 24
### ((size_t)newsize + (newsize >> 3) + 6) == increases the one eighth which is 15 so 24+3 => 24+6 = 30
### ~(size_t)3 = -4
### 30 & -4 = 32


### Note: The calculations vary based on the size of the object used in the list


Figure 1 : Memory allocation in list

Credits : https://www.laurentluce.com/images/blog/list/list_insert.png


Empty List

When an empty list is created and how its address is pointed. It will always pointing to different address. And it will hold preallocated memory as well.

Source:

# When creating two empty list it will be point different address space
a = []
b = []
if a is not b :
print("A and B are not mapped to same address")
print("address of a: ", id(a))
print("address of b: ", id(b))

print("A size of list: ", sys.getsizeof(a))
print("B size of list: ", sys.getsizeof(b))

Output:

A and B are not mapped to same address

address of a: 140509666702080

address of b: 140509666510912

A size of list: 56

B size of list: 56


Removal and Insertion

When we perform removal, the allocated memory will shrink without changing the address of the variable. While performing insert, the allocated memory will expand and the address might get changed as well.

Source:

# pop an element from the between of the array
print("address before change: ", id(list1))
print("Remove element from list", list1.pop(2))
print("Size of list after removal: ", sys.getsizeof(list1))
print("Remove element from list", list1.pop(2))
print("Size of list after removal: ", sys.getsizeof(list1))
print("Remove element from list", list1.pop(2))
print("Size of list after removal: ", sys.getsizeof(list1))
print("Remove element from list", list1.pop(2))
print("Size of list after removal: ", sys.getsizeof(list1))
print("Remove element from list", list1.pop(2))
print("Size of list after removal: ", sys.getsizeof(list1))
print(list1)
print("address after change: ", id(list1))

Output:

address before change: 140509666477824

Remove element from list three

Size of list after removal: 192

Remove element from list 4

Size of list after removal: 192

Remove element from list 5

Size of list after removal: 192

Remove element from list 6

Size of list after removal: 192

Remove element from list 7

Size of list after removal: 136

[100, 2, 8, 9, 10, 11, 12]

address after change: 140509666477824

Source:

# pop an element from the between of the array
print("address before change: ", id(list1))
list1.insert(3, 30)
print(list1)
print("address after change: ", id(list1))

Output:

address before change: 140509666477824

[100, 2, 8, 30, 9, 10, 11, 12]

address after change: 140509666477824


Sortable

While performing the sort operation, the address of the list doesn't get changed before and after sort operation.

Source:

# sort the array
print("address before change: ", id(list1))
list1.sort()
print(list1)
print("address after change: ", id(list1))

Output:

# Note : even after sort the address remains the same

address before change: 140509666477824

[2, 8, 9, 10, 11, 12, 30, 100]

address after change: 140509666477824


List sort vs sorted function

.sort() and sorted() can provide exactly the sort order you need if you use them properly with both the reverse and key optional keyword arguments. Both have very different characteristics when it comes to output and in-place modifications.

Syntax: .sort() and sorted()

Example: .sort()

>>> values_to_sort = [5, 2, 6, 1]
>>> # Sort the list and assign to new variable
>>> sorted_values = values_to_sort.sort()
>>> print(sorted_values) # will not return anything rather perform inplace sort
None
>>> print(values_to_sort) # original list will be sorted
[1, 2, 5, 6]

Example: sorted()

>>> names_with_case = ['harry', 'Suzy', 'al', 'Mark']
>>> sorted_values = sorted(names_with_case, reverse=True) # order of sort mentioned
>>> print(sorted_values)
['harry', 'al', 'Suzy', 'Mark']
>>> names_with_case # original list will be unaffected
['harry', 'Suzy', 'al', 'Mark']


User List

User list is similar to user string, instead of string uses list class implementation to edit lists features. It acts as a wrapper class around the List objects.

Syntaxcollections.UserList([list])

Example:

from collections import UserList
L = [1, 2, 3, 4]
# Creating a userlist
userL = UserList(L)
print(userL.data) # [1, 2, 3, 4]
# Creating empty userlist
userL = UserList()
print(userL.data) # []

# Creating a List where
# deletion is not allowed
class MyList(UserList):

# Function to stop deletion
# from List
def remove(self, s = None):
raise RuntimeError("Deletion not allowed")
# Function to stop pop from
# List
def pop(self, s = None):
raise RuntimeError("Deletion not allowed")


# Driver's code
L = MyList([1, 2, 3, 4])
# Inserting to List"
L.append(5)
print("After Insertion")
print(L) # [1, 2, 3, 4, 5]
# Deleting From List
L.remove() # Exception will be raised here: Traceback ... RuntimeError: Deletion not allowed


List comprehension

List comprehension offers a shorter syntax when you want to create a new list based on the values of an existing list. List comprehensions are faster than loops, but avoid to use when nested for readability

Syntaxnewlist = [expression for item in iterable if condition == True]

Example:

# get the list of fruites which contains 'a' and change the case to upper
fruits = ["apple", "banana", "cherry", "kiwi", "mango"]
newlist = [x.upper() for x in fruits if "a" in x]
print(newlist) # ['APPLE', 'BANANA', 'MANGO']


Performance

  • Append – O(1)

    • a = [1, 2, 3]

    • b = [4, 5, 6]

    • a.append(b)

      • a = [1, 2, 3, [4, 5, 6]]

  • Extend – O(k)

    • a = [1, 2, 3]

    • b = [4, 5, 6]

    • a.extend(b)

      • a = [1, 2, 3, 4, 5, 6]

  • Pop last element - O(1)

  • Pop anywhere else - O(n) worst case

  • Insert - O(n) worst case

    • will insert the element in the passed index

  • Index - O(1)

  • Slice - O(k)

  • in Operator – O(n)

    • find the element in the array

  • Sort - O(nlogn) - tim sort is used

n = number of elements

k = k'th index

1 = order of 1


Tuples

Let's observe how the tuples are defined, and how it differs in allocation of memory compared to list.

  1. Immutable in nature - Once defined the values cannot be changed.

  2. Tuple consumes less memory as compared to the list

Syntaxtuple(<values>) or (<values>)


Example:

mytuple = ("apple", "banana", "cherry", 1)


Definition

Quick example to define a tuple.

Source:

# example of tuple
tuple1 = ('one', 'two', 'three')
print(type(tuple1))
print(tuple1)

Output:

<class 'tuple'>

('one', 'two', 'three')


Slicing

Accessing tuple is as same as list.

Source:

# access elements using index ; first 2 indexes
print(tuple1[0:2])

# access the last index of the tuple
print(tuple1[-1])

Output:

('one', 'two')

three


Immutable

Tuples are immutable in nature, saying we cannot change the values of the created tuple.

Changing the single value

Find the error while trying to change the value of the tuple

Source:

# tuple are immutable
tuple1[0] = "cherry"
print(type(tuple1))
print(tuple1)

Output:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-a780f574006b> in <module>
      1 # tuple are immutable
----> 2 tuple1[0] = "cherry"
      3 print(type(tuple1))
      4 print(tuple1)

TypeError: 'tuple' object does not support item assignment

Replacing tuple to new tuple

We can override the existing tuple to new tuple, the address will also be overridden.

Source:

# but we can assign the tuple1 variable to any other value also another tuple
print("address before the change: ", id(tuple1))
tuple1 = (1, 2, 3, 4)
print(type(tuple1))
print(tuple1)
print("address after the change: ", id(tuple1))

Output:

address before the change: 140509666846336

<class 'tuple'>

(1, 2, 3, 4)

address after the change: 140509667383792

Changing the list inside tuple

We know that the tuple can as well hold any values. Now we have tried to save list inside tuple. And try editing its value. Can we edit ? Will it change the list ??

Lets see the answer below:

Source:

t = (1, 2, [10, 20, 30])
print(id(t), t)
t[2] += [40, 50]

Output:

140509666702784 (1, 2, [10, 20, 30])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-cc42d1fec8cf> in <module>
      1 t = (1, 2, [10, 20, 30])
      2 print(id(t), t)
----> 3 t[2] += [40, 50]

TypeError: 'tuple' object does not support item assignment

It has clearly thrown error, so it should have not updated the values as well.

Source:
print(id(t), t)
Output: 140509666702784 (1, 2, [10, 20, 30, 40, 50])

But if you see the values would be appended. This is an edge case where python behaves strangely. So, putting mutable items in tuples is not a good idea.


Memory Allocation

Check the memory allocated, it uses only required memory. They are not over-allocated as they are not resizable.

Source:
print(sys.getsizeof(tuple1))
Output: 72


Reuse Memory

To reduce memory fragmentation and speed up allocations, Python reuses old tuples.

If a tuple no longer needed and has less than 20 items.

Instead of deleting it permanently Python moves it to a free list and later uses it.

Note: Though its always not certain, the same memory will be used.

Source:

a = (1,2)
print(id(a))
del a
b = (1,2)
print(id(b))
Output:

140509665739520

140509665739520


Empty Tuple

When creating two empty tuple it will be point same address space.

Empty tuple acts as a singleton, that is, there is always only one tuple with a length of zero.

When creating an empty tuple Python points to already preallocated one.

In such way that any empty tuple has the same address in the memory.

This is possible because tuples are immutable and sometimes saves a lot of memory.

Source:

a = ()
b = ()
if a is b :
print("A and B are not mapped to same address")
print("address of a: ", id(a))
print("address of b: ", id(b))

print("A size of list: ", sys.getsizeof(a))
print("B size of list: ", sys.getsizeof(b))
Output:

A and B are not mapped to same address

address of a: 140509600395328

address of b: 140509600395328

A size of list: 40

B size of list: 40


Removal and Insertion

We cannot update the existing tuple but we can create new tuple with it, it will be copied into new address.

Source:

tuple1 = ('one', 'two', 'three')
print(id(tuple1), tuple1)
print()

tuple2 = ('1', '2', '3')
print(id(tuple2), tuple2)
print()

tuple3 = tuple1 + tuple2
print(id(tuple3), tuple3)
print()
Output:

140509667635968 ('one', 'two', 'three')

140509666902208 ('1', '2', '3')

140509666693664 ('one', 'two', 'three', '1', '2', '3')


Sort

Yes as said tuples are immutable, we cannot implicitly sort tuple.

But we can make use of sorted function to sort tuple.

Note: It will return as list.

Source:

sorted_tuple1 = sorted(tuple1)
print(id(sorted_tuple1), type(sorted_tuple1), sorted_tuple1)
Output:

140509667589312 <class 'list'> ['one', 'three', 'two']


Named Tuple

The namedtuple and normal type uses exactly same amount of memory. Because the field names are stored in the class

So we can either use tuple or named tuple, it will be the same. And named tuple will increase the readability of the program as well. Bonus!!

Named Tuples are not changeable.


Syntaxnamedtuple(<name>, <values>)


Example:

from collections import namedtuple
# Declaring namedtuple()
Student = namedtuple('Student', ['name', 'age', 'DOB'])
# Adding values
S = Student('Nandini', '19', '2541997')
# Access using index
print("The Student age using index is : ", end="")
print(S[1])
# Access using name
print("The Student name using keyname is : ", end="")
print(S.name)


Memory Allocation Source:

from collections import namedtuple
City = namedtuple("City", "name country status")
chennai = City("Chennai", "India", "Red")
print(chennai)
print("Size of named tuple: ", sys.getsizeof(chennai))

normaltuple = ("Chennai", "India", "Red")
print(normaltuple)
print("Size of normaltuple tuple: ", sys.getsizeof(normaltuple))
Output:

City(name='Chennai', country='India', status='Red')

Size of named tuple: 64

('Chennai', 'India', 'Red')

Size of normaltuple tuple: 64


Performance

  • Index - O(1)

  • Slice - O(k)

  • in Operator - O(n)

n = number of elements

k = k'th index

1 = order of 1

Note: As Tuples are immutable it has only limited features


Conclusion

Use list when:

  • When the collection needs to be changed constantly.

Use tuples when:

  • Working with arguments.

  • Returning 2 or more items from a function.

  • Iterating over dictionary's key-value pairs.

  • Using string formatting.

  • Lists are more complex to implement, preferring tuple saves memory and time (list uses 3000+ lines of code while tuple needs only 1000+ lines of C code)


Reference

https://www.laurentluce.com/posts/python-list-implementation/
https://github.com/python/cpython/blob/master/Objects/listobject.c
Fluent Python, 2nd Edition [Book] from O'Rielly
https://cdn.educba.com/academy/wp-content/uploads/2019/06/Python-Tuple-vs-List.jpg

Original Blog Posted in OSFY
https://www.opensourceforu.com/2021/05/memory-management-in-lists-and-tuples/
For further research and updates maintaining the blog here.


No comments:

Post a Comment

Scarcity Brings Efficiency: Python RAM Optimization

  In today’s world, with the abundance of RAM available, we rarely think about optimizing our code. But sooner or later, we hit the limits a...