Photo by ArtisanalPhoto on Unsplash

Understanding Python Multithreading and Multiprocessing by visualization

If you want to know the visualization method, please refer to this article. I will only present the results and my findings. Or you can find the code on my Github.

Eric Tsai
5 min readMar 14, 2021

--

Why do we need to understand parallel processing?

I think the direct reason is we can use computational resources sufficiently on our machines. It is meant to reduce the overall processing time. But not every task is suitable for parallel processing. Even it is feasible, ways of parallel processing have different timing to apply.

Explain the principle of parallel processing

In Python, we have two ways to use parallel operations. One is multithreading, and the other is multiprocessing. I draw a diagram to display how multithreading and multiprocessing work roughly.

The critical difference between multithreading and multiprocessing

Different ways to increasing computing power

Multiprocessing adds CPUs(or processors) for increasing computing power. And multithreading creates many threads in a single processor to get the same effect.

Multiprocessing process creation is time-consuming work. While in Multithreading, process creation is economical.

Multiprocessing will copy a virtual memory space to each processor. But multithreading does not. It let each thread share a memory space.

Let me show you some cases.

I use two evaluation indexes. One is multithreading or multiprocessing whether can reduce computation time. The other is which way is better for reduce run time. It can help you to understand application timing for multithreading and multiprocessing.

API Calls (download task):

Get time benefit from both multithreading and multiprocessing. The performances seem almost equal, but multithreading doesn’t need to create more processes (use less memory). So multithreading is better.

IO Heavy (write a big string into a file)

Multithreading is better than multiprocessing. It seems like multithreading can parallel processing without waiting for other tasks.

Numpy Functions

In this case, multithreading and multiprocessing can both reduce run time. Using multiprocessing, we find that process creation is pretty time-consuming when your NumPy array is enormous. So depend on total run time, multithreading is better than multiprocessing no matter whether you use Numpy to Addition or Dot Product. But when tasks are many, I think multiprocessing might be better.

  1. Numpy Addition

2. Dot Product

CPU Intensive

When we meet CPU-intensive cases, let several processors run simultaneously will improve your speed. Even you need to create many processes. Multiprocessing is still better than multithreading. Of course, this is base on the fact that the memory capacity of your current computing environment is not significant.

Conclusions

Sometimes you can’t easily define what type of your case is. So maybe you can quickly run a sample and find out which parallel processing way is better. I hope you guys can benefit from this article.

In the end, I want to recommend an article. It uses good diagrams to tell readers how multithreading and multiprocessing work. And the complete code will be in my Github. Thanks.

--

--