Vectorization of calculations for code optimization in the Python programming language
Keywords:
vectorization, Python optimization, data processing, performance improvement, code readability, software development, missing data handlingAbstract
Purpose. The purpose of this study is to explore vectorization as an engineering technique to improve the performance and readability of Python code, particularly in data processing tasks. We aim to demonstrate the benefits of vectorization through practical examples involving the handling of missing data. Design / Method / Approach. To achieve the research goals, we performed a comparative analysis between loop-based and vectorized implementations. Specifically, two versions of a function were developed to identify columns containing missing values within a dataset. These implementations were tested on two real-world datasets. We compared execution time and code readability. Findings. The findings showed that vectorization resulted in substantial performance improvements, reducing execution time by hundreds of times compared to traditional loop-based methods. Additionally, the vectorized code was more compact, leading to greater readability and ease of maintenance. Theoretical Implications. Vectorization provides a higher level of abstraction for performing operations on data structures. This allows developers to focus on algorithmic logic rather than managing iterative control structures, contributing to broader discussions on optimizing computational efficiency in Python. Practical Implications. For data engineers and analysts, vectorization represents a highly effective solution for optimizing Python code. It significantly accelerates data-intensive tasks, such as missing data imputation, data analysis, and machine learning, making it an essential tool for enhancing productivity in data-driven environments. Originality / Value. This study presents a practical approach to optimizing Python code through vectorization. It is valuable for professionals seeking to improve efficiency in their workflows. Research Limitations / Future Research. The limitation of this research lies in its focus on a single problem – missing data imputation. Future research should expand the scope to other computational areas, such as image processing and simulation modeling, or examine the use of vectorization alongside Just-In-Time (JIT) compilation using tools like Numba to further boost Python's performance. Paper Type. Practitioner Paper.
Downloads
References
Turner-Trauring, I. (2023, January). How vectorization speeds up your Python code. Hyphenated Enterprises LLC. https://pythonspeed.com/articles/vectorization-python/
Zemlianyi, O., & Baibuz, O. (2024). Методи імпутування пропусків у даних про ішемічну хворобу серця. System Technologies, 2(151), 33–49. https://doi.org/10.34185/1562-9945-2-151-2024-04
Janosi, A., Steinbrunn, W., Pfisterer, M., & Detrano, R. (1988). Heart Disease. UCI Machine Learning Repository. https://doi.org/10.24432/C52P4X
NHLBI. (2024). Framingham Heart Study-Cohort (FHS-Cohort). National Heart, Lung, and Blood Institute. https://biolincc.nhlbi.nih.gov/studies/framcohort/
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Oleksii Zemlianyi, Oleh Baibuz (Author)
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in the journal Challenges and Issues of Modern Science are licensed under the Creative Commons Attribution 4.0 International (CC BY) license. This means that you are free to:
- Share, copy, and redistribute the article in any medium or format
- Adapt, remix, transform, and build upon the article
as long as you provide appropriate credit to the original work, include the authors' names, article title, journal name, and indicate that the work is licensed under CC BY. Any use of the material should not imply endorsement by the authors or the journal.