Векторизація обчислень для оптимізації коду на мові програмування Python

Олексій Земляний; Олег Байбуз

Authors

Oleksii Zemlianyi✉ Oles Honchar Dnipro National University
https://orcid.org/0009-0001-6157-8725
Oleh Baibuz✉ Oles Honchar Dnipro National University
https://orcid.org/0000-0001-7489-6952

Keywords:

vectorization, Python optimization, data processing, performance improvement, code readability, software development, missing data handling

Abstract

Purpose. The purpose of this study is to explore vectorization as an engineering technique to improve the performance and readability of Python code, particularly in data processing tasks. We aim to demonstrate the benefits of vectorization through practical examples involving the handling of missing data. Design / Method / Approach. To achieve the research goals, we performed a comparative analysis between loop-based and vectorized implementations. Specifically, two versions of a function were developed to identify columns containing missing values within a dataset. These implementations were tested on two real-world datasets. We compared execution time and code readability. Findings. The findings showed that vectorization resulted in substantial performance improvements, reducing execution time by hundreds of times compared to traditional loop-based methods. Additionally, the vectorized code was more compact, leading to greater readability and ease of maintenance. Theoretical Implications. Vectorization provides a higher level of abstraction for performing operations on data structures. This allows developers to focus on algorithmic logic rather than managing iterative control structures, contributing to broader discussions on optimizing computational efficiency in Python. Practical Implications. For data engineers and analysts, vectorization represents a highly effective solution for optimizing Python code. It significantly accelerates data-intensive tasks, such as missing data imputation, data analysis, and machine learning, making it an essential tool for enhancing productivity in data-driven environments. Originality / Value. This study presents a practical approach to optimizing Python code through vectorization. It is valuable for professionals seeking to improve efficiency in their workflows. Research Limitations / Future Research. The limitation of this research lies in its focus on a single problem – missing data imputation. Future research should expand the scope to other computational areas, such as image processing and simulation modeling, or examine the use of vectorization alongside Just-In-Time (JIT) compilation using tools like Numba to further boost Python's performance. Paper Type. Practitioner Paper.

PURL: https://purl.org/cims/2403.017

Downloads

Download data is not yet available.

References

Turner-Trauring, I. (2023, January). How vectorization speeds up your Python code. Hyphenated Enterprises LLC. https://pythonspeed.com/articles/vectorization-python/

Zemlianyi, O., & Baibuz, O. (2024). Методи імпутування пропусків у даних про ішемічну хворобу серця. System Technologies, 2(151), 33–49. https://doi.org/10.34185/1562-9945-2-151-2024-04

Janosi, A., Steinbrunn, W., Pfisterer, M., & Detrano, R. (1988). Heart Disease. UCI Machine Learning Repository. https://doi.org/10.24432/C52P4X

NHLBI. (2024). Framingham Heart Study-Cohort (FHS-Cohort). National Heart, Lung, and Blood Institute. https://biolincc.nhlbi.nih.gov/studies/framcohort/