Performance testing of an exact algorithm for the Subset Sum Problem on different personal computers
Keywords:
subset sum problem, parallel algorithm, GPU, performanceAbstract
This paper investigates the performance of an exact algorithm for the NP-complete Subset Sum Problem on different personal computers. The Subset Sum Problem asks whether there exists a subset of a given set of integers that sums up to a specified target value. Finding exact solutions for large instances is computationally challenging due to the exponential growth of possible subsets. The proposed algorithm employs a parallel backtracking approach, consisting of a breadth-first search phase executed on the CPU and a depth-first search phase executed on the GPU. The breadth-first phase starts with an empty subset, iteratively adding elements while storing intermediate subsets and sums in a queue. Branch pruning rules discard unproductive subsets. When a depth limit is reached, the queue is transferred to the GPU memory, and multiple threads perform depth-first searches in parallel on different subtrees, using a stack for intermediate subsets and sums. The depth-first phase follows the same pruning logic. It terminates when all threads complete their work or a solution is found. The algorithm was implemented in C# using the ILGPU library, which provides high-level abstractions for GPU programming. Performance tests were conducted on three personal computers with different CPU and GPU configurations: Intel Core i5-8250U with Nvidia GeForce MX150, AMD Ryzen 7 5800H with Nvidia GeForce GTX 1650, and AMD Ryzen 5 7600 with Nvidia GeForce RTX 4070 Ti SUPER. Results demonstrate significant speedups achieved by GPU parallelization compared to sequential CPU execution. For an input set of 1000 elements, the fastest time of 567ms was obtained on the RTX 4070 Ti SUPER with 16384 threads, which is almost 12 times faster than the same computer's CPU execution and 43 times faster than the weakest tested system. Optimal thread counts vary for different input sizes and GPUs. The massive parallelism, specialized architecture, and high memory bandwidth of GPUs contribute to their efficiency for this highly parallelizable problem. The proposed algorithm can be used as a building block for solving other related problems.
Downloads
References
Skiena S. S. The Algorithm Design Manual. 2nd ed. London : Springer, 2008. 730 p.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Михайло Ленський, Ганна Михальчук (Автор)
This work is licensed under a Creative Commons Attribution 4.0 International License.
All articles published in the journal Challenges and Issues of Modern Science are licensed under the Creative Commons Attribution 4.0 International (CC BY) license. This means that you are free to:
- Share, copy, and redistribute the article in any medium or format
- Adapt, remix, transform, and build upon the article
as long as you provide appropriate credit to the original work, include the authors' names, article title, journal name, and indicate that the work is licensed under CC BY. Any use of the material should not imply endorsement by the authors or the journal.