PyFET: Forensically Equivalent Transformation for Python Binary Decompilation

Ali Ahad, Chijung Jung, Ammar Askar, Doowon Kim, Taesoo Kim, and and Yonghwi Kwon

Abstract
Decompilation is a crucial capability in forensic analysis, facilitating analysis of unknown binaries. The recentrise of Python malware has brought attention to Python decompilers that aim to obtain source code representation from a Python binary. However, Python decompilers fail to handle various binaries, limiting their capabilities in forensic analysis. This paper proposes a novel solution that transforms a decompilation error-inducing Python binary into a decompilable binary. Our key intuition is that we can resolve the decompilation errors by transforming error-inducing code blocks in the input binary into another form. The core of our approach is the concept of Forensically Equivalent Transformation (FET) which allows non-semantic preserving transformation in the context of forensic analysis. We carefully define the FETs to minimize their undesirable consequences while fixing various error-inducing instructions that are difficult to solve when preserving the exact semantics. We evaluate the prototype of our approach with 17,117 real-world Python malware samples causing decompilation errors in five popular decompilers. It successfully identifies and fixes 77,022 errors. Our approach also handles anti-analysis techniques, including opcode remapping, and helps migrate Python 3.9 binaries to 3.8 binaries.

Reference
Ali Ahad, Chijung Jung, Ammar Askar, Doowon Kim, Taesoo Kim, and and Yonghwi Kwon. 2023. PyFET: Forensically equivalent transformation for python binary decompilation. In Proceedings of the 2023 44th IEEE Symposium on Security and Privacy. S&P.

Downloads