In September 2022, Trellix published a report on a vulnerability in the tarfile module, which is part of a standard library for the Python programming language and can be used by anyone. The vulnerability allows an arbitrary file to be written to an arbitrary folder on the hard drive, and in some cases it also allows for malicious code execution. What makes this study noteworthy is that the problem in tarfile was discovered in August 2007 – just over 15 years ago! But back then it wasn’t considered dangerous. Let’s find out why is wasn’t, and what problems Python developers and their users could face as a result.
Tarfile in detail
Tarfile contains code for working with tar archives. This format is widely used in Unix-like operating systems, tracing its history all the way back to 1979. Tar is a simple way to pack a large number of files and folders. Initially it was used for writing backups to magnetic tape. Nowadays, tar archives can use file compression, although this is optional. The tarfile is responsible for creating and unpacking such archives, and Python developers use it as a ready-made tool for such tasks.
The vulnerability in tarfile is quite simple. It was described exhaustively in the original bug report from August 2007. It’s not even a vulnerability as such; it’s just that tarfile recreates the exact folder structure contained in the archive when it’s unpacked. This includes cases when the file name in the archive is something like “../../../../../etc/passwd”. If you unpack such an archive as a system administrator, the passwd file is not written to the directory where the archive itself is located. When going through the /../ elements in the path, the unpacker first reaches the root directory, then overwrites the passwd file in the /etc directory. In Linux, this means erasing the regular file holding the data of all system users.
The danger here is that the user of a program that employs the tarfile module doesn’t know how the normal unpacking of a regular archive ends. There may be nothing, or some files may appear in an unexpected place. Or some user files might get overwritten. The author of the bug report mentions this very problem in the tar archiver itself, which got fixed back in 2001 – more than 20 years ago. But in tarfile the vulnerability was never closed.
A 15-year wait
Following a discussion of the potential bug in 2007, it was decided… to do nothing, for two reasons. First, such file processing is in full compliance with the Unix POSIX standard (we can confirm that). Second, “there is no possibility of exploitation in practice.” A warning in the user guide that it’s not advised to unpack files from untrusted sources using tarfile was considered sufficient.
This assessment was proven false in 2022, when Trellix showed that exploitation in practice is more than possible. And not only for writing data wherever you like, but also for running arbitrary code. Recall that this is a library for programmers; that is, the possibility of an attack depends on the specific software in which the tarfile module is used. Trellix gave two examples.
The first is Universal Radio Hacker, a program for analyzing unknown wireless protocols. The program saves data in the form of projects, which consist of multiple tar-packed files. The researchers demonstrated how an attempt to open a premade archive results in an executable file being written to the Windows autorun directory. So the next time the system is rebooted, this code is executed. This vulnerability can, among other things, be exploited on different platforms.
The second example shown in the video is slightly more complicated. The Spyder IDE development environment stores data in tar archives. When importing this data, the researchers first repeated the experiment with planting the file in the system, but then did something cooler: they programmed arbitrary code to run at the next startup of Spyder. The end result of this experiment was a request to execute arbitrary code now with system administrator privileges.
Unpredictable consequences
This story of the 15-year bug illustrates once again that you should never underestimate vulnerabilities that allow writing data anywhere – even if it’s done by the book and the exploitation paths aren’t obvious.
Tarfile is part of the standard Python library and can be found in almost any Linux-based system (among others). However, the danger is in the use of a specific vulnerable function. Generally speaking, any project developed in Python that employs a tarfile module is a vulnerability. From the end user’s point of view, it’s a tricky situation: they may be running a potentially vulnerable program and not even know that it uses tar. Kaspersky experts recommend to:
- limit the processing of files from untrusted sources;
- execute third-party programs with minimal privileges to minimize attack opportunities;
- audit software used on the most critical systems to identify those that use the vulnerable function.
For developers, this problem is a reason to audit their own code to find calls to a vulnerable function, and to amend accordingly.