Improving GPT-2 Throughput for Lossless Text Compression

Thumbnail Image



Journal Title

Journal ISSN

Volume Title


The Ohio State University

Research Projects

Organizational Units

Journal Issue


Compression helps with handling the enormous amount (in the hundreds of millions of terabytes) of data generated daily. We can compress data with redundancies at higher rates with better models of data. Knowing large language models' (LLM) impressive performance at modeling text data, they seem well suited for lossless text compression. We implement a lossless text compressor that uses arithmetic coding with GPT-2. A naive GPT-2-based compressor is slow---it compresses 200-500 bytes per second with a GPU compared to almost two million bytes per second by 7-Zip (a general-purpose compressor). Though such a compressor's outputs are about 30% smaller than 7-Zip's, the poor compression speed limits its practicality. Hence, we investigate how various LLM optimizations impact our compressor's performance and speed. We achieve a 1.5x speedup without significant performance degradation. We achieve further speedup (over 2x) if we accept sacrificing performance. Additionally, we find that increasing model size improves compression performance more than increasing context size and that distilled models compress better than pruned models.



large language models, lossless text compression, transformer acceleration