Analyzing the Performance of Block-Splitting in LLVM Fingerprinting


  • Bill Mahoney University of Nebraska at Omaha, Nebraska, USA
  • Philip Sigillito University of Nebraska at Omaha, Nebraska, USA
  • Jeff Smolinski University of Nebraska at Omaha, Nebraska, USA
  • Todd McDonald University of Nebraska at Omaha, Nebraska, USA
  • George Grispos University of Nebraska at Omaha, Nebraska, USA



security and privacy, application fingerprinting, steganography


This paper expands and builds upon previous work reported at the 2021 ICCWS concerning Executable Steganography and software intellectual property protection via fingerprinting. Software fingerprinting hides some type of unique identification into the binary program artifact so that a proof of ownership can be established if the artifact turns up elsewhere. In our previous work, it was noted that “fingerprints are a special case of watermarks, with the difference being that each fingerprint is unique to each copy of a program”. This prior work emphasized making the fingerprint independent of the machine architecture; that is, performing the operations on an intermediate representation (IR). LLVM was used as the target IR, which is a compiler “middleware” language that is then converted into machine code in a later step. Both a static fingerprinting method, where the serial number or data is embedded and visible by inspection, and a dynamic method, where the code must be executed, were explored. The dynamic method only incurs an overhead if the proof code is executed and has very minimal impact if the proof code is not executed. However, the static fingerprint was accomplished by shuffling the order of basic blocks in the software in a manner that represents the serial number data, and this would have an impact on both the execution speed and the program size. This paper reports on subsequent research to improve the quantity of data which could be encoded by rearranging the blocks in a program and increasing the number of blocks by splitting them into smaller fragments, thus allowing for more potential orderings and therefore more data. Contributions in the current
work are twofold. First, the experimental infrastructure has been refined so that the fingerprinting actions take place within the compiler itself as opposed to an external LLVM parser. Second, code has been introduced to limit the upper bound on the size of a block and to split blocks which are larger than this upper bound. We evaluate the resultant overhead and performance of the block splitting method and report negligible increases based on the block-splitting technique.