Abstract
This article is especially designed to crack Java executables by disassembling their corresponding bytes code. Disassembling of Java bytecodes is the act of transforming Java bytecodes into Java source code. Disassembling is an inherent issue in the software industry, causing revenue loss due to software piracy. Security engineers tend to resist disassembling techniques, including software watermarking and code obfuscation in the context of Java byte code disassembling. A huge portion of this paper is dedicated to tactics that are commonly considered to be Reverse Engineering. The methods presented here, however, are intended for professional software developers and each technique is based on custom created applications. We are not encouraging any kind of malicious hacking approach by presenting this article; in fact the contents of this paper assist to pinpoint the vulnerability in the source code and learn the various methods to developers to shield their intellectual property from reversing. We shall come across with the process of disassembling in terms of obtaining sensitive information from source code and cracking a Java executable without having the original source code.
Prerequisite
I presume that the reader has a thorough understanding of programming, debugging and compiling in Java on various platforms such as Linux, Windows and of course, a JVM inner working knowledge. Apart from that, the subsequent tools are required to manipulate byte code reverse engineering.
- JDK Toolkit [Javac, javap]
- Eclipse
- JVM
- JAD
Java Byte code
Engineers usually, construct software in a high-level language like Java that is comprehensible to them, but that in fact, cannot be executed by the machine directly. Such textual form of a computer program, known as source code, is converted into a form that the computer can directly execute. Java source code is compiled into an intermediate language known as Java bytecode that is not directly executed by the CPU but rather, executed by a Java Virtual Machine. Compilation typically, is the act of transforming a high-level language, into a low-level language such as machine code or bytecode. We do not need to understand Java byte code, rather doing so can assist debugging and can improve performance and memory consumption.
The JVM is essentially a simple stack based machine that can be separated into a couple of segments, for instance, stack, heap, registers, method area and native method stacks. An advantage of the virtual machine architecture is portability. Any machine that implements the Java Virtual Machine specification is able to execute Java bytecode in a manner of “Write once, run anywhere". Java bytecode is not strictly linked to the Java language and there are many compilers and other tools available that produce Java bytecode such as the Eclipse IDE, Netbeans and Jasmin bytecode assembler. Another advantage of the Java Virtual Machine is the runtime type-safety of programs. The Java Virtual Machine defines the required behavior of a Java Virtual Machine but does not specify any implementation details. Therefore the implementation of the Java Virtual Machine specification can be designed in various ways for diverse platforms as long as it adheres to the specification.
Sample Cracked Application
The subsequent Java console application “LoginTest” is developed to reflect Java byte code disassembling. This application typically tests the valid users by passing them using a simple login user name and password mechanism. We have this application from other resources as an unregistered user and obviously, we don't possess the source code of this application. As a result, we are do not have the valid user name and password that is only provided to the registered user and could not login eventually.
Without having the source code of the application or login credential sets, we still can manage to login into this mechanism, by disassembling its byte code where we can expose sensitive information related to the user login.
Disassemble Bytecode
Disassembling is the reverse approach due to the standard and well-documented structure of bytecode that is an act of transforming a low-level language into a high-level language. It basically generates the source code from Java bytecode. We typically run a disassembler to obtain the source code for a given bytecode just as a compiler is run to yield bytecode from the source code. Disassembling is used to determine the implementation logic in the absence of the relevant documentation and the source code, which is why vendors explicitly prohibit disassembling and reverse engineering in the license agreement. Here are some of the reasons to decompile:
- Fixing critical bugs in the software for which no source code exists.
- Troubleshooting a software or jar that does not have proper documentation.
- Recovering the source code that was accidentally lost.
- Learning the implementation of a mechanism.
- Learning to protect your code from reverse engineering.
The process of disassembling Java byte code is quite simple, not as complex as a native C/C++ binary. The first step is to compile the Java source code file that has *.java extension using the javac utility that produces a *.class file from the original source code in which byte code typically resides. Finally, by using javap that is a utility provided in the JDK toolkit, we can disassemble the byte code from the corresponding *.class file. The javap utility stores its output in a *.bc file.
Opening a *.class file does not mean that we access the entire implementation logic of a mechanism. If we try to open the generated byte code file using simple Notepad or any editor after compiling the Java source code file using the javac utility we surprisingly find some bizarre or strange data in the class file that are totally uncomprehendable. Here, the following figure displays the .class files data as:
So, the idea of opening the class file directly, isn't successful at all, hence we shall encounter the WinHex editor to disassemble the byte code, that produces implementation logic in hexadecimal bytes along with the string that is manipulated in the application. Although we can reverse engineer or reveal sensitive information of a Java application using the WinHex editor, this operation is sophisticated because unless we don't have knowledge of the hex byte reference to the corresponding instruction in the source code, we can't obtain much of the information.
Reversing Bytecode
It is relatively easy to disassemble byte code of a Java application rather than another binary. The javap built-in utility that ships with the JDK toolkit plays a significant role in disassembling Java byte code as well as assisting in revealing sensitive information. It typically accepts a *.class file as an argument as in the following:
Drive:\> Javap LoginTest
Once this command is issued, it shows the real source code behind the class file, but remember one thing, it does display only the methods signature used in the source code as in the following:
- Compiled from “LoginTest.java”
- public class LoginTest
- {
- public LoginTest();
- public static void main(java.lang.String[]);
- static boolean verify(java.lang.String, char[]);
- }
The entire source code of Java executable, even of the contains methods related opcodes, would be showcased by the javap –c switch, as in the following:
Drive:\> Javap –c LoginTestThe previous command dumps the entire byte code of the program in the form of special opcode instructions. The meaning of each instruction in the context of this program, will be explained in a later section of this paper. I highlighted the important section from where, we can obtain critical information.
From line 62, we can easily conclude that the login mechanism is implemented using a method called verify that typically checks either the user entered cored username password or not. If the user entered the correct password then a Login success message flashes, otherwise:
But still we are unable to grab the username and password related information. Hence, if we analyze the verify methods instruction, we can easily determine that the username and password are hard-coded in the code itself, highlighted in the colored box as in the following.
We finally, come to the conclusion that this program accepts ajay as username and test as password that is specified in the ldc instruction.
Hence, launch the application once again and entered the obtained credentials as previously described. Bingo!!!! We have successfully subverted the login authentication mechanism without even having the source code as in the following:
Byte Code Instruction Specification
Like assembly programming, Java machine code representation is done via bytecode opcodes that forms instructions that the JVM executes on any platform. Java byte codes typically, offer 256 diverse mnemonics and each is one byte in length. Java byte codes instructions fall into these major categories:
- Load and store
- Method invocation and return
- Control transfer
- Arithmetical operation
- Type conversion
- Object manipulation
- Operand stack management
We shall only discuss the opcode instructions that are used in the previous Java binary. The following table illustrates the usage meaning as well as corresponding hex value as in the following:
Java Opcodes Meaning Hex value
In-Brief
This paper explained the mechanism of disassembling Java byte code in order to reveal sensitive information when the source of the Java binary is unavailable. We have come to an understanding of how to implement such reverse engineering using JDK utilities. This article also unfolds the importance of byte code disassembling and JVM internals in the context reverse byte code as well as explain the meaning of essential byte opcode in details. Finally, we have seen subverting login authentication on a live Java console application by applying disassemble tactics. In the forthcoming paper, we shall explain, how to patch Java byte code in the context of revere engineering.