Glimpse - A PHP Sandbox for the Deobfuscation and Dissection of Web Malcode and Remote Access Trojans

by Peter Wrench

More Info

Research Goals

This project had two primary objectives. The first was the development of a sandbox-based component capable of safely executing and dissecting potentially malicious PHP code. This sandbox was designed to mimic a vulnerable host and allow the code to run as it usually would, but without negatively affecting the machine on which it is being run. The purpose of creating such a system was to analyse the behaviour of shell scripts and identify any potentially malicious actions that they might undertake. As such, it forms the dynamic component of the full shell analysis system - information about the shell's functioning is extracted at runtime. The sandbox component is able to log calls to functions that have the potential to be exploited by an attacker and make the user aware of such calls by specifying where they were made in the code.

The second major goal was the development of an auxiliary component for performing normalisation and deobfuscation of input code prior to execution in the sandbox environment. Code normalisation is the process of altering the format of a script to promote readability and understanding, while deobfuscation is the process of revealing code that has been deliberately disguised. The decoder component was designed to analyse code for syntactic structures and functions that are typically associated with code obfuscation (such as eval() and preg_replace()) and replace these with the code they were intended to disguise. In addition to this functionality, the decoder is capable of extracting useful information in the form of variable names, URLs and email addresses from PHP scripts.

The project was not intended to create a system capable of being deployed in a production environment, but rather as a proof of concept. As such, the focus was on proving that a dynamic, sandbox-based approach to malware analysis is a viable (and even desirable) option, especially when combined with information gained from traditional static analysis techniques.