IUST Reverse Engineering Research Laboratory
Welcome to the Iran University of Science and Technology (IUST) Reverse Engineering Research Laboratory. The laboratory is led by Dr. Saeed Parsa associate professor of computer engineering. This section briefly introduces the laboratory reserach areas and topics.
The main research areas of the lab are:
- Software testing
- Software engineering
- Reverse engineering
Ph.D. Assistant professor - Member of academic staff of university of Gilan
Thesis Title: Statistical Latent Fault Localization Considering Program Structure and Fault-proneness Analysis
Software debugging process is one of the most difficult, tedious, and time-consuming steps of software development. In this regard, several automated techniques have been developed to reduce the burden on the developer during debugging. Conducted research in recent years has shown that statistical techniques in many cases perform better than other techniques in terms of the amount of code the developer must examine to locate the fault. However, these techniques are still faced with major limitations. Statistical methods of automatic fault localization are biased by data collected from different executions of the program. This biasness could result in unstable statistical models which may vary dependent on the test data provided for trial executions of the program. These methods consider an equal fault-proneness likelihood for different portions of the code; whereas, the location of an entity inside the program and the fault-proneness of the programming structures could have strong impact on the suspiciousness assessment of the program elements. Moreover, statistical methods attempt to find the failure-correlated statements, while the fault localization is a causal problem and a statistical causal method is required. The overall goal of this thesis is to apply a statistical causal analysis combined with program analysis and consider the program structure and static fault-proneness likelihood of statements while locating the causes of failures. In this regard, in the first phase of the dissertation, two new methods, so-called FPA-FL and Inforence, are proposed which are able to efficiently locate program faults by taking into account the static structure of the program and the static fault-proneness likelihoods of statements. Both methods use program static structure as a roadmap in order to avoid building a blind and inaccurate model which solely relies on dynamic runtime data. FPA-FL incorporates the static fault-proneness analysis into statistical fault localization and is capable of locating multiple-bugs effectively. Inforence also employs a statistical causal analysis and attempts to estimate the failure-causing effect of statements, by taking into account the program structure in the form of a causal model. In the second phase of the dissertation, we investigate the methods of reducing the dependence of statistical fault localization on data obtained from test executions. In this regard, first, a probabilistic method based on program slicing is proposed to identify and handle the coincidentally correct test cases, in both single and multiple-bug settings. Since it is impossible to accurately recognize the coincidentally correct tests, a new method based on cooperative game theory is presented that is able to effectively diminish the negative impact of coincidentally correct tests on fault localization effectiveness and can pinpoint the failure causes in existence of these tests. Finally, we have also proposed a novel statistical technique for automatic test case generation, Bayes-TDG, to assist the fault localization model in finding the location of unknown faults. The new test case generation scheme is based on mapping program control flow graph to a probabilistic network. By making inferences on the constructed network, new parameter values are generated to traverse different prime paths in a given program. To achieve failure-detection effectiveness, we propose a path selection strategy that works based on the predicted outcome of generated test cases. So, we mitigate the need for a human oracle, and the generated test suite could be directly used in fault localization. To verify the effectiveness of proposed methods, we provide the results of our experiments with different subject programs, containing seeded and real faults. The experimental results are then compared with those provided by different fault localization techniques for the both single-fault and multiple-fault programs. The experimental results prove the outperformance of our proposed methods compared to the state-of-the-art techniques.
Keywords: semantic fault, fault localization, test data generation, statistical analysis, regression, cooperative game theory, slicing, coincidental correct test case, the static fault-proneness likelihood
Kernel-based Detection of Coincidentally Correct Test Cases to Improve Fault Localization Effectiveness – 2018
arXiv preprint arXiv:1803.09226
Bayes-TDG: effective test data generation using Bayesian belief network: toward failure-detection effectiveness and maximum coverage – 2018
FPA-FL: Incorporating static fault-proneness analysis into statistical fault localization – 2018
Journal of Systems and Software 136, 39-58
A program slicing-based method for effective detection of coincidentally correct test cases – 2018
Inforence: Effective Fault Localization Based on Information-Theoretic Analysis and Statistical Causal Inference – 2017
arXiv preprint arXiv:1712.03361
Mehdi Sakhai Nia
Ph.D. Assistant professor - Member of academic staff of Buali Sina university
Thesis Title: Software Latent Fault Localization using Statistical Methods
The aim of this thesis is to estimate worst-case execution time of iterative program codes. This estimation is very important to determine the state of software execution that makes the worst use of resources. Especially in real-time embedded systems, knowing Worst-Case Exaction Time (WCET) of tasks is necessary for task scheduling. Finding the number of loop iterations and depth of recursive calls and their dependence on the program input are the main challenges for WCET estimation. The number of loop iterations depends on the execution paths within the loop body. The selected path for execution depends on the conditions within the loop body, affected by variables whose values are altered within the loop body. Using symbolic analysis, for each execution path the execution condition is extracted within the loop body by combination of the branch conditions based on the values of variable at the loop entry point. Symbolic expression representing change in the value of each variable on an execution path within the loop body is extracted via symbolic analysis on that path. One problem is to determine the value of variables at loop entry point affecting execution paths within the loop body. With respect to this problem, a parametric function is presented to determine the number of loop iterations. The parameters of this function are the variables affecting the number of loop iterations. It is clear that the execution time of programs can be varied depending on the execution conditions and program inputs. The loop WCET can be derived based on determining the maximum number of iterations and the WCET of each execution path by an integer linear programming solver. The maximum number of possible program states at a point can be a safe estimate of the number of iterations for that point. Surely, the number of iterations will be close to actual value by reasonable limitation of the possible program states. Evaluation results showed that the proposed approach has a lower computational complexity and higher accuracy in comparison with other approaches especially those that estimate the WCET for multi-path loops.
Keywords: loop bound analysis, worst-case execution time, timing analysis, real-time systems
A XML-Based Representation of Timing Information for WCET Analysis – 2014
in Journal of mathematics and computer science 8 (2014), 205-214
Modeling flow information of loops using compositional condition of controls – 2015
The Journal of Supercomputing71(2): 508-536
PLEA: Parametric loop bound estimation in WCET analysis -2016
Turkish Journal of Electrical Engineering & Computer Sciences 24 (4), 2135-2149.
Ph.D. Assistant professor - Member of academic staff of Islamic Azad University, South Tehran Branch
Thesis Title: Nested Loop Parallelization Considering Data Locality Improvement
Data locality improvement and nested loops parallelization are two complementary and competing approaches for optimizing loop nests that constitute a large portion of computation times in scientific and engineering programs. Effective parallelization techniques try to distribute the computation and necessary data across different processors, whereas data locality targets at placing data on the same processor. Therefore, data locality improvement and parallelization may demand different loop transformations. Then, an integrated approach that combines these two can generate much better results than each individual approach. While there are effective methods for each one of these, prior studies have paid less attention to address these two simultaneously. This thesis proposes a unified approach that integrates these two techniques to obtain an appropriate locality conscious loop transformation to partition the loop iteration space into outer parallel tillable loops. The approach is based on the polyhedral model to achieve a multidimensional affine scheduling as a transformation that result the largest groups of tillable loops with maximum coarse-grain parallelism, as far as possible. Furthermore, tiles will be scheduled on processor cores to exploit maximum data reuse through scheduling tiles with high volume of data sharing on the same core consecutively or on different cores with shared cache at around the same time. Our experimental results, demonstrates the effectiveness of our proposed algorithms in optimizing different computational programs.
Keywords: Nested loops parallelization, loop tiling, data locality, parallel computing
Nested-Loops Tiling for Parallelization and Locality Optimization – 2017
Computing and Informatics 36(3): 566-596 (2017)
Locality-Conscious Nested-Loops Parallelization – 2014
ETRI Journal, Volume 36, Number 1, 71(2), 124-133.
Mojtaba Vahidi Asl
Ph.D. Assistant professor - Member of academic staff of Shahid Beheshti University
Thesis Title: Software Latent Fault Localization using Statistical Methods
The aim of this thesis is to introduce a new technique for latent fault localization in software programs. To this end, the execution data of a subject program in various failing and passing runs are collected. Analyzing the data using proposed statistical methods, the location of latent faults are identified. Due to very wide variety of latent semantic faults, we have followed a stepwise approach. To achieve this, after identification and categorization of semantic faults, in the first step, three new techniques, so called NN-Graph, Stat-Slice and Hierarchy-Debug are introduced, each of them are capable to detect some specific types of faults. In the third step, a new technique, so called CERM-fl, is introduced. The basis of CERM-fl is a novel cause-effect regression model which preserves cause-effect chains of variables during the construction of the linear model of program entities. These chains are reported to programmer, as failure context for efficient fault localization. The new method uses program static structure as its roadmap in order to avoid building a blind and inaccurate model which only relies on dynamic runtime data. The new method considers the fault-prone pieces of code while building the regression model and is capable to find multiple faults in programs. An extension of CERM-fl, the so-called GCERM-fl، is also designed on graph a data input which is capable to find combinatorial faults affecting control dependence of program. In third step of our approach, we have also invented a novel statistical technique for automatic test case generation to assist the fault localization model in finding the location of unknown faults. The new test case generation scheme is based on mapping program control flow graph into a probabilistic network. By making inferences on the constructed network, new parameter values are generated to traverse different prime paths in a given program. The new discovered path data are fed into the CERM-fl model to locate previously unknown faults, in an iterative procedure.
A large number of experiments is done on standard benchmark programs and the experimental results, support our claim regarding the performance of the proposed techniques in finding latent fault locations in programs.
Keywords: semantic faults, fault localization, automatic test data generation, statistical methods, regression.
Statistical Based Slicing Method for Prioritizing Program Fault Relevant Statements – 2015
Computing and Informatics 34 (4), 823-857
Hierarchy-Debug: a scalable statistical technique for fault localization – 2014
Software Quality Journal 22 (3), 427-466
Somayeh Arabi Naraee
Ph.D. Assistant professor - Member of academic staff of Kharazme university
Thesis Title: Design and Development of Models Dedicated to Predict Software Failures at Run Time
This thesis presents a new online failure prediction model to monitor the behavior of software systems. The novelty of the proposed approach is the use of a classifier with a customized kernel function to accelerate the early detection of bugs before they can cause the program to fail. The new kernel function is built based on a novel sequence-matching technique to measure the similarities between passing and failing executions, represented as sequences of program predicates. The kernel classifier constructs a hyper-plane that optimally divides the program execution space into two regions of failing and passing executions. The hyper-plane could be further applied to detect the symptoms of failure during program execution.
This kernel classifier predicts the termination state of the program execution paths as failing or passing. This could be achieved by mapping each execution path as a vector into a feature space whose dimensions represent common sub-paths amongst failing and passing execution paths. The main dilemma is the size of feature space, affecting the speed of the classifier. One factor affecting the size of the space is the length of the features assigned to the space dimensions. The length of the features, which are the common sub-paths amongst execution paths, is affected by repeated patterns in program executions. Replacing the consecutively repeated patterns with only a single iteration in execution paths reduces the size of the common sub-paths. The number of dimensions could be reduced by removing the dimensions which have projections onto others. This thesis proposes two kernels which implicitly measure the similarity amongst execution paths in a feature space with reduced dimensionality.
Our experiments, conducted on some real software systems with different metrics, demonstrate the ability of the proposed method in early bug detection with a small overhead in program execution time compared with related approaches. The proposed failure prediction model improves the speed of the predictions while preserving accuracy.
Keywords: Online Monitoring, Early Failure Prediction, Customized Kernel Function, Kernel- Based Classification, Feature Space and Dimensionality Reduction.
Early Failure Prediction in Software Programs: Dimensionality Reduction Kernel – 2016
Computing and Informatics 35(5): 1110-1140
Software fault localization using elastic net: A new statistical approach – 2009
IET Software 6(1): 61-73
Ph.D. Associate Professor – Member and academic staff of Azad University of Mahshar
Thesis Title: Game theoretic scheduling and Task Graph Pre-Scheduling
Pre-scheduling algorithms are mainly aimed at modifying the structure of task graphs to gain optimal scheduling. Task graph scheduling is a multi-objective and NP-hard problem. In this thesis a new pre-scheduling algorithm to execute tasks on homogeneous multi-processors systems is proposed. Basically, scheduling algorithms are targeted to balance the two parameters of time and energy consumption. These two parameters are up to a certain limit in contrast with each other and improvement of one causes reduction in the other one. The limit is to allocate as many processors as there are concurrent tasks in the task graph. Above this limit, practically increasing the energy consumption or in other words, increasing the number of processors is at the same direction of increasing execution time of the task graph. The problem is to achieve the trade-off between these two parameters. In the proposed algorithm after the upper limit of the number of processors is obtained the number of processors and the execution time of the task graph are optimized through computing the trade-off between these two parameters.
In the proposed algorithm after the upper limit for the number of processors is detected, based on the game theory, the suitable number of processors for scheduling the task graph is computed. In this way, the processors are used optimally and the idle time of the processors is reduced. To achieve this, each level of the task graph is considered as a player who is aimed at parallel processing in such a way that all independent tasks are executed in parallel. The idea of Nash equilibrium in game theory is mainly applied to compute the appropriate number of processors in such a way that the idle time of the processors is reduced while their processing power is increased. Also, considering the interdependencies and communication costs, the tasks are merged as their earliest start time is reduced to its possible minimum amount. In this way, the length of the critical path of the task graph is reduced while the degree of parallelism is increased and ultimately the completion time of the task graph is reduced. To apply on the game theory each task is considered as a player who attempts to reduce its earliest start time as much as possible. Using the idea of Nash equilibrium in game theory the most suitable merge of each task with subset of its parents is computed. The proposed algorithm firstly applies pre-scheduling and transfers its results to the scheduling stage as the optimal scheduling is obtained. Our experimental result on a number of known benchmark graphs demonstrates the effect of our proposed algorithm on the improvement of scheduling algorithms.
Keywords: pre-scheduling, scheduling, game theory, Nash equilibrium, task graph.
Task graph pre-scheduling, using Nash equilibrium in game theory – 2013
The Journal of Supercomputing 64(1): 177-203(2013)
Pre-scheduling and Scheduling of Task Graph on Homogeneous Multiprocessor Systems – 2013
Journal of Advances in Computer Research 4 (1), 13-29
Freshteh Azadi Prand
Ph.D. Assistant professor - Member of academic staff of Alameh Tabatabi university
Thesis Title: Game theoretic super scheduling and resource management in Grid-federation environment
Grid computing is a term referring to the federation of computer resources from multiple administrative domains to reach a common computational goal. Grid federation, a type of Grid network, is a cooperative society of autonomous and independent cluster networks which cooperate, through exchange and share of resources, with the aim of obtaining better utilization. Since cluster resource managers within Grid federation environments are autonomous and self-interested, resource sharing may lead to the tragedy of commons. Therefore, the mechanism applied for resource scheduling and allocation mechanism should motivate the cluster resource managers within grid federation environment to share their resources. Also, it should confute them not to request above and beyond need. To achieve this, economical mechanism for scheduling and management of resources could be used. Those groups of economical mechanisms are favorite, which their application guarantees the truth revelation of the available and required resources maximize the utility of the resource managers. Our proposed solution for these difficulties is to apply both the market equilibrium and Bayesian Nash equilibrium, at the same time. The aim has been to propose a mechanism guarantying the maximum utilization of any grid manager in its resource trades provided that the truth revelation of the available and required resources is announced. The application of such a mechanism made feasible the efficient allocation of resources. The other difficulty threatening logical architecture and the efficient management of the grid resources is not to provide services in accordance with the initial schedules. Since there is no central control on cluster managers, the violent resources should be detected and sacked from the grid federation environment. However, to apply users’ information the correctness of their information should be evaluated. In this thesis a new approach to evaluate the correctness of the users’ reports about the quality of the services and the use of the reports to score and finally eliminate the violent resource managers is presented. Due to the large scale and distribution of grid federation environments, resources and their communication links are not very reliable. Different mapping of resources to the tasks results in different value for reliability of tasks set. Therefore, efficient allocation of resources requires exact evaluation of reliability of the tasks. In this thesis to evaluate the reliability of a task the reliability of its underlying resources is considered. This may lead to improvement of the economical scheduling of resource utilization.
Keywords: super-scheduling, resource management, Walrasian equilibrium, incentive compatible, credibility, reliability
A Micro-economics based resource allocation in Grid-Federation environment – 2011
Cluster Computing 14(4) 433-444
Cooperative decision making in a knowledge grid environment – 2007
Future Generation Comp. Syst. 23(8): 932-938
Ph.D. Associate professor - Member of academic staff of Shiraz University of Technology
Thesis Title: Automatic Distribution of Sequential Object-Oriented Programs Over a Dedicated Multi-Computer
With the increasing popularity of using clusters and networks of low cost computers in solving computationally intensive problems, there has been a great demand for system and application software that can provide transparent and efficient utilization of the multiple machines in a distributed system. To achieve this, in this dissertation, a novel approach called POREL (Performance-Oriented Re-modularization of Legacy programs), based on a software re-modularization technique, is proposed. POREL modularizes and distributes legacy object oriented programs over a cluster, automatically.
The main objective of program distribution in POREL is to increase the program performance in terms of speedup as much as possible. The resultant distributed program will be a set of modules obtained based on a high performance computational model. These modules can communicate with each other over a cluster. Prediction of the maximum number of required computational nodes in the cluster, the best distribution architecture for the resultant distributed program, optimizing the amount of concurrency in the distributed code and platform independent communication among the modules using a novel layered architecture, are some of the characteristics of the POREL approach.
The POERL approach not only can be applied to speed up the scientific and computationally intensive programs but it also could be applied to improve the performance of the legacy object oriented application.
In POREL, at first, the program is analyzed using an optimization algorithm to work out the best order of the program statements which may result in the highest concurrency amongst the caller and the called methods. Then, a performance evaluation function is extracted from the program call flow graph and is used to find the best re-modularization of the program classes from the point of view. The program re-modularization is performed in the DAGC environment which is a flexible clustering environment. The POREL runtime system supports the asynchronous inter-module communications over the distributed program. A structural and behavioral formal specification is used to prove this equivalence, The usefulness of the PROEL approach is demonstrated, using several real case studies,
Keywords: Cluster Computing, Re-Modularization, Software Clustering, Performance
Genetic Clustering with Constraints – 2007
Journal of Research and Practice in Information Technology 37 (1), 127
Design and Implementation of a framework for auto Automatic Modularization of Software Systems – 2005
The Journal of Supercomputing 32 (1), 71-94
A new encoding scheme and a framework to investigate genetic clustering algorithms – 2005
Journal of Research and Practice in Information Technology 37 (1), 127
Ph.D. Associate professor - Member of academic staff of university of Tabriz
Thesis Title: Multi-Dimensional Loop Parallelization Using an Evolutionary Approach
The need for high computational power in most of the scientific computations is so much that most of the conventional computers are not responsive. Therefore, the use of a number of processors at a same time was initially recommended. In this respect, multiprocessor machines under the nomination of supercomputers were presented. The most common use of supercomputers is to run programs which are developed with high expenses and their sequential executions are lengthy. The existence of such costly sequential programs and also the difficulty of writing, understanding and maintenance of parallel algorithms have motivated the use of parallelizing compilers to automatically exploit the inherent parallelism in sequential code and automatically translate it into a corresponding parallel code. It is a difficult task to design and implement parallelizing compilers, because translation of a sequential code into a corresponding high performance parallel code and scheduling of the parallel code to run on parallel processors all lead to optimization problems in different aspects.
One of the major issues in the design of parallelizing compilers is to parallelize nested loops which highly affect the performance of scientific computations. Detection of dependencies amongst nested loops iterations in sequential code and parallel execution of the independent iterations could be a vital step towards accelerating the execution of the resultant parallel code. On the other hand, the large number of dependencies and the existence of irregular iteration space for a loop complicate the construction of the corresponding parallel loop. To resolve the difficulty, the dependency vectors are defined in terms of a relatively small set of basic dependence vectors. In this way, the loop iterations space is converted into a uniform iterations space. Partitioning or in the other words tiling of the uniform iterations space with the aim of creating tiles with minimum dependencies and interrupt-less execution on parallel processors, is a clustering problem in Cartesian space. Tiling of multi-dimensional irregular iteration spaces is a very complicated problem. One of the major achievements of this research is to solve this problem. The problem of detecting tiles which could be executed in parallel is solved by applying a wave-fronts approach. Here major issues with applying a wave-front approach are to detect tiles on a same wave-front and tiles on the boundaries of irregular multi-dimensional iteration spaces and also transformation of coordinates in between a tiled space and its corresponding iteration space and vice versa. Another achievement of this research is to resolve these issues. Parallel execution of the tiles residing on a same wave-front and on the other hand assignment of dependent tiles on consecutive wave-fronts to a same processor is a considerable point while scheduling the tiles to execute on a predefined number of parallel processors. In this respect, a new extension of an approach called Block is presented to schedule multi-dimensional tiles.
Keywords: parallelizing compilers, sequential nested loops, data dependence analysis, unionization, loop tiling, parallel loop generation, loop scheduling, evolutionary approach.
Thesis Title: . . .
Thesis Abstract: . . .
Thesis Title: The generation of protection code for self-defense against attacks
Defense against software reverse engineering is very important. The reverse engineering of commercial software generates a lot of financial loss to the software developer. So the software must have a solution to defend itself.
Self-defense can also be a malware defense against antivirus. When a malware attacks an antivirus, this is a kind of self-defense. When a malware wipes itself off, it’s another form of self-defense. Even a minor change to the malware code that is done to prevent detection by antivirus is also a kind of self-defense. This kind of self-defense includes code obfuscation, polymorphism, and encryption. Malicious codes do this by using Packers, rootkits and hiding in the system.
Another way to implement self-protection is to create a patch-resistant programs.Creating patch-resistant programs make it difficult for an attacker to easily change it. This mechanism includes passive strategies such as code obfuscation and active strategies aimed at changing the function of the program or not running it in the event of a program change. The patch-resistant must be in the software domain and include common aspects of avoiding the copy.
In this thesis, a protector system will be designed and implemented in the Windows operating system to defend executable files against above issues.
Keywords: Packer, Protector, Self-Defense, Reverse Engineering, Malware.
M.Sc. (Adiban Institute of Higher Education)
Thesis Title: …