Hybrid Machine Learning-Based Approaches for Feature and Overfitting Reduction to Model Intrusion Patterns

F. Ahmadi Abkenari, A. Milani Fard, S. Khanchi.
Journal Paper Journal of Cybersecurity and Privacy, no. 3: 544-557, 2023.

Abstract

An intrusion detection system (IDS), whether as a device or software-based agent, plays a significant role in networks and systems security by continuously monitoring traffic behaviour to detect malicious activities. The literature includes IDSs that leverage models trained to detect known attack behaviours. However, such models suffer from low accuracy or high overfitting. This work aims to enhance the performance of the IDS by making a model based on the observed traffic via applying different single and ensemble classifiers and lowering the classifier’s overfitting on a reduced set of features. We implement various feature reduction techniques, including Linear Regression, LASSO, Random Forest, Boruta, and autoencoders on the CSE-CIC-IDS2018 dataset to provide a training set for classifiers, including Decision Tree, Naïve Bayes, neural networks, Random Forest, and XGBoost. Our experiments show that the Decision Tree classifier on autoencoders-based reduced sets of features yields the lowest overfitting among other combinations.

BibTeX

@article{AhmadiAbkenari2023,
title={Hybrid Machine Learning-Based Approaches for Feature and Overfitting Reduction to Model Intrusion Patterns},
author={Ahmadi Abkenari, Fatemeh and Milani Fard, Amin and Khanchi, Sara},
journal={Journal of Cybersecurity and Privacy},
volume={3},
pages={544--557},
year={2023},
publisher={MDPI}
}

The State of Ethereum Smart Contracts Security: Vulnerabilities, Countermeasures, and Tool Support

H. Zhou, A. Milani Fard, A. Makanju.
Journal Paper Journal of Cybersecurity and Privacy, no. 2: 358-378, 2022.

Abstract

Smart contracts are self-executing programs that run on the blockchain and make it possible for peers to enforce agreements without a third-party guarantee. The smart contract on Ethereum is the fundamental element of decentralized finance with billions of US dollars in value. Smart contracts cannot be changed after deployment and hence the code needs to be verified for potential vulnerabilities. However, smart contracts are far from being secure and attacks exploiting vulnerabilities that have led to losses valued in the millions. In this work, we explore the current state of smart contracts security, prevalent vulnerabilities, and security-analysis tool support, through reviewing the latest advancement and research published in the past five years. We study 13 vulnerabilities in Ethereum smart contracts and their countermeasures, and investigate nine security-analysis tools. Our findings indicate that a uniform set of smart contract vulnerability definitions does not exist in research work and bugs pertaining to the same mechanisms sometimes appear with different names. This inconsistency makes it difficult to identify, categorize, and analyze vulnerabilities. We explain some safeguarding approaches and best practices. However, as technology improves new vulnerabilities may emerge. Regarding tool support, SmartCheck, DefectChecker, contractWard, and sFuzz tools are better choices in terms of more coverage of vulnerabilities; however, tools such as NPChecker, MadMax, Osiris, and Sereum target some specific categories of vulnerabilities if required. While contractWard is relatively fast and more accurate, it can only detect pre-defined vulnerabilities. The NPChecker is slower, however, can find new vulnerability patterns.

BibTeX

@article{Zhou2022,
title={The State of Ethereum Smart Contracts Security: Vulnerabilities, Countermeasures, and Tool Support},
author={Haozhe Zhou and Amin Milani Fard and Adetokunbo Makanju},
journal={Journal of Cybersecurity and Privacy},
volume={2},
pages={358--378},
year={2022},
publisher={MDPI}
}

From Text Representation to Financial Market Prediction: A Literature Review

S. Anbaee Farimani, M. Vafaei Jahan, A. Milani Fard
Journal Paper Information, 13, no. 10: 466, 2022.

Abstract

News dissemination in social media causes fluctuations in financial markets. (Scope) Recent advanced methods in deep learning-based natural language processing have shown promising results in financial market analysis. However, understanding how to leverage large amounts of textual data alongside financial market information is important for the investors’ behavior analysis. In this study, we review over 150 publications in the field of behavioral finance that jointly investigated natural language processing (NLP) approaches and a market data analysis for financial decision support. This work differs from other reviews by focusing on applied publications in computer science and artificial intelligence that contributed to a heterogeneous information fusion for the investors’ behavior analysis. (Goal) We study various text representation methods, sentiment analysis, and information retrieval methods from heterogeneous data sources. (Findings) We present current and future research directions in text mining and deep learning for correlation analysis, forecasting, and recommendation systems in financial markets, such as stocks, cryptocurrencies, and Forex (Foreign Exchange Market).

BibTeX

@article{farimani2022information,
title={From Text Representation to Financial Market Prediction: A Literature Review},
author={Saeede Anbaee Farimania and Majid Vafaei Jahan and Amin Milani Fard},
journal={Information},
year={2022},
VOLUME = {13},
YEAR = {2022},
NUMBER = {10},
ARTICLE-NUMBER = {466},
URL = {https://www.mdpi.com/2078-2489/13/10/466},
ISSN = {2078-2489}
}

Investigating the informativeness of technical indicators and news sentiment in financial market price prediction

S. Anbaee Farimani, M. Vafaei Jahan, A. Milani Fard, S. R. K. Tabbakh
Journal Paper Knowledge-Based Systems, ISSN 0950-7051, Elsevier, 2022.

Abstract

Real-time market prediction tool tracking public opinion in specialized newsgroups and informative market data persuades investors of financial markets. Previous works mainly used lexicon-based sentiment analysis for financial markets prediction, while recently proposed transformer-based sentiment analysis promise good results for cross-domain sentiment analysis. This work considers temporal relationships between consecutive snapshots of informative market data and mood time series for market price prediction. We calculate the sentiment mood time series via the probability distribution of news embedding generated through a BERT-based transformer language model fine-tuned for financial domain sentiment analysis. We then use a deep recurrent neural network for feature extraction followed by a dense layer for price regression. We implemented our approach as an open-source API for real-time price regression. We build a corpus of financial news related to currency pairs in foreign exchange and Cryptocurrency markets. We further augment our model with informative technical indicators and news sentiment scores aligned based on news release timestamp. Results of our experiments show significant error reduction compared to the baselines. Our Financial News and Financial Sentiment Analysis RESTFul APIs are available for public use.

BibTeX

@article{farimani2022investigating,
title={Investigating the informativeness of technical indicators and news sentiment in financial market price prediction},
author={Saeede Anbaee Farimania and Majid Vafaei Jahan and Amin Milani Fard and Seyed Reza Kamel Tabbakha},
journal={Knowledge-Based Systems},
year={2022},
publisher={Elsevier}
}

Vulnerability Analysis of Similar Code

A. Piran, C. Chang, A. Milani Fard
Conference PapersThe 21st IEEE International Conference on Software Quality, Reliability and Security (QRS), 2021

Abstract

Studying frequent code vulnerabilities in similar code, such as clones, near-duplicates, forked projects, or libraries, can help in the automated detection of security flaws during the software development process. In this work, we conduct an empirical study on vulnerabilities in C/C++ code to characterize security flaws and find out if the same vulnerabilities exist in applications that share similar code or have the same business logic/domain. We analyze a code vulnerability dataset including315 projects with 3284 security issues in 10,880 functions. Our results show that vulnerable functions in 35% of the most occurring CWEs (software weaknesses types) have similar code, and 23% of projects with the same domain/category have the same vulnerabilities. We observe that the most prevalent vulnerabilities in similar code are Use After Free, Improper Access Control, Cryptographic Issues, 7PK - Security Features, DoubleFree, Cross-site Scripting, and Divide By Zero. These vulnerabilities are, however, less frequent compared to other CWEs across all subjects. Our results suggest that automated vulnerability detection tools that work based on code similarity or abstract patterns can be tailored more towards certain CWEs.

BibTeX

@inproceedings{QRS2021,
title={Vulnerability Analysis of Similar Code},
author={Piran, Azin and Chang, Che-Pin and Milani Fard, Amin},
booktitle={Proceedings of the 21st IEEE International Conference on Software Security and Reliability (QRS)},
year={2021}
}

Leveraging Latent Economic Concepts and Sentiments in the News for Market Prediction

S. Anbaee Farimani, M. Vafaei Jahan, A. Milani Fard, G. Haffari
Conference PapersThe 8th IEEE International Conference on Data Science and Advanced Analytics (DSAA), 2021

Abstract

Most of the existing news-based market prediction techniques disregard conceptual and emotional relations in the news stream. In this work, we consider the conceptual relationship between news documents using contextualized latent concept modeling as well as leveraging news sentiment and technical indicators. We present our approach as an open-source RESTFul API. We build a corpus of financial news related to currency pairs in the Foreign Exchange and Cryptocurrencies markets. Next, we apply BERT-based embedding to generate word vectors, cluster the vectors to create latent economic concepts, and propose a document representation based on the distribution of words on these concepts as well as news sentiment. We use a recurrent convolutional neural network to jointly use BERT-based text representation and technical indicators embedding for market time series prediction. We further augment our model with technical indicators using another recurrent layer. The experimental results show the superiority of our method compared to the baselines. Our MarketNews dataset, news crawler, and MarketPredict APIs are available for public use.

BibTeX

@inproceedings{farimani2021leveraging, title={Leveraging Latent Economic Concepts and Sentiments in the News for Market Prediction}, author={Farimani, Saeede Anbaee and Jahan, Majid Vafaei and Fard, Amin Milani and Haffari, Gholamreza}, booktitle={2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)}, pages={1--10}, year={2021}, organization={IEEE} }

Using Market Indicators to Eliminate Local Trends for Financial Time Series Cross-Correlation Analysis

Z. Alamatian, M. Vafaei Jahan, A. Milani Fard
Conference PapersThe 34th Canadian Conference on Artificial Intelligence (Canadian AI), 2021

Abstract

Multifractal detrended cross-correlation analysis (MFDCCA) is largely used to analyze non-stationary financial time series. Existing methods for such analysis utilize the time series itself as the detrending function with a polynomial. We propose a technique for a more accurate removal of local trends, called indicator-based MFDCCA (IMFDCCA), which leverages market technical indicators to better determine correlations between financial time series. We evaluated our method on pair trading in the Foreign Exchange Market (Forex) and our results show that the proposed IMFDCCA compared to the MFDCCA reduces the RMSE for the Hurst exponent estimation by 30%.

BibTeX

@inproceedings{alamatian2021,
title={Using Market Indicators to Eliminate Local Trends for Financial Time Series Cross-Correlation Analysis},
doi = {10.21428/594757db.8d600287},
author={Alamatian, Zohreh and Vafaei Jahan, Majid and Milani Fard, Amin},
booktitle={Proceedings of the 34th Canadian Conference on Artificial Intelligence (Canadian AI)},
url = {https://caiac.pubpub.org/pub/jtymmz06},
year={2021}
}

Pandemic Programming: How COVID-19 affects software developers and how their organizations can help

P. Ralph, S. Baltes, G. Adisaputri, R. Torkar, V. Kovalenko, M. Kalinowski, N. Novielli, S. Yoo, X. Devroey, X. Tan, M. Zhou, B. Turhan, R. Hoda, H. Hata, G. Robles, A. Milani Fard, R. Alkadhi
Journal Paper Empirical Software Engineering, 25:4927–4961, Springer US, 2020.

** Selected in the Journal-First track of the 43rd International Conference on Software Engineering (ICSE), 2021 **

Abstract

Context. As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective. This study investigates the effects of the pandemic on developers' wellbeing and productivity. Method. A questionnaire survey was created mainly from existing, validated scales and translated into 12 languages. The data was analyzed using non-parametric inferential statistics and structural equation modeling. Results. The questionnaire received 2225 usable responses from 53 countries. Factor analysis supported the validity of the scales and the structural model achieved a good fit (CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Findings include: (1) developers' wellbeing and productivity are suffering; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity; (4) women, parents and people with disabilities may be disproportionately affected. Conclusions. To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees' home offices. Women, parents and disabled persons may require extra support.

BibTeX

@article{ralph2020pandemic,
title={Pandemic Programming: How COVID-19 affects software developers and how their organizations can help},
author={Paul Ralph and Sebastian Baltes and Gianisa Adisaputri and Richard Torkar and Vladimir Kovalenko and Marcos Kalinowski and Nicole Novielli and Shin Yoo and Xavier Devroey and Xin Tan and Minghui Zhou and Burak Turhan and Rashina Hoda and Hideaki Hata and Gregorio Robles and Amin Milani Fard and Rana Alkadhi},
journal={Empirical Software Engineering},
year={2020},
publisher={Springer US}
}

SIMBA: An Efficient Simulator for Blockchain Applications

S. M. Fattahi, A. Makanju, A. Milani Fard
Conference PapersThe 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 2020

Abstract

Predicting the performance of a blockchain application during the design phase is difficult and evaluation after it is built could be expensive. The ability to simulate a blockchain network during the design stage in order to evaluate it is therefore a necessity. In this paper, we present a simulator for blockchain applications, called SIMBA (SIMulator for Blockchain Applications). SIMBA extends an existing simulator by adding the Merkle tree feature to blockchain nodes to improve efficiency and allowing more realistic evaluations not possible with the base tool to be performed. Results of our experiments show that the inclusion of Merkle trees has a high impact of up to 30 times reduction in the verification time of block transactions without an impact on block propagation delay. Since block verification is a critical part of the computational load of nodes on the network, this performance improvement significantly affects the overall performance of each node and consequently the entire network.

BibTeX

@inproceedings{SIMBA2020,
author = {Fattahi, Seyed Mehdi and Makanju, Adetokunbo and Milani Fard, Amin},
title = {{SIMBA: An Efficient Simulator for Blockchain Applications},
booktitle = {Proceedings of the International Conference on Dependable Systems and Networks (DSN)},
publisher = {IEEE},
year = {2020}
}

Relationship Prediction in Dynamic Heterogeneous Information Networks

** Best Paper Award **

A. Milani Fard, E. bagheri, K. Wang
Conference PapersThe 41st European Conference on Information Retrieval (ECIR), 2019

Abstract

Most real-world information networks, such as social networks, are heterogeneous and as such, relationships in these networks can be of different types and hence carry differing semantics. Therefore techniques for link prediction in homogeneous networks cannot be directly applied on heterogeneous ones. On the other hand, works that investigate link prediction in heterogeneous networks do not necessarily consider network dynamism in sequential time intervals. In this work we propose a technique that leverages a combination of latent and topological features to predict a target relationship between two nodes in a dynamic heterogeneous information network. Our technique, called MetaDynaMix, effectively combines meta path-based topology features and inferred latent features that incorporate temporal network changes in order to capture network (1) heterogeneity and (2) temporal evolution, when making link predictions. Our experiment results on two real-world datasets show statistically significant improvement over AUCROC and prediction accuracy compared to the state of the art techniques.

BibTeX

@inproceedings{amin:ecir19,
author = {Milani Fard, Amin and Bagheri, Ebrahim and Wang, Ke},
title = {Relationship Prediction in Dynamic Heterogeneous Information Networks},
booktitle = {Proceedings of the European Conference on Information Retrieval (ECIR)},
publisher = {Springer},
pages = {12 pages},
year = {2019}
}

JavaScript: The (Un)covered Parts

** Best Paper Award Nominee **

A. Milani Fard, A. Mesbah
Conference PapersThe 10th IEEE International Conference on Software Testing, Verification and Validation (ICST), 2017

Abstract

Testing JavaScript code is important. JavaScript has grown to be among the most popular programming languages and it is extensively used to create web applications both on the client and server. We present the first empirical study of JavaScript tests to characterize their prevalence, quality metrics (e.g. code coverage), and shortcomings. We perform our study across a representative corpus of 373 JavaScript projects, with over 5.4 million lines of JavaScript code. Our results show that 22% of the studied subjects do not have test code. About 40% of projects with JavaScript at client-side do not have a test, while this is only about 3% for the purely server-side JavaScript projects. Also tests for server-side code have high quality (in terms of code coverage, test code ratio, test commit ratio, and average number of assertions per test), while tests for client-side code have moderate to low quality. In general, tests written in Mocha, Tape, Tap, and Nodeunit frameworks have high quality and those written without using any framework have low quality. We scrutinize the (un)covered parts of the code under test to find out root causes for the uncovered code. Our results show that JavaScript tests lack proper coverage for event-dependent callbacks (36%), asynchronous callbacks (53%), and DOM-related code (63%). We believe that it is worthwhile for the developer and research community to focus on testing techniques and tools to achieve better coverage for difficult to cover JavaScript code.

BibTeX

@inproceedings{amin:icst17,
author = {Milani Fard, Amin and Mesbah, Ali},
title = {JavaScript: The (Un)covered Parts},
booktitle = {Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST)},
publisher = {IEEE},
pages = {11 pages},
year = {2017}
}

Directed Test Generation and Analysis for Web Applications

A. Milani Fard
ThesisPh.D. Thesis, University of British Columbia (UBC), January 2017

Abstract

The advent of web technologies has led to the proliferation of modern web applications with enhanced user interaction and client-side execution. JavaScript (the most widely used programming language) is extensively used to build responsive modern web applications. The event-driven and dynamic nature of JavaScript, and its interaction with the Document Object Model (DOM), make it challenging to understand and test effectively. The ultimate goal of this thesis is to improve the quality of web applications through automated testing and maintenance. The work presented in this dissertation has focused on advancing the state-of-the-art in testing and maintaining web applications by proposing a new set of techniques and tools. We proposed (1) a feedback-directed exploration technique and a tool to cover a subset of the state-space of a given web application; the exploration is guided towards achieving higher functionality, navigational, and page structural coverage while reducing the test model size, (2) a technique and a tool to generate UI tests using existing tests; it mines the existing test suite to infer a model of the covered DOM states and event-based transitions including input values and assertions; it then expands the inferred model by exploring alternative paths and generates assertions for the new states; finally it generates a new test suite from the extended model, (3) the first empirical study on JavaScript tests to characterize their prevalence and quality metrics, and to find out root causes for the uncovered (missed) parts of the code under test, (4) a DOM-based JavaScript test fixture generation technique and a tool, which is based on dynamic symbolic execution; it guides the executing through different branches of a function by producing expected DOM instances, (5) a technique and a tool to detect JavaScript code smells using static and dynamic analysis. We evaluated the presented techniques by conducting various empirical studies and comparisons. The evaluation results point to the effectiveness of the proposed techniques in terms of fault detection capability and code coverage for test generation, and in terms of accuracy for code smell detection.

BibTeX

@phdthesis{amin:thesis2017,
author = {Milani Fard, Amin},
title = {Directed test generation and analysis for web applications},
series={Electronic Theses and Dissertations (ETDs) 2008+},
url={https://open.library.ubc.ca/cIRcle/collections/24/items/1.0340953},
DOI={http://dx.doi.org/10.14288/1.0340953},
school={University of British Columbia},
year={2017},
month={Jan},
collection={Electronic Theses and Dissertations (ETDs) 2008+}
}

Generating Fixtures for JavaScript Unit Testing

A. Milani Fard, A. Mesbah, E. Wohlstadter
Conference PapersThe 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2015
** Editor Select paper in IEEE Internet Computing, March 2016 **

Abstract

In today's web applications, JavaScript code interacts with the Document Object Model (DOM) at runtime. This runtime interaction between JavaScript and the DOM is error-prone and challenging to test. In order to unit test a JavaScript function that has read/write DOM operations, a DOM instance has to be provided as a test fixture. This DOM fixture needs to be in the exact structure expected by the function under test. Otherwise, the test case can terminate prematurely due to a null exception. Generating these fixtures is challenging due to the dynamic nature of JavaScript and the hierarchical structure of the DOM. We present an automated technique, based on concolic execution, which generates test fixtures for unit testing JavaScript functions. Our approach is implemented in a tool called ConFix. Our empirical evaluation shows that ConFix can effectively generate tests that cover DOM-dependent paths. We also find that ConFix yields considerably higher coverage compared to an existing JavaScript input generation technique.

BibTeX

@inproceedings{amin:ase15,
author = {Milani Fard, Amin and Mesbah, Ali and Wohlstadter, Eric},
title = {Generating Fixtures for JavaScript Unit Testing},
booktitle = {Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE)},
publisher = {ACM},
pages = {190--200},
year = {2015}
}

An Empirical Study of Bugs in Test Code

A. Vahabzadeh, A. Milani Fard, A. Mesbah
Conference PapersThe 31st IEEE International Conference on Software Maintenance and Evolution (ICSME), 2015

Abstract

Testing aims at detecting (regression) bugs in production code. However, testing code is just as likely to contain bugs as the code it tests. Buggy test cases can silently miss bugs in the production code or loudly ring false alarms when the production code is correct. We present the first empirical study of bugs in test code to characterize their prevalence and root cause categories. We mine the bug repositories and version control systems of 211 Apache Software Foundation (ASF) projects and find 5,556 test-related bug reports. We (1) compare properties of test bugs with production bugs, such as active time and fixing effort needed, (2) qualitatively study 443 randomly sampled test bug reports in detail and categorize them based on their impact and root causes, (3) run FindBugs on the test code of the latest version of the projects to discover potential (undiscovered) bugs. Our results show that (1) around half of all the projects had bugs in their test code; (2) the majority of test bugs are false alarms, i.e., test fails while the production code is correct, while a minority of these bugs result in silent horrors, i.e., test passes while the production code is incorrect; (3) incorrect and missing assertions are the dominant root cause of silent horror bugs; (4) semantic (25%), flaky (21%), environment-related (18%) bugs are the dominant root cause categories of false alarms; (5) the majority of false alarm bugs happen in the exercise portion of the tests, and (6) developers contribute more actively to fixing test bugs and test bugs are fixed sooner compared to production bugs.

BibTeX

@inproceedings{arash:icsme15,
author = {Vahabzadeh, Arash and Milani Fard, Amin and Mesbah, Ali},
title = {An Empirical Study of Bugs in Test Code},
booktitle = {Proceedings of the International Conference on Software Maintenance and Evolution (ICSME)},
publisher = {IEEE Computer Society},
pages = {101--110},
year = {2015}
}

Neighborhood Randomization for Link Privacy in Social Network Analysis

A. Milani Fard, K. Wang
Journal PaperThe World Wide Web Journal (WWW), 24 pages, Springer US, January 2015, Volume 18, Issue 1, pp 9-32.

Abstract

Social network analysis has many important applications but it depends on sharing and publishing the underlying graph. Link privacy requires limiting the ability of an adversary to infer the presence of a sensitive link between two individuals in the published social network graph. A standard technique for achieving link privacy is to probabilistically randomize a link over the space for node pairs. A major drawback of such graph-wise randomization is that it ignores the structural proximity of nodes, thus, alters considerably the structure of social networks and distorts the accuracy of social network analysis. To address this problem, we propose a structure-aware randomization scheme, called neighborhood randomization. This scheme models a social network as a directed graph and probabilistically randomizes the destination of a link within a local neighborhood. By confining the randomization to a local neighborhood, this scheme drastically reduces the distortion to the graph structure yet hides a sensitive link. The trade-off between privacy and utility is dictated by the retention probability of a destination and by the size of the randomization neighborhood. We conduct extensive experiments to evaluate this trade-off using real life social network data.

BibTeX

@article{amin:wwwj,
title={Neighborhood randomization for link privacy in social network analysis},
author={Milani Fard, Amin and Wang, Ke},
journal={World Wide Web},
volume={18},
number={1},
pages={9--32},
year={2015},
publisher={Springer US}
}

Leveraging Existing Tests in Automated Test Generation for Web Applications

A. Milani Fard, M. Mirzaaghaei, A. Mesbah
Conference PapersThe 29th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2014

Abstract

To test web applications, developers currently write test cases in frameworks such as Selenium. On the other hand, most web test generation techniques rely on a crawler to explore the dynamic states of the application. The first approach requires much manual effort, but benefits from the domain knowledge of the developer writing the test cases. The second one is automated and systematic, but lacks the domain knowledge required to be as effective. We believe combining the two can be advantageous. In this paper, we propose to (1) mine the human knowledge present in the form of input values, event sequences, and assertions, in the human-written test suites, (2) combine that inferred knowledge with the power of automated crawling, and (3) extend the test suite for uncovered/unchecked portions of the web application under test. Our approach is implemented in a tool called Testilizer. An evaluation of our approach indicates that Testilizer (1) outperforms a random test generator, and (2) on average, can generate test suites with improvements of up to 150 percent in fault detection rate and up to 30 precent in code coverage, compared to the original test suite.

BibTeX

@inproceedings{amin:ase14,
author = {Milani Fard, Amin and Mirzaaghaei, Mehdi and Mesbah, Ali},
title = {Leveraging Existing Tests in Automated Test Generation for Web Applications},
booktitle = {Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE)},
publisher = {ACM},
pages = {67--78},
year = {2014}
}

Feedback-Directed Exploration of Web Applications to Derive Test Models

A. Milani Fard, A. Mesbah
Conference PapersThe 24th IEEE International Symposium on Software Reliability Engineering (ISSRE), 2013

Abstract

Dynamic exploration techniques play a significant role in automated web application testing and analysis. However, a general web application crawler that exhaustively explores the states can become mired in limited specific regions of the web application, yielding poor functionality coverage. In this paper, we propose a feedback-directed web application exploration technique to derive test models. While exploring, our approach dynamically measures and applies a combination of code coverage impact, navigational diversity, and structural diversity, to decide a-priori (1) which state should be expanded, and (2) which event should be exercised next to maximize the overall coverage, while minimizing the size of the test model. Our approach is implemented in a tool called FeedEx. We have empirically evaluated the efficacy of FeedEx using six web applications. The results show that our technique is successful in yielding higher coverage while reducing the size of the test model, compared to classical exhaustive techniques such as depth-first, breadth-first, and random exploration.

BibTeX

@inproceedings{amin:issre13,
author = {Milani Fard, Amin and Mesbah, Ali},
title = {Feedback-directed Exploration of Web Applications to Derive Test Models},
booktitle = {Proceedings of the International Symposium on Software Reliability Engineering (ISSRE)},
publisher = {IEEE Computer Society},
pages = {278--287},
year = {2013}
}

JSNose: Detecting JavaScript Code Smells

** Most Influential Paper Award **

A. Milani Fard, A. Mesbah
Conference PapersThe 13th IEEE International Conference on Source Code Analysis and Manipulation (SCAM), 2013

Abstract

JavaScript is a powerful and flexible prototype-based scripting language that is increasingly used by developers to create interactive web applications. The language is interpreted, dynamic, weakly-typed, and has first-class functions. In addition, it interacts with other web languages such as CSS and HTML at runtime. All these characteristics make JavaScript code particularly error-prone and challenging to write and maintain. Code smells are patterns in the source code that can adversely influence program comprehension and maintainability of the program in the long term. We propose a set of 13 JavaScript code smells, collected from various developer resources. We present a JavaScript code smell detection technique called JSNose. Our metric-based approach combines static and dynamic analysis to detect smells in client-side code. This automated technique can help developers to spot code that could benefit from refactoring. We evaluate the smell finding capabilities of our technique through an empirical study. By analyzing 11 web applications, we investigate which smells detected by JSNose are more prevalent.

BibTeX

@inproceedings{amin:scam13,
author = {Milani Fard, Amin and Mesbah, Ali},
title = {{JSNose}: Detecting {JavaScript} Code Smells},
booktitle = {Proceedings of the International Conference on Source Code Analysis and Manipulation (SCAM)},
publisher = {IEEE Computer Society},
pages = {116--125},
year = {2013}
}

Limiting Link Disclosure in Social Network Analysis through Subgraph-Wise Perturbation

A. Milani Fard, K. Wang, P. S. Yu
Conference PapersThe 15th International Conference on Extending Database Technology (EDBT), 2012

Abstract

Link disclosure between two individuals in a social network could be a privacy breach. To limit link disclosure, previous works modeled a social network as an undirected graph and randomized a link over the entire domain of links, which leads to considerable structural distortion to the graph. In this work, we address this issue in two steps. First, we model a social network as a directed graph and randomize the destination of a link while keeping the source of a link intact. The randomization ensures that, if the prior belief about the destination of a link is bounded by some threshold, the posterior belief, given the published graph, is no more than another threshold. Then, we further reduce structural distortion by a subgraph-wise perturbation in which the given graph is partitioned into several subgraphs and randomization of destination nodes is performed within each subgraph. The benefit of subgraph-wise perturbation is that it retains a destination node with a higher retention probability and replaces a destination node with a node from a local neighborhood. We study the trade-off of utility and privacy of subgraph-wise perturbation.

BibTeX

@inproceedings{amin:edbt12,
author = {Milani Fard, Amin and Wang, Ke and Yu, Philip S.},
title = {Limiting Link Disclosure in Social Network Analysis Through Subgraph-wise Perturbation},
booktitle = {Proceedings of the International Conference on Extending Database Technology (EDBT)},
year = {2012},
pages = {109--119},
publisher = {ACM}
}

Privacy Preserving Web Query Log Publishing: A Survey on Anonymization Techniques

A. Milani Fard
Technical ReportThe Computing Research Repository (CoRR), 2012

Abstract

Releasing Web query logs which contain valuable information for research or marketing, can breach the privacy of search engine users. Therefore rendering query logs to limit linking a query to an individual while preserving the data usefulness for analysis, is an important research problem. This survey provides an overview and discussion on the recent studies on this direction.

BibTeX

@article{amin:corr12,
author = {Milani Fard, Amin},
title = {Privacy Preserving Web Query Log Publishing: A Survey on Anonymization Techniques},
journal = {Computing Research Repository (CoRR)},
volume = {abs/1211.2354},
year = {2012}
}

Clustering-based Web Query Log Anonymization

A. Milani Fard
ThesisM.Sc. Thesis, Simon Fraser University (SFU), November 2010

Abstract

Web query logs data contain information which can be very useful in research or marketing, however, release of such data can seriously breach the privacy of search engine users. These privacy concerns go far beyond just the identifying information in a query such as name, address, and etc., which can refer to a particular individual. It has been shown that even non-identifying personal data can be combined with external publicly available information and pinpoint to an individual as this happened after AOL query logs release in 2006. In this work we model web query logs as unstructured transaction data and present a novel transaction anonymization technique based on clustering and generalization techniques to achieve the k-anonymity privacy. We conduct extensive experiments on the AOL query log data. Our results show that this method results in a higher data utility compared to the state of-the-art transaction anonymization methods.

BibTeX

@phdthesis{amin:thesis2010,
author = {Milani Fard, Amin},
title = {Clustering-based Web Query Log Anonymization},
series={Electronic Theses and Dissertations (ETDs) 2008+},
url={http://summit.sfu.ca/item/12814},
school={Simon Fraser University},
year={2010},
month={Nov},
collection={Electronic Theses and Dissertations (ETDs) 2008+}
}

An Effective Clustering Approach to Web Query Log Anonymization

A. Milani Fard, K. Wang
Conference PapersThe 5th International Conference on Security and Cryptography, (SECRYPT), 2010

Abstract

Web query log data contain information useful to research; however, release of such data can re-identify the search engine users issuing the queries. These privacy concerns go far beyond removing explicitly identifying information such as name and address, since non-identifying personal data can be combined with publicly available information to pinpoint to an individual. In this work we model web query logs as unstructured transaction data and present a novel transaction anonymization technique based on clustering and generalization techniques to achieve the k-anonymity privacy. We conduct extensive experiments on the AOL query log data. Our results show that this method results in a higher data utility compared to the state-of-the-art transaction anonymization methods.

BibTeX

@inproceedings{amin:secrypt10,
title={An effective clustering approach to web query log anonymization},
author={Milani Fard, Amin and and Wang, Ke},
booktitle={Proceedings of the International Conference on Security and Cryptography (SECRYPT)},
pages={109--119},
year={2010},
organization={IEEE}
}

Collaborative Mining in Multiple Social Networks Data for Criminal Group Discovery

A. Milani Fard, M. Ester
Conference PapersSymposium on Social Computing Applications, IEEE International Conference on Social Computing (SocialCom), 2009

Abstract

The hidden knowledge in social networks data can beregarded as an important resource for criminal investigations which can help finding the structure and organization of a criminal network. However such network based analysis has not been studied in an applied way and remains mostly a manual process. To assist inspectors and intelligence agencies discover this knowledge, we defined a new problem and then proposed a framework for automated network data analysis and deductionapproach from multiple social networks by converting totransaction dataset, applying association mining, and statistical methods. By applying a game theory concept in a multi-agent model, we try to design a policy for knowledge discovery and inference fusion. This approach enables police stations to build and deploy P2P applications through a unified medium for finding criminals relationship and identifying suspicious guys.

Competitive-Cooperative Automated Reasoning from Distributed and Multiple Source of Data

A. Milani Fard
Book Chapterin Data Mining and Multiagent Integration, ISBN 978-1-4419-0521-5, Springer US, 2009

Abstract

Knowledge extraction from distributed database systems, have been investigated during past decade in order to analyze billions of information records. In this work a competitive deduction approach in a heterogeneous data grid environment is proposed using classic data mining and statistical methods. By applying a game theory concept in a multi-agent model, we tried to design a policy for hierarchical knowledge discovery and inference fusion. To show the system run, a sample multi-expert system has also been developed.

BibTeX

@Inbook{Fard2009,
author="Fard, Amin Milani",
editor="Cao, Longbing",
title="Competitive-Cooperative Automated Reasoning from Distributed and Multiple Source of Data",
bookTitle="Data Mining and Multi-agent Integration",
year="2009",
publisher="Springer US",
address="Boston, MA",
pages="279--290",
isbn="978-1-4419-0522-2",
doi="10.1007/978-1-4419-0522-2_19",
url="https://doi.org/10.1007/978-1-4419-0522-2_19"
}

Evolutionary Query Optimization for Heterogeneous Distributed Database Systems

R. Ghaemi, A. Milani Fard, H. Tabatabaee, and M. Sadeghizadeh
Journal Paper International Journal of World Academy of Science, Engineering and Technology, V. 19, pp 43-49, 2008

Finding Optimal Grid Dimension for Partitioning Linguistic Variables of Fuzzy Concepts

M. A. Rigi, A. Milani Fard, M. -R. Akbarzadeh -T.
Journal Paper International Journal of Mathematics and Computer Science, ISSN 1814-0424, Volume 3, no. 2, 2008

Agent based Grid Data Mining using Game Theory and Soft Computing

A. Milani Fard
ThesisB.Sc. Thesis, Ferdowsi University of Mashhad (FUM), September 2007

Abstract

In this thesis, knowledge discovery process has been investigated including current algorithms, methods, and different architectures. Then a new multi-agent system has been designed and developed for a grid knowledge mining process. Game theory and soft computing approach is also applied to our method for the improvement of knowledge mining and representation. The project is also aim at developing a high performance search engine based on grid technology.

BibTeX

@misc{fard2007intelligent,
title={Intelligent Agent based Grid Data Mining using Game Theory and Soft Computing},
author={Milani Fard, Amin},
year={2007}
}

Game Theory based Data Mining Technique for Strategy Making of a Soccer Simulation Coach Agent

A. Milani Fard, V. Salmani, M. Naghibzadeh, S. Khajouie Nejad, H. Ahmadi
Conference PapersThe 6th International Conference on Information Systems Technology and its Applications, (ISTA), 2007

Quick Grammar Type Recognition: Concepts and Techniques

A. Milani Fard, A. Deldari, H. Deldari
Conference PapersInternational conf. on Compilers, Related Technologies and Applications (CoRTA), 2007

Microorganism DNA Pattern Search in a Multi-agent Genomic Engine Framework

M. Mohebbi, M. -R. Akbarzadeh -T., A. Milani Fard
Journal PaperWorld Applied Sciences Journal, V. 2 N. 6, 2007, Also a poster in International Conference on Bioinformatics (InCoB), 2007

Multi-agent Data Fusion Architecture for Intelligent Web Information Retrieval

A. Milani Fard, M. Kahani, R. Ghaemi, H. Tabatabaee
Journal PaperInternational Journal of Intelligent Systems and Technologies, V. 2, N. 3, 2007

Kavosh: An Intelligent Neuro-Fuzzy Search Engine

A. Milani Fard, R. Ghaemi, M. -R. Akbarzadeh -T., H. Akbari
Conference Papers7th IEEE International Conference on Intelligent Systems Design and Application (ISDA), 2007

Fuzzy Adaptive Resonance Theory for Content-Based Data Retrieval

A. Milani Fard, H. Akbari, M. -R. Akbarzadeh -T
Conference PapersIEEE International Conference on Innovations in Information Technology (IIT), 2006

A New Genetic Algorithm Approach for Secure JPEG Steganography

A. Milani Fard, M. -R. Akbarzadeh -T, F. Varasteh -A
Conference PapersIEEE International Conference on Engineering of Intelligent Systems (ICEIS), 2006