I conduct (multi/inter)disciplinary research in Software Engineering, Data Science, and Cybersecurity.

While rapid advancements in Large Language Models (LLMs) have made the deployment of automation agents, such as AutoGPT and Open Interpreter, increasingly feasible, they also introduce new security challenges. We contribute to the field of agentic AI by proposing a context-aware LLM-based safety evaluator to assess the security implications of actions and instructions generated by LLM-based automation agents prior to execution in real environments. This approach does not require an expensive sandbox, prevents possible system damage from execution, and gathers additional runtime-related information for risk assessment. Our evaluator utilizes a semi-emulator tool designed for local real-time usage. Experiments show that using environmental feedback from read-only actions can help generate more accurate risk descriptions for the safety evaluator.
Decision support systems use LLM embeddings to convert market data into actionable trading insights. However, current financial prediction models often overlook valuable information within short-term intervals (e.g. 4 hours) within longer ones (e.g. a day). The dissemination of news within these shorter periods significantly impacts market movements even for multiple days. This study aims to determine the effectiveness of incorporating fine-grained information into market prediction models that traditionally rely on coarse-grained data. We design a neural network to simultaneously attend to important news and influential indicators present in short-term time slices as well as benefit from market data available in long-term timeframes. With the advancement of contrastive learning-based NLP, we utilize the Angle-optimized Embedding (AoE) sentence transformer for news representation, which generates discriminative embeddings leveraging angle-optimized loss. Besides, to tackle the problem of non-stationary series regression, we employed reversible instance normalization. Comparative results with baseline articles within Forex and cryptocurrencies demonstrate the superiority of the proposed method. Our ablation studies demonstrate that the simultaneous use of financial market data in both fine-grained hourly time slices and coarse-grained daily time slices improves prediction accuracy by up to 60\%. Additionally, utilizing the AoE method to generate informative vector representations for news documents outperformed other embeddings by up to 9.5%.
In this work, we categorize a financial time series into a number of subseries with similar behavior to increase prediction accuracy by learning the subseries category. We create a deep learning model for each category based on the attention mechanism to predict its next step. Due to the limited amount of cryptocurrency data for training models, if the number of categories increases, the amount of training data for each model will decrease, and some complex models will not be trained well due to the large number of parameters. To overcome this challenge, we propose to combine the time series data of other cryptocurrencies to increase the amount of data for each category, thus increasing the precision of the models corresponding to each category.
While Local Differential Privacy (LDP) offers strong privacy guarantees for IoT data collection, users often struggle to understand its implications and control their privacy settings. This paper presents a user-centric approach to implementing LDP in smart home environments, focusing on voice command privacy. We analyze privacy control patterns across major smart home platforms and propose a novel interface that translates complex LDP parameters into four intuitive privacy levels. The interface combines visual controls with concrete examples showing how privacy transformations affect voice commands. By mapping mathematical privacy parameters to user-friendly settings while maintaining theoretical guarantees, our approach explores making differential privacy more accessible in IoT environments. We validated our design through a usability study to understand its strengths in accessibility and key areas for refinement.
                                                        @article{Li2025,
                                                            title={Enhancing User Experience with Visual Controls for Local Differential Privacy},
                                                            author={Li, Xueting and Dong, Shiyao and Milani Fard, Amin},
                                                            journal={Journal of Cybersecurity and Privacy},
                                                            volume={5},
                                                            pages={36},
                                                            year={2025},
                                                            publisher={MDPI}
                                                            }
                                                        
Digital fingerprints have brought great convenience and benefits to many online businesses. However, they pose a significant threat to the privacy and security of ordinary users. In this paper, we investigate the effectiveness of current anti-tracking methods against digital fingerprints and design a browser extension that can effectively resist digital fingerprints and record the website's collection of digital fingerprint-related information.
                                                            @article{lin2025browser,
                                                            author={Lin, Kaitong and Cao, Huazhu and Milani Fard, Amin},
                                                            title={Browser Fingerprint Detection and Anti-Tracking},
                                                            journal={arXiv preprint arXiv:2502.14326},
                                                            year = {2025}
                                                            }
                                                    
Organizing and managing cryptocurrency portfolios and decision-making on transactions is crucial in this market. Optimal selection of assets is one of the main challenges that requires accurate prediction of the price of cryptocurrencies. In this work, we categorize the financial time series into several similar subseries to increase prediction accuracy by learning each subseries category with similar behavior. For each category of the subseries, we create a deep learning model based on the attention mechanism to predict the next step of each subseries. Due to the limited amount of cryptocurrency data for training models, if the number of categories increases, the amount of training data for each model will decrease, and some complex models will not be trained well due to the large number of parameters. To overcome this challenge, we propose to combine the time series data of other cryptocurrencies to increase the amount of data for each category, hence increasing the accuracy of the models corresponding to each category.
                                                            @article{peik2024leveraging,
                                                            author={Peik, Arash and Zare Chahooki, Mohammad Ali and Milani Fard, Amin and Agha Sarram, Mehdi},
                                                            title={Leveraging Time Series Categorization and Temporal Fusion Transformers to Improve Cryptocurrency Price Forecasting},
                                                            journal={arXiv preprint arXiv:2412.14529},
                                                            year      = {2024}
                                                            }
                                                    
JavaScript has been consistently among the most popular programming languages in the past decade. However, its dynamic, weakly-typed, and asynchronous nature can make it challenging to write maintainable code for developers without in-depth knowledge of the language. Consequently, many JavaScript applications tend to contain code smells that adversely influence program comprehension, maintenance, and debugging. Due to the widespread usage of JavaScript, code security is an important matter. While JavaScript code smells and detection techniques have been studied in the past, current work on security smells for JavaScript is scarce. Security code smells are coding patterns indicative of potential vulnerabilities or security weaknesses. Identifying security code smells can help developers to focus on areas where additional security measures may be needed. We present a set of 24 JavaScript security code smells, map them to a possible security awareness defined by Common Weakness Enumeration (CWE), explain possible refactoring, and explain our detection mechanism. We implement our security code smell detection on top of an existing open source tool that was proposed to detect general code smells in JavaScript.
                                                            @article{2024characterizing,
                                                            author={Kambhampati, Vikas and Mohammed, Nehaz Hussain and Milani Fard, Amin},
                                                            title={Characterizing JavaScript Security Code Smells},
                                                            journal={arXiv preprint arXiv:2411.19358},
                                                            year      = {2024}
                                                            }
                                                    
Investors’ trading behavior is influenced by a multimode of information sources such as technical analysis, news dissemination, and sentiment, which results in the non-stationary behavior of financial time series. With advancements in deep learning, studies considering temporal relationships in each data mode and applying heterogeneous data fusion techniques for market prediction are increasing. While net price change prediction is helpful for investors, most previous deep learning models only predict the up/down trend of price as the non-stationary behavior of price time series influences the regression performance. In this work, we present an adaptive model for price regression, which learns interdependencies between the distribution of multimode data and the amount of price change around an average price for snapshots of systems. We use news content, the mood in specialized newsgroups, and technical indicators for data representation. Different news topics, also known as modalities, can be absorbed by investors with different diffusion speeds; hence we use a concept-based news representation method that reflects news topics in a news vector. Also, our model considers the positive/negative mood in specialized newsgroups and technical indicators. To capture complex temporal characteristics in the distribution of economic concepts in the news sequence, we use a recurrent convolutional neural network and other recurrent layers to perceive changes in technical indicators and mood in specialized newsgroups. In the fusion layer, our model learns to normalize data points based on their estimated distribution and the importance weight of each data mode to handle multimodality challenges. To overcome the non-stationary behavior of price, we let the network learn how to drift the predicted values around the average price of that packet. Our experiments demonstrate a significant 40.11% error reduction compared to the baselines. We also discuss the adaptability, and price prediction capability of our proposed approach.
                                                        @article{AhmadiAbkenari2023,
                                                            title={An Adaptive Multimodal Learning Model for Financial Market Price Prediction},
                                                            author={Anbaee Farimani, Saeede and Jahan, Majid Vafaei and Milani Fard, Amin},
                                                            journal={IEEE Access},
                                                            volume={12},
                                                            pages={121846--121863},
                                                            year={2024},
                                                            doi={10.1109/ACCESS.2024.3441029},
                                                            publisher={IEEE}
                                                            }
                                                        
An intrusion detection system (IDS), whether as a device or software-based agent, plays a significant role in networks and systems security by continuously monitoring traffic behaviour to detect malicious activities. The literature includes IDSs that leverage models trained to detect known attack behaviours. However, such models suffer from low accuracy or high overfitting. This work aims to enhance the performance of the IDS by making a model based on the observed traffic via applying different single and ensemble classifiers and lowering the classifier’s overfitting on a reduced set of features. We implement various feature reduction techniques, including Linear Regression, LASSO, Random Forest, Boruta, and autoencoders on the CSE-CIC-IDS2018 dataset to provide a training set for classifiers, including Decision Tree, Naïve Bayes, neural networks, Random Forest, and XGBoost. Our experiments show that the Decision Tree classifier on autoencoders-based reduced sets of features yields the lowest overfitting among other combinations.
                                                        @article{AhmadiAbkenari2023,
                                                            title={Hybrid Machine Learning-Based Approaches for Feature and Overfitting Reduction to Model Intrusion Patterns},
                                                            author={Ahmadi Abkenari, Fatemeh and Milani Fard, Amin and Khanchi, Sara},
                                                            journal={Journal of Cybersecurity and Privacy},
                                                            volume={3},
                                                            pages={544--557},
                                                            year={2023},
                                                            publisher={MDPI}
                                                            }
                                                        
Smart contracts are self-executing programs that run on the blockchain and make it possible for peers to enforce agreements without a third-party guarantee. The smart contract on Ethereum is the fundamental element of decentralized finance with billions of US dollars in value. Smart contracts cannot be changed after deployment and hence the code needs to be verified for potential vulnerabilities. However, smart contracts are far from being secure and attacks exploiting vulnerabilities that have led to losses valued in the millions. In this work, we explore the current state of smart contracts security, prevalent vulnerabilities, and security-analysis tool support, through reviewing the latest advancement and research published in the past five years. We study 13 vulnerabilities in Ethereum smart contracts and their countermeasures, and investigate nine security-analysis tools. Our findings indicate that a uniform set of smart contract vulnerability definitions does not exist in research work and bugs pertaining to the same mechanisms sometimes appear with different names. This inconsistency makes it difficult to identify, categorize, and analyze vulnerabilities. We explain some safeguarding approaches and best practices. However, as technology improves new vulnerabilities may emerge. Regarding tool support, SmartCheck, DefectChecker, contractWard, and sFuzz tools are better choices in terms of more coverage of vulnerabilities; however, tools such as NPChecker, MadMax, Osiris, and Sereum target some specific categories of vulnerabilities if required. While contractWard is relatively fast and more accurate, it can only detect pre-defined vulnerabilities. The NPChecker is slower, however, can find new vulnerability patterns.
                                                        @article{Zhou2022,
                                                            title={The State of Ethereum Smart Contracts Security: Vulnerabilities, Countermeasures, and Tool Support},
                                                            author={Haozhe Zhou and Amin Milani Fard and Adetokunbo Makanju},
                                                            journal={Journal of Cybersecurity and Privacy},
                                                            volume={2},
                                                            pages={358--378},
                                                            year={2022},
                                                            publisher={MDPI}
                                                            }
                                                        
News dissemination in social media causes fluctuations in financial markets. (Scope) Recent advanced methods in deep learning-based natural language processing have shown promising results in financial market analysis. However, understanding how to leverage large amounts of textual data alongside financial market information is important for the investors’ behavior analysis. In this study, we review over 150 publications in the field of behavioral finance that jointly investigated natural language processing (NLP) approaches and a market data analysis for financial decision support. This work differs from other reviews by focusing on applied publications in computer science and artificial intelligence that contributed to a heterogeneous information fusion for the investors’ behavior analysis. (Goal) We study various text representation methods, sentiment analysis, and information retrieval methods from heterogeneous data sources. (Findings) We present current and future research directions in text mining and deep learning for correlation analysis, forecasting, and recommendation systems in financial markets, such as stocks, cryptocurrencies, and Forex (Foreign Exchange Market).
                                                        @article{farimani2022information,
                                                            title={From Text Representation to Financial Market Prediction: A Literature Review},
                                                            author={Saeede Anbaee Farimania and Majid Vafaei Jahan and Amin Milani Fard},
                                                            journal={Information},
                                                            year={2022},
                                                            VOLUME = {13},
                                                            YEAR = {2022},
                                                            NUMBER = {10},
                                                            ARTICLE-NUMBER = {466},
                                                            URL = {https://www.mdpi.com/2078-2489/13/10/466},
                                                            ISSN = {2078-2489}
                                                            }
                                                        
Real-time market prediction tool tracking public opinion in specialized newsgroups and informative market data persuades investors of financial markets. Previous works mainly used lexicon-based sentiment analysis for financial markets prediction, while recently proposed transformer-based sentiment analysis promise good results for cross-domain sentiment analysis. This work considers temporal relationships between consecutive snapshots of informative market data and mood time series for market price prediction. We calculate the sentiment mood time series via the probability distribution of news embedding generated through a BERT-based transformer language model fine-tuned for financial domain sentiment analysis. We then use a deep recurrent neural network for feature extraction followed by a dense layer for price regression. We implemented our approach as an open-source API for real-time price regression. We build a corpus of financial news related to currency pairs in foreign exchange and Cryptocurrency markets. We further augment our model with informative technical indicators and news sentiment scores aligned based on news release timestamp. Results of our experiments show significant error reduction compared to the baselines. Our Financial News and Financial Sentiment Analysis RESTFul APIs are available for public use.
                                                        @article{farimani2022investigating,
                                                            title={Investigating the informativeness of technical indicators and news sentiment in financial market price prediction},
                                                            author={Saeede Anbaee Farimania and Majid Vafaei Jahan and Amin Milani Fard and Seyed Reza Kamel Tabbakha},
                                                            journal={Knowledge-Based Systems},
                                                            year={2022},
                                                            publisher={Elsevier}
                                                            }
                                                        
Studying frequent code vulnerabilities in similar code, such as clones, near-duplicates, forked projects, or libraries, can help in the automated detection of security flaws during the software development process. In this work, we conduct an empirical study on vulnerabilities in C/C++ code to characterize security flaws and find out if the same vulnerabilities exist in applications that share similar code or have the same business logic/domain. We analyze a code vulnerability dataset including315 projects with 3284 security issues in 10,880 functions. Our results show that vulnerable functions in 35% of the most occurring CWEs (software weaknesses types) have similar code, and 23% of projects with the same domain/category have the same vulnerabilities. We observe that the most prevalent vulnerabilities in similar code are Use After Free, Improper Access Control, Cryptographic Issues, 7PK - Security Features, DoubleFree, Cross-site Scripting, and Divide By Zero. These vulnerabilities are, however, less frequent compared to other CWEs across all subjects. Our results suggest that automated vulnerability detection tools that work based on code similarity or abstract patterns can be tailored more towards certain CWEs.
                                                        @inproceedings{QRS2021,
                                                            title={Vulnerability Analysis of Similar Code},
                                                            author={Piran, Azin and Chang, Che-Pin and Milani Fard, Amin},
                                                            booktitle={Proceedings of the 21st IEEE International Conference on Software Security and Reliability (QRS)},
                                                            year={2021}
                                                        }
                                                        
Most of the existing news-based market prediction techniques disregard conceptual and emotional relations in the news stream. In this work, we consider the conceptual relationship between news documents using contextualized latent concept modeling as well as leveraging news sentiment and technical indicators. We present our approach as an open-source RESTFul API. We build a corpus of financial news related to currency pairs in the Foreign Exchange and Cryptocurrencies markets. Next, we apply BERT-based embedding to generate word vectors, cluster the vectors to create latent economic concepts, and propose a document representation based on the distribution of words on these concepts as well as news sentiment. We use a recurrent convolutional neural network to jointly use BERT-based text representation and technical indicators embedding for market time series prediction. We further augment our model with technical indicators using another recurrent layer. The experimental results show the superiority of our method compared to the baselines. Our MarketNews dataset, news crawler, and MarketPredict APIs are available for public use.
@inproceedings{farimani2021leveraging, title={Leveraging Latent Economic Concepts and Sentiments in the News for Market Prediction}, author={Farimani, Saeede Anbaee and Jahan, Majid Vafaei and Fard, Amin Milani and Haffari, Gholamreza}, booktitle={2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)}, pages={1--10}, year={2021}, organization={IEEE} }
Multifractal detrended cross-correlation analysis (MFDCCA) is largely used to analyze non-stationary financial time series. Existing methods for such analysis utilize the time series itself as the detrending function with a polynomial. We propose a technique for a more accurate removal of local trends, called indicator-based MFDCCA (IMFDCCA), which leverages market technical indicators to better determine correlations between financial time series. We evaluated our method on pair trading in the Foreign Exchange Market (Forex) and our results show that the proposed IMFDCCA compared to the MFDCCA reduces the RMSE for the Hurst exponent estimation by 30%.
                                                        @inproceedings{alamatian2021,
                                                            title={Using Market Indicators to Eliminate Local Trends for Financial Time Series Cross-Correlation Analysis},
                                                            doi = {10.21428/594757db.8d600287},
                                                            author={Alamatian, Zohreh and Vafaei Jahan, Majid and Milani Fard, Amin},
                                                            booktitle={Proceedings of the 34th Canadian Conference on Artificial Intelligence (Canadian AI)},
                                                            url = {https://caiac.pubpub.org/pub/jtymmz06},
                                                            year={2021}
                                                        }
                                                        
** Selected in the Journal-First track of the 43rd International Conference on Software Engineering (ICSE), 2021 **
Context. As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective. This study investigates the effects of the pandemic on developers' wellbeing and productivity. Method. A questionnaire survey was created mainly from existing, validated scales and translated into 12 languages. The data was analyzed using non-parametric inferential statistics and structural equation modeling. Results. The questionnaire received 2225 usable responses from 53 countries. Factor analysis supported the validity of the scales and the structural model achieved a good fit (CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Findings include: (1) developers' wellbeing and productivity are suffering; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity; (4) women, parents and people with disabilities may be disproportionately affected. Conclusions. To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees' home offices. Women, parents and disabled persons may require extra support.
                                                        @article{ralph2020pandemic,
                                                            title={Pandemic Programming: How COVID-19 affects software developers and how their organizations can help},
                                                            author={Paul Ralph and Sebastian Baltes and Gianisa Adisaputri and Richard Torkar and Vladimir Kovalenko and Marcos Kalinowski and Nicole Novielli and Shin Yoo and Xavier Devroey and Xin Tan and Minghui Zhou and Burak Turhan and Rashina Hoda and Hideaki Hata and Gregorio Robles and Amin Milani Fard and Rana Alkadhi},
                                                            journal={Empirical Software Engineering},
                                                            year={2020},
                                                            publisher={Springer US}
                                                            }
                                                        
Predicting the performance of a blockchain application during the design phase is difficult and evaluation after it is built could be expensive. The ability to simulate a blockchain network during the design stage in order to evaluate it is therefore a necessity. In this paper, we present a simulator for blockchain applications, called SIMBA (SIMulator for Blockchain Applications). SIMBA extends an existing simulator by adding the Merkle tree feature to blockchain nodes to improve efficiency and allowing more realistic evaluations not possible with the base tool to be performed. Results of our experiments show that the inclusion of Merkle trees has a high impact of up to 30 times reduction in the verification time of block transactions without an impact on block propagation delay. Since block verification is a critical part of the computational load of nodes on the network, this performance improvement significantly affects the overall performance of each node and consequently the entire network.
                                                         @inproceedings{SIMBA2020,
                                                            author = {Fattahi, Seyed Mehdi and Makanju, Adetokunbo and Milani Fard, Amin},
                                                            title = {{SIMBA: An Efficient Simulator for Blockchain Applications},
                                                            booktitle = {Proceedings of the International Conference on Dependable Systems and Networks (DSN)},
                                                            publisher = {IEEE},
                                                            year = {2020}
                                                            }
                                                        
** Best Paper Award **
Most real-world information networks, such as social networks, are heterogeneous and as such, relationships in these networks can be of different types and hence carry differing semantics. Therefore techniques for link prediction in homogeneous networks cannot be directly applied on heterogeneous ones. On the other hand, works that investigate link prediction in heterogeneous networks do not necessarily consider network dynamism in sequential time intervals. In this work we propose a technique that leverages a combination of latent and topological features to predict a target relationship between two nodes in a dynamic heterogeneous information network. Our technique, called MetaDynaMix, effectively combines meta path-based topology features and inferred latent features that incorporate temporal network changes in order to capture network (1) heterogeneity and (2) temporal evolution, when making link predictions. Our experiment results on two real-world datasets show statistically significant improvement over AUCROC and prediction accuracy compared to the state of the art techniques.
                                                        @inproceedings{amin:ecir19,
                                                        author = {Milani Fard, Amin and Bagheri, Ebrahim and Wang, Ke},
                                                        title = {Relationship Prediction in Dynamic Heterogeneous Information Networks},
                                                        booktitle = {Proceedings of the European Conference on Information Retrieval (ECIR)},
                                                        publisher = {Springer},
                                                        pages = {12 pages},
                                                        year = {2019}
                                                        }
** Best Paper Award Nominee **
Testing JavaScript code is important. JavaScript has grown to be among the most popular programming languages and it is extensively used to create web applications both on the client and server. We present the first empirical study of JavaScript tests to characterize their prevalence, quality metrics (e.g. code coverage), and shortcomings. We perform our study across a representative corpus of 373 JavaScript projects, with over 5.4 million lines of JavaScript code. Our results show that 22% of the studied subjects do not have test code. About 40% of projects with JavaScript at client-side do not have a test, while this is only about 3% for the purely server-side JavaScript projects. Also tests for server-side code have high quality (in terms of code coverage, test code ratio, test commit ratio, and average number of assertions per test), while tests for client-side code have moderate to low quality. In general, tests written in Mocha, Tape, Tap, and Nodeunit frameworks have high quality and those written without using any framework have low quality. We scrutinize the (un)covered parts of the code under test to find out root causes for the uncovered code. Our results show that JavaScript tests lack proper coverage for event-dependent callbacks (36%), asynchronous callbacks (53%), and DOM-related code (63%). We believe that it is worthwhile for the developer and research community to focus on testing techniques and tools to achieve better coverage for difficult to cover JavaScript code.
                                                        @inproceedings{amin:icst17,
                                                        author = {Milani Fard, Amin and Mesbah, Ali},
                                                        title = {JavaScript: The (Un)covered Parts},
                                                        booktitle = {Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST)},
                                                        publisher = {IEEE},
                                                        pages = {11 pages},
                                                        year = {2017}
                                                        }
The advent of web technologies has led to the proliferation of modern web applications with enhanced user interaction and client-side execution. JavaScript (the most widely used programming language) is extensively used to build responsive modern web applications. The event-driven and dynamic nature of JavaScript, and its interaction with the Document Object Model (DOM), make it challenging to understand and test effectively. The ultimate goal of this thesis is to improve the quality of web applications through automated testing and maintenance. The work presented in this dissertation has focused on advancing the state-of-the-art in testing and maintaining web applications by proposing a new set of techniques and tools. We proposed (1) a feedback-directed exploration technique and a tool to cover a subset of the state-space of a given web application; the exploration is guided towards achieving higher functionality, navigational, and page structural coverage while reducing the test model size, (2) a technique and a tool to generate UI tests using existing tests; it mines the existing test suite to infer a model of the covered DOM states and event-based transitions including input values and assertions; it then expands the inferred model by exploring alternative paths and generates assertions for the new states; finally it generates a new test suite from the extended model, (3) the first empirical study on JavaScript tests to characterize their prevalence and quality metrics, and to find out root causes for the uncovered (missed) parts of the code under test, (4) a DOM-based JavaScript test fixture generation technique and a tool, which is based on dynamic symbolic execution; it guides the executing through different branches of a function by producing expected DOM instances, (5) a technique and a tool to detect JavaScript code smells using static and dynamic analysis. We evaluated the presented techniques by conducting various empirical studies and comparisons. The evaluation results point to the effectiveness of the proposed techniques in terms of fault detection capability and code coverage for test generation, and in terms of accuracy for code smell detection.
                                                        @phdthesis{amin:thesis2017,
                                                        author = {Milani Fard, Amin},
                                                        title = {Directed test generation and analysis for web applications},
                                                        series={Electronic Theses and Dissertations (ETDs) 2008+},
                                                        url={https://open.library.ubc.ca/cIRcle/collections/24/items/1.0340953},
                                                        DOI={http://dx.doi.org/10.14288/1.0340953},
                                                        school={University of British Columbia},
                                                        year={2017},
                                                        month={Jan},
                                                        collection={Electronic Theses and Dissertations (ETDs) 2008+}
                                                        }
In today's web applications, JavaScript code interacts with the Document Object Model (DOM) at runtime. This runtime interaction between JavaScript and the DOM is error-prone and challenging to test. In order to unit test a JavaScript function that has read/write DOM operations, a DOM instance has to be provided as a test fixture. This DOM fixture needs to be in the exact structure expected by the function under test. Otherwise, the test case can terminate prematurely due to a null exception. Generating these fixtures is challenging due to the dynamic nature of JavaScript and the hierarchical structure of the DOM. We present an automated technique, based on concolic execution, which generates test fixtures for unit testing JavaScript functions. Our approach is implemented in a tool called ConFix. Our empirical evaluation shows that ConFix can effectively generate tests that cover DOM-dependent paths. We also find that ConFix yields considerably higher coverage compared to an existing JavaScript input generation technique.
                                                        @inproceedings{amin:ase15,
                                                        author = {Milani Fard, Amin and Mesbah, Ali and Wohlstadter, Eric},
                                                        title = {Generating Fixtures for JavaScript Unit Testing},
                                                        booktitle = {Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE)},
                                                        publisher = {ACM},
                                                        pages = {190--200},
                                                        year = {2015}
                                                        }
                                                    
Testing aims at detecting (regression) bugs in production code. However, testing code is just as likely to contain bugs as the code it tests. Buggy test cases can silently miss bugs in the production code or loudly ring false alarms when the production code is correct. We present the first empirical study of bugs in test code to characterize their prevalence and root cause categories. We mine the bug repositories and version control systems of 211 Apache Software Foundation (ASF) projects and find 5,556 test-related bug reports. We (1) compare properties of test bugs with production bugs, such as active time and fixing effort needed, (2) qualitatively study 443 randomly sampled test bug reports in detail and categorize them based on their impact and root causes, (3) run FindBugs on the test code of the latest version of the projects to discover potential (undiscovered) bugs. Our results show that (1) around half of all the projects had bugs in their test code; (2) the majority of test bugs are false alarms, i.e., test fails while the production code is correct, while a minority of these bugs result in silent horrors, i.e., test passes while the production code is incorrect; (3) incorrect and missing assertions are the dominant root cause of silent horror bugs; (4) semantic (25%), flaky (21%), environment-related (18%) bugs are the dominant root cause categories of false alarms; (5) the majority of false alarm bugs happen in the exercise portion of the tests, and (6) developers contribute more actively to fixing test bugs and test bugs are fixed sooner compared to production bugs.
                                                         @inproceedings{arash:icsme15,
                                                            author = {Vahabzadeh, Arash and Milani Fard, Amin and Mesbah, Ali},
                                                            title = {An Empirical Study of Bugs in Test Code},
                                                            booktitle = {Proceedings of the International Conference on Software Maintenance and Evolution (ICSME)},
                                                            publisher = {IEEE Computer Society},
                                                            pages = {101--110},
                                                            year = {2015}
                                                            }
                                                        
Social network analysis has many important applications but it depends on sharing and publishing the underlying graph. Link privacy requires limiting the ability of an adversary to infer the presence of a sensitive link between two individuals in the published social network graph. A standard technique for achieving link privacy is to probabilistically randomize a link over the space for node pairs. A major drawback of such graph-wise randomization is that it ignores the structural proximity of nodes, thus, alters considerably the structure of social networks and distorts the accuracy of social network analysis. To address this problem, we propose a structure-aware randomization scheme, called neighborhood randomization. This scheme models a social network as a directed graph and probabilistically randomizes the destination of a link within a local neighborhood. By confining the randomization to a local neighborhood, this scheme drastically reduces the distortion to the graph structure yet hides a sensitive link. The trade-off between privacy and utility is dictated by the retention probability of a destination and by the size of the randomization neighborhood. We conduct extensive experiments to evaluate this trade-off using real life social network data.
                                                        @article{amin:wwwj,
                                                            title={Neighborhood randomization for link privacy in social network analysis},
                                                            author={Milani Fard, Amin and Wang, Ke},
                                                            journal={World Wide Web},
                                                            volume={18},
                                                            number={1},
                                                            pages={9--32},
                                                            year={2015},
                                                            publisher={Springer US}
                                                            }
                                                        
To test web applications, developers currently write test cases in frameworks such as Selenium. On the other hand, most web test generation techniques rely on a crawler to explore the dynamic states of the application. The first approach requires much manual effort, but benefits from the domain knowledge of the developer writing the test cases. The second one is automated and systematic, but lacks the domain knowledge required to be as effective. We believe combining the two can be advantageous. In this paper, we propose to (1) mine the human knowledge present in the form of input values, event sequences, and assertions, in the human-written test suites, (2) combine that inferred knowledge with the power of automated crawling, and (3) extend the test suite for uncovered/unchecked portions of the web application under test. Our approach is implemented in a tool called Testilizer. An evaluation of our approach indicates that Testilizer (1) outperforms a random test generator, and (2) on average, can generate test suites with improvements of up to 150 percent in fault detection rate and up to 30 precent in code coverage, compared to the original test suite.
                                                          @inproceedings{amin:ase14,
                                                            author = {Milani Fard, Amin and Mirzaaghaei, Mehdi and Mesbah, Ali},
                                                            title = {Leveraging Existing Tests in Automated Test Generation for Web Applications},
                                                            booktitle = {Proceedings of the IEEE/ACM International Conference on Automated Software Engineering (ASE)},
                                                            publisher = {ACM},
                                                            pages = {67--78},
                                                            year = {2014}
                                                            }
                                                    
Dynamic exploration techniques play a significant role in automated web application testing and analysis. However, a general web application crawler that exhaustively explores the states can become mired in limited specific regions of the web application, yielding poor functionality coverage. In this paper, we propose a feedback-directed web application exploration technique to derive test models. While exploring, our approach dynamically measures and applies a combination of code coverage impact, navigational diversity, and structural diversity, to decide a-priori (1) which state should be expanded, and (2) which event should be exercised next to maximize the overall coverage, while minimizing the size of the test model. Our approach is implemented in a tool called FeedEx. We have empirically evaluated the efficacy of FeedEx using six web applications. The results show that our technique is successful in yielding higher coverage while reducing the size of the test model, compared to classical exhaustive techniques such as depth-first, breadth-first, and random exploration.
                                                            @inproceedings{amin:issre13,
                                                                author = {Milani Fard, Amin and Mesbah, Ali},
                                                                title = {Feedback-directed Exploration of Web Applications to Derive Test Models},
                                                                booktitle = {Proceedings of the International Symposium on Software Reliability Engineering (ISSRE)},
                                                                publisher = {IEEE Computer Society},
                                                                pages = {278--287},
                                                                year = {2013}
                                                                }
                                                    
** Most Influential Paper Award **
JavaScript is a powerful and flexible prototype-based scripting language that is increasingly used by developers to create interactive web applications. The language is interpreted, dynamic, weakly-typed, and has first-class functions. In addition, it interacts with other web languages such as CSS and HTML at runtime. All these characteristics make JavaScript code particularly error-prone and challenging to write and maintain. Code smells are patterns in the source code that can adversely influence program comprehension and maintainability of the program in the long term. We propose a set of 13 JavaScript code smells, collected from various developer resources. We present a JavaScript code smell detection technique called JSNose. Our metric-based approach combines static and dynamic analysis to detect smells in client-side code. This automated technique can help developers to spot code that could benefit from refactoring. We evaluate the smell finding capabilities of our technique through an empirical study. By analyzing 11 web applications, we investigate which smells detected by JSNose are more prevalent.
                                                        @inproceedings{amin:scam13,
                                                            author = {Milani Fard, Amin and Mesbah, Ali},
                                                            title = {{JSNose}: Detecting {JavaScript} Code Smells},
                                                            booktitle = {Proceedings of the International Conference on Source Code Analysis and Manipulation (SCAM)},
                                                            publisher = {IEEE Computer Society},
                                                            pages = {116--125},
                                                            year = {2013}
                                                            }
                                                    
Link disclosure between two individuals in a social network could be a privacy breach. To limit link disclosure, previous works modeled a social network as an undirected graph and randomized a link over the entire domain of links, which leads to considerable structural distortion to the graph. In this work, we address this issue in two steps. First, we model a social network as a directed graph and randomize the destination of a link while keeping the source of a link intact. The randomization ensures that, if the prior belief about the destination of a link is bounded by some threshold, the posterior belief, given the published graph, is no more than another threshold. Then, we further reduce structural distortion by a subgraph-wise perturbation in which the given graph is partitioned into several subgraphs and randomization of destination nodes is performed within each subgraph. The benefit of subgraph-wise perturbation is that it retains a destination node with a higher retention probability and replaces a destination node with a node from a local neighborhood. We study the trade-off of utility and privacy of subgraph-wise perturbation.
                                                        @inproceedings{amin:edbt12,
                                                        author = {Milani Fard, Amin and Wang, Ke and Yu, Philip S.},
                                                        title = {Limiting Link Disclosure in Social Network Analysis Through Subgraph-wise Perturbation},
                                                        booktitle = {Proceedings of the International Conference on Extending Database Technology (EDBT)},
                                                        year = {2012},
                                                        pages = {109--119},
                                                        publisher = {ACM}
                                                        }
                                                    
Releasing Web query logs which contain valuable information for research or marketing, can breach the privacy of search engine users. Therefore rendering query logs to limit linking a query to an individual while preserving the data usefulness for analysis, is an important research problem. This survey provides an overview and discussion on the recent studies on this direction.
                                                            @article{amin:corr12,
                                                            author    = {Milani Fard, Amin},
                                                            title     = {Privacy Preserving Web Query Log Publishing: A Survey on Anonymization Techniques},
                                                            journal   = {Computing Research Repository (CoRR)},
                                                            volume    = {abs/1211.2354},
                                                            year      = {2012}
                                                            }
                                                    
Web query logs data contain information which can be very useful in research or marketing, however, release of such data can seriously breach the privacy of search engine users. These privacy concerns go far beyond just the identifying information in a query such as name, address, and etc., which can refer to a particular individual. It has been shown that even non-identifying personal data can be combined with external publicly available information and pinpoint to an individual as this happened after AOL query logs release in 2006. In this work we model web query logs as unstructured transaction data and present a novel transaction anonymization technique based on clustering and generalization techniques to achieve the k-anonymity privacy. We conduct extensive experiments on the AOL query log data. Our results show that this method results in a higher data utility compared to the state of-the-art transaction anonymization methods.
                                                        @phdthesis{amin:thesis2010,
                                                            author = {Milani Fard, Amin},
                                                            title = {Clustering-based Web Query Log Anonymization},
                                                            series={Electronic Theses and Dissertations (ETDs) 2008+},
                                                            url={http://summit.sfu.ca/item/12814},
                                                            school={Simon Fraser University},
                                                            year={2010},
                                                            month={Nov},
                                                            collection={Electronic Theses and Dissertations (ETDs) 2008+}
                                                        }
Web query log data contain information useful to research; however, release of such data can re-identify the search engine users issuing the queries. These privacy concerns go far beyond removing explicitly identifying information such as name and address, since non-identifying personal data can be combined with publicly available information to pinpoint to an individual. In this work we model web query logs as unstructured transaction data and present a novel transaction anonymization technique based on clustering and generalization techniques to achieve the k-anonymity privacy. We conduct extensive experiments on the AOL query log data. Our results show that this method results in a higher data utility compared to the state-of-the-art transaction anonymization methods.
                                                        @inproceedings{amin:secrypt10,
                                                            title={An effective clustering approach to web query log anonymization},
                                                            author={Milani Fard, Amin and and Wang, Ke},
                                                            booktitle={Proceedings of the International Conference on Security and Cryptography (SECRYPT)},
                                                            pages={109--119},
                                                            year={2010},
                                                            organization={IEEE}
                                                            }
                                                    
The hidden knowledge in social networks data can beregarded as an important resource for criminal investigations which can help finding the structure and organization of a criminal network. However such network based analysis has not been studied in an applied way and remains mostly a manual process. To assist inspectors and intelligence agencies discover this knowledge, we defined a new problem and then proposed a framework for automated network data analysis and deductionapproach from multiple social networks by converting totransaction dataset, applying association mining, and statistical methods. By applying a game theory concept in a multi-agent model, we try to design a policy for knowledge discovery and inference fusion. This approach enables police stations to build and deploy P2P applications through a unified medium for finding criminals relationship and identifying suspicious guys.
Knowledge extraction from distributed database systems, have been investigated during past decade in order to analyze billions of information records. In this work a competitive deduction approach in a heterogeneous data grid environment is proposed using classic data mining and statistical methods. By applying a game theory concept in a multi-agent model, we tried to design a policy for hierarchical knowledge discovery and inference fusion. To show the system run, a sample multi-expert system has also been developed.
                                                        @Inbook{Fard2009,
                                                        author="Fard, Amin Milani",
                                                        editor="Cao, Longbing",
                                                        title="Competitive-Cooperative Automated Reasoning from Distributed and Multiple Source of Data",
                                                        bookTitle="Data Mining and Multi-agent Integration",
                                                        year="2009",
                                                        publisher="Springer US",
                                                        address="Boston, MA",
                                                        pages="279--290",
                                                        isbn="978-1-4419-0522-2",
                                                        doi="10.1007/978-1-4419-0522-2_19",
                                                        url="https://doi.org/10.1007/978-1-4419-0522-2_19"
                                                        }
                                                        
In this thesis, knowledge discovery process has been investigated including current algorithms, methods, and different architectures. Then a new multi-agent system has been designed and developed for a grid knowledge mining process. Game theory and soft computing approach is also applied to our method for the improvement of knowledge mining and representation. The project is also aim at developing a high performance search engine based on grid technology.
                                                        @misc{fard2007intelligent,
                                                          title={Intelligent Agent based Grid Data Mining using Game Theory and Soft Computing},
                                                          author={Milani Fard, Amin},
                                                          year={2007}
                                                        }