Event

PhD Defense: Machine Learning Techniques for Suspicious Transaction Detection and Analysis

  • Conférencier  Ramiro Daniel Camino

  • Lieu

    LU

Please click on this link to both register and connect on the day of the event. 

Members of the defense committee:

  • A-Prof. Dr Raphaël Frank, University of Luxembourg, Chairman
  • A-Prof. Dr Djamila Aouada, University of Luxembourg, Deputy Chairman
  • A-Prof. Dr Radu State, University of Luxembourg, Supervisor
  • Prof. Dr Diego Fernandez Slezak, University of Buenos Aires, Argentina, Member
  • Dr Christian Hammerschmidt, Delft University of Technology, Netherlands, Member

Financial services must monitor their transactions in order to prevent being used for money laundering and to combat the financing of terrorism.

Originally, organizations in charge of fraud regulation were only concerned about financial institutions such as banks, but nowadays the Fintech industry, online businesses, or platforms involving virtual assets can also be affected by similar criminal schemes.

Regardless of the differences between the aforementioned entities, malicious activities affecting them share many common patterns.

The goal of this dissertation is first, to compile and compare existing studies involving machine learning solutions for the detection and analysis of suspicious transactions; second, to synthesize methodologies from the previous goal for tackling different use cases in an organized manner; and third, to assess the applicability of deep generative models for enhancing existing solutions.

In the first part of the thesis, we propose an unsupervised methodology for the detection of suspicious transactions applied to two case studies: one related to transactions from a money remittance network, and the other related to a novel payment network based on distributed ledger technologies.

Anomaly detection algorithms are applied to rank user accounts based on recency, frequency and monetary features. The results are manually validated by domain experts, confirming known scenarios and finding unexpected new cases.

In the second part, an analogous analysis is carried out employing supervised methods, along with a case study where Ethereum smart contracts are classified into honeypots and non-honeypots.

Features are taken from the source code, the transaction data, and characterization of the flow of funds. The proposed classification models proved to generalize well to unseen honeypot instances and techniques and allowed us to characterize previously unknown techniques.

In the third part, we analyze the challenges that tabular data brings into the domain of deep generative models, a special type of data used to represent financial transactions in the previous two parts. Furthermore, we propose a new model architecture by adapting state-of-the-art methods to output multiple variables from distributions of mixed types. Additionally, we extend the evaluation metrics used in the literature to the multi-output setting and we show empirically that our approach outperforms the existing methods.

Finally, in the last part, we extend the work from the third part by applying the presented models to enhance classifications tasks from the second part, which commonly contain a severe class imbalance. The multi-input architecture is introduced to expand models alongside with our previously proposed multi-output architecture. We compare three techniques to sample from deep generative models defining a clear and fair large-scale experimental protocol and include interesting tools for visual analysis.

 We showed that general machine learning detection and visualization techniques can be very helpful to address the many challenges of the fraud detection domain. In particular, deep generative models can add value to the classification task given the imbalance nature of the fraudulent class, in exchange for implementation and time complexity. Future and promising applications for deep generative models include missing data imputation and sharing synthetic data or data generators preserving privacy constraints.