Event

PhD Defense: Creating better ground truth to further understand Android malware

  • Conférencier  Médéric Hurier

  • Lieu

    Room E004, JFK Building 29 Avenue J.F. Kennedy L-1855 Kirchberg

    LU

Members of the defense committee:

  • Dr Jacques Klein, University of Luxembourg, Chairman
  • A-prof. Dr Jean-François Lalande, Centrale Supélec, Deputy Chairman
  • Prof. Dr Yves Le Traon, University of Luxembourg,  Member
  • Dr Tegawendé F. Bissyandé, University of Luxembourg, Member
  • Dr Damien Octeau, Google Inc, Member

Abstract:

Mobile applications are essential for interacting with technology and other people.

With more than 2 billion devices deployed all over the world, Android offers a thriving ecosystem by making accessible the work of thousands of developers on digital marketplaces such as Google Play.

Nevertheless, the success of Android also exposes millions of users to malware authors who seek to siphon private information and hijack mobile devices for their benefits.

To fight against the proliferation of Android malware, the security community embraced machine learning, a branch of artificial intelligence that powers a new generation of detection systems.

Machine learning algorithms, however, require a substantial number of qualified samples to learn the classification rules enforced by security experts.

Unfortunately, malware ground truths are notoriously hard to construct due to the inherent complexity of Android applications and the global lack of public information about malware.

In a context where both information and human resources are limited, the security community is in demand for new approaches to aid practitioners to accurately define Android malware, automate classification decisions, and improve the comprehension of Android malware.

This dissertation proposes three solutions to assist with the creation of malware ground truths.

The first contribution is STASE, an analytical framework that qualifies the composition of malware ground truths.

STASE reviews the information shared by antivirus products with nine metrics in order to support the reproducibility of research experiments and detect potential biases.

This dissertation reports the results of STASE against three typical settings and suggests additional recommendations for designing experiments based on Android malware.

The second contribution is EUPHONY, a heuristic system built to unify family clusters belonging to malware ground truths.

EUPHONY exploits the co-occurrence of malware labels obtained from antivirus reports to study the relationship between Android applications and proposes a single family name per sample for the sake of facilitating malware experiments.

This dissertation evaluates EUPHONY on well-known malware ground truths to assess the precision of our approach and produce a large dataset of malware tags for the research community.

The third contribution is AP-GRAPH, a knowledge database for dissecting the characteristics of malware ground truths.

AP-GRAPH leverages the results of EUPHONY and static analysis to index artefacts that are highly correlated with malware activities and recommend the inspection of the most suspicious components.

This dissertation explores the set of artefacts retrieved by AP-GRAPH from popular malware families to track down their correlation and their evolution compared to other malware populations.