Other Research

In our lab, we are open to conducting a range of research that doesn’t fit into our otherwise broad portfolio. Here you will find information about publications from these projects.

Blockchain

Bitcoin’s success has led to significant interest in its underlying components, particularly blockchain technology. Over 10 years after Bitcoin’s initial release, the community still suffers from a lack of clarity regarding what properties defines blockchain technology, its relationship to similar technologies, and which of its proposed use-cases are tenable and which are little more than hype. In our research,1 we have answered four common questions regarding blockchain technology:

  1. What exactly is blockchain technology?
  2. What capabilities does it provide?
  3. What are good applications for blockchain technology?
  4. How does it relate to other distributed technologies (e.g., distributed databases)?

Our finding show that Blockchain technology is most appropriate under three conditions: (a) a need for shared governance and operation (i.e., not trusted central parties can conduct any of these responsibilities), (b) auditable state, and (c) resilience to data loss.

We are currently exploring applications for blockchain technology in two areas. First, distributed document management for health care. Second, supporting multi-organizational humanitarian aid and disaster relief. In both these situations, there is no central authority to handle governance and operation, but there is a need for organizations to work with each other. Our research aims to build tools that will support both of these use cases, including in situations with limited Internet connectivity—situations which are critical for both applications, but which are not currently well-supported by existing blockchain techniques.

Secure Software Development

Many of today’s software products are insecure. Often, security is ignored as companies race to be first-to-market. Subsequent attempts to bolt security onto existing products face many challenges, frequently leaving residual vulnerabilities. The current paradigm for developing secure software is failing, and it is essential to explore alternatives.

To make progress, it is important to first understand the drawbacks of the current secure software development paradigm—i.e., implementing security on an application-by-application basis. In this model, each application needs to be architected with security in mind, and many—if not most—application developers must then correctly implement the relevant security features. Unfortunately, there is a significant lack of developers trained in cybersecurity, meaning that the architecture and implementation are both likely to have security flaws. Moreover, there is no indication that the number of cybersecurity-trained developers will ever scale up sufficiently to support the ever-increasing need for new applications.

Regardless of the specific reason, the result of the current paradigm is thousands, if not tens-of-thousands, of applications with broken and outdated security. To address these issues, we are researching alternative software development paradigms. For example, we are exploring a paradigm where instead of explicitly implementing security primitives into individual applications, they are instead implemented at global control points that are then responsible for layering security on top of all (unmodified) applications. We have successfully used this paradigm to improve the security of TLS (TrustBase2) and secure Email (MessageGuard3).


Publications

Journals and Magazines

Abstract:  Bitcoin's success has led to significant interest in its underlying components, particularly blockchain technology. Over 10 years after Bitcoin's initial release, the community still suffers from a lack of clarity regarding what properties defines blockchain technology, its relationship to similar technologies, and which of its proposed use-cases are tenable and which are little more than hype. In this paper we answer four common questions regarding blockchain technology: (1) what exactly is blockchain technology, (2) what capabilities does it provide, and (3) what are good applications for blockchain technology, and (4) how does it relate to other distributed technologies (e.g., distributed databases). We accomplish this goal by using grounded theory (a structured approach to gathering and analyzing qualitative data) to thoroughly analyze a large corpus of literature on blockchain technology. This method enables us to answer the above questions while limiting researcher bias, separating thought leadership from peddled hype and identifying open research questions related to blockchain technology. The audience for this paper is broad as it aims to help researchers in a variety of areas come to a better understanding of blockchain technology and identify whether it may be of use in their own research.
Abstract:  Bitcoin's success has led to significant interest in its underlying components, particularly blockchain technology. Over 10 years after Bitcoin's initial release, the community still suffers from a lack of clarity regarding what properties defines blockchain technology, its relationship to similar technologies, and which of its proposed use-cases are tenable and which are little more than hype. In this paper we answer four common questions regarding blockchain technology: (1) what exactly is blockchain technology, (2) what capabilities does it provide, and (3) what are good applications for blockchain technology, and (4) how does it relate to other distributed technologies (e.g., distributed databases). We accomplish this goal by using grounded theory (a structured approach to gathering and analyzing qualitative data) to thoroughly analyze a large corpus of literature on blockchain technology. This method enables us to answer the above questions while limiting researcher bias, separating thought leadership from peddled hype and identifying open research questions related to blockchain technology. The audience for this paper is broad as it aims to help researchers in a variety of areas come to a better understanding of blockchain technology and identify whether it may be of use in their own research.

Conferences

Abstract:  Modern Websites rely on various client-side web resources, such as JavaScript libraries, to provide end-users with rich and interactive web experiences. Unfortunately, anecdotal evidence shows that improperly managed client-side resources could open up attack surfaces that adversaries can exploit. However, there is still a lack of a comprehensive understanding of the updating practices among web developers and the potential impact of inaccuracies in Common Vulnerabilities and Exposures (CVE) information on the security of the web ecosystem. In this paper, we conduct a longitudinal (four-year) measurement study of the security practices and implications on client-side resources (e.g., JavaScript libraries and Adobe Flash) across the Web. Specifically, we first collect a large-scale dataset of 157.2M webpages of Alexa Top 1M websites for four years in the wild. Analyzing the dataset, we find an average of 41.2% of websites (in each year of the four years) carry at least one vulnerable client-side resource (e.g., JavaScript or Adobe Flash). We also reveal that vulnerable JavaScript library versions are frequently observed in the wild, suggesting a concerning level of lagging update practice in the wild. On average, we observe 531.2 days with 25,337 websites of the window of vulnerability due to the unpatched client-side resources from the release of security patches. Furthermore, we manually investigate the fidelity of CVE (Common Vulnerabilities and Exposures) reports on client-side resources, leveraging PoC (Proof of Concept) code. We find that 13 CVE reports (out of 27) have incorrect vulnerable version information, which may impact security-related tasks such as security updates.
Abstract:  Decompilation is a crucial capability in forensic analysis, facilitating analysis of unknown binaries. The recentrise of Python malware has brought attention to Python decompilers that aim to obtain source code representation from a Python binary. However, Python decompilers fail to handle various binaries, limiting their capabilities in forensic analysis. This paper proposes a novel solution that transforms a decompilation error-inducing Python binary into a decompilable binary. Our key intuition is that we can resolve the decompilation errors by transforming error-inducing code blocks in the input binary into another form. The core of our approach is the concept of Forensically Equivalent Transformation (FET) which allows non-semantic preserving transformation in the context of forensic analysis. We carefully define the FETs to minimize their undesirable consequences while fixing various error-inducing instructions that are difficult to solve when preserving the exact semantics. We evaluate the prototype of our approach with 17,117 real-world Python malware samples causing decompilation errors in five popular decompilers. It successfully identifies and fixes 77,022 errors. Our approach also handles anti-analysis techniques, including opcode remapping, and helps migrate Python 3.9 binaries to 3.8 binaries.
Abstract:  The packet stream analysis is essential for the early identification of attack connections while in progress, enabling timely responses to protect system resources. However, there are several challenges for implementing effective analysis, including out-of-order packet sequences introduced due to network dynamics and class imbalance with a small fraction of attack connections available to characterize. To overcome these challenges, we present two deep sequence models: (i) a bidirectional recurrent structure designed for resilience to out-of-order packets, and (ii) a pre-training-enabled sequence-to-sequence structure designed for better dealing with unbalanced class distributions using self-supervised learning. We evaluate the presented models using a real network dataset created from month-long real traffic traces collected from backbone links with the associated intrusion log. The experimental results support the feasibility of the presented models with up to 94.8% in F1 score with the first five packets (k=5), outperforming baseline deep learning models.
Abstract:  Software systems may contain critical program components such as patented program logic or sensitive data. When those components are reverse-engineered by adversaries, it can cause significantly damage (e.g., financial loss or operational failures). While protecting critical program components (e.g., code or data) in software systems is of utmost importance, existing approaches, unfortunately, have two major weaknesses: (1) they can be reverse-engineered via various program analysis techniques and (2) when an adversary obtains a legitimate-looking critical program component, he or she can be sure that it is genuine. In this paper, we propose Ambitr, a novel technique that hides critical program components. The core of Ambitr is Ambiguous Translator that can generate the critical program components when the input is a correct secret key. The translator is ambiguous as it can accept any inputs and produces a number of legitimate-looking outputs, making it difficult to know whether an input is correct secret key or not. The executions of the translator when it processes the correct secret key and other inputs are also indistinguishable, making the analysis inconclusive. Our evaluation results show that static, dynamic and symbolic analysis techniques fail to identify the hidden information in Ambitr. We also demonstrate that manual analysis of Ambitr is extremely challenging.
Abstract:  Server-side malware is one of the prevalent threats that can affect a large number of clients who visit the compromised server. In this paper, we propose Dazzle-attack, a new advanced server-side attack that is resilient to forensic analysis such as reverse-engineering. Dazzleattack retrieves typical (and non-suspicious) contents from benign and uncompromised websites to avoid detection and mislead the investigation to erroneously associate the attacks with benign websites. Dazzleattack leverages a specialized state-machine that accepts any inputs and produces outputs with respect to the inputs, which substantially enlarges the input-output space and makes reverse-engineering effort significantly difficult. We develop a prototype of Dazzle-attack and conduct empirical evaluation of Dazzle-attack to show that it imposes significant challenges to forensic analysis.
Abstract:  This research explores the possibility of a new anti-analysis technique, carefully designed to attack weaknesses of the existing program analysis approaches. It encodes a program code snippet to hide, and its decoding process is implemented by a sophisticated state machine that produces multiple outputs depending on inputs. The key idea of the proposed technique is to ambiguously decode the program code, resulting in multiple decoded code snippets that are challenging to distinguish from each other. Our approach is stealthier than previous similar approaches as its execution does not exhibit different behaviors between when it decodes correctly or incorrectly. This paper also presents analyses of weaknesses of existing techniques and discusses potential improvements. We implement and evaluate the proof of concept approach, and our preliminary results show that the proposed technique imposes various new unique challenges to the program analysis technique.
Abstract:  Outlier detection has been shown to be a promising machine learning technique for a diverse array of fields and problem areas. However, traditional, supervised outlier detection is not well suited for problems such as network intrusion detection, where proper labelled data is scarce. This has created a focus on extending these approaches to be unsupervised, removing the need for explicit labels, but at a cost of poorer performance compared to their supervised counterparts. Recent work has explored ways of making up for this, such as creating ensembles of diverse models, or even diverse learning algorithms, to jointly classify data. While using unsupervised, heterogeneous ensembles of learning algorithms has been proposed as a viable next step for research, the implications of how these ensembles are built and used has not been explored.

Technical Reports

Abstract:  Bitcoin's success has led to significant interest in its underlying components, particularly blockchain technology. Over 10 years after Bitcoin's initial release, the community still suffers from a lack of clarity regarding what properties defines blockchain technology, its relationship to similar technologies, and which of its proposed use-cases are tenable and which are little more than hype. In this paper we answer four common questions regarding blockchain technology: (1) what exactly is blockchain technology, (2) what capabilities does it provide, and (3) what are good applications for blockchain technology, and (4) how does it relate to other distributed technologies (e.g., distributed databases). We accomplish this goal by using grounded theory (a structured approach to gathering and analyzing qualitative data) to thoroughly analyze a large corpus of literature on blockchain technology. This method enables us to answer the above questions while limiting researcher bias, separating thought leadership from peddled hype and identifying open research questions related to blockchain technology. The audience for this paper is broad as it aims to help researchers in a variety of areas come to a better understanding of blockchain technology and identify whether it may be of use in their own research.