Other Research

In our lab, we are open to conducting a range of research that doesn’t fit into our otherwise broad portfolio. Here you will find information about publications from these projects.

Blockchain

Bitcoin’s success has led to significant interest in its underlying components, particularly blockchain technology. Over 10 years after Bitcoin’s initial release, the community still suffers from a lack of clarity regarding what properties defines blockchain technology, its relationship to similar technologies, and which of its proposed use-cases are tenable and which are little more than hype. In our research,1 we have answered four common questions regarding blockchain technology:

  1. What exactly is blockchain technology?
  2. What capabilities does it provide?
  3. What are good applications for blockchain technology?
  4. How does it relate to other distributed technologies (e.g., distributed databases)?

Our finding show that Blockchain technology is most appropriate under three conditions: (a) a need for shared governance and operation (i.e., not trusted central parties can conduct any of these responsibilities), (b) auditable state, and (c) resilience to data loss.

We are currently exploring applications for blockchain technology in two areas. First, distributed document management for health care. Second, supporting multi-organizational humanitarian aid and disaster relief. In both these situations, there is no central authority to handle governance and operation, but there is a need for organizations to work with each other. Our research aims to build tools that will support both of these use cases, including in situations with limited Internet connectivity—situations which are critical for both applications, but which are not currently well-supported by existing blockchain techniques.

Secure Software Development

Many of today’s software products are insecure. Often, security is ignored as companies race to be first-to-market. Subsequent attempts to bolt security onto existing products face many challenges, frequently leaving residual vulnerabilities. The current paradigm for developing secure software is failing, and it is essential to explore alternatives.

To make progress, it is important to first understand the drawbacks of the current secure software development paradigm—i.e., implementing security on an application-by-application basis. In this model, each application needs to be architected with security in mind, and many—if not most—application developers must then correctly implement the relevant security features. Unfortunately, there is a significant lack of developers trained in cybersecurity, meaning that the architecture and implementation are both likely to have security flaws. Moreover, there is no indication that the number of cybersecurity-trained developers will ever scale up sufficiently to support the ever-increasing need for new applications.

Regardless of the specific reason, the result of the current paradigm is thousands, if not tens-of-thousands, of applications with broken and outdated security. To address these issues, we are researching alternative software development paradigms. For example, we are exploring a paradigm where instead of explicitly implementing security primitives into individual applications, they are instead implemented at global control points that are then responsible for layering security on top of all (unmodified) applications. We have successfully used this paradigm to improve the security of TLS (TrustBase2) and secure Email (MessageGuard3).


Publications

Journals and Magazines

Abstract:  Bitcoin's success has led to significant interest in its underlying components, particularly blockchain technology. Over 10 years after Bitcoin's initial release, the community still suffers from a lack of clarity regarding what properties defines blockchain technology, its relationship to similar technologies, and which of its proposed use-cases are tenable and which are little more than hype. In this paper we answer four common questions regarding blockchain technology: (1) what exactly is blockchain technology, (2) what capabilities does it provide, and (3) what are good applications for blockchain technology, and (4) how does it relate to other distributed technologies (e.g., distributed databases). We accomplish this goal by using grounded theory (a structured approach to gathering and analyzing qualitative data) to thoroughly analyze a large corpus of literature on blockchain technology. This method enables us to answer the above questions while limiting researcher bias, separating thought leadership from peddled hype and identifying open research questions related to blockchain technology. The audience for this paper is broad as it aims to help researchers in a variety of areas come to a better understanding of blockchain technology and identify whether it may be of use in their own research.
Abstract:  Bitcoin's success has led to significant interest in its underlying components, particularly blockchain technology. Over 10 years after Bitcoin's initial release, the community still suffers from a lack of clarity regarding what properties defines blockchain technology, its relationship to similar technologies, and which of its proposed use-cases are tenable and which are little more than hype. In this paper we answer four common questions regarding blockchain technology: (1) what exactly is blockchain technology, (2) what capabilities does it provide, and (3) what are good applications for blockchain technology, and (4) how does it relate to other distributed technologies (e.g., distributed databases). We accomplish this goal by using grounded theory (a structured approach to gathering and analyzing qualitative data) to thoroughly analyze a large corpus of literature on blockchain technology. This method enables us to answer the above questions while limiting researcher bias, separating thought leadership from peddled hype and identifying open research questions related to blockchain technology. The audience for this paper is broad as it aims to help researchers in a variety of areas come to a better understanding of blockchain technology and identify whether it may be of use in their own research.

Conferences

Abstract:  Decompilation is a crucial capability in forensic analysis, facilitating analysis of unknown binaries. The recentrise of Python malware has brought attention to Python decompilers that aim to obtain source code representation from a Python binary. However, Python decompilers fail to handle various binaries, limiting their capabilities in forensic analysis. This paper proposes a novel solution that transforms a decompilation error-inducing Python binary into a decompilable binary. Our key intuition is that we can resolve the decompilation errors by transforming error-inducing code blocks in the input binary into another form. The core of our approach is the concept of Forensically Equivalent Transformation (FET) which allows non-semantic preserving transformation in the context of forensic analysis. We carefully define the FETs to minimize their undesirable consequences while fixing various error-inducing instructions that are difficult to solve when preserving the exact semantics. We evaluate the prototype of our approach with 17,117 real-world Python malware samples causing decompilation errors in five popular decompilers. It successfully identifies and fixes 77,022 errors. Our approach also handles anti-analysis techniques, including opcode remapping, and helps migrate Python 3.9 binaries to 3.8 binaries.
Abstract:  The packet stream analysis is essential for the early identification of attack connections while in progress, enabling timely responses to protect system resources. However, there are several challenges for implementing effective analysis, including out-of-order packet sequences introduced due to network dynamics and class imbalance with a small fraction of attack connections available to characterize. To overcome these challenges, we present two deep sequence models: (i) a bidirectional recurrent structure designed for resilience to out-of-order packets, and (ii) a pre-training-enabled sequence-to-sequence structure designed for better dealing with unbalanced class distributions using self-supervised learning. We evaluate the presented models using a real network dataset created from month-long real traffic traces collected from backbone links with the associated intrusion log. The experimental results support the feasibility of the presented models with up to 94.8% in F1 score with the first five packets (k=5), outperforming baseline deep learning models.
Abstract:  Software systems may contain critical program components such as patented program logic or sensitive data. When those components are reverse-engineered by adversaries, it can cause significantly damage (e.g., financial loss or operational failures). While protecting critical program components (e.g., code or data) in software systems is of utmost importance, existing approaches, unfortunately, have two major weaknesses: (1) they can be reverse-engineered via various program analysis techniques and (2) when an adversary obtains a legitimate-looking critical program component, he or she can be sure that it is genuine. In this paper, we propose Ambitr, a novel technique that hides critical program components. The core of Ambitr is Ambiguous Translator that can generate the critical program components when the input is a correct secret key. The translator is ambiguous as it can accept any inputs and produces a number of legitimate-looking outputs, making it difficult to know whether an input is correct secret key or not. The executions of the translator when it processes the correct secret key and other inputs are also indistinguishable, making the analysis inconclusive. Our evaluation results show that static, dynamic and symbolic analysis techniques fail to identify the hidden information in Ambitr. We also demonstrate that manual analysis of Ambitr is extremely challenging.
Abstract:  Server-side malware is one of the prevalent threats that can affect a large number of clients who visit the compromised server. In this paper, we propose Dazzle-attack, a new advanced server-side attack that is resilient to forensic analysis such as reverse-engineering. Dazzleattack retrieves typical (and non-suspicious) contents from benign and uncompromised websites to avoid detection and mislead the investigation to erroneously associate the attacks with benign websites. Dazzleattack leverages a specialized state-machine that accepts any inputs and produces outputs with respect to the inputs, which substantially enlarges the input-output space and makes reverse-engineering effort significantly difficult. We develop a prototype of Dazzle-attack and conduct empirical evaluation of Dazzle-attack to show that it imposes significant challenges to forensic analysis.
Abstract:  This research explores the possibility of a new anti-analysis technique, carefully designed to attack weaknesses of the existing program analysis approaches. It encodes a program code snippet to hide, and its decoding process is implemented by a sophisticated state machine that produces multiple outputs depending on inputs. The key idea of the proposed technique is to ambiguously decode the program code, resulting in multiple decoded code snippets that are challenging to distinguish from each other. Our approach is stealthier than previous similar approaches as its execution does not exhibit different behaviors between when it decodes correctly or incorrectly. This paper also presents analyses of weaknesses of existing techniques and discusses potential improvements. We implement and evaluate the proof of concept approach, and our preliminary results show that the proposed technique imposes various new unique challenges to the program analysis technique.
Abstract:  Outlier detection has been shown to be a promising machine learning technique for a diverse array of fields and problem areas. However, traditional, supervised outlier detection is not well suited for problems such as network intrusion detection, where proper labelled data is scarce. This has created a focus on extending these approaches to be unsupervised, removing the need for explicit labels, but at a cost of poorer performance compared to their supervised counterparts. Recent work has explored ways of making up for this, such as creating ensembles of diverse models, or even diverse learning algorithms, to jointly classify data. While using unsupervised, heterogeneous ensembles of learning algorithms has been proposed as a viable next step for research, the implications of how these ensembles are built and used has not been explored.
Abstract:  Cloud-hosted databases have many compelling benefits, including high availability, flexible resource allocation, and resiliency to attack, but it requires that cloud tenants cede control of their data to the cloud provider. In this paper, we describe Proactively-secure Accumulo with Cryptographic Enforcement (PACE), a client-side library that cryptographically protects a tenant's data, returning control of that data to the tenant. PACE is a drop-in replacement for Accumulo's APIs and works with Accumulo's row-level security model. We evaluate the performance of PACE, discussing the impact of encryption and signatures on operation throughput.
Abstract:  The current state of certificate-based authentication is messy, with broken authentication in applications and proxies, along with serious flaws in the CA system. To solve these problems, we design TrustBase, an architecture that provides certificate-based authentication as an operating system service, with system administrator control over authentication policy. TrustBase transparently enforces best practices for certificate validation on all applications, while also providing a variety of authentication services to strengthen the CA system. We describe a research prototype of TrustBase for Linux, which uses a loadable kernel module to intercept traffic in the socket layer, then consults a user-space policy engine to evaluate certificate validity using a variety of plugins. We evaluate the security of TrustBase, including a threat analysis, application coverage, and hardening of the Linux prototype. We also describe prototypes of TrustBase for Android and Windows, illustrating the generality of our approach. We show that TrustBase has negligible overhead and universal compatibility with applications. We demonstrate its utility by describing eight authentication services that extend CA hardening to all applications.
Abstract:  Developing secure software is inherently difficult, and is further hampered by a rush to market, the lack of cybersecurity-trained architects and developers, and the difficulty of identifying flaws and deploying mitigations. To address these problems, we advocate for an alternative paradigm-layering security onto applications from global control points, such as the browser, operating system, or network. This approach adds security to existing applications, relieving developers of this burden. The benefits of this paradigm are three-fold-(1) increased correctness in the implementation of security features, (2) coverage for all software, even non-maintained legacy software, and (3) more rapid and consistent deployment of threat mitigations and new security features. To demonstrate these benefits, we describe three concrete instantiations of this paradigm- MessageGuard, a system that layers end-to-end encryption in the browser; TrustBase, a system that layers authentication in the operating system; and software-defined perimeter, which layers access control at network middleboxes.
Abstract:  Potentially dangerous cryptography errors are well-documented in many applications. Conventional wisdom suggests that many of these errors are caused by cryptographic Application Programming Interfaces (APIs) that are too complicated, have insecure defaults, or are poorly documented. To address this problem, researchers have created several cryptographic libraries that they claim are more usable, however, none of these libraries have been empirically evaluated for their ability to promote more secure development. This paper is the first to examine both how and why the design and resulting usability of different cryptographic libraries affects the security of code written with them, with the goal of understanding how to build effective future libraries. We conducted a controlled experiment in which 256 Python developers recruited from GitHub attempt common tasks involving symmetric and asymmetric cryptography using one of five different APIs. We examine their resulting code for functional correctness and security, and compare their results to their self-reported sentiment about their assigned library. Our results suggest that while APIs designed for simplicity can provide security benefits - reducing the decision space, as expected, prevents choice of insecure parameters - simplicity is not enough. Poor documentation, missing code examples, and a lack of auxiliary features such as secure key storage, caused even participants assigned to simplified libraries to struggle with both basic functional correctness and security. Surprisingly, the availability of comprehensive documentation and easy-to-use code examples seems to compensate for more complicated APIs in terms of functionally correct results and participant reactions, however, this did not extend to security results. We find it particularly concerning that for about 20% of functionally correct tasks, across libraries, participants believed their code was secure when it was not. Our results suggest that while new cryptographic libraries that want to promote effective security should offer a simple, convenient interface, this is not enough: they should also, and perhaps more importantly, ensure support for a broad range of common tasks and provide accessible documentation with secure, easy-to-use code examples.
Abstract:  Vulnerabilities in Android code—including but not limited to insecure data storage, unprotected inter-component communication, broken TLS implementations, and violations of least privilege—have enabled real-world privacy leaks and motivated research cataloguing their prevalence and impact. Researchers have speculated that appification promotes security problems, as it increasingly allows inexperienced laymen to develop complex and sensitive apps. Anecdotally, Internet resources such as Stack Overflow are blamed for promoting insecure solutions that are naively copy-pasted by inexperienced developers. In this paper, we for the first time systematically analyzed how the use of information resources impacts code security. We first surveyed 295 app developers who have published in the Google Play market concerning how they use resources to solve security-related problems. Based on the survey results, we conducted a lab study with 54 Android developers (students and professionals), in which participants wrote security-and privacy-relevant code under time constraints. The participants were assigned to one of four conditions: free choice of resources, Stack Overflow only, official Android documentation only, or books only. Those participants who were allowed to use only Stack Overflow produced significantly less secure code than those using, the official Android documentation or books, while participants using the official Android documentation produced significantly less functional code than those using Stack Overflow. To assess the quality of Stack Overflow as a resource, we surveyed the 139 threads our participants accessed during the study, finding that only 25% of them were helpful in solving the assigned tasks and only 17% of them contained secure code snippets. In order to obtain ground truth concerning the prevalence of the secure and insecure code our participants wrote in the lab study, we statically analyzed a random sample of 200,000 apps from Google Play, finding that 93.6% of the apps used at least one of the API calls our participants used during our study. We also found that many of the security errors made by our participants also appear in the wild, possibly also originating in the use of Stack Overflow to solve programming problems. Taken together, our results confirm that API documentation is secure but hard to use, while informal documentation such as Stack Overflow is more accessible but often leads to insecurity. Given time constraints and economic pressures, we can expect that Android developers will continue to choose those resources that are easiest to use, therefore, our results firmly establish the need for secure-but-usable documentation.

Technical Reports

Abstract:  Bitcoin's success has led to significant interest in its underlying components, particularly blockchain technology. Over 10 years after Bitcoin's initial release, the community still suffers from a lack of clarity regarding what properties defines blockchain technology, its relationship to similar technologies, and which of its proposed use-cases are tenable and which are little more than hype. In this paper we answer four common questions regarding blockchain technology: (1) what exactly is blockchain technology, (2) what capabilities does it provide, and (3) what are good applications for blockchain technology, and (4) how does it relate to other distributed technologies (e.g., distributed databases). We accomplish this goal by using grounded theory (a structured approach to gathering and analyzing qualitative data) to thoroughly analyze a large corpus of literature on blockchain technology. This method enables us to answer the above questions while limiting researcher bias, separating thought leadership from peddled hype and identifying open research questions related to blockchain technology. The audience for this paper is broad as it aims to help researchers in a variety of areas come to a better understanding of blockchain technology and identify whether it may be of use in their own research.