Spring 2024 Seminars

Spring 2024

Organizers: Laura Israel, Grant VanDomelen

Dynamix: a DSL for semantics definition via CPS transformation

Speaker: Andrew Tolmach

Abstract

Continuation-passing style (CPS) is a restricted form of lambda calculus in which all function applications are in tail position; instead of returning, a function passes its result to a continuation parameter. Its uniform treatment of all kinds of control flow make CPS useful as a compiler intermediate representation. In this work, we show how the semantics of a source language can be conveniently expressed by specifying how to transform that language into CPS. Since the CPS transformation algorithm is notoriously hard to read, we propose Dynamix, a DSL for writing such transformations. Dynamix exists in several incarnations, including a prototype integrated into the Spoofax Language Designer/s Workbench, and a stand-alone embedded DSL in Haskell. We argue that, with appropriate surface syntax, Dynamix specifications are almost as easy to read as definitional interpreters, while allowing more straightforward and efficiently compilable treatment of non-standard control flow. This is joint work with the late Eelco Visser, Casper Bach Poulsen, Thijs Molendijk, and Ruben van Baarle at TU Delft.

Beyond Relooper: recursive translation of unstructured control flow to structured control flow (functional pearl)

Speaker: Ian Kariniemi

Abstract

In many compilers, control flow is represented using an arbitrary directed graph. But in some interesting target languages, including JavaScript and WebAssembly, intraprocedural control flow can be expressed only in structured ways, using loops, conditionals, and multilevel breaks or exits. As was shown by Peterson, Kasami, and Tokura in 1973, such structured control flow can be obtained by translating arbitrary control flow. The translation uses two standard analyses, but as published, it takes three passes—which may explain why it was overlooked by Emscripten, a popular compiler from C to JavaScript. By tweaking the analyses and by applying fundamental ideas from functional programming (recursive functions and immutable abstract-syntax trees), the translation, along with a couple of code improvements, can be implemented in a single pass. This new implementation is slated to be added to the Glasgow Haskell Compiler. Its single-pass translation, its immutable representation, and its use of dominator trees make it much easier to reason about than the original translation.

Semantic-type-guided Bug Finding

Speaker: Earl Wu

Abstract

In recent years, there has been an increasing interest in tools that establish incorrectness rather than correctness of program properties. In this work we build on this approach by developing a novel methodology for proving incorrectness of semantic typing properties of functional programs, extending the incorrectness approach to the model theory of functional program typing. We define a semantic type refuter which refutes semantic typing for a simple functional language. We prove our refuter is co-recursively enumerable, and that it is sound and complete with respect to a semantic typing notion. An initial implementation is described which uses symbolic evaluation to efficiently find type errors over a functional language with a rich type system.

EverParse: Hardening critical attack surfaces with formally proven message parsers from data format specifications

Speaker: Tahina Ramananandro

Hi, my name is Tahina Ramananandro and I am a Principal Research Software Development Engineer at Microsoft Research in Redmond, Washington, USA. My research interests cover programming languages, compilers, and formal verification. In the past few years, I have been mostly working on formal verification of parsers for communication protocols with binary data formats, and their integration into critical industrial systems. Prior to landing at Microsoft, I defended my PhD in 2012 at Université Paris Diderot on formal verification of compilers for C++ multiple inheritance, directed by Xavier Leroy. You can read more on my corporate webpage: https://www.microsoft.com/en-us/research/people/taramana/

Abstract

Software security exploits often begin with an attacker providing an unexpected input to a program, causing it to misbehave in a way that allows the attacker to gain access to a critical system. Such attacks are mostly due to binary data parsers and validators generally being written by hand in unverified languages. To prevent such attacks, we have been designing EverParse, a framework to produce data validators formally verified for memory safety, functional correctness with respect to the data format, and other security properties such as absence of double-fetches for concurrent settings. In this talk, we will present two components of EverParse: 1. EverParse3D, a fully automatic generator producing efficient, verified data validators from data-dependent format descriptions via partial evaluation with zero user proof effort. EverParse3D has been in use in the Windows kernel since 2021, as part of the Microsoft Hyper-V network virtualization stack to filter malformed packets away from hosts and guests alike. 2. However, so far user effort has still been needed to write data format specifications. Fortunately, with the advent of AI, we believe that this part can also be automated. In this talk, we will then describe our approach, 3DGen, leveraging AI to produce data format specifications from IETF RFC standard documents, with a feedback loop including both AI-based and symbolic test case generation.

Static prediction of parallel computation graphs

Speaker: Stefan Muller

Stefan Muller is an Assistant Professor at the Illinois Institute of Technology working on the design of type systems and static analyses for safe and efficient parallel programming. Before joining Illinois Tech in 2020, he completed a PhD and postdoc at Carnegie Mellon University, working under Umut Acar and Jan Hoffmann.

Abstract

Many algorithms for analyzing parallel programs, for example to detect deadlocks or data races or to calculate the execution cost, are based on a model variously known as a cost graph, computation graph, or dependency graph, which captures the parallel structure of threads in a program. In modern parallel programs, computation graphs are highly dynamic and depend greatly on the program inputs and execution details. As such, most analyses that use these graphs are either dynamic analyses or are specialized static analyses that gather a subset of dependency information for a specific purpose. In this talk, I'll briefly discuss my work on graph types, which compactly represent all of the graphs that could arise from program execution and can be inferred from a parallel program using a graph type system. The graph type system introduces unique names for vertices in the graph; uniqueness is ensured using an affine type system. I will also discuss recent work on extending the graph type system to handle data structures containing, or built using, futures, which requires indexing types in the graph type system with corecursive structures of vertex names.

Automatic Verification of a Lambda Calculus

Speaker: John Wiegley

The presenter is John Wiegley, a long-time fan of the Haskell and Coq languages, and more recently blockchain and smart contract systems. He has worked at both Dfinity and Kadena building blockchains in Haskell and Rust, and has worked on formal verification of those systems at both companies. His background is mostly in C++ compilers, but in the last decade also programming language theory and proof systems, which have definitely become his favorite arena to work in.

Abstract

Pact is a smart contract language used by the Kadena blockchain, but it can be viewed simply as a lambda calculus enriched with database and security operations over a public state. At Kadena we offer an automated verification utility based on Z3, which at present walks a parsed term and constructs logical statements via the Haskell SBV library. As an attempt to simplify and improve this tool, we are presently rewriting it to use a type-indexed term construction, originally developed and proven in Coq, which is then used to calculate a weakest precondition to be rendered into SMT-LIB2 using the Z3 C API.