The Essence of BlueSpec — A Core Language for Rule-Based Hardware Design (PLDI 2020 poster)

Abstract

Bluespec, a high-hevel HDL, offers a simple concurrency model that enables functional reasoning without compromising performance.

Unfortunately, its cost model is hard to formalize: performance depends on user hints and static analysis of conflicts within a design.

We present Kôika, a Bluespec derivative that gives direct control over scheduling decisions that determine performance, while using dynamic analysis to avoid concurrency anomalies.

Our implementation includes formal semantics, mechanized theorems, and a verified compiler.

Overview

Decomposition of a Kôika program into rules and a scheduler — State function = Rules + Explicit schedule

Contributions

Core calculus for rule-based language amenable to formal reasoning about functionality and performance
Cycle-accurate operational semantics
Proof of one-rule-at-a-time abstraction
Full Coq mechanization of the semantics
Verified compilation to RTL
Performance case-study of a pipelined processor

Motivation

HW spec = Functional spec (ORAAT) + Performance spec (Not in previous work)

in → f → FIFO queue → g → out

Functional spec: out = g(f(in))
Performance spec: out[t + 2] = g(f(in[t]))

Precise semantics allows for performance proofs

Our design

Kôika’s semantics are written in Coq, as an interpreter. Kôika programs, are written in a custom language embedded inside of Coq. Users can write specifications of high-level properties and prove them against the semantics. The compiler, also written in Coq, allows us to translate our programs into RTL circuits. It is verified, so we now know that any properties proven about the original programs carry to the compiled circuits, including timing properties.

The Kôika EDSL

rule divide =
  let v = r.rd_0() in
  if iseven(v) then
    r.wr_0(v >> 1)

rule multiply =
  let v = r.rd_1() in
  if isodd(v) then
    r.wr_1(3 * v + 1)

schedule collatz = [divide; multiply]

rule swap =
  s.wr_0(r.rd_0());
  r.wr_0(s.rd_0())

rule dyn_abort =
  if r.rd_0() == 0 then
    t.wr_0(0b1);
  if s.rd_0() == 0 then
    t.wr_0(0b1)

Core language

Registers r
Variables x
External functions f
Constants b
Ports p ∈ {0, 1}
Actions a ⩴
- $b
- $x
- $f($a, …, $a)
- skip
- let $x := $a in $a
- $r.rd_$p()
- $r.wr_$p($a)
- if $a then $a else $a
- …

Semantics

$$ \deduction[\text{RD}{_0}] {\logWrite{1}{r}{*} \notin L \hypsep \logWrite{0}{r}{*} \notin L} {\judgeExec{\sgaRead{0}{r}}{\logAppend{\logRead{0}{r}}{\ell}}{\mathcal{R}\left[r\right]}} $$ $$ \deduction[\text{WR}{_1}] {\judgeExec{a}{\ell^\prime}{v} \hypsep \logWrite{1}{r}{*} \notin L \app \ell^\prime } {\judgeExec{\sgaWrite{1}{r}{a}}{\logAppend{\logWrite{1}{r}{v}}{\ell^\prime}}{\textsf{tt}}} $$

Rich specifications serve as a contract between hardware designers and compiler writers and enable verified designs and verified compilation.

One-rule-at-a-time abstraction

Theorem OneRuleAtATime init schedule:
  exists rules ⊂ schedule,
    interp_s init schedule =
    foldl interp_r init rules.

∈

Valid behaviors under one-rule-at-a-time semantics

ORAAT property enables local, modular reasoning Proof mechanization enables semantics exploration

Compilation to RTL

Sequential semantics
↓
Concurrent RTL with deferred checks

Verified circuit optimizations (constant propagation, mux elimination, partial evaluation) reduce the overhead of dynamic scheduling & conflict resolution.

Compiler correctness

Theorem CompilerCorrectness init schedule:
  interp_s   init schedule =
  interp_rtl init (compile schedule).

≡

Output and downstream synthesis

Fragment of YoSys output on the Collatz example

Kôika outputs code in a safe subset of Verilog.
Downstream tools perform further optimizations and can generate FPGA and ASICs designs.

Hardware artifacts so far

Arithmetic-pipeline performance proof (on paper)
Kôika port of a simple BSV RISCV core (most of RV32i&e, critical path and area overhead ≤ 10%)

Next steps

Kôika-level simulation, leveraging high-level structure for performance
Verification of performance (pipelining behavior) and timing properties of the RISCV core
Multi-core systems & enclaves, with proofs of safety from timing side-channels
Further language design, including native modularity