Why Are There No Machine Instruction Decoders Written Using Parser Generators?

My question is about a rather unusual usage of a parser generators. While the process of decoding machine instructions in software has a lot in common with “regular” input languages parsing, I have found no examples or attempts to implement such decoder in LEX/YACC, ANTLR or another parser generator. At the same time, I am aware about several specialized DSL translators tailored for creation of decoders, and none of them, to my knowledge, is based on a parser generator.

The second step of an machine interpreter operation, after fetching machine code from memory, is to decode it: a routine that accepts a machine word of given width and parses it into an intermediate representation that facilitates consequent simulation of the instruction’s semantics. Usually such IR contains aggregated information about instruction’s function (opcode) and arguments (operands). A similar task is disassembling - translating a machine word into a string of human readable text.

At decoding, a typical machine instruction of RISC kind (MIPS, ARM, SPARC, Alpha etc) is split into several bit fields. Values of bitfields are tested against a set of checks to determine whether instruction is known and what arguments are present. Position of some fields may depend on values found in other fields, so the decoding is usually done sequentially from the first bit to the last. Machine encodings are guaranteed to be prefix code, so no ambiguous situation may arise.

Decoding of CISC instructions (such as IA32) is somewhat (to my experience, significantly) harder, as they present variable length of instructions, but in the beginning it the same task.

No surprise that writing a decoder by hand is a tedious and ungrateful task. Several translators have been written [1-4] in order to simplify this and adjacent tasks. All of them accept a machine language description in certain DSLs and create code capable, besides other things, to decode that language.

To my knowledge, none of these tools are based on parser generators. But what they do is mainly parsing. I find this strange. I am looking for projects that can disprove my observation. Alternatively, I am trying to formulate a logical explanation to support it. So far I was able to imagine two possibilities:

1. Machine language parsing is really an unsuitable task for general purpose parser generators, such as ANTLR, due to “low-levelness” of the task. Indeed, I cannot imagine a lexer for machine code producing other tokens than ‘0’ and ‘1’. A scanner using just these two tokens might turn out to be clumsy or gigantic, or inefficient.

2. Folks working in system simulation field are not aware about parser generators or are too conservative to use it, sticking to their tools. The very idea of trying to create a machine decoder in ANTLR or other tool only recently visited my mind. Until then I’ve always resorted to manually written decoders or ad-hoc translators.

[1] Oliver Schliebusch, Andreas Hoffmann, Achim Nohl, Gunnar Braun, and Heinrich Meyr. Architecture implementation using the machine description language LISA. In: ASPDAC 02 Proceedings of the 2002 conference on Asia South Pacific design automationVLSI Design (2002), pp. 239–244.

[2] G Hadjiyiannis, S Hanono, and S Devadas. ISDL: An Instruction Set Description Language For Retargetability. In: Proceedings of the 34th Design Automation Conference (1997), pp. 299–302.

[3] Fredrik Larsson, Peter Magnusson, and Bengt Werner. SimGen: Development of Efficient Instruction Set Simulators. SICSTechnical Report R97:03.

[4] Norman Ramsey and Mary F. Fernandez. The New Jersey Machine-Code Toolkit. In Proceedings of USENIX 1995 Technical Conference


Written by Grigory Rechistov in Uncategorized on 17.03.2015. Tags: antlr, decoder, parser generators, yacc,


Copyright © 2019 Grigory Rechistov