Reshaping Narrow Law and Art: Interpreters

Over a year ago I posted about integrated development environments (IDE's), and mentioned in passing that interpreters were

An "interpreter" is a program that executes another program line by line, rather than translating it into assembly or machine code. When an interpreter runs a program, it responds to errors differently than would a computer running the compiled version of the same program. Since the compiled program is merely a translation, an error would potentially cause the computer running the program to crash. An interpreter can stop and send an error message.

That was, of course, profoundly inadequate. I'd like to post just a little bit more about what interpreters do.

Strictly speaking, any programming language may be implemented with an interpreter or a compiler; although some exceptions may apply. Disagreement exists over whether Java is an interpreted language or a compiled language, with some (Lowe, p.11) saying it is compiled, and others (Cohn, et al.) saying it is interpreted. Perl may be implemented with a compiler; so may PHP; Lisp; and Python. I don't pretend to be an authority on the subject, but there's a basic concept in logic that the statement "All x is y" is false if even a single example of x ≠ y can be found. I am vulnerable here on the grounds that disagreement may exist over whether a thing is a compiler or an interpreter.

There are several reasons why a language may be implemented with an interpreter rather than a compiler. First, the obvious reason is that you may want a developer environment that allows you to debug the program. With a compiler, you simply get a message, "Runtime error." It might tell you more, but a really sophisticated interpreter can help you find the actual spot where the error occurred, and even help correct your spelling. Since the compiler's function is to translate the entire program into byte code that the machine can read "in one shot" as it were, debugging with a true compiler is a little like finding a needle in a haystack.

Another reason is that an interpreter may be easier to update and be creative with. Ruby was developed with certain syntax innovations ("principle of least surprise"—POLS), and of course it was a lot easier to create an interpreter that could run an increasing number of commands, than a compiler with a fully-revised complement of libraries, ported to that specific model of microprocessor. Also, a compiler generates machine code, or data in the ones and zeros that the microprocessor actually understands. In contrast, an interpreter can be written entirely in a high-level programming language like C, without any knowledge of machine code.
________________________________________________
How do Interpreters/Compilers Work?

There are several similarities between compilers and interpreters at the operational level. The code that is sent to the compiler/interpreter for execution is called the source file; sometimes, programs written explicitly for use with an interpreter are called scripts. Both interpreters and compilers include a scanner and lexer. The scanner module reads the source file one character at a time. The lexer module divides the source file into tiny chunks of one or more characters, called tokens, and specifies the token type to which they belong; the effect is rather like diagramming a sentence. Suppose the source file is as follows

cx = cy + 324;
print "value of cx is ", cx;

The lexer would produce this:

cx  --> Identifier (variable)
=  --> Symbol (assignment operator)
cy  --> Identifier (variable)
+ --> Symbol (addition operator)
324 --> Numeric constant (integer)
; --> Symbol (end of statement)
print --> Identifier (keyword)
"value of cx is " --> String constant
, --> Symbol (string concatenation operator)
cx --> Identifier (variable)
; --> Symbol (end of statement)

The ability of the lexer to do this depends on the ability of the scanner to document exactly where each token occurs in the source filer, and its ability to scan backwards and forwards. Sometimes the precise meaning of the file depends on its position with respect to other token types. For example, operators may contain more than a single character (e.g., < as opposed to <=). The lexer may have to pass a message to the scanner to back up and check to see the identity of neighboring characters.

The parser receives the tokens + token types from the lexer and applies the syntax of the language. The parser actually requests the tokens and assesses their appropriateness with respect to the syntax of the language, and sometimes demands additional information from the lexer module.

Parser: Give me the next token
Lexer: Next token is "cx" which is a variable.
Parser: Ok, I have "cx" as a declared integer variable. Give me next token
Lexer: Next token is "=", the assignment operator.
Parser: Ok, the program wants me to assign something to "cx". Next token
  Lexer: The next token is "cy" which is a variable.
  Parser: Ok, I know "cy" is an integer variable. Next token please
  Lexer: The next token is '+', which is an addition operator.
  Parser: Ok, so I need to add something to the value in "cy". Next token please.
      Lexer: The next token is "324", which is an integer.
      Parser: Ok, both "cy" and "324" are integers, so I can add them. Next token please:
      Lexer: The next token is ";" which is end of statement.
  Parser: Ok, I will evaluate "cy + 324" and get the answer
Parser: I'll take the answer from "cy + 324" and assign it to "cx"

Indents are used here to indicate a subroutine. This illustrates what the interpreter/compiler must do in order to add cy and 324. If the parser gets a token that violates the syntax, it will stop processing and send an error message.

The next module is the Interpreter or, with compilers, the Code Generator, which actually executes the code. With interpreters (as opposed to compilers), this is sometimes part of the parser; the parser interprets and converts the statements into bytecode (i.e., intermediate language, passed off to a compiler). In the case of the compiler itself, the code generate produces machine code that can be executed by the microprocessor.

(Special thanks to Scorpions4ever)

SOURCES & ADDITIONAL READING: Wikipedia, Interpreter (computing), Interpreted Language;

BOOKS: Doug Lowe, Java for Dummies, Wiley Publishing (2005); Cohn, et al., Java Developer's Reference (1996)

Labels: commands, compilers, development environment, operating systems, programming

10 July 2007

Interpreters

0 Comments:

About Me

Previous Posts