Description
This project consists of writing a Lexical Analyzer for a subset of the Ada programming language. The Lexical Analyzer is to be a module written in the language of your choice that exports the following:
procedure GetNextToken
global variables Token
Lexeme
Value {for integer tokens}
ValueR {for real tokens}
Literal {for strings}
The following are the reserved words in the language (may be upper or lower case):
BEGIN, MODULE, CONSTANT, PROCEDURE, IS, IF, THEN, ELSE,
ELSIF, WHILE, LOOP, FLOAT, INTEGER, CHAR, GET, PUT, END.
The notation for specifying tokens is as follows:
Comments begin with the symbol — and continue to the end of the line. Comments may appear after any token.
Blanks between tokens are optional, with the exception of reserved words. Reserved words must be separated by blanks, newlines, the beginning of the program or the final semicolon.
Token id for identifiers matches a letter followed by letters, underscore and/or digits having a maximum length of 17 characters. Ada identifiers are not case sensitive.
letter -> [a-z,A-Z]
digit -> [0-9]
underscore -> _
id -> letter(letter | digit | underscore )*
Token num matches unsigned integers or real numbers and has attribute Value for integers and ValueR for real numbers.
digits -> digit digit*
optional_fraction -> . digits | e
num -> digits optional_fraction
String literals begin with a “ and end with a “ and should be stored in the literal variable. Strings must begin and end on the same line.
The relational operators (Token relop) are:
=, /=, <, <= ,>, >=
The addop’s are: +, -, and or
The mulop’s are: *, /, rem, mod, and and.
The assignop is: :=
The following symbols are also allowed in the language:
( ) , : ; . “
The Ada subset has the following rules:
Parameterless procedure declarations start the program,
procedures are begun with the reserved word PROCEDURE
followed by an id the word IS then a semicolon.
The body of a procedure starts with the reserved word BEGIN
and terminates with the reserved word END followed by the
name of the procedure and a semicolon.
The tokens for each possible symbol (or type of symbol) should be
declared as an enumerated data type.
To test your project, write a short program that imports (uses) module LexicalAnalyzer to read a source program and output the tokens encountered and the associated attributes (lexeme for identifiers and reserved words, the numeric value for token num, and the symbol itself for all others).
Source code for this and all other assignment must be submitted in a single zip file to the appropriate D2L dropbox on or before the due date.