It’s sometimes useful to make a little language for a simple problem. We are making a language to let us play with strings, and for this assignment we are building a lexical analyzer for this simple language.
Here are the lexical rules for the language:
- The language has identifiers. An identifier starts with a letter and is followed by zero or more letters.
- The language has string constants. A string constant is a sequence of characters, all on one line, enclosed in double quotes.
- The language has 2 operators: + for set union and ^ for set intersection
- The language has 4 keywords: “set”, “print”, “search” and “for”
- Statements in the language end in a semicolon
- The language supports parentheses
White space is used to separate tokens and lines for readability.
A comment begins with two slashes (//) and ends at a newline.
The lexical analyzer is to be implemented in a C++ function. The function will be passed a pointer to an input stream to read from. It will need to return the token that has been recognized and the lexeme for that token.
The definitions for the unique values for each of the tokens that you must recognize is provided in the header file p2lex.h, which is on the course website. The lexical analyzer will ignore white space and comments.
Your program will ignore whitespace and use it to note separation between tokens. The program should maintain an external integer named “linenum”, which should be incremented whenever a newline is seen by the lexer.
A lexical error should cause the token ERR to be returned. An end of file should cause DONE to be returned.
You MUST use the p2lex.h header file .You may not change it.