Lex Error Handling
Contents |
here for a quick overview of the site Help Center Detailed answers to any questions you might have Meta Discuss the workings and policies of this site About bison error handling Us Learn more about Stack Overflow the company Business Learn more about hiring
Yacc Tutorial
developers or posting ads with us Stack Overflow Questions Jobs Documentation Tags Users Badges Ask Question x Dismiss Join the Stack Overflow Community Stack Overflow is a community of 6.2 million programmers, just like you, helping each other. Join them; it only takes a minute: Sign up Handling error conditions in Lex rather than Yacc? up vote 1 down vote favorite Suppose I have a lex regular expression like [aA][0-9]{2,2}[pP][sS][nN]? { return TOKEN; } If a user enters A75PsN A75PS It will match But if a user says something like A75PKN I would like it to error and say "Character K not recognized, expecting S" What I am doing right now is just writing it like let [a-zA-Z] num [0-9] {let}{num}{2,2}{let}{2,3} And then essentially re-lexing the string in Yacc so that I can have meaningful error conditions How can I get around this? The only thing I can think of is to use named groups? parsing lex flex-lexer lexical-analysis share|improve this question edited May 6 '13 at 21:02 lesmana 13k64470 asked Aug 14 '09 at 23:06 DevDevDev 1,95073473 add a comment| 1 Answer 1 active oldest votes up vote 2 down vote accepted Wow! Interesting scheme. If you're going to detect that in the lexical analyzer, you would have to have a catch-all rule that deals with 'any otherwise unrecognized string' and produces an error message. Determining that it was the K that caused the trouble is going to be hell. [^aA][0-9]{2,2}[pP][sS][nN]? { report_error(); return ERROR; } [aA][0-9]{2,2}[^pP][sS][nN]? { report_error(); return ERROR; } [aA][0-9]{2,2}[pP][^sS][nN]? { report_error(); return ERROR; } [aA][0-9]{2,2}[pP][sS][^nN] { report_error(); return ERROR; } Note the placing of the carets, and the absence of the question mark! Dealing with non-digits, or too many digits, or too few digits - urgh! Generally, you would be better of recognizing all 'identifiers' and then validating which ones are OK: [a-zA-Z][0-9]{2,2}[a-zA-Z]{2,5} { return validate_id_string(); } Choose your poison what yo
Errors Attributes Actions Debugging Yacc Error Messages A nice compiler gives the user meaningful error messages. For example, not much information is conveyed by the following message: syntax error If we track the line number in lex then we can at least give the http://stackoverflow.com/questions/1280606/handling-error-conditions-in-lex-rather-than-yacc user a line number: void yyerror(char *s) { fprintf(stderr, "line %d: %s\n", yylineno, s); } When yacc discovers a parsing error the default action is to call yyerror and then return from yylex with a return value of http://epaperpress.com/lexandyacc/err.html one. A more graceful action flushes the input stream to a statement delimiter and continues to scan: stmt: ';' | expr ';' | PRINT expr ';' | VARIABLE '=' expr '; | WHILE '(' expr ')' stmt | IF '(' expr ')' stmt %prec IFX | IF '(' expr ')' stmt ELSE stmt | '{' stmt_list '}' | error ';' | error '}' ; The error token is a special feature of yacc that will match all input until the token following error is found. For this example, when yacc detects an error in a statement it will call yyerror, flush input up to the next semicolon or brace, and resume scanning.
Linux or Unix, this useful book explains how to use flex and bison to solve your problems quickly. flex & bison is the long-awaited sequel to the classic O'Reilly book, http://archive.oreilly.com/pub/a/linux/excerpts/9780596155971/error-reporting-recovery.html lex & yacc. In the nearly two decades since the original book was published, the flex and bison utilities have proven to be more reliable and more powerful than the original Unix tools. flex & bison http://dinosaur.compilertools.net/yacc/ covers the same core functionality vital to Linux and Unix program development, along with several important new topics. You'll find revised tutorials for novices and references for advanced users, as well as an explanation of each error handling utility's basic usage and simple, standalone applications you can create with them. With flex & bison, you'll discover the wide range of uses these flexible tools offer. The previous chapters discussed techniques for finding errors within bison grammars. In this chapter, we turn our attention to the other side of error detection--how the parser and lexical analyzer detect errors. This chapter presents some techniques to incorporate error detection and reporting into a parser. lex error handling We'll make a modified version of the SQL parser from Parsing SQL that demonstrates them.Bison provides the error token and the yyerror() routine, which are typically sufficient for early versions of a tool. However, as any program begins to mature, especially a programming tool, it becomes important to provide better error recovery, which allows for detection of errors in later portions of the file, and to provide better error reporting. Error ReportingError reporting should give as much detail about the error as possible. The default bison error declares only that it found a syntax error and stops parsing. In our examples, we used yylineno to report the line number. This provides the location of the error but does not report any other errors within the file or where in the specified line the error occurs. The bison locations feature, described later in this chapter, is an easy way to pinpoint the location of an error, down to the exact line and character numbers. In our example, we print out the locations, but precise location information would also allow a visual interface to highlight the relevant text.It is often useful to categorize the possible errors, perhaps building an array of error types and defining symbolic constants to identify the errors. For example, in many languages a common erro
can be thought of as defining an ``input language'' which it accepts. An input language may be as complex as a programming language, or as simple as a sequence of numbers. Unfortunately, usual input facilities are limited, difficult to use, and often are lax about checking their inputs for validity. Yacc provides a general tool for describing the input to a computer program. The Yacc user specifies the structures of his input, together with code to be invoked as each such structure is recognized. Yacc turns such a specification into a subroutine that handles the input process; frequently, it is convenient and appropriate to have most of the flow of control in the user's application handled by this subroutine. The input subroutine produced by Yacc calls a user-supplied routine to return the next basic input item. Thus, the user can specify his input in terms of individual input characters, or in terms of higher level constructs such as names and numbers. The user-supplied routine may also handle idiomatic features such as comment and continuation conventions, which typically defy easy grammatical specification. Yacc is written in portable C. The class of specifications accepted is a very general one: LALR(1) grammars with disambiguating rules. In addition to compilers for C, APL, Pascal, RATFOR, etc., Yacc has also been used for less conventional languages, including a phototypesetter language, several desk calculator languages, a document retrieval system, and a Fortran debugging system. 0: Introduction Yacc provides a general tool for imposing structure on the input to a computer program. The Yacc user prepares a specification of the input process; this includes rules describing the input structure, code to be invoked when these rules are recognized, and a low-level routine to do the basic input. Yacc then generates a function to control the input process. This function, called a parser, calls the user-supplied low-level input routine (the lexical analyzer) to pick up the basic items (called tokens) from the input stream. These tokens are organized according to the input structure rules, called grammar rules; when one of these rules has been recognized, then user code supplied for this rule, an action, is invoked; actions have the ability to return values and make use of the values of other actions. Yacc is written in a portable dialect of C[1] and the actions, and output subroutine, are in C as well. Moreover, many of the syntactic conventions of Yacc follow C. The heart of the input specification is a collection of grammar rules. Each rule describes an allowable structure and gives it a name. Fo