Programming computers used to be harder. Don’t get us wrong — today, people tend to solve harder problems with computers, but the fundamental act of programming is easier. We have high-level languages, toolkits, and even help from our operating systems. Most people never have to figure out how to directly read from a disk drive, deblock the data into records, and perform multiplication using nothing but shifts and adds. While that’s a good thing, sometimes it is good to study the basics. That was [gnebehay’s] thought when his university studies were too high level, so he decided to write an arithmetic expression parser in Python. It came out in about 100 lines of code.
Interpreting math expressions is one of those things that seems simple until you get into it. The first problem is correctly lexing the input — a term that means splitting into tokens. For a human, it seems simple that 5-3 is three tokens, {5, -, and 3} and that’s easy to figure out. But what about 5+-3? That’s also three tokens: {5,+,-3}. Tricky.
Precedence is the other problem. If you look at 5+3*2, you should remember that the answer is 11 or 5+(3*2). However, other than by convention, it is equally valid to consider the answer is 16, or (5+3)*2. Another convention is left association so that 7-4+2 is 5 instead of 1. As it turns out, [gnebehay’s] parser currently spits out 1 for that expression, but he’s promised to fix it soon.
While it is interesting to read the code, the real value is the readme file which documents the creation of the parser and some of the theory behind it. This sort of thing used to be a staple of computer science classes, but today you are less likely to encounter it as classes focus on higher-level constructs.
You probably won’t use this code for anything. You rarely need to parse just a math expression and even if you did, there are many tools to help do that now. But you might just learn something about how interpreters and compilers digest text and garner meaning.
Today, you would probably build a parser with ANTLR or some similar tool. While 100 lines seems small, we’ve seen tiny languages that are smaller.
No comments:
Post a Comment