CITS5501 lab 6 (week 7) – grammars and syntax-based testing
It is strongly suggested you complete the recommended readings for weeks 1-6 before attempting this lab/workshop.
A very common place to use grammars is when parsing command-line arguments to applications. As an example, we’ll look at a simplified version of the command-line arguments for Docker, software which allows programs to be run in a lightweight virtual Linux environment.
For instance, suppose I use the Ubuntu distribution of Linux as the main operating system on my computer, but am working on a team which deploys software to servers running MySQL on the Fedora distribution. Docker allows me to easily run MySQL in a lightweight virtual environment which mimics the Fedora distribution.
Take a look at the documentation page for Docker which explains the command-line arguments that can be given to the Docker executable:
The page shows a typical way of documenting a command-line program:
Usage: docker [OPTIONS] COMMAND [ARG...] A self-sufficient runtime for containers. Options: --config string Location of client config files (default "/root/.docker") -c, --context string Name of the context to use to connect to the daemon (overrides DOCKER_HOST env var and default context set with "docker context use") -D, --debug Enable debug mode --help Print usage -H, --host value Daemon socket(s) to connect to (default [])
[... remaining documentation snipped ...]
The syntax used is a little like EBNF, with a few changes:1
[ ]
”) are used to indicate optional
elements (whereas EBNF would use the question mark,
“?
”)...
”) means some element can be repeated
(whereas EBNF would use the plus sign, “+
”)On Linux or MacOS, you can see additional examples of this
documentation style by typing man ls
into a terminal
window. On Windows, enter “cmd.exe” into the search box in the taskbar,
run cmd.exe
, and type dir /?
into the terminal
window.
The following grammar represents a simplified, very small subset of these command-line arguments:
<invocation> ::= "docker " ( "--verbose " )? <subcommand> <image>
<subcommand> ::= "pull " | "run " | "build "
<image> ::= "debian" | "ubuntu" | "fedora"
The question mark indicates an optional element.
In your browser, visit the BNF Playground webpage at https://bnfplayground.pauliankline.com, and type the grammar into the box labelled “Enter your BNF (or EBNF) below”.
Click the button labelled “Compile BNF”. This checks that our grammar follows the rules for BNF or EBNF, analyses it, and converts it into data structures which can be used to build:
In the box labelled “Test a string here!”, type (don’t paste!):
docker pull debian
You’ll see that the box is initially red (indicating the string is not recognized as being in the language we have defined), and then turns green (once it is recognized).
Exercise:
Now try hitting the button labelled “Generate random
<invocation>
” several times, and see what strings are
produced.
Note that the BNF differs slightly from the less formal version we have seen in class, in that it requires spaces be explicitly inserted.
Exercise:
Now put the original grammar back in and compile it again.
Exercise:
ubuntu
” is chosen twice as often as the others?Suppose we wanted to write code ourselves to parse the command-line arguments of a program according to a grammar like this.
For small programs, the simplest way is usally is to implement a
hand-written parser. For an example of what this looks
like, take a look at the parseArgs()
method of the
MyDockerLiteProgram
class in the code for this
workshop.
In the parseArgs()
method, we manually work our way
through the ArrayList
of arguments; after each argument is
verified as valid, we remove it by calling args.remove(0)
and move onto the next argument.
This approach is fine if we only have a small number of arguments to verify, and if they can only be supplied in a fixed order. But for applications with many arguments, and where the arguments can be supplied in flexible order, it’s usual to make use of a command-line parsing library.
Some examples of such libraries are:
These libraries allows us to more easily specify a grammar for validating command-line arguments; as a by-product, they also allow us to automatically generate documentation for our application (of the kind we saw for Docker in section A).
Challenge exercise:
If you are already familiar with the idea of parsing command-line
arguments and would like a challenge: read through the documentation for
either Apache Commons CLI or Picocli, and amend our
MyDockerLiteProgram
class to use one of these libraries to
parse its command-line arguments.
Recall that when testing something that can be represented as a grammar, there are different levels of coverage we might aim to achieve.
Exercise:
Exercise:
Looking at the original grammar – consider how many tests we would need to write if we wanted to get the following sorts of coverage for this grammar.
What would be the minimum number of tests needed? How many tests do you think would be most appropriate?
Exercise:
A recognizer for a language can be regarded as a program which takes in a string, and gives back a boolean saying whether the string is in the language or not.
validArgumentsParseOk
in
the MyDockerLiteProgramTest
class. Adapt this to create a
new test for the test case you described in problem (a).Exercise:
Consider the following grammar:
<list> ::= "1" | "0" <list>
Try entering it into the BNF playground, generating some random strings, and seeing what strings it recognizes.
<list>
non-terminal itself refers to the
<list>
non-terminal. Can you think of an equivalent
grammar – that is, one that defines exactly the same language – which
instead uses asterisks?There is no formal name for this syntax, but it is documented at https://man7.org/linux/man-pages/man7/man-pages.7.html if you are interested (look under “Sections within a manual page” / “Synopsis”) and at https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html.↩︎