Franck Pommereau - Blog - Randomized C assessments with AMC

In a previous post I’ve explained how I randomise an AMC exam by generating LaTeX source to be included by AMC during the compilation. Here, I show how I reuse this idea to assess students understanding of a randomised C program.

With respect to the previous post, there are two issues to be solved:

How to generate a program that is different for every student
How to ensure that the questions are consistent with the generate program

To solve (1), I’ll use the tool badass; for (2), I’ll systematically run the generated programs to extract their exact behaviour.

Generating randomised programs

Let’s consider a simple algorithm that counts the number of values in an array A that are also in an array B:

typedef struct {
  unsigned int len;
  int *val;
} Tab;

int count (Tab A, Tab B) {
  int C = 0;
  int I, J, F;
  for (I=0; I<A.len; I++) {
    F = 0;
    for (J=0; J<B.len; J++) {
      if (A.val[I] == B.val[J]) {
        F = 1;
        break;
      }
    }
    if (F) {
      C++;
    }
  }
  return C;
}

With slight modifications, we can compute the smallest value that is in A but not in B (I’ll use only small values hence the use of 999 instead of INT_MAX):

int sum (Tab A, Tab B) {
  int C = 999;
  int I, J, F;
  for (I=0; I<A.len; I++) {
    F = 0;
    for (J=0; J<B.len; J++) {
      if (A.val[I] == B.val[J]) {
        F = 1;
        break;
      }
    }
    if ((!F) && A.val[I] < C) {
      C = A.val[I];
    }
  }
  return C;
}

It is easy to see that we can derive a range of similar algorithms that consist in searching all the values within an array, that belong (or not) to another array, and process each such value (by counting, summing, taking the min/max, etc.).

So we can define a parametric version of our algorithm:

// PARAM may swap A and B, or not
int FORALLIN (PARAM(Tab A,Tab B)) {
  int C = INIT; // initial value of C depends on the algorithm
  int I, J, F;
  for (I=0; I<A.len; I++) {
    F = FALSE; // we may use 0 or 1...
    for (J=0; J<B.len; J++) {
      if (A.val[I] == B.val[J]) {
        F = TRUE; // ..and here 1 or 0
        break;
      }
    }
    if (COND(F,C,A.val[I])) { // the condition depends on the algorithm...
      // ...as well as how each value is used
      PROCESS(C,A.val[I]);
    }
  }
  return C;
}

To generate instances of such a generic algorithm, I’ve developed module badass.p5 (Python-powered pre-pre-processor). To use it, first pip install not-so-badass. badass itself is a more general tool that I’ll use in next post to automatically assess C code for programming assessments. badass.p5 is a pre-processor that prepares code to be run through cpp, its goal is to process the generic parts of the code in order to avoid showing them to the students while preserving the cpp directives. In particular, it allows to generate random identifiers, or random values for test variables.

I’ll show and comment excerpts of the source code that can be found here. First, we generate macro PARAM that will swap or not the parameters of the function.

//let swap = randint(0, 1)
//if swap
  //let PARAM(x,y) = "y, x"
//else
  //let PARAM(x,y) = "x, y"
//end if

This //let, //if, //else and //end if are C comments that are at the same time p5 directives, similar respectively to cpp’s #define, #if, #else, and #endif. The main difference is that expressions here are Python expressions. So we define a variable swap that is either 0 or 1, and, depending on its value, we define macro PARAM accordingly. (Note that we can indent these directives because the lines will completely disappear in the output.)

Then, we chose what the algorithm shall do with the selected values, and chose INIT and PROCESS accordingly:

//let comp = choice("min", "max", "sum", "num")
//if comp == "min"
  //let INIT = 999
  //let PROCESS(c,v) = "c = v"
//else if comp == "max"
  //let INIT = 0
  //let PROCESS(c,v) = "c = v"
//else if comp == "sum"
  //let INIT = 0
  //let PROCESS(c,v) = "c += v"
//else if comp == "num"
  //let PROCESS(c,v) = "c++"
  //let INIT = 0
//end if

Now we chose the values for FALSE and TRUE:

//let FALSE = randint(0, 1)
//let TRUE = 1 - FALSE

We also chose whether the condition about F will be negated of not, and we define COND accordingly (choosing among equivalent conditions whenever possible):

//let NOT = randint(0, 1)
//if NOT
  //if comp == "min"
    //let COND(f,c,v) = choice("(!f) && (v < c)", "(!f) && (c > v)", "(v < c) && (!f)", "(c > v) && (!f)")
  //else if comp == "max"
    //let COND(f,c,v) = choice("(!f) && (c < v)", "(!f) && (v > c)", "(c < v) && (!f)", "(v > c) && (!f)")
  //else
    //let COND(f,c,v) = "!f"
  //end if
//else
  //if comp == "min"
    //let COND(f,c,v) = choice("f && (v < c)", "f && (c > v)", "(v < c) && f", "(c > v) && f")
  //else if comp == "max"
    //let COND(f,c,v) = choice("f && (c < v)", "f && (v > c)", "(c < v) && f", "(v > c) && f")
  //else
    //let COND(f,c,v) = "f"
  //end if
//end if

Finally, we compute a last value join that tells us whether we select from the intersection of the two arrays or their difference. This actually depends on FALSE and NOT as follows:

//let join = "sub" if FALSE ^ NOT else "and"

To further randomise the algorithm, we can chose random identifiers (p5 will chose names from the English dictionnary that are not C keywords, nor one of our //leted names, nor offending words) and shuffle the declarations:

//let FORALLIN = randname()
//let A = randname()
//let B = randname()
//let C = randname()
//let I = randname()
//let J = randname()
//let F = randname()
int FORALLIN (PARAM(Tab A,Tab B)) {
  //shuffle
  int C = INIT;
  int I;
  int J;
  int F;
  //end shuffle
  /* the rest is a above */

Finally, we define a main function to test the algorithm on random data:

{% raw %}
//let U = randname()
//let TABU_LEN = randint(5, 10)
//let TABU_VAL = randint(1, 10, len=TABU_LEN, fmt=None)
//let TABU = f"{{.len={TABU_LEN}, .val=malloc({TABU_LEN} * sizeof(int))}}"
//let V = randname()
//let TABV_LEN = randint(5, 10)
//let TABV_VAL = randint(1, 10, len=TABV_LEN, fmt=None)
//let TABV = f"{{.len={TABV_LEN}, .val=malloc({TABV_LEN} * sizeof(int))}}"
int main () {
  //shuffle
  Tab U = TABU;
  Tab V = TABV;
  //end shuffle
  //for IDX, VAL in enumerate(TABU_VAL)
  U.val[IDX] = VAL;
  //end for
  //for IDX, VAL in enumerate(TABV_VAL)
  V.val[IDX] = VAL;
  //end for
  printf("%i\n", FORALLIN(U, V));
  //shuffle
  free(U.val);
  free(V.val);
  //end shuffle
}
{% endraw %}

We have now randint that is used with a parameter len in which case it returns an array. This array is usually rendered as a C literal but with fmt=None we get a Python list instead, so we can iterate on its values latter on (see //for loops).

With this program saved as algo.c, you can run badass p5 algo.c to generate a randomised instance. The problem with this command is that (1) we get the resulting code but not the value of the random parameters, but they are required to generate questions about the code; and (2) the pseudo-random number generator being seeded with a fixed constant, we’ll generate always the same program (this latter remark gives me the idea to add an option in badass CLI).

Retrieving the parameters to generate questions

To retrieve and use the random parameters, we use badass.p5 as a library, see gen/cgen.py for the full code.

First, we define a class Exo that extends p5.PrePreProcessor. It’s constructor fixes the path of the source code and retrieves the three parameters needed to know what the code does (in attribute self.top.cppenv):

import tempfile, subprocess, pathlib, random
import badass.p5 as p5

class Exo (p5.PrePreProcessor) :
    def __init__ (self, path="algo.c") :
        super().__init__(path)
        self.swap = self.top.cppenv["swap"]
        self.join = self.top.cppenv["join"]
        self.comp = self.top.cppenv["comp"]

Then, we define a method run to compile the code, run it, and retrieve its standard output (attribute self.cpp is the source code fully processed):

    def run (self) :
        with tempfile.NamedTemporaryFile(mode="w", encoding="utf-8", suffix=".c") as tmp :
            tmp.write(self.cpp)
            tmp.flush()
            subprocess.check_output(["gcc", tmp.name])
        return subprocess.run(["./a.out"],
                              capture_output=True,
                              encoding="utf-8").stdout.strip()

Now, we define a method that computes the expected result of the generic code according to its parameters swap, join, and comp, it will be used to check that the generated programs are correct, it will also be used to compute the function result on varied parameters:

    def compute (self, swap, join, comp) :
        U, V = self.top.cppenv["TABU_VAL"], self.top.cppenv["TABV_VAL"]
        if swap :
            U, V = V, U
        if join == "and" :
            T = [u for u in U if u in V]
        elif join == "sub" :
            T = [u for u in U if u not in V]
        else :
            assert False, "unexpected case"
        if comp == "min" :
            return min(T, default=999)
        elif comp == "max" :
            return max(T, default=0)
        elif comp == "sum" :
            return sum(T, 0)
        elif comp == "num" :
            return len(T)
        else :
            assert False, "unexpected case"

To ask students what their algorithms do, we also need a method that enumerates all the possible variants for our generic algorithm:

{% raw %}
    def algos (self, tex=False) :
        _comp = {"min" : "the smallest value that is",
                 "max" : "the largest value that is",
                 "sum" : "the sum of the values that are",
                 "num" : "the number of values that are"}
        _join = {"sub" : "in {X} but not in {Y}",
                 "and" : "both in {X} and in {Y}"}
        for swap in (0, 1) :
            X, Y = self.top.cppenv["A"], self.top.cppenv["B"]
            if swap :
                X, Y = Y, X
            if tex :
                X = fr"\texttt{{{X}}}"
                Y = fr"\texttt{{{Y}}}"
            for join in ("sub", "and") :
                for comp in ("min", "max", "sum", "num") :
                    where = _join[join].format(X=X, Y=Y)
                    good = (swap, join, comp) == (self.swap, self.join, self.comp)
                    value = self.compute(swap, join, comp)
                    if good :
                        assert int(self.run()) == value, "inconsistent value"
                    yield good, value, f"{_comp[comp]} {where}"
{% endraw %}

Parameter tex allows to chose whether to output text as LaTeX code or plain text (default). Essentially, we enumerate all the possible values for the generic parameters swap, join and comp, and for each combination yield a tuple with:

A Boolean that is True iff current combination is that of this instance of Exo
The value expected from a run of the function with these parameters
A sentence that describes in English what this variant actually computes

Note the assert int(self.run()) == value, "inconsistent value" that checks whether the actual code provides the expected result.

I’ve also defined a method png that prints the generated code as a PNG image, the purpose is to include it in the exam to forbid students to copy/paste the code in order to see the result. I want them to read the code and reason about it, not to run it cluelessly. Of course, they can still type the code, but this will take quite a long time which will be better spent reading and understanding (the exam being passed in limited time).

Finally, a method tex outputs three questions, two of which are the same, I’ll comment this below. Instead of showing the code that is quite simple, let me show you the result for the first instance (with repetitive parts snipped):

{% raw %}
\expandafter\def\csname cpp:swap\endcsname{1}
%% <snip> %%
\expandafter\def\csname cpp:TABV\endcsname{{.len=8, .val=malloc(8 * sizeof(int))}}

\def\C#1{\csname cpp:#1\endcsname}

\element{COMP1}{
  \begin{questionmult}{C1}
    What does function \C{FORALLIN} computes?
    \AMCBoxedAnswers
    \begin{reponses}
      %% <snip> %%
      \correctchoice{the sum of the values that are in \texttt{lazied} but not in \texttt{piecing}}
      \wrongchoice{the number of values that are in \texttt{lazied} but not in \texttt{piecing}}
      %% <snip> %%
    \end{reponses}
  \end{questionmult}
  \begin{questionmult}{C2}
    What does the program prints?
    \begin{multicols}{4}
      \AMCBoxedAnswers
      \begin{reponses}
      %% <snip> %%
        \correctchoice{9}
        \wrongchoice{10}
      %% <snip> %%
      \end{reponses}
    \end{multicols}
  \end{questionmult}
  \begin{questionmult}{C3}
    What does the program prints?
    \AMCnumericChoices{9}{digits=3,vertical=true,scoreexact=1,scoreapprox=0,borderwidth=1pt,sign=false}
  \end{questionmult}
}
{% endraw %}

To start with, we define all the constants generated by \\let directives, and a macro \C to retrieve them. Then note that we generate groups COMP1, COMP2, etc., one for each exam, inside these groups there are three questions C1, C2, and C3, with the same identifier for every exam. To generate C1 we just iterate over the triples yield by method algos, doing so, we also collect all the return values from all the algorithms, as well as the correct answer for the specific variant we are currently processing. These collected values allow to generate question C2, in order to guarantee we don’t have duplicated values, we collect them in a set. So, if several variants generate the same value, even if it’s the correct one, there will be no duplicated choices in the question. This is something especially difficult to guarantee when randomisation is made directly within LaTeX, and I’ve had such problems in some exams before, which motivated the switch to Python. Question C3 is the same as question C2 but using an AMCnumericChoices to get the answer from students. I recommend this method over a list of choices as in question C2 because it forces the students to think about the question and build an answer, while choosing in a list may lead them to test the proposed values one by one, which is less interesting in our case. (It may be more interesting in other cases, so both methods may be useful depending on the situation.)

The last step is to generate all the files before to compile the exam with AMC:

if __name__ == "__main__" :
    import tqdm
    inc = pathlib.Path("../inc")
    inc.mkdir(exist_ok=True)
    for num in tqdm.tqdm(range(1, 11)) :
        e = Exo()
        with open(inc / f"C{num}.tex", "w") as out :
            e.tex(num, out)
        with open(inc / f"C{num}.c", "w") as out :
            e.save(out)
        e.png(inc / f"C{num}.png", 2)

Note that additionally to the PNG and TeX file, we also save a C file. This will be the same source code as we’ve seen before except that all the parameters will be fixed using //let directives, and //shuffle, //if, and //for directives will be fully processed. So, using for instance Exo("inc/C1.c") results in the same object as the first instance we have generated in the loop just above. As explained in the previous post, this may be crucial if we need to fix something after the exam is passed.

All done

In a next post, I’ll show how to assess code written by students in an automated way, which is what badass was actually made for.