By assigning a likelihood for each category in all fields of the new form in the first stage, a matrix is built.
As a field can only be of a single category, the length of each line is normalised to 1.0.
A possibility for comparing this matrix with the existing forms in the Database has to be found. This is done by summing up over each column. The resulting vector has n dimensions and can directly be compared to other key-vectors as described in Section 5.3.1.
Essential for this comparison is a proper definition of the default probability in the first stage. If this step sets the values of likelihood too high, a tendency to vectors having many entries arises. Similarly, short vectors will be favoured if the initialisation is pessimistic.
Experiments have shown, however, that this means of comparing not yet categorised forms to existing entries in the Database is stable. Any misinterpretations can be allowed for and corrected in a subsequent step.