Input | Output | Details | Training data | Test data | Contact
The topics in this third task are patent images representing flow-charts. Participants in this task will be asked to extract the information in these images and return it in a predefined textual format.
MT - META : refers to meta information about the flow chart.
MT Title : title of the chart, in double quotes - optional
MT NO : number of nodes in the flow chart - required
MT DE : number of directed edges in the flow chart - required
MT UE : number of unidirected edges in the flow chart - required
NO - NODE: the line starting with NO describes a node in the flow chart. Each such line must contain:
NO : identifier of the line describing a flow chart's node. Required
id : integer : identifier of the node in the flow chart. Required
node-type : a keyword describing the type (shape) of the node. Possible values: oval, rectangle, double-rectangle, parallelogram, diamond, circle, point, no-box, cylinder, unknown. Required
text : a string between double quotes : the text appearing in the flow chart's node. Optional
DE - directed edge
UE - unidirected edge
: the line starting with DE or UE describes an edge in the flow chart. Each such line must contain:
DE | UE : identifier of the line describing a flow chart's edge. Required
start-node : integer : the identifier of the flow chart node where the edge has its starting point, Must occur on a line with the 'NO' identifier. Required
end-node : integer : the identifier of the flow chart node where the edge has its ending point. must occur on a line with the 'NO' identifier. Required
type : a keyword describing the edge. May take one of the following values: plain, dotted, wiggly, unknown.Required
text : a string between double quotes : the text attached to the edge. Optional
CO - COMMENT : denotes a line that contains comments.
MT Title "Fig.7" MT NO 10 MT DE 5 MT UE 4 CO ====== Now comes the list of nodes ====== CO === identifier type text ============ NO 1 oval "BEGIN" NO 2 rectangle "RECEIVE AND DIGITIZE IMAGE" NO 3 rectangle "DISPLAY IMAGE AND SELECT CHART" NO 4 rectangle "DESIGNATE APPROXIMATE POSITIONS" NO 5 rectangle "RECOGNIZE GRAPHICAL OBJECT AND OUTPUT" NO 6 oval "BEGIN" NO 7 no-box "80" NO 8 no-box "82" NO 9 no-box "84" NO 10 no-box "86" CO ========== Here come the edges =========== CO === start-node end-node type text ======== DE 1 2 plain "" DE 2 3 plain "" UE 2 7 wiggly "" DE 3 4 plain "" UE 3 8 wiggly "" DE 4 5 plain "" UE 4 9 wiggly "" DE 5 6 plain "" UE 5 10 wiggly "" CO ======== THIS IS IT =======================
The main evaluation measure will be the graph distance metric based on the mcs, most common subgraph (see Bunke and Shearer, 1998 and Wallis et. al., 2001).
The distance between the topic flowchart Ft and the submitted flowchart Fs is computed as:
where |.| denotes the size of the flowchart/graph. In our case, the size of the flowchart is the number of edges plus the number of nodes.
The distance between the topic and the submitted flowcharts will be computed at three levels:
The so-called wiggly edges are the only part of the result which involves a bit of semantics. Wiggly edges connect the nodes of the flowchart with their labels. After intense discussions, we have decided that rather than have you discard them, they should be returned. After all, if you can discard them, it means you have already identified them as being of the wiggly type.
The semantic part comes into play here, because wiggly edges are not always wiggly. In the vast majority of cases, they are, but in some cases they are simple straight lines. In such cases you still have to return them as wiggly, because they are not actually part of the flowchart, but they are annotations. One way to determine that they are annotations is to observe that one of their nodes does not have a frame, or is just a number or short out-of-vocabulary term. This is unfortunately not a rule that will always apply.
There are 10 types of nodes available.
As you may observe, the difference between a rectangle, oval and diamond will often be determined by its corners. Many ovals are actually rectangles with rounded corners. Some diamonds will be hexagons (rectangles with cut corners)
For questions and anything else regarding this task, please contact Mihai Lupu (lupu at ifs.tuwien.ac.at) or Florina Piroi (piroi at ifs.tuwien.ac.at)