[an error occurred while processing this directive]

Flowchart recognition task 2013

Input | Output | Details | Training data | Test data | Contact

The topics in this third task are patent images representing flow-charts. Participants in this task will be asked to extract the information in these images and return it in a predefined textual format.

The text file will contain, as a rule, structural information. The point is to obtain as much as the information present in the image, in such a way as to be able to process it further for the purposes of patent search. As a rule, you will not be asked to make inferences about the nature of the nodes or edges. The nodes are simply to be represented by the geometrical figure: rectangle, circle, parallelogram, etc. (see below list). The edges are also just plain or dotted, but there is one special type of edge: wiggly which denotes the link between an actual node and its label, as seen in the images below.

Input

Tiff images containing flowcharts.

Unlike last year, we no longer filter out flowcharts with additional information in them. In particular, it is relatively often the case that a set of nodes are encased in a meta-node to denote a specific part of the flowchart. Below are some examples. Here is a small set of 17 images with features different that what you might have seen last year.



 back to top

Output

A text file describing the flowchart. In principle it is the same as last year, except we need some extra information. In particular, we need :
  • Node coordinates The coordinates of the center of each node. They will not be used for evaluation, but to help us create better visualization tools.
  • Meta-node information An additional type of lines, which will list the metanodes, if any exist in the graph.

The text file is a sequence of lines, each line prefixed with a mark to identify the information on the line, as follows:

MT - META : refers to meta information about the flow chart.
MT Title : title of the chart, in double quotes - optional
MT NO : number of nodes in the flow chart - required
MT DE : number of directed edges in the flow chart - required
MT UE : number of unidirected edges in the flow chart - required

NO - NODE: the line starting with NO describes a node in the flow chart. Each such line must contain:
NO : identifier of the line describing a flow chart's node. Required
id : integer : identifier of the node in the flow chart. Required
node-type : a keyword describing the type (shape) of the node. Possible values: oval, rectangle, double-rectangle, parallelogram, diamond, circle, point, no-box, cylinder, unknown. Required
text : a string between double quotes : the text appearing in the flow chart's node. Optional
location : a pair of integers, enclosed in () and separated by comma identifying the center of the node, with respect to the top left corner of the image Optional

DE - directed edge
UE - unidirected edge
: the line starting with DE or UE describes an edge in the flow chart.
Each such line must contain:
DE | UE : identifier of the line describing a flow chart's edge. Required
start-node : integer : the identifier of the flow chart node where the edge has its starting point, Must occur on a line with the 'NO' or 'MN' identifier. Required
end-node : integer : the identifier of the flow chart node where the edge has its ending point. must occur on a line with the 'NO' or 'MN' identifier. Required
type : a keyword describing the edge. May take one of the following values: plain, dotted, wiggly, unknown.Required
text : a string between double quotes : the text attached to the edge. Optional

CO - COMMENT : denotes a line that contains comments.
MN - META-NODE: the line starting with MN describes a meta node in the flow chart. Each such line must contain:
MN : identifier of the line describing a flow chart's meta-node. Required
id : integer : identifier of the metanode in the flow chart. MUST be different from any other NO ids.Required
nodelist : a comman separated, [] enclosed list of node identifiers (the must match existing nodes on NO lines) which are enclosed in this metanode Required
text : a string between double quotes : the text appearing in the flow chart's node. Optional

Example of output file:

MT Title "Fig.7"
MT NO 10
MT DE 5
MT UE 4
CO ====== Now comes the list of nodes ======
CO === identifier  type  text   ============
NO 1 oval "BEGIN" (100,100)
NO 2 rectangle "RECEIVE AND DIGITIZE IMAGE" (100,150)
NO 3 rectangle "DISPLAY IMAGE AND SELECT CHART" (100,200)
NO 4 rectangle "DESIGNATE APPROXIMATE POSITIONS" (100,250)
NO 5 rectangle "RECOGNIZE GRAPHICAL OBJECT AND OUTPUT" (100,300)
NO 6 oval "END" (100,350)
NO 7 no-box "80" (180,160)
NO 8 no-box "82" (180,210)
NO 9 no-box "84" (180,260)
NO 10 no-box "86" (180,310)
CO ========== Here come the edges ===========
CO === start-node end-node type text ========
DE 1 2 plain ""
DE 2 3 plain ""
UE 2 7 wiggly ""
DE 3 4 plain ""
UE 3 8 wiggly ""
DE 4 5 plain ""
UE 4 9 wiggly ""
DE 5 6 plain ""
UE 5 10 wiggly ""
CO ======== A METANODE =======================
MN 11 [2,3] "fake metanode" 
CO ======== THIS IS IT =======================

 back to top

Details

Evaluation

By and large, the evaluation will be similar to the one in the last year. Namely, we look at the ultimate result rather than the location and tagging of specific image segments. We have also investigated a way to evalute semi-automatically, which we will publish soon.
Please see the page from last year for details on node types and other potential issues.

Training data

Here you have a set of 50 images containing flowcharts, and here you have the corresponding text files. Here you have the 100 images from last year, and here are the corresponding text files.

 back to top

Test data

The set of test topics contains 747 black and white images. You can download it here.

 back to top

Contact

For questions and anything else regarding this task, please contact Mihai Lupu (lupu at ifs.tuwien.ac.at) or Florina Piroi (piroi at ifs.tuwien.ac.at)

 back to top