cs_honey017: February 2010

We shall first discuss what a Data Flow Diagram is.

When it comes to conveying how information data flows through systems (and how that data is transformed in the process), data flow diagrams (DFDs) are the method of choice over technical descriptions for three principal reasons.

1. DFDs are easier to understand by technical and non technical audiences

2. DFDs can provide a high level system overview, complete with boundaries and connections to other systems

3. DFDs can provide a detailed representation of system components

DFDs help system designers and others during initial analysis stages visualize a current system or one that

may be necessary to meet new requirements. Systems analysts prefer working with DFDs, particularly when they require a clear understanding of the boundary between existing systems and postulated systems. DFDs represent the following:

1. External devices sending and receiving data

2. Processes that change that data

3. Data flows themselves

4. Data storage locations

The hierarchical DFD typically consists of a top-level diagram (Level 0) underlain by cascading lower level diagrams (Level 1, Level 2…) that represent different parts of the system.

Defining DFD Components

DFDs consist of four basic components that illustrate how data flows in a system: entity, process, data store, and data flow.

Entity

An entity is the source or destination of data. The source in a DFD represents these entities that are outside the context of the system. Entities either provide data to the system (referred to as a source) or receive data from it (referred to as a sink). Entities are often represented as rectangles (a diagonal line across the right-hand corner means that this entity is represented somewhere else in the DFD). Entities are also referred to as agents, terminators, or source/sink.

Process

The process is the manipulation or work that transforms data, performing computations, making decisions (logic flow), or directing data flows based on business rules. In other words, a process receives input and generates some output. Process names (simple verbs and dataflow names, such as “Submit Payment” or “Get Invoice”) usually describe the transformation, which can be performed by people or machines. Processes can be drawn as circles or a segmented rectangle on a DFD, and include a process name and process number.

Data Store

A data store is where a process stores data between processes for later retrieval by that same process or another one. Files and tables are considered data stores. Data store names (plural) are simple but meaningful, such as “customers,” “orders,” and “products.” Data stores are usually drawn as a rectangle with the right hand side missing and labeled by the name of the data storage area it represents, though different notations do exist.

Data Flow

Data flow is the movement of data between the entity, the process, and the data store. Data flow portrays the interface between the components of the DFD. The flow of data in a DFD is named to reflect the nature of the data used (these names should also be unique within a specific DFD). Data flow is represented by an arrow, where the arrow is annotated with the data name.

For an analyst to achieve the proper characteristics in examining a data flow diagram, he or she must know the set of standards in evaluating a DFD quality.

Evaluating DFD Quality

Readable

-your data flow diagram must be readable so that your audience can understand its contents and what it meant to say.

Internally consistent

-a number of rules and guidelines that help ensure the dataflow diagram is consistent with the other system models -- the entity-relationship diagram, the state-transition diagram, the data dictionary, and the process specification. However, there are some guidelines that we use now to ensure that the DFD itself is consistent.

The major consistency guidelines are these:

*Avoid infinite sinks, bubbles that have inputs but no outputs. These are also known by systems analysts as “black holes,” in an analogy to stars whose gravitational field is so strong that not even light can escape.

*Avoid spontaneous generation bubbles; bubbles that have outputs but no inputs are suspicious, and generally incorrect. One plausible example of an output-only bubble is a random-number generator, but it is hard to imagine any other reasonable example.

*Beware of unlabeled flows and unlabeled processes. This is usually an indication of sloppiness, but it may mask an even deeper error: sometimes the systems analyst neglects to label a flow or a process because he or she simply cannot think of a reasonable name. In the case of an unlabeled flow, it may mean that several unrelated elementary data items have been arbitrarily packaged together; in the case of an unlabeled process, it may mean that the systems analyst was so confused that he or she drew a disguised flowchart instead of a dataflow diagram.

*Beware of read-only or write-only stores. This guideline is analogous to the guideline about input-only and output-only processes; a typical store should have both inputs and outputs. The only exception to this guideline is the external store, a store that serves as an interface between the system and some external terminator.

Accurately represents system requirements

Reduces information overload: Rule of 6 +/- 3

*Single DFD should have not more than 6 +/- 3 processe
*No more than 6 +/- 3 data flows should enter or leave a process or data store on a single DFD

Minimizes required number of interfaces

Data Flow Consistency Problems

Differences in data flow content between a process and its process decomposition

-want to have balancing: equivalence of data content between data flows entering and leaving a process or its decomposition)

Data outflows without corresponding inflows

Data inflows without corresponding outflows

Results in unbalanced DFDs

Black hole - a process with input that is never used to produce a data output

Miracle - a process with a data output that is created out of nothing (I.e. “miraculously appears”)

Most CASE tools perform data flow consistency checking

*Black hole and miracle problems apply to both processes and data stores

Consistency Rules

All data that flows into a process must:

*Flow out of the process or
*Be used to generate data that flow out of the process

All data that flows out of a process must:

Have flowed into the process or

Have been generated from data that flowed into the process

Documentation of DFD Components

Lowest level processes need to be described in detail

Data flow contents need to be described

Data stores need to be described in terms of data elements

Each data element needs to be described

Various options for process definition exist

Some Guidelines about Valid and Non-Valid Data Flows

Before embarking on developing your own data flow diagram, there are some general guidelines you should be aware of.
Data stores are storage areas and are static or passive; therefore, having data flow directly from one data store to another doesn't make sense because neither could initiate the communication.
Data stores maintain data in an internal format, while entities represent people or systems external to them. Because data from entities may not be syntactically correct or consistent, it is not a good idea to have a data flow directly between a data store and an entity regardless of direction.
Data flow between entities would be difficult because it would be impossible for the system to know about any communication between them. The only type of communication that can be modeled is that which the system is expected to know or react to.
Processes on DFDs have no memory, so it would not make sense to show data flows between two asynchronous processes (between two processes that may or may not be active simultaneously) because they may respond to different external events.

Therefore, data flow should only occur in the following scenarios:

· Between a process and an entity (in either direction)

· Between a process and a data store (in either direction)

· Between two processes that can only run simultaneously

Here are a few other guidelines on developing DFDs:

· Data that travel together should be in the same data flow

· Data should be sent only to the processes that need the data

· A data store within a DFD usually needs to have an input data flow

· Watch for Black Holes: a process with only input data flows

· Watch for Miracles: a process with only output flows

· Watch for Gray Holes: insufficient inputs to produce the needed output

· A process with a single input or output may or may not be partitioned enough

· Never label a process with an IF-THEN statement

· Never show time dependency directly on a DFD (a process begins to perform tasks as soon as it receives the necessary input data flows)

*Data flow diagramming is a highly effective technique for showing the flow of information through a system. DFDs are used in the preliminary stages of systems analysis to help understand the current system and to represent a required system. The DFDs themselves represent external entities sending and receiving information (entities), the processes that change information (processes), the information flows themselves (data flows), and where information is stored (data stores).

DFDs are a form of information development, and as such provide key insight into how information is transformed as it passes through a system. Having the skills to develop DFDs from functional specs and being able to interpret them is a value-add skill set that is well within the domain of technical communications.

References: