Error Detection And Correction Coding Information Technology Essay
In this chapter, we will discuss some famous error detection as well as error correction techniques. We present some basic concepts and terminology of coding techniques. Towards the end of the chapter we present some error correction methods.
4.1 Transmission Errors
During the transmission of data bits errors are introduced due to the external phenomena such as radiation, or noise originating from the other parts of the chip. These errors are divided in to two main classes
Content Errors
Flow Integrity Errors
4.1.1 Contents Errors
Content errors are defined as the errors in the content of a message, for example “1” may be received as a “0” and vice versa. Such errors move stealthily in the system due to mutilation of the electrical signal in the transmission medium.
4.1.2 Data Flow Errors
Data flow errors are missing packets of data. For example a data packet may be lost in the network due to congestion of traffic or delivered to a wrong destination or dropped by the network.
Due to the sensitive nature of NoC systems some procedures are introduced into a network communication system to reduce the errors of data communication. These measures include the following
To detect the content errors additional check bits are included in the data. These additional bits are known as check bits.
Mechanisms to correct the data
Recovery of the loss data packet
4.2 Error Detection and Correction Coding
For the detection as well as correction of data, some additional bits are added to data block. As these extra bits do not carry any constructive information to the user so some time these extra bits are also called redundant bits. These extra check bits are chosen in a way that facilities the system to detect and correct errors. The process of encapsulating the check bits is known as coding.
Following is the few terms keeping in view during designing the coding mechanisms:
Check bits are added along block of data bits
The portion of data block consists of check bits is known as code word
Hamming distance is calculated which is defined as the number of bits which differ from each others. For example the distance between the two words as shown in the Fig 4.1 is 2 because two bits have distinct values.
1 1 1 1 1
0 1 1 1 0
Fig 4.1 Hamming distance
Total number of 1’s in a code is termed as code word weight. For example for the code 10000001 , weight is 2
4.2.1 Detection of Errors
After transmitting some data, there is always a possibility of receiving few errors in the received bits. The receiver node is equipped with such mechanisms which can detect errors.
Table 4.1 shows the Hamming distance between transmitted and received code words
Table 4.1 Hamming distance between transmitted and received code words
Transmitted code word
Received code word
Number of errors
Distance
1111
1110
1
1
11111
11100
2
2
111111
111000
3
3
4.2.2 Error Correction
After the detection of error mainly following approaches are used to correct the errors
Reverse Error Correction(REC)
Forward Error Correction(EEC)
4.2.2.1 Reverse Error Correction
Reverse error correction is also known as automatic repeat query (ARQ), in this method a retransmission of data is required.[62]
4.2.2.2 Forward Error Correction
In this technique retransmission of data is not required; check bits transmitted together with the data are used to correct errors without the need for retransmission. [62]
Along with the techniques mentioned above hybrid approaches are also used try to combine the best properties of ARQ and FEC [23, 24, 25]
Generally code’s error detection capability is larger than its error correction capability. Thus, the same code’s detection capabilities can be used for worse error conditions than its correction capabilities if the same detection/correction rate is targeted. Under uniform noise distribution the probability of a link error depends on the link voltage. Therefore, a lower link voltage can be used to detect errors than is required to correct them. As a result, error detection combined with ARQ can be more power efficient than FEC [23] , especially when dynamic voltage scaling is applied [63] . The drawback of ARQ is the retransmission latency. Furthermore, the number of retransmissions depends on error conditions. In persistent noise environments, a large number of retransmissions may result, making ARQ less energy-efficient than FEC. The FEC approach has a fixed, guaranteed throughput, which is important for many applications.
More importantly, the ARQ method fails in the presence of permanent errors (retransmission is only useful for avoiding transient errors). FEC codes can detect and correct permanent errors, but each permanent error will reduce the code’s capability to tolerate transient or intermittent errors. FEC codes which can detect and correct multiple errors (e.g. BCH codes) have large power and area overheads.
The realization of retransmission request and the retransmission has its overheads. In the stop-and-wait method [62] the receiver informs the transmitter after each transmission about the transmission success and if a retransmission is required. This results in latency overhead and throughput decline. The go-back-N method [62] does not stop after each transmission but continues until an error is detected. At this point the receiver informs the transmitter which starts the transmission again from the word where the error occurred. The overhead of this method is the additional area and power for storing past data words in the transmitter as well as the latency and throughput penalty of starting the transmission again. The third ARQ method is selective repeat [62] which is similar to go-back-N, but now only the erroneous word is retransmitted. This approach has area and power overhead for not only storing words at the transmitter but also at the receiver. On the other hand, the delay and throughput penalties are the lowest of the ARQ approaches.
ARQ fits well with asynchronous signaling, where handshaking is used to inform a successful transmission. Since an acknowledgment signal will in all cases be transmitted from receiver to transmitter, it can be extended to contain also information about the status of the transmission. This is achieved for instance by sending either an acknowledgement or a negative acknowledgement. The former means a correct transmission while the latter is a request for a retransmission.
The timing signals in asynchronous links are crucial for the correct operation of the link, and therefore they have to be protected against errors. For this purpose triple modular redundancy (TMR) can be used as proposed in [62]. Furthermore, the three instances of each control signal can be physically dispersed to maintain the tolerance against burst errors. TMR in on-chip signaling means actually the utilization of a repetition code, where the voter is the decoder.
FEC, on the other hand, suits nicely for synchronous signaling. Both of them target for a high throughput without any backward signaling. Synchronous signaling is based on tight timing constraints, and the usage of forward error correction can be sometimes used to loosen these constraints. The code can correct the errors caused by some signals arriving late in the worst case conditions for which the clock frequency would need to be otherwise matched.
4.3 Error Detection Methods
Some of the popular error detection methods are:
Parity Checking
Checksum Error Detection
Cyclic Redundancy Check
Each of above methods has its advantages and limitations as we shall discuss in the following sections.
4.3.1 Parity Checking
Parity checking technique is further divided into two categories.
Single Parity
Double Parity
In single parity checking an additional bit known as parity bit is added to the data word. This additional bit is chosen so that to make total number of ones in the data word become even or odd. If total number of ones in the data along with parity bit become even then this is known as even parity otherwise odd parity. Following Figures illustrate the concept of even and odd parity nore clearly
Even Parity
P
Data Word
1
1
1
1
1
1
1
1
Odd Parity
P
Data Word
1
1
1
1
1
1
1
1
Figure 4.1 Even and odd parity bits.
After the occurrence of a single error or an odd number of errors during transmission, the parity of the code word changes. Parity of the code word is checked at the receiving end and violation of the parity rule indicates errors somewhere in the code word. Figure 4.2 explain this concept more clearly
Transmitted code
1
1
1
1
Even Parity
Received Code
(Single Error)
1
1
1
Odd Parity
(error is detected)
Received Code
(Double Error)
1
1
1
1
Even Parity
(Error is not detected)
Figure 4.2 Error detection by change in parity
A problem with this technique is that double or any even number of errors will go undetected because the resulting parity of the code will not change. Thus a single parity checking method has its limitations. This method is not suitable for multiple errors. The other limitation of single parity checking technique is unable to detect the location of erroneous bit. To overcome this problem and to find the location of erroneous bit double parity checking technique is used.
In double parity checking method the data is divided into rows and columns in the form of a matrix. Parity for each row and each column is calculated. After the transmission of data row and column parities are calculated again and in case of parity error, bit errors are corrected. Figure 4.3 illustrate this concept more clearly
No errors Correctable single bit error
1 0 1 0 1 1 1 0 1 0 1 1
1 1 1 1 0 0 1 0 1 1 0 0
0 1 1 1 0 1 0 1 1 1 0 1
0 0 1 0 1 0 0 0 1 0 1 0
Parity Error
Parity Error
Figure 4.3 Two dimensional even parity
Two-dimensional parity can also detect but not correct any combination of two errors in a packet.
4.3.2 Checksum Error Detection
In checksum error detection method, the d bits of data are treated as a sequence of k bit integers. One simple checksum method is to simply sum these k bit integers and use the resulting sum as the error detection bits. Following example 4.1 explain the method of calculating checksum.
Example 4.1
Find the checksum of the message.
10100101 00100110 11100010 01010101
10101010 11001100 00100100
Solution:
Carries
1 1 1
1 1 1 1 1 1
1 0 1 0 0 1 0 1
0 0 1 0 0 1 1 0
1 1 1 0 0 0 1 0
0 1 0 1 0 1 0 1 Data Bytes
1 0 1 0 1 0 1 0
0 0 1 0 0 1 0 0
1 1 0 0 1 1 0 0
1 0 0 1 1 1 0 0 Checksum Byte
After transmitting the data bytes, the checksum is also transmitted. The checksum is regenerated at the receiving end and errors show up as a different checksum. Further simplification is possible by transmitting the 2’s complement of the checksum in place of the checksum itself. The receiver in this case accumulates all the bytes including the 2’s complement of the checksum. If there is no error, the contents of the accumulator should be zero after accumulation of the 2’s complement of the checksum byte.
4.3.3 Cyclic Redundancy Check
Cyclic redundancy check (CRC) codes are very powerful and are now use most frequently for error detection and correction purposes. CRC codes are also known as polynomial codes. These codes provide a better measure of protection at the lower level of redundancy and can be fairly easily implemented using hardware or software.
A CRC code word of length N with m bit data word is referred to as (N,m) cyclic code and contains (N-m) check bits. These check bits are generated by modulo-2 division. The dividend is the data word followed by n= N- m zeros and the divisor is a special binary word of length n+1. The CRC code word is formed by modulo-2 addition of the remainder so obtained and the dividend. This procedure is illustrated in example 4.2
Example 4.2
Generate CRC code for the data word 110101010 using the divisor 10101?
Solution:
Data word 110101010
Divisor 10101
111000111
10101 1101010100000
10101
11111
10101
10100
10101
11000
10101
11010
10101
11110
10101
1011
1101010100000
1011
Code Word 1101010101011
In the above example, note that the CRC code word consists of the data word followed by the remainder. The code word so generated is completely divisible by the divisor because it is the difference of the dividend and the remainder. Thus, when the code word is again divided by the same divisor at the receiving end, a non zero remainder after so dividing will indicate errors in transmission of the code word.
4.3.3.1 Undetected Errors in CRC
It is not possible for CRC codes to detect all the types of errors. The probability of error detection and the types of errors which can be detected depends on the choice of the divisor. If the number of check bits in CRC code is n, the probabilities of error detection for various types of errors are as given below: [25]
Single error 100%
Two bit error 100%
Odd number of bits in error 100%
Error bursts of length < n + 1 100%
Error bursts of length = n +1 1- (0.5) n-1
Error bursts of length > n +1 1- (0.5) n
4.4 Error Correction Methods
After the detection of error mainly following approaches are used to correct the errors
Forward Error Correction
Reverse Error Correction
4.4.1 Forward Error Correction Methods
In this technique retransmission of data is not required; check bits transmitted together with the data are used to correct errors without the need for retransmission. [24]
Most popular coding techniques used for FEC are given below
Block parity
Hamming code
Convolutional code
4.4.1.1 Block Parity
The concept of block parity checking is to detect and correct single errors. The data block is arranged in a rectangular matrix form and two set of parity bits are generated, namely
Longitudinal Redundancy Check (LRC)
Vertical Redundancy Check (VRC)
LRC is parity bit generated over the rows of bits and VRC is the parity bit associated with the character code. LRC is appended to the end of a data block. The bit 8 of the LRC represents the VRC of the other 7 bits of the LRC. In Figure 4.4 even parity is used for LRC and VRC.
1 1 1 0 1 0 1 0 1
1 1 0 0 0 0 0 1 1
0 1 1 0 1 1 1 0 1
0 1 1 0 0 0 0 0 0
0 0 0 1 1 1 0 1 0
0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 0
1 1 0 0 0 1 1 1 1
Even parity Even parity Bits (VRC) Bits (LRC)
Figure 4.4 Vertical and Longitudinal parity check bits.
Any single error in any bit results in failure of longitudinal redundancy check in one of the rows and vertical redundancy check in one of the column. The bit which is common to the row and column is the bit in error. Multiple errors in rows and columns can be detected but can not be corrected as the bits which are in error cannot be located.
4.4.1.2 Hamming Code
This is a single error correcting code devised by Hamming. In this code, there are multiple parity bits in a code word. Bit positions 1,2,4,8,16,32,…. are reserved for the parity bits. The other bit positions are for the data bits. The number of parity bits required for correcting single bit errors depends on the length of the code word. A code word of length n contains m parity bits, where m is the smallest integer satisfying the conditions:[26]
2m ≥ n + 1
Following Figure 4.5 shows the layout of the Hamming code, where P stand for parity bit and D stand for data bit.
P1
P2
D
P4
D
D
D
P8
D
D
……..
Figure 4.5 Location of parity bits in Hamming code
Calculating the Hamming Code
Calculation of Hamming code depends upon the calculation of extra bit value which depends upon the previous bit values.
Finding and fixing a bad bit
Hamming code is generally a error detection and correction code. One corrupt bit of data can be checked and corrected by using Hamming code at the destination itself. Suppose that incorrect word received was 011100101110 instead of correct word 011100101010. Receiver performs the calculation to figure out the incorrect bits. In this process incorrect bit numbers are added to find the correct bit value. By checking and adding the wrong parity bits, location of the bad bit is determined.
4.4.1.3 Convolutional Code
Convoutional codes are used to achieve reliable data transfer extensively used in numerous applicatrions including digital video, radio, mobile communication and satellitre communicaton.
Convolutional codes are generated over a span of data bits, e.g., a convolutional code of constraint length 3 is generated bit by bit always using the “last three data bits”.
Following Figure 4.6 shows a simple convolutional encoder consisting of a shift register having three stages and EXOR gates which generate two output bits for each input bit.
Input Data Bits
Output1
Output2
Figure 4.6 Simple convolutional encoder
A convolutional encoder is a finite state machine. State transmission diagram of this encoder is also shown in Figure 4.7. Each circle in the diagram shows a state of the encoder, which is the content of two leftmost stages of the shift register. There are four possible states 00,01,10,11. Arrows represent the state transitions for the input bit which can be 0 or 1.
A
B
0/11
0/10
1/10
C
D
1/00
0/01
01
11
00
10
1/11
Figure 4.7 State transmission diagram of encoder
The label on each arrow shows the input data bit by which the transitions is caused and the corresponding output bits. As an example, suppose the initial state of the encoder is 00 and the input data sequence is 1011. The corresponding output sequence of the encoder will then be 11010010.
Alternative approach to represent the state is by using trellis diagram. Here the four states are represented as four levels. The arrows represent state transitions as in the state transition diagram. The labels on the arrows indicate the output. By convention, a “0” input is always represented as an upward transition and a “1” input as a downward transition. Any state diagram can also be converted into trellis diagram.
Figure 4.8 [64] Trellis diagram of convolutional encoder shown in Figure 4.6
4.4.2 Reverse Error Correction Methods
We have seen some of the methods of forward error correction but reverse error correction is more economical than forward error correction in terms of the number of check bits. Therefore, usually error detection methods are implemented with an error correction mechanism which requires the receiver to request the sender for retransmission of the code word received with errors. Following are few basic mechanisms of reverse error correction:
Stop and Wait
Go back N
Selective Retransmission
4.4.2.1 Stop and Wait
In this scheme, the sending end transmits one block of data at a time and then waits for acknowledgement from the receiver. If the receiver detects any error in the data block, it sends a request for retransmission in the form of negative acknowledgement. If there is no error, the receiver sends a positive acknowledgement in which case the sending end transmits the next block of data. Figure 4.7 illustrates the mechanism.
Sender
Receiver
Data block with check bits
No errors, positive acknowledgement
Next data block
Errors, Negative acknowledgement
Retransmission
Time, tn
Time, to
Figure 4.7 Reverse error correction by Stop and Wait mechanism
4.4.2.2 Go back N (GBN)
In this mechanism all the data blocks are numbered (sequence number) and the sending end keeps transmitting the data blocks with check bits. The sender is allowed to transmit multiple packets (when available) without waiting for any acknowledgement, but is constrained to have no more than some maximum allowable number, N, of unacknowledged packets in the pipeline.
Whenever the receiver detects error in a block, it sends a retransmission request indicating the sequence number of the data block received with errors. The sending end then starts retransmission of all the data blocks from the requested data block onwards.
Figure 4.8 shows the sender’s view of the sequence numbers in Go Back N. If we define base to be the sequence number of the oldest unacknowledged packet and nextseqnum to be the smallest unused sequence number i.e. the sequence number of the next packet to be sent. Then under the above terminologies Go back N mechanism can be partitioned in the following portions.
Sequence numbers in the interval [0, base -1] corresponds to packets that have already been transmitted and acknowledged.
The interval [base, nextseqnum -1] corresponds to packets that have been sent but not yet acknowledged.
Sequence numbers in the interval [nextseqnum, base+N-1] can be used for packets that can be sent immediately as soon as data arrived.
Sequence numbers greater than or equal to base +N cannot be used until an unacknowledged packet currently in the pipeline (specifically, the packet with sequence number base) has been acknowledged.
nextseqnum
base
Window size
Already
ACK’d
Sent not yet ACK’d
Usable, not yet sent
Not useable
Figure 4.8 shows the sender’s view of the sequence numbers in Go Back N
Sender and receiver operations are quite simple in Go back N mechanisms. Sender sends the packets according to the window size in the sequence and wait for acknowledgements for all the sent packets. In case of data corruption retransmission of all the packets take place.
4.4.2.3 Selective Retransmission
In the GBN protocol channel utilization is one of the biggest problems thus to avoid the channel utilization and retransmission of large number of packets new technique is widely in use know as “Selective Retransmission”.
As the name suggests, SR protocols avoid unnecessary retransmissions by having the sender retransmit only those packets that received in errors. The SR receiver will accept correctly received packet received in any sequence whether in order or not in order. SR requests for selective retransmission of the data block containing errors. On receipt of the request, the sending end retransmits the data block but skips the data blocks already transmitted and continues with the next data block.
Following Figure 4.9 gives a brief overview of sender and receiver view of sequence numbers.
Window Size N
Next Sequence Number
Send base
a. Sender view of Sequence Numbers
Window Size N
Received base
b. Receiver view of Sequence Numbers
Figure 4.9 gives sender and receiver view of sequence numbers
Order Now