经典IC设计面试问题5
“Delta-delay” as a word exists only in VHDL and not in Verilog. However, the concepts explained here are applicable both for Verilog and VHDL.
"Delta delay" refers to an infinitesimal small amount of delay. With delta-delays, the simulation time does NOT advance.
In VHDL, if there is an assignment to a signal, the values on the RHS of the assignments are read immediately, but, the value on the LHS is updated - after a delta-delay. However, if the assignment is to a variable, the LHS is also updated immediately.
Lets look at the following example excerpt (from within a "process" block):
a <= b;
x <= a;
Here, "b" is sampled immediately, but, the value of "a" will be updated "slightly" later. This "slightly" is within the same time-stamp, but, after a delta-delay.
By the time, the next statement is executed, "a" is still not updated. Hence, "x" would get the old value of "a".
On the other hand, if "a" and "x" were variables, and, the assignments were of the form:
a := b;
x := a;
"x" would have gotten the new value of "a" - which is "b".
In Verilog the same concept exists in the form of Non-Blocking Assignment (which has an infinitesimally small delay associated), and, Blocking Assignments (where, the assignments are immediate).
These delta-delays have an impact on simulation-race, and, if used effectively, can help get rid of simulation race.
Simulation race is something which impacts only the digital simulation – for digital electronics devices
Though, a simulation race could occur because of several reasons, in this article we examine the race due to “delta-delay”.
Besides, “delta-delay”, another factor which causes these specific races is the behaviour of certain devices (say: flops) – wherein, a value is sampled once, and, that sampled value continues to determine the behaviour for some time.
Lets consider the following circuit:
At the positive edge of “C”, the flop samples the value of “D”, and, that value is transferred onto “Q”. This sampled value of “D” will remain on “Q”, till the next positive edge of “C”. Lets consider, just at the positive edge of “C”, “D” itself was also changing, from “0” to “1”. Now, because of sequential nature of the simulator it has to pick up the transitions on C and D only one by one. It can consider the change in “C” either before or after the change in “D”. If the change in “C” is considered before the change in “D”, then, “Q” becomes “0”, else, “Q” would become “1”.
Thus, there is an unpredictability as to whether Q would be going to “0” or to “1”. Clearly, this unpredictability (or, race) occurs when a change in “D” occurs at the same time as the change in “C”.
Usually, for a full-timing gate-level simulation such situations are not very likely. However, in RTL simulations these situations are quite likely – unless, appropriate care is taken. The main reason, why RTL encounters these situations quite often is – because, typically RTL does not have any explicit delay associated.
Lets consider the following circuit, as given in the following figure:Now, for the flop: F2, the clock comes at more or less the same time as C sees a positive edge. Simultaneously, F1 also sees the clock edge, causes Q1 to get changed. This change is visible on D2. Because, either of the two paths (C --> C2) and (C --> F1--> Q1 --> D2) does not have any delay assignment, so, D2 and C2 could arrive at the same time. This causes a “race” on F2 - because the behaviour of F2 would depend on whether the change in D2 is being considered before the change in C2 or later than C2.
The “race” can be avoided if there was some determinism among the “delay”s on the two paths. Usually, it would be desired that “C2” should arrive before “D2”, else, there might be a feedthrough.
Hence, in order to avoid a “race”, there are fundamentally two requirements:
a)
There has to be a predictability in the relative delay on the two paths.
b)
The delay on the “D2” path should always be more than the delay in the “C2” path.
There are various ways of achieving the above requirements. We will examine some of these ways.
-
Depend on Delta Delay: Since “D2” path will have atleast some assignments, hence, there will be some delta-delay, which will cause the “D2” path to be longer than the “C2” path.
However, there is a potential risk with this approach. The “C2” path might also have some assignment, as, represented by the cloud in the “C2” path. This assignment could be there to model some gating or some other circuit. In such a case, the delta-delays on both the paths might be the same (thus, once again bringing in unpredictability), or, the number of deltas on the C2 path could even exceed that on the D2 path (thus, causing feedthroughs). These could be real risks, for shift-register kind of things, where, there is pretty much no logic (or, assignment) between Q1 and D1.
-
Balance Deltas: In order to get around the above issue, some designers “balance” the number of deltas on the “clock” paths. What this means is: the number of deltas from “C” to “C1” would equal the number of deltas from “C” to “C2”. Since there would be atleast one more delta from “C1” till “D2” – that would ensure that the deltas in the “D2” path always exceed the deltas in the “C2” path.
However, its not an easy job to keep counting the number of deltas across all clock-paths. Even if somebody might employ a tool to do this, actually going and inserting additional deltas (where needed) could be a painful task. So, some designers actually count the complete deltas along the two paths (i.e. C -->D2 and C -->C2), rather than just balancing the deltas along the clocks only. As long as the number of deltas along D2 > the number of deltas along C2, this problem is resolved. This would cause - a lesser number of places, where, there might be a need to insert additional deltas in the data-paths.
This still has a problem in Mixed Language designs – if the signals involved are crossing language boundaries. Since Verilog does not have the concept of delta-delays, different simulators might count deltas differently - atleast in the world of Verilog and along the language boundaries. So, in some simulators, the deltas might be more in the “D2” path, while, in other simulators, it might not be the case.
Thus once again, there is unpredictability across simulators.
-Explicit Delays: Another approach could be to explicitly insert a “very small” delay in the “D2” path. There are a few things that one has to be careful – when doing this.
It needs to be ensured that each of the “data” path should have a small delay. This delay should be small enough – that it can never be comparable to the clock time-period. As technology nodes improve, the clock frequencies keep on increasing, meaning that the clock period decreases. Thus, what was a non-risky delay-value (in the data-path) for one technology node could be a matter of concern – few years down the line. This is counter to the current trend of “Soft IPs”, where in, the same RTL is used across many different technology nodes.
Also, even if a small delay is inserted in the “data” path, if – the same small delay value is inserted across many components (assignments) on the data-path, then, the cumulative delay could be higher enough to come close to the clock period.Another minor point to be taken care of is: Synthesis tools are going to ignore the delay-assignments. So, the delay-value should be so chosen that the "correct" functionality is not really dependent on the actual value of the delay.
Thus, the need is:
-
for a delay value (on the data path), which is small enough to be NEVER at-risk of coming anywhere close to the clock period.
-
But, is predictably – higher than zero
-
And, NO delay on the clock-path.
In the world of Verilog, this is easily managed through proper usage of Non Blocking Assignments (NBAs), and, Blocking Assignments (BAs).
Since NBAs have infinitesimally small delay (similar to delta-delays), while, BAs dont have this - so, fundamentally, we want to ensure that the “data” path should have atleast one NBA, and, clock path should have NO NBAs.
While writing a code, it might be difficult to keep track of which component (or, signal) is going to be into Data-Path, and, which one into Clock-Path. Hence, following a very simple discipline should solve the problem:
All register assignments should have an NBA, and, all combinatiorial assignments should have a BA.
So, in the path to “D2”, the C1 --> Q1 path will be an NBA. Just one NBA in the path is good enough to ensure predictability, as long as there is no NBA (and, any other delay) in the “C2” path.
In the world of VHDL, such a simple discipline does not work. The closest modeling of NBA (from delta-delay perspective) is “signal”. The other alternative way of assigning values is through variables, which don’t have delta-delays, and hence, are analogous to BA. However, in VHDL all inter-process communication can take place only through signals. Variables can not be used for inter-process communication. Hence, even combinatorial devices have to be modeled using “signals” – which bring in the deltas into picture. Hence, this problem of “race” has to be managed through either delta-balancing or through explicit delay assignment in VHDL.
By the way, these days, it has been observed that some High Level Synthesis tools are putting NBAs even for combinatorial signals (in Verilog). This is done in order to maintain exact equivalence with the corresponding VHDL, where, these things were modeled as Signals, rather than Variables. In such cases also, such tools need to take extra precaution to have minimum number of combinatorial signals getting assigned using NBAs - rather than putting NBAs everywhere. Typically, these are signals which are crossing the “process” ("always" block) boundaries. Secondly, these signals should never lie on the clock path. Else, there would be an NBA on the clock path also, and, the same issue of unpredictability would still remain.
Even if deltas are taken care of in RTL, as the design moves to the gate-level, it might create some other minor issues. As clock-tree network is created, the number of deltas in the clock-tree is very likely to get modified - due to placement of several buffers. This change in the number of deltas could again change the simulation behaviour of the gate-level netlist causing a mismatch with the RTL simulation results. Typically, these situations are not encountered very often, because:
- Not many people do gate level simulation on VHDL. Also, as part of clock-tree balancing, more or less same number of components (buffers) might be inserted in each of the clock-paths, thus, causing a balance. However, if one is actually using VHDL Gate-Level simulation, one might need to count the deltas (as in RTL) to be actually sure.
- For Verilog, there is a much simple solution to this. The gate-level libraries of the cells are modeled using the same basic discipline: BA for combinatorial cells, and, NBA for sequential cells. This will once again, ensure that in the data-path, there would be atleast one NBA.
- Those who do gate-level simulation - might actually also turn-on Full Timing. If timing is ON, then, the actual delays along the two paths are anyways expected to be different. If the timings on the two paths are not different, it would anyways get flagged as setup/hold violation. When the actual timing is different, once again, delta-delays wont be able to cause a race.
Acknowledgements: Thanks to Olivier Boraud of Texas Instruments (o-boraud -at- ti -dot- com), Gunther Siegel (Gunther.Siegel -at- esterel-technologies -dot- com), Michael Buchholz(Michael.Buchholz -at- esterel-technologies -dot- com) and Olivier Allemandi of Esterel (Olivier.Allemandi -at- esterel-technologies -dot- com) for their help on this article.
谢谢分享~~