
www.Usenet.com
| <-- __Chronological__ --> | <-- __Thread__ --> |
"A.y" wrote:
> in one line what is the general procedure to extract the best
> performance(important)
> as well as runtime from the tool in case of high density designs ?
Hard to do this in one line! :-)
Performance
---------------------------------------
First, what is "performance". For some designs it is about placing the
required functionality in the smallest and least expensive device. For
others it is purely about speed. And, of course, there's the middle ground,
where give and take is king.
I'll take "performance" to mean speed, MHz, clock rate.
In general terms, I think you start exploring "best performance" when
you've hit a limit somewhere. For example, if you are doing a design one a
VII running at 20MHz there's probably little need to waste any time doing
anything beyond specifying PERIOD for good measure. As the clock rates
increase and the complexity of the design grows you may hit spots where the
"insert a coin and pull the lever" path simply does not work.
Should you hit that wall, the first place to look is your HDL. There
are ways to describe circuits that simply don't translate into an
implementation that will run very fast at all. Search the NG and 'net for
HDL coding style references.
The prior step was about HDL coding styles that don't synthesise very
well, not about choosing a particular solution or circuit, if you will.
Once you get decent synthesis, if performance isn't sufficient the next
question is: Is the chosen way to solve the problem the best (from a
performance (speed) standpoint). For example, if you are trying to add a
couple dozen values, a pipelined adder will run substantially faster than a
parallel adder (at the expense of latency).
For a new design the above two steps would proably be swapped as you'd
want to zero-in on a good approach to solving a problem first and then make
sure that the HDL implements it efficiently. For an existing design that
needs fixing, you may have to take them in the sequence I presented.
If you did the above and still can't achieve the performance requirement
for your design you need to go in at a different level.
There are simple things you can do that might make a big difference, in
no particular order:
- Do you have any FF's that should be in the IOB's?
- Increase tool effort levels
- Over constrain the PERIOD specification
- Identify false paths and "TIG" them
- Multi-cycle paths
- Going far/wide/fast? Can you insert additional FF's in the path?
- Consider device-specific resources (example: use registered
multipliers)
- Can you fold combinatorial logic into fast embedded rom lut's?
Beyond this you have to get into RPM/Floorplanning mode. I think I can
say that RPM's are the better way to go. Area constraints can be
problematic and, in an evolving design, there can be a bit of a
chicken-and-egg scenario. Hierarchically built RPM's done in HDL, of
course, is the best approach from many standpoints.
The subject of RPM's is wide and deep. In order to maximize performance
you need to acquire a full understanding of the routing resources and how to
use them. Just 'cause the layout looks good on the screen it doesn't mean
that it will run the fastest. Many, many hours (days, months, years?) of
work are required in order to fully understand this topic.
<business hat on>
Depending on your design's constraints you might be better advised to
move up to a faster device rather than undertake layout at this level.
Purists might cringe, but I think that most will agree that sometimes it is
better to spend more money on a chip and get the design out the door than to
try to optimze a design to death just for the sake of superb engineering.
<business hat off>
There's also the idea of using FPGA's properly. Something that I see
come up frequently are wide, parallel and slow designs that consume lots of
chip resources (both logic and routing). The significance of this in terms
of design performance (speed) is that these highly parallel structures do
eat-up routing and do push and shove other modules within the chip. Routing
can be complicated by this approach to a level that it could compromise
performance. Many of these wide/parallel/slow designs can be changed to
serialized high speed designs to take advantage of the fact that FPGA's can
run so darn fast. Do that and save lots of resources for fast parallel
logic. It's amazing to me how many people never take advantage of this
wonderful trick that is basicly free. A FF can run at 5 MHz just as well as
200MHz, running it slow might just be a total waste.
Tool runtime (process optimization)
---------------------------------------
This is simpler ... and not. You have five choices:
1- Simulation
2- Simulation
3- Simulation
4- Modular design
5- Incremental design
The first three look very similar. There's a message here: Don't try
to use synthesis and hardware to verify logical design and algorithms. That
very time consuming. Simulation is orders of magnitude faster. Go to
hardware when you know that the design (or design's components) work in
simulation. A full simulation isn't required, sometimes you can use test
vectors to check modules in isolation of other parts of the design.
Modular design is more appropriately used within the context of a team.
Incremental design works for single-developer mode. The idea is simple:
limit time consuming processing to modules that have changed since the last
run. Performance improvement can be dramatic.
However --there's always one of those-- it is best to start a project in one
of these modes as opposed to trying to convert an existing project. There
are very specific inter-module (and other) requirements that must be taken
into account. Check the manuals for further detail here.
Hope this helps. Sorry that I couldn't fit it in one line. Not that I
tried.
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Martin Euredjian
To send private email:
[EMAIL PROTECTED]
where
"0_0_0_0_" = "martineu"
| <-- __Chronological__ --> | <-- __Thread__ --> |