XCC Feature Summary
The centerpiece of the Xtensa compiler tool chain is the Xtensa C/C++ Compiler (XCC). XCC is an advanced, optimizing compiler that implements inter-procedural optimization and techniques such as software loop pipelining to extract instruction-level parallelism.
XCC provides 15-20% average performance improvements on general C code over GCC. It also improves code size up to 10% over GCC when compiling for optimal code size. Here's a PDF of a Powerpoint presentation that explains a lot more about XCC.
The XCC compiler uses a GNU compiler front-end with a customized code generation back-end targeting the compact 24-/16-bit Xtensa Instruction Set Architecture (ISA). The XCC compiler is automatically updated by the TIE Compiler to support the designer-defined instruction extensions written in the TIE language. Thus, the compiler does instruction scheduling for the TIE instructions specified by the designer and also does register allocation for the user-defined register files. The XCC compiler additionally supports Tensilica’s Flexible Length Instruction Xtensions (FLIX) and automatically extracts parallelism in the C/C++ code, packing up to 15 independent operations (limited only by opcode availability) into 32- or 64-bit VLIW instruction bundles.
Based on industry standard benchmarks, the XCC compiler generates the highest code density when compared to compilers for other 32-bit RISC architectures. To increase code execution speed and reduce code size, XCC employs sophisticated multi-level optimizations such as function inlining, software pipelining, static single assignment (SSA) optimizations, and other code generation techniques.
XCC optimizes an application’s critical performance hotspots for speed and optimizes the remainder of the application for code size using an important optimization technique known as feedback-directed optimization. Feedback-directed optimization is a two-step process where code is instrumented on the first pass of compilation, so that when it is subsequently executed using a representative input data set, it produces a file containing profiling and branching information. On the second pass, this profiling information is used to optimize application code to further reduce branch delays, improve function inlining, and reduce stack operations.
XCC is also capable of feedback-directed optimization using a hardware target, in which the instrumented code is executed on the user’s target hardware platform that has an Xtensa implementation in a FPGA or in silicon. The profiling information can be gathered in the same way using the hardware platform and is used by XCC to further optimize the code. Hardware feedback-directed optimization (on a FPGA or silicon implementation of the processor) is particularly useful to execute the application in a realistic environment that can include actual hardware peripherals and memories versus using simulation models.
Interprocedural analysis is another optimization method implemented in XCC that looks globally across all associated files of an application during link time. Global optimization is a very powerful method that examines relationships across function calls, and can perform optimizations that cannot be achieved by optimizing locally within an expression or procedure. Interprocedural analysis eliminates unneeded computations, improves function inlining, and performs alias analysis that may not be performed by less sophisticated optimization techniques.
Additionally, XCC is fully
aware of user-defined instructions generated by
Tensilica’s Instruction Extension (TIE) compiler,
which automatically updates XCC with knowledge
of user-defined instructions. There is no user
intervention during the process of modifying and
updating XCC by the TIE compiler
Demonstrated Superior Performance
The EEMBC
benchmarks provide independently certified
proof of the benefits of the XCC Compiler. The
benchmarks show that the XCC compiler generates
more efficient code than its RISC based competitors,
as verified by the EEMBC benchmark below. The “Office
Automation” benchmark compares performance
and code density on various processors, and all
results are “Out-of-the-Box” results.
This is a direct architectural comparison using
the EEMBC benchmark applications, as no TIE instructions
are included in the Tensilica code size metrics.
It is apparent Tensilica’s XCC compiler
is extremely efficient vs. compilers for other
popular RISC CPUs. Obviously, user-generated
TIE instructions would greatly increase code
density by combining many standard RISC operations
into single instructions.
| Raw
Code Size (bytes) |
4,912 |
5,908 |
13,780 |
18,540 |
| Relative
Code Size (other/XCC) |
1.00 |
1.20 |
2.80 |
3.77 |
High performance optimizing compiler delivers
much better code density when compared to other
RISC architectures.
- Graph coloring register allocater supports
live range splitting, homing and rematerialization
- Integrated instruction scheduler
- SSA-based global optimizer
- Inliner
- Loop nest optimizer
|