猿代码 — 科研/AI模型/高性能计算
0

GCC loop

摘要: -fforward-propagatePerform a forward propagation pass on RTL. The pass tries to combine two instructions and checks if the result can be simplified. If loop unrolling is active, two passes are perform ...

-fforward-propagate

Perform a forward propagation pass on RTL. The pass tries to combine two instructions and checks if the result can be simplified. If loop unrolling is active, two passes are performed and the second is scheduled after loop unrolling.

This option is enabled by default at optimization levels -O1-O2-O3-Os.



-faggressive-loop-optimizations

This option tells the loop optimizer to use language constraints to derive bounds for the number of iterations of a loop. This assumes that loop code does not invoke undefined behavior by for example causing signed integer overflows or out-of-bound array accesses. The bounds for the number of iterations of a loop are used to guide loop unrolling and peeling and loop exit test optimizations. This option is enabled by default.

-floop-unroll-and-jam

Apply unroll and jam transformations on feasible loops. In a loop nest this unrolls the outer loop by some factor and fuses the resulting multiple inner loops. This flag is enabled by default at -O3. It is also enabled by -fprofile-use and -fauto-profile.

-ftree-loop-im

Perform loop invariant motion on trees. This pass moves only invariants that are hard to handle at RTL level (function calls, operations that expand to nontrivial sequences of insns). With -funswitch-loops it also moves operands of conditions that are invariant out of the loop, so that we can use just trivial invariantness analysis in loop unswitching. The pass also includes store motion.


-ftree-loop-ivcanon

Create a canonical counter for number of iterations in loops for which determining number of iterations requires complicated analysis. Later optimizations then may determine the number easily. Useful especially in connection with unrolling.

-fsplit-ivs-in-unroller

Enables expression of values of induction variables in later iterations of the unrolled loop using the value in the first iteration. This breaks long dependency chains, thus improving efficiency of the scheduling passes.

A combination of -fweb and CSE is often sufficient to obtain the same effect. However, that is not reliable in cases where the loop body is more complicated than a single basic block. It also does not work at all on some architectures due to restrictions in the CSE pass.

This optimization is enabled by default.


-fvariable-expansion-in-unroller

With this option, the compiler creates multiple copies of some local variables when unrolling a loop, which can result in superior code.

This optimization is enabled by default for PowerPC targets, but disabled by default otherwise.


-fweb

Constructs webs as commonly used for register allocation purposes and assign each web individual pseudo register. This allows the register allocation pass to operate on pseudos directly, but also strengthens several other optimization passes, such as CSE, loop optimizer and trivial dead code remover. It can, however, make debugging impossible, since variables no longer stay in a “home register”.

Enabled by default with -funroll-loops.


-fprofile-use=path,这个是啥意思?

Enable profile feedback-directed optimizations, and the following optimizations, many of which are generally profitable only with profile feedback available:

-fbranch-probabilities  -fprofile-values
-funroll-loops  -fpeel-loops  -ftracer  -fvpt
-finline-functions  -fipa-cp  -fipa-cp-clone  -fipa-bit-cp
-fpredictive-commoning  -fsplit-loops  -funswitch-loops
-fgcse-after-reload  -ftree-loop-vectorize  -ftree-slp-vectorize
-fvect-cost-model=dynamic  -ftree-loop-distribute-patterns
-fprofile-reorder-functions

Before you can use this option, you must first generate profiling information. See Program Instrumentation Options, for information about the -fprofile-generate option.

By default, GCC emits an error message if the feedback profiles do not match the source code. This error can be turned into a warning by using -Wno-error=coverage-mismatch. Note this may result in poorly optimized code. Additionally, by default, GCC also emits a warning message if the feedback profiles do not exist (see -Wmissing-profile).

If path is specified, GCC looks at the path to find the profile feedback data files. See -fprofile-dir.

-fauto-profile
-fauto-profile=path

Enable sampling-based feedback-directed optimizations, and the following optimizations, many of which are generally profitable only with profile feedback available:

-fbranch-probabilities  -fprofile-values
-funroll-loops  -fpeel-loops  -ftracer  -fvpt
-finline-functions  -fipa-cp  -fipa-cp-clone  -fipa-bit-cp
-fpredictive-commoning  -fsplit-loops  -funswitch-loops
-fgcse-after-reload  -ftree-loop-vectorize  -ftree-slp-vectorize
-fvect-cost-model=dynamic  -ftree-loop-distribute-patterns
-fprofile-correction

path is the name of a file containing AutoFDO profile information. If omitted, it defaults to fbdata.afdo in the current directory.

Producing an AutoFDO profile data file requires running your program with the perf utility on a supported GNU/Linux target system. For more information, see https://perf.wiki.kernel.org/.


E.g.

perf record -e br_inst_retired:near_taken -b -o perf.data \
    -- your_program

Then use the create_gcov tool to convert the raw profile data to a format that can be used by GCC.  You must also supply the unstripped binary for your program to this tool. See https://github.com/google/autofdo.


E.g.

create_gcov --binary=your_program.unstripped --profile=perf.data \
    --gcov=profile.afdo

The following options control compiler behavior regarding floating-point arithmetic. These options trade off between speed and correctness. All must be specifically enabled.


-frename-registers

Attempt to avoid false dependencies in scheduled code by making use of registers left over after register allocation. This optimization most benefits processors with lots of registers. Depending on the debug information format adopted by the target, however, it can make debugging impossible, since variables no longer stay in a “home register”.

Enabled by default with -funroll-loops.

-funroll-loops

Unroll loops whose number of iterations can be determined at compile time or upon entry to the loop. -funroll-loops implies -frerun-cse-after-loop-fweb and -frename-registers. It also turns on complete loop peeling (i.e. complete removal of loops with a small constant number of iterations). This option makes code larger, and may or may not make it run faster.

Enabled by -fprofile-use and -fauto-profile.


-funroll-all-loops

Unroll all loops, even if their number of iterations is uncertain when the loop is entered. This usually makes programs run more slowly. -funroll-all-loops implies the same options as -funroll-loops.









































































































上一篇:GCC -Ofast下一篇:GCC源码下载地址

说点什么...

已有0条评论

最新评论...

本文作者
2024-2-7 23:16
  • 0
    粉丝
  • 193
    阅读
  • 0
    回复
资讯幻灯片
热门评论
热门专题
排行榜
Copyright   ©2015-2023   猿代码-超算人才智造局 高性能计算|并行计算|人工智能      ( 京ICP备2021026424号-2 )