Shared implementation for block frequency analysis. More...

#include "llvm/Analysis/BlockFrequencyInfo.h"

Inheritance diagram for llvm::BlockFrequencyInfoImpl< BT >:

Collaboration diagram for llvm::BlockFrequencyInfoImpl< BT >:

Public Member Functions
	BlockFrequencyInfoImpl ()=default

const FunctionT *	getFunction () const

void	calculate (const FunctionT &F, const BranchProbabilityInfoT &BPI, const LoopInfoT &LI)

BlockFrequency	getBlockFreq (const BlockT *BB) const

Optional< uint64_t >	getBlockProfileCount (const Function &F, const BlockT *BB) const

Optional< uint64_t >	getProfileCountFromFreq (const Function &F, uint64_t Freq) const

bool	isIrrLoopHeader (const BlockT *BB)

void	setBlockFreq (const BlockT *BB, uint64_t Freq)

Scaled64	getFloatingBlockFreq (const BlockT *BB) const

const BranchProbabilityInfoT &	getBPI () const

raw_ostream &	print (raw_ostream &OS) const override
	Print the frequencies for the current function. More...

raw_ostream &	printBlockFreq (raw_ostream &OS, const BlockT *BB) const

Friends
struct	bfi_detail::BlockEdgesAdder< BT >

Detailed Description

template<class BT>
class llvm::BlockFrequencyInfoImpl< BT >

Shared implementation for block frequency analysis.

This is a shared implementation of BlockFrequencyInfo and MachineBlockFrequencyInfo, and calculates the relative frequencies of blocks.

LoopInfo defines a loop as a "non-trivial" SCC dominated by a single block, which is called the header. A given loop, L, can have sub-loops, which are loops within the subgraph of L that exclude its header. (A "trivial" SCC consists of a single block that does not have a self-edge.)

In addition to loops, this algorithm has limited support for irreducible SCCs, which are SCCs with multiple entry blocks. Irreducible SCCs are discovered on they fly, and modelled as loops with multiple headers.

The headers of irreducible sub-SCCs consist of its entry blocks and all nodes that are targets of a backedge within it (excluding backedges within true sub-loops). Block frequency calculations act as if a block is inserted that intercepts all the edges to the headers. All backedges and entries point to this block. Its successors are the headers, which split the frequency evenly.

This algorithm leverages BlockMass and ScaledNumber to maintain precision, separates mass distribution from loop scaling, and dithers to eliminate probability mass loss.

The implementation is split between BlockFrequencyInfoImpl, which knows the type of graph being modelled (BasicBlock vs. MachineBasicBlock), and BlockFrequencyInfoImplBase, which doesn't. The base class uses BlockNode, a wrapper around a uint32_t. BlockNode is numbered from 0 in reverse-post order. This gives two advantages: it's easy to compare the relative ordering of two nodes, and maps keyed on BlockT can be represented by vectors.

This algorithm is O(V+E), unless there is irreducible control flow, in which case it's O(V*E) in the worst case.

These are the main stages:

0. Reverse post-order traversal (initializeRPOT()).

Run a single post-order traversal and save it (in reverse) in RPOT. All other stages make use of this ordering. Save a lookup from BlockT to BlockNode (the index into RPOT) in Nodes.

Loop initialization (initializeLoops()).

Translate LoopInfo/MachineLoopInfo into a form suitable for the rest of the algorithm. In particular, store the immediate members of each loop in reverse post-order.
Calculate mass and scale in loops (computeMassInLoops()).

For each loop (bottom-up), distribute mass through the DAG resulting from ignoring backedges and treating sub-loops as a single pseudo-node. Track the backedge mass distributed to the loop header, and use it to calculate the loop scale (number of loop iterations). Immediate members that represent sub-loops will already have been visited and packaged into a pseudo-node.

Distributing mass in a loop is a reverse-post-order traversal through the loop. Start by assigning full mass to the Loop header. For each node in the loop:
```
- Fetch and categorize the weight distribution for its successors.
  If this is a packaged-subloop, the weight distribution is stored
  in \a LoopData::Exits.  Otherwise, fetch it from
  BranchProbabilityInfo.

- Each successor is categorized as \a Weight::Local, a local edge
  within the current loop, \a Weight::Backedge, a backedge to the
  loop header, or \a Weight::Exit, any successor outside the loop.
  The weight, the successor, and its category are stored in \a
  Distribution.  There can be multiple edges to each successor.

- If there's a backedge to a non-header, there's an irreducible SCC.
  The usual flow is temporarily aborted.  \a
  computeIrreducibleMass() finds the irreducible SCCs within the
  loop, packages them up, and restarts the flow.

- Normalize the distribution:  scale weights down so that their sum
  is 32-bits, and coalesce multiple edges to the same node.

- Distribute the mass accordingly, dithering to minimize mass loss,
  as described in \a distributeMass().
```
In the case of irreducible loops, instead of a single loop header, there will be several. The computation of backedge masses is similar but instead of having a single backedge mass, there will be one backedge per loop header. In these cases, each backedge will carry a mass proportional to the edge weights along the corresponding path.

At the end of propagation, the full mass assigned to the loop will be distributed among the loop headers proportionally according to the mass flowing through their backedges.

Finally, calculate the loop scale from the accumulated backedge mass.
Distribute mass in the function (computeMassInFunction()).

Finally, distribute mass through the DAG resulting from packaging all loops in the function. This uses the same algorithm as distributing mass in a loop, except that there are no exit or backedge edges.
Unpackage loops (unwrapLoops()).

Initialize each block's frequency to a floating point representation of its mass.

Visit loops top-down, scaling the frequencies of its immediate members by the loop's pseudo-node's frequency.
Convert frequencies to a 64-bit range (finalizeMetrics()).

Using the min and max frequencies as a guide, translate floating point frequencies to an appropriate range in uint64_t.

It has some known flaws.

The model of irreducible control flow is a rough approximation.

Modelling irreducible control flow exactly involves setting up and solving a group of infinite geometric series. Such precision is unlikely to be worthwhile, since most of our algorithms give up on irreducible control flow anyway.

Nevertheless, we might find that we need to get closer. Here's a sort of TODO list for the model with diminishing returns, to be completed as necessary.
- The headers for the LoopData representing an irreducible SCC include non-entry blocks. When these extra blocks exist, they indicate a self-contained irreducible sub-SCC. We could treat them as sub-loops, rather than arbitrarily shoving the problematic blocks into the headers of the main irreducible SCC.
- Entry frequencies are assumed to be evenly split between the headers of a given irreducible SCC, which is the only option if we need to compute mass in the SCC before its parent loop. Instead, we could partially compute mass in the parent loop, and stop when we get to the SCC. Here, we have the correct ratio of entry masses, which we can use to adjust their relative frequencies. Compute mass in the SCC, and then continue propagation in the parent.
- We can propagate mass iteratively through the SCC, for some fixed number of iterations. Each iteration starts by assigning the entry blocks their backedge mass from the prior iteration. The final mass for each block (and each exit, and the total backedge mass used for computing loop scale) is the sum of all iterations. (Running this until fixed point would "solve" the geometric series by simulation.)

Definition at line 32 of file BlockFrequencyInfo.h.

Constructor & Destructor Documentation

◆ BlockFrequencyInfoImpl()

template<class BT>

llvm::BlockFrequencyInfoImpl< BT >::BlockFrequencyInfoImpl ( )

default

Member Function Documentation

◆ calculate()

template<class BT >

void llvm::BlockFrequencyInfoImpl< BT >::calculate	(	const FunctionT &	F,
		const BranchProbabilityInfoT &	BPI,
		const LoopInfoT &	LI
	)

Definition at line 1019 of file BlockFrequencyInfoImpl.h.

◆ getBlockFreq()

template<class BT>

BlockFrequency llvm::BlockFrequencyInfoImpl< BT >::getBlockFreq ( const BlockT * BB ) const

inline

Definition at line 971 of file BlockFrequencyInfoImpl.h.

◆ getBlockProfileCount()

template<class BT>

Optional<uint64_t> llvm::BlockFrequencyInfoImpl< BT >::getBlockProfileCount	(	const Function &	F,
		const BlockT *	BB
	)		const

inline

Definition at line 975 of file BlockFrequencyInfoImpl.h.

◆ getBPI()

template<class BT>

const BranchProbabilityInfoT& llvm::BlockFrequencyInfoImpl< BT >::getBPI ( ) const

inline

Definition at line 995 of file BlockFrequencyInfoImpl.h.

◆ getFloatingBlockFreq()

template<class BT>

Scaled64 llvm::BlockFrequencyInfoImpl< BT >::getFloatingBlockFreq ( const BlockT * BB ) const

inline

Definition at line 991 of file BlockFrequencyInfoImpl.h.

◆ getFunction()

template<class BT>

const FunctionT* llvm::BlockFrequencyInfoImpl< BT >::getFunction ( ) const

inline

Definition at line 964 of file BlockFrequencyInfoImpl.h.

◆ getProfileCountFromFreq()

template<class BT>

Optional<uint64_t> llvm::BlockFrequencyInfoImpl< BT >::getProfileCountFromFreq	(	const Function &	F,
		uint64_t	Freq
	)		const

inline

Definition at line 980 of file BlockFrequencyInfoImpl.h.

◆ isIrrLoopHeader()

template<class BT>

bool llvm::BlockFrequencyInfoImpl< BT >::isIrrLoopHeader ( const BlockT * BB )

inline

Definition at line 985 of file BlockFrequencyInfoImpl.h.

◆ print()

template<class BT >

raw_ostream & llvm::BlockFrequencyInfoImpl< BT >::print ( raw_ostream & OS ) const

overridevirtual

Print the frequencies for the current function.

Prints the frequencies for the blocks in the current function.

Blocks are printed in the natural iteration order of the function, rather than reverse post-order. This provides two advantages: writing -analyze tests is easier (since blocks come out in source order), and even unreachable blocks are printed.

BlockFrequencyInfoImplBase::print() only knows reverse post-order, so we need to override it here.

Reimplemented from llvm::BlockFrequencyInfoImplBase.

Definition at line 1337 of file BlockFrequencyInfoImpl.h.

◆ printBlockFreq()

template<class BT>

raw_ostream& llvm::BlockFrequencyInfoImpl< BT >::printBlockFreq	(	raw_ostream &	OS,
		const BlockT *	BB
	)		const

inline

Definition at line 1013 of file BlockFrequencyInfoImpl.h.

◆ setBlockFreq()

template<class BT >

void llvm::BlockFrequencyInfoImpl< BT >::setBlockFreq	(	const BlockT *	BB,
		uint64_t	Freq
	)

Definition at line 1048 of file BlockFrequencyInfoImpl.h.

Friends And Related Function Documentation

◆ bfi_detail::BlockEdgesAdder< BT >

template<class BT>

friend struct bfi_detail::BlockEdgesAdder< BT >

friend

Definition at line 845 of file BlockFrequencyInfoImpl.h.

The documentation for this class was generated from the following files:

include/llvm/Analysis/BlockFrequencyInfo.h
include/llvm/Analysis/BlockFrequencyInfoImpl.h

Public Member Functions

Friends

Detailed Description

template<class BT> class llvm::BlockFrequencyInfoImpl< BT >

Constructor & Destructor Documentation

◆ BlockFrequencyInfoImpl()

Member Function Documentation

◆ calculate()

◆ getBlockFreq()

◆ getBlockProfileCount()

◆ getBPI()

◆ getFloatingBlockFreq()

◆ getFunction()

◆ getProfileCountFromFreq()

◆ isIrrLoopHeader()

◆ print()

◆ printBlockFreq()

◆ setBlockFreq()

Friends And Related Function Documentation

◆ bfi_detail::BlockEdgesAdder< BT >

template<class BT>
class llvm::BlockFrequencyInfoImpl< BT >