bb-reorder.cc source code [gcc/bb-reorder.cc]

1	/ Basic block reordering routines for the GNU compiler.*
2	Copyright (C) 2000-2023 Free Software Foundation, Inc.
3
4	This file is part of GCC.
5
6	GCC is free software; you can redistribute it and/or modify it
7	under the terms of the GNU General Public License as published by
8	the Free Software Foundation; either version 3, or (at your option)
9	any later version.
10
11	GCC is distributed in the hope that it will be useful, but WITHOUT
12	ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
13	or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public
14	License for more details.
15
16	You should have received a copy of the GNU General Public License
17	along with GCC; see the file COPYING3. If not see
18	<http://www.gnu.org/licenses/>. /*
19
20	/ This file contains the "reorder blocks" pass, which changes the control*
21	flow of a function to encounter fewer branches; the "partition blocks"
22	pass, which divides the basic blocks into "hot" and "cold" partitions,
23	which are kept separate; and the "duplicate computed gotos" pass, which
24	duplicates blocks ending in an indirect jump.
25
26	There are two algorithms for "reorder blocks": the "simple" algorithm,
27	which just rearranges blocks, trying to minimize the number of executed
28	unconditional branches; and the "software trace cache" algorithm, which
29	also copies code, and in general tries a lot harder to have long linear
30	pieces of machine code executed. This algorithm is described next. /*
31
32	/ This (greedy) algorithm constructs traces in several rounds.*
33	The construction starts from "seeds". The seed for the first round
34	is the entry point of the function. When there are more than one seed,
35	the one with the lowest key in the heap is selected first (see bb_to_key).
36	Then the algorithm repeatedly adds the most probable successor to the end
37	of a trace. Finally it connects the traces.
38
39	There are two parameters: Branch Threshold and Exec Threshold.
40	If the probability of an edge to a successor of the current basic block is
41	lower than Branch Threshold or its count is lower than Exec Threshold,
42	then the successor will be the seed in one of the next rounds.
43	Each round has these parameters lower than the previous one.
44	The last round has to have these parameters set to zero so that the
45	remaining blocks are picked up.
46
47	The algorithm selects the most probable successor from all unvisited
48	successors and successors that have been added to this trace.
49	The other successors (that has not been "sent" to the next round) will be
50	other seeds for this round and the secondary traces will start from them.
51	If the successor has not been visited in this trace, it is added to the
52	trace (however, there is some heuristic for simple branches).
53	If the successor has been visited in this trace, a loop has been found.
54	If the loop has many iterations, the loop is rotated so that the source
55	block of the most probable edge going out of the loop is the last block
56	of the trace.
57	If the loop has few iterations and there is no edge from the last block of
58	the loop going out of the loop, the loop header is duplicated.
59
60	When connecting traces, the algorithm first checks whether there is an edge
61	from the last block of a trace to the first block of another trace.
62	When there are still some unconnected traces it checks whether there exists
63	a basic block BB such that BB is a successor of the last block of a trace
64	and BB is a predecessor of the first block of another trace. In this case,
65	BB is duplicated, added at the end of the first trace and the traces are
66	connected through it.
67	The rest of traces are simply connected so there will be a jump to the
68	beginning of the rest of traces.
69
70	The above description is for the full algorithm, which is used when the
71	function is optimized for speed. When the function is optimized for size,
72	in order to reduce long jumps and connect more fallthru edges, the
73	algorithm is modified as follows:
74	(1) Break long traces to short ones. A trace is broken at a block that has
75	multiple predecessors/ successors during trace discovery. When connecting
76	traces, only connect Trace n with Trace n + 1. This change reduces most
77	long jumps compared with the above algorithm.
78	(2) Ignore the edge probability and count for fallthru edges.
79	(3) Keep the original order of blocks when there is no chance to fall
80	through. We rely on the results of cfg_cleanup.
81
82	To implement the change for code size optimization, block's index is
83	selected as the key and all traces are found in one round.
84
85	References:
86
87	"Software Trace Cache"
88	A. Ramirez, J. Larriba-Pey, C. Navarro, J. Torrellas and M. Valero; 1999
89	http://citeseer.nj.nec.com/15361.html
90
91	*/
92
93	#include "config.h"
94	#include "system.h"
95	#include "coretypes.h"
96	#include "backend.h"
97	#include "target.h"
98	#include "rtl.h"
99	#include "tree.h"
100	#include "cfghooks.h"
101	#include "df.h"
102	#include "memmodel.h"
103	#include "optabs.h"
104	#include "regs.h"
105	#include "emit-rtl.h"
106	#include "output.h"
107	#include "expr.h"
108	#include "tree-pass.h"
109	#include "cfgrtl.h"
110	#include "cfganal.h"
111	#include "cfgbuild.h"
112	#include "cfgcleanup.h"
113	#include "bb-reorder.h"
114	#include "except.h"
115	#include "alloc-pool.h"
116	#include "fibonacci_heap.h"
117	#include "stringpool.h"
118	#include "attribs.h"
119	#include "common/common-target.h"
120
121	/ The number of rounds. In most cases there will only be 4 rounds, but*
122	when partitioning hot and cold basic blocks into separate sections of
123	the object file there will be an extra round. /*
124	#define N_ROUNDS 5
125
126	struct target_bb_reorder default_target_bb_reorder;
127	#if SWITCHABLE_TARGET
128	struct target_bb_reorder *this_target_bb_reorder = &default_target_bb_reorder;
129	#endif
130
131	#define uncond_jump_length \
132	(this_target_bb_reorder->x_uncond_jump_length)
133
134	/ Branch thresholds in thousandths (per mille) of the REG_BR_PROB_BASE. /
135	static const int branch_threshold[N_ROUNDS] = {`400`, `200`, `100`, `0`, `0`};
136
137	/ Exec thresholds in thousandths (per mille) of the count of bb 0. /
138	static const int exec_threshold[N_ROUNDS] = {`500`, `200`, `50`, `0`, `0`};
139
140	/ If edge count is lower than DUPLICATION_THRESHOLD per mille of entry*
141	block the edge destination is not duplicated while connecting traces. /*
142	#define DUPLICATION_THRESHOLD 100
143
144	typedef fibonacci_heap <long, basic_block_def> bb_heap_t;
145	typedef fibonacci_node <long, basic_block_def> bb_heap_node_t;
146
147	/ Structure to hold needed information for each basic block. /
148	struct bbro_basic_block_data
149	{
150	/ Which trace is the bb start of (-1 means it is not a start of any). /
151	int start_of_trace;
152
153	/ Which trace is the bb end of (-1 means it is not an end of any). /
154	int end_of_trace;
155
156	/ Which trace is the bb in? /
157	int in_trace;
158
159	/ Which trace was this bb visited in? /
160	int visited;
161
162	/ Cached maximum frequency of interesting incoming edges.*
163	Minus one means not yet computed. /*
164	int priority;
165
166	/ Which heap is BB in (if any)? /
167	bb_heap_t *heap;
168
169	/ Which heap node is BB in (if any)? /
170	bb_heap_node_t *node;
171	};
172
173	/ The current size of the following dynamic array. /
174	static int array_size;
175
176	/ The array which holds needed information for basic blocks. /
177	static bbro_basic_block_data *bbd;
178
179	/ To avoid frequent reallocation the size of arrays is greater than needed,*
180	the number of elements is (not less than) 1.25 size_wanted. /
181	#define GET_ARRAY_SIZE(X) ((((X) / 4) + 1) * 5)
182
183	/ Free the memory and set the pointer to NULL. /
184	#define FREE(P) (gcc_assert (P), free (P), P = 0)
185
186	/ Structure for holding information about a trace. /
187	struct trace
188	{
189	/ First and last basic block of the trace. /
190	basic_block first, last;
191
192	/ The round of the STC creation which this trace was found in. /
193	int round;
194
195	/ The length (i.e. the number of basic blocks) of the trace. /
196	int length;
197	};
198
199	/ Maximum count of one of the entry blocks. /
200	static profile_count max_entry_count;
201
202	/ Local function prototypes. /
203	static void find_traces_1_round (int, profile_count, struct trace , int* *,
204	int, bb_heap_t *, int*);
205	static basic_block copy_bb (basic_block, edge, basic_block, int);
206	static long bb_to_key (basic_block);
207	static bool better_edge_p (const_basic_block, const_edge, profile_probability,
208	profile_count, profile_probability, profile_count,
209	const_edge);
210	static bool copy_bb_p (const_basic_block, int);
211
212	/ Return the trace number in which BB was visited. /
213
214	static int
215	bb_visited_trace (const_basic_block bb)
216	{
217	gcc_assert (bb->index < array_size);
218	return bbd[bb->index].visited;
219	}
220
221	/ This function marks BB that it was visited in trace number TRACE. /
222
223	static void
224	mark_bb_visited (basic_block bb, int trace)
225	{
226	bbd[bb->index].visited = trace;
227	if (bbd[bb->index].heap)
228	{
229	bbd[bb->index].heap->delete_node (node: bbd[bb->index].node);
230	bbd[bb->index].heap = NULL;
231	bbd[bb->index].node = NULL;
232	}
233	}
234
235	/ Check to see if bb should be pushed into the next round of trace*
236	collections or not. Reasons for pushing the block forward are 1).
237	If the block is cold, we are doing partitioning, and there will be
238	another round (cold partition blocks are not supposed to be
239	collected into traces until the very last round); or 2). There will
240	be another round, and the basic block is not "hot enough" for the
241	current round of trace collection. /*
242
243	static bool
244	push_to_next_round_p (const_basic_block bb, int round, int number_of_rounds,
245	profile_count count_th)
246	{
247	bool there_exists_another_round;
248	bool block_not_hot_enough;
249
250	there_exists_another_round = round < number_of_rounds - `1`;
251
252	block_not_hot_enough = (bb->count < count_th
253	\|\| probably_never_executed_bb_p (cfun, bb));
254
255	if (there_exists_another_round
256	&& block_not_hot_enough)
257	return true;
258	else
259	return false;
260	}
261
262	/ Find the traces for Software Trace Cache. Chain each trace through*
263	RBI()->next. Store the number of traces to N_TRACES and description of
264	traces to TRACES. /*
265
266	static void
267	find_traces (int n_traces, struct* trace *traces)
268	{
269	int i;
270	int number_of_rounds;
271	edge e;
272	edge_iterator ei;
273	bb_heap_t heap = new* bb_heap_t (LONG_MIN);
274
275	/ Add one extra round of trace collection when partitioning hot/cold*
276	basic blocks into separate sections. The last round is for all the
277	cold blocks (and ONLY the cold blocks). /*
278
279	number_of_rounds = N_ROUNDS - `1`;
280
281	/ Insert entry points of function into heap. /
282	max_entry_count = profile_count::zero ();
283	FOR_EACH_EDGE (e, ei, ENTRY_BLOCK_PTR_FOR_FN (cfun)->succs)
284	{
285	bbd[e->dest->index].heap = heap;
286	bbd[e->dest->index].node = heap->insert (key: bb_to_key (e->dest), data: e->dest);
287	if (e->dest->count > max_entry_count)
288	max_entry_count = e->dest->count;
289	}
290
291	/ Find the traces. /
292	for (i = `0`; i < number_of_rounds; i++)
293	{
294	profile_count count_threshold;
295
296	if (dump_file)
297	fprintf (stream: dump_file, format: "STC - round %d\n", i + `1`);
298
299	count_threshold = max_entry_count.apply_scale (num: exec_threshold[i], den: `1000`);
300
301	find_traces_1_round (REG_BR_PROB_BASE * branch_threshold[i] / `1000`,
302	count_threshold, traces, n_traces, i, &heap,
303	number_of_rounds);
304	}
305	delete heap;
306
307	if (dump_file)
308	{
309	for (i = `0`; i < *n_traces; i++)
310	{
311	basic_block bb;
312	fprintf (stream: dump_file, format: "Trace %d (round %d): ", i + `1`,
313	traces[i].round + `1`);
314	for (bb = traces[i].first;
315	bb != traces[i].last;
316	bb = (basic_block) bb->aux)
317	{
318	fprintf (stream: dump_file, format: "%d [", bb->index);
319	bb->count.dump (f: dump_file);
320	fprintf (stream: dump_file, format: "] ");
321	}
322	fprintf (stream: dump_file, format: "%d [", bb->index);
323	bb->count.dump (f: dump_file);
324	fprintf (stream: dump_file, format: "]\n");
325	}
326	fflush (stream: dump_file);
327	}
328	}
329
330	/ Rotate loop whose back edge is BACK_EDGE in the tail of trace TRACE*
331	(with sequential number TRACE_N). /*
332
333	static basic_block
334	rotate_loop (edge back_edge, struct trace trace, int* trace_n)
335	{
336	basic_block bb;
337
338	/ Information about the best end (end after rotation) of the loop. /
339	basic_block best_bb = NULL;
340	edge best_edge = NULL;
341	profile_count best_count = profile_count::uninitialized ();
342	/ The best edge is preferred when its destination is not visited yet*
343	or is a start block of some trace. /*
344	bool is_preferred = false;
345
346	/ Find the most frequent edge that goes out from current trace. /
347	bb = back_edge->dest;
348	do
349	{
350	edge e;
351	edge_iterator ei;
352
353	FOR_EACH_EDGE (e, ei, bb->succs)
354	if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
355	&& bb_visited_trace (bb: e->dest) != trace_n
356	&& (e->flags & EDGE_CAN_FALLTHRU)
357	&& !(e->flags & EDGE_COMPLEX))
358	{
359	if (is_preferred)
360	{
361	/ The best edge is preferred. /
362	if (!bb_visited_trace (bb: e->dest)
363	\|\| bbd[e->dest->index].start_of_trace >= `0`)
364	{
365	/ The current edge E is also preferred. /
366	if (e->count () > best_count)
367	{
368	best_count = e->count ();
369	best_edge = e;
370	best_bb = bb;
371	}
372	}
373	}
374	else
375	{
376	if (!bb_visited_trace (bb: e->dest)
377	\|\| bbd[e->dest->index].start_of_trace >= `0`)
378	{
379	/ The current edge E is preferred. /
380	is_preferred = true;
381	best_count = e->count ();
382	best_edge = e;
383	best_bb = bb;
384	}
385	else
386	{
387	if (!best_edge \|\| e->count () > best_count)
388	{
389	best_count = e->count ();
390	best_edge = e;
391	best_bb = bb;
392	}
393	}
394	}
395	}
396	bb = (basic_block) bb->aux;
397	}
398	while (bb != back_edge->dest);
399
400	if (best_bb)
401	{
402	/ Rotate the loop so that the BEST_EDGE goes out from the last block of*
403	the trace. /*
404	if (back_edge->dest == trace->first)
405	{
406	trace->first = (basic_block) best_bb->aux;
407	}
408	else
409	{
410	basic_block prev_bb;
411
412	for (prev_bb = trace->first;
413	prev_bb->aux != back_edge->dest;
414	prev_bb = (basic_block) prev_bb->aux)
415	;
416	prev_bb->aux = best_bb->aux;
417
418	/ Try to get rid of uncond jump to cond jump. /
419	if (single_succ_p (bb: prev_bb))
420	{
421	basic_block header = single_succ (bb: prev_bb);
422
423	/ Duplicate HEADER if it is a small block containing cond jump*
424	in the end. /*
425	if (any_condjump_p (BB_END (header)) && copy_bb_p (header, `0`)
426	&& !CROSSING_JUMP_P (BB_END (header)))
427	copy_bb (header, single_succ_edge (bb: prev_bb), prev_bb, trace_n);
428	}
429	}
430	}
431	else
432	{
433	/ We have not found suitable loop tail so do no rotation. /
434	best_bb = back_edge->src;
435	}
436	best_bb->aux = NULL;
437	return best_bb;
438	}
439
440	/ One round of finding traces. Find traces for BRANCH_TH and EXEC_TH i.e. do*
441	not include basic blocks whose probability is lower than BRANCH_TH or whose
442	count is lower than EXEC_TH into traces (or whose count is lower than
443	COUNT_TH). Store the new traces into TRACES and modify the number of
444	traces N_TRACES. Set the round (which the trace belongs to) to ROUND.*
445	The function expects starting basic blocks to be in HEAP and will delete*
446	HEAP and store starting points for the next round into new HEAP. */
447
448	static void
449	find_traces_1_round (int branch_th, profile_count count_th,
450	struct trace traces, int* n_traces, int* round,
451	bb_heap_t *heap, int* number_of_rounds)
452	{
453	/ Heap for discarded basic blocks which are possible starting points for*
454	the next round. /*
455	bb_heap_t new_heap = new* bb_heap_t (LONG_MIN);
456	bool for_size = optimize_function_for_size_p (cfun);
457
458	while (!(*heap)->empty ())
459	{
460	basic_block bb;
461	struct trace *trace;
462	edge best_edge, e;
463	long key;
464	edge_iterator ei;
465
466	bb = (*heap)->extract_min ();
467	bbd[bb->index].heap = NULL;
468	bbd[bb->index].node = NULL;
469
470	if (dump_file)
471	fprintf (stream: dump_file, format: "Getting bb %d\n", bb->index);
472
473	/ If the BB's count is too low, send BB to the next round. When*
474	partitioning hot/cold blocks into separate sections, make sure all
475	the cold blocks (and ONLY the cold blocks) go into the (extra) final
476	round. When optimizing for size, do not push to next round. /*
477
478	if (!for_size
479	&& push_to_next_round_p (bb, round, number_of_rounds,
480	count_th))
481	{
482	int key = bb_to_key (bb);
483	bbd[bb->index].heap = new_heap;
484	bbd[bb->index].node = new_heap->insert (key, data: bb);
485
486	if (dump_file)
487	fprintf (stream: dump_file,
488	format: " Possible start point of next round: %d (key: %d)\n",
489	bb->index, key);
490	continue;
491	}
492
493	trace = traces + *n_traces;
494	trace->first = bb;
495	trace->round = round;
496	trace->length = `0`;
497	bbd[bb->index].in_trace = *n_traces;
498	(*n_traces)++;
499
500	do
501	{
502	bool ends_in_call;
503
504	/ The probability and count of the best edge. /
505	profile_probability best_prob = profile_probability::uninitialized ();
506	profile_count best_count = profile_count::uninitialized ();
507
508	best_edge = NULL;
509	mark_bb_visited (bb, trace: *n_traces);
510	trace->length++;
511
512	if (dump_file)
513	fprintf (stream: dump_file, format: "Basic block %d was visited in trace %d\n",
514	bb->index, *n_traces);
515
516	ends_in_call = block_ends_with_call_p (bb);
517
518	/ Select the successor that will be placed after BB. /
519	FOR_EACH_EDGE (e, ei, bb->succs)
520	{
521	gcc_assert (!(e->flags & EDGE_FAKE));
522
523	if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
524	continue;
525
526	if (bb_visited_trace (bb: e->dest)
527	&& bb_visited_trace (bb: e->dest) != *n_traces)
528	continue;
529
530	/ If partitioning hot/cold basic blocks, don't consider edges*
531	that cross section boundaries. /*
532	if (BB_PARTITION (e->dest) != BB_PARTITION (bb))
533	continue;
534
535	profile_probability prob = e->probability;
536	profile_count count = e->dest->count;
537
538	/ The only sensible preference for a call instruction is the*
539	fallthru edge. Don't bother selecting anything else. /*
540	if (ends_in_call)
541	{
542	if (e->flags & EDGE_CAN_FALLTHRU)
543	{
544	best_edge = e;
545	best_prob = prob;
546	best_count = count;
547	}
548	continue;
549	}
550
551	/ Edge that cannot be fallthru or improbable or infrequent*
552	successor (i.e. it is unsuitable successor). When optimizing
553	for size, ignore the probability and count. /*
554	if (!(e->flags & EDGE_CAN_FALLTHRU) \|\| (e->flags & EDGE_COMPLEX)
555	\|\| !prob.initialized_p ()
556	\|\| ((prob.to_reg_br_prob_base () < branch_th
557	\|\| e->count () < count_th) && (!for_size)))
558	continue;
559
560	if (better_edge_p (bb, e, prob, count, best_prob, best_count,
561	best_edge))
562	{
563	best_edge = e;
564	best_prob = prob;
565	best_count = count;
566	}
567	}
568
569	/ If the best destination has multiple predecessors and can be*
570	duplicated cheaper than a jump, don't allow it to be added to
571	a trace; we'll duplicate it when connecting the traces later.
572	However, we need to check that this duplication wouldn't leave
573	the best destination with only crossing predecessors, because
574	this would change its effective partition from hot to cold. /*
575	if (best_edge
576	&& EDGE_COUNT (best_edge->dest->preds) >= `2`
577	&& copy_bb_p (best_edge->dest, `0`))
578	{
579	bool only_crossing_preds = true;
580	edge e;
581	edge_iterator ei;
582	FOR_EACH_EDGE (e, ei, best_edge->dest->preds)
583	if (e != best_edge && !(e->flags & EDGE_CROSSING))
584	{
585	only_crossing_preds = false;
586	break;
587	}
588	if (!only_crossing_preds)
589	best_edge = NULL;
590	}
591
592	/ If the best destination has multiple successors or predecessors,*
593	don't allow it to be added when optimizing for size. This makes
594	sure predecessors with smaller index are handled before the best
595	destination. It breaks long trace and reduces long jumps.
596
597	Take if-then-else as an example.
598	A
599	/ \
600	B C
601	\ /
602	D
603	If we do not remove the best edge B->D/C->D, the final order might
604	be A B D ... C. C is at the end of the program. If D's successors
605	and D are complicated, might need long jumps for A->C and C->D.
606	Similar issue for order: A C D ... B.
607
608	After removing the best edge, the final result will be ABCD/ ACBD.
609	It does not add jump compared with the previous order. But it
610	reduces the possibility of long jumps. /*
611	if (best_edge && for_size
612	&& (EDGE_COUNT (best_edge->dest->succs) > `1`
613	\|\| EDGE_COUNT (best_edge->dest->preds) > `1`))
614	best_edge = NULL;
615
616	/ Add all non-selected successors to the heaps. /
617	FOR_EACH_EDGE (e, ei, bb->succs)
618	{
619	if (e == best_edge
620	\|\| e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
621	\|\| bb_visited_trace (bb: e->dest))
622	continue;
623
624	key = bb_to_key (e->dest);
625
626	if (bbd[e->dest->index].heap)
627	{
628	/ E->DEST is already in some heap. /
629	if (key != bbd[e->dest->index].node->get_key ())
630	{
631	if (dump_file)
632	{
633	fprintf (stream: dump_file,
634	format: "Changing key for bb %d from %ld to %ld.\n",
635	e->dest->index,
636	(long) bbd[e->dest->index].node->get_key (),
637	key);
638	}
639	bbd[e->dest->index].heap->replace_key
640	(node: bbd[e->dest->index].node, key);
641	}
642	}
643	else
644	{
645	bb_heap_t which_heap = heap;
646
647	profile_probability prob = e->probability;
648
649	if (!(e->flags & EDGE_CAN_FALLTHRU)
650	\|\| (e->flags & EDGE_COMPLEX)
651	\|\| !prob.initialized_p ()
652	\|\| prob.to_reg_br_prob_base () < branch_th
653	\|\| e->count () < count_th)
654	{
655	/ When partitioning hot/cold basic blocks, make sure*
656	the cold blocks (and only the cold blocks) all get
657	pushed to the last round of trace collection. When
658	optimizing for size, do not push to next round. /*
659
660	if (!for_size && push_to_next_round_p (bb: e->dest, round,
661	number_of_rounds,
662	count_th))
663	which_heap = new_heap;
664	}
665
666	bbd[e->dest->index].heap = which_heap;
667	bbd[e->dest->index].node = which_heap->insert (key, data: e->dest);
668
669	if (dump_file)
670	{
671	fprintf (stream: dump_file,
672	format: " Possible start of %s round: %d (key: %ld)\n",
673	(which_heap == new_heap) ? "next" : "this",
674	e->dest->index, (long) key);
675	}
676
677	}
678	}
679
680	if (best_edge) / Suitable successor was found. /
681	{
682	if (bb_visited_trace (bb: best_edge->dest) == *n_traces)
683	{
684	/ We do nothing with one basic block loops. /
685	if (best_edge->dest != bb)
686	{
687	if (best_edge->count ()
688	> best_edge->dest->count.apply_scale (num: `4`, den: `5`))
689	{
690	/ The loop has at least 4 iterations. If the loop*
691	header is not the first block of the function
692	we can rotate the loop. /*
693
694	if (best_edge->dest
695	!= ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
696	{
697	if (dump_file)
698	{
699	fprintf (stream: dump_file,
700	format: "Rotating loop %d - %d\n",
701	best_edge->dest->index, bb->index);
702	}
703	bb->aux = best_edge->dest;
704	bbd[best_edge->dest->index].in_trace =
705	(*n_traces) - `1`;
706	bb = rotate_loop (back_edge: best_edge, trace, trace_n: *n_traces);
707	}
708	}
709	else
710	{
711	/ The loop has less than 4 iterations. /
712
713	if (single_succ_p (bb)
714	&& copy_bb_p (best_edge->dest,
715	optimize_edge_for_speed_p
716	(best_edge)))
717	{
718	bb = copy_bb (best_edge->dest, best_edge, bb,
719	*n_traces);
720	trace->length++;
721	}
722	}
723	}
724
725	/ Terminate the trace. /
726	break;
727	}
728	else
729	{
730	/ Check for a situation*
731
732	A
733	/\|
734	B \|
735	\\|
736	C
737
738	where
739	AB->count () + BC->count () >= AC->count ().
740	(i.e. 2 B->count >= AC->count )*
741	Best ordering is then A B C.
742
743	When optimizing for size, A B C is always the best order.
744
745	This situation is created for example by:
746
747	if (A) B;
748	C;
749
750	*/
751
752	FOR_EACH_EDGE (e, ei, bb->succs)
753	if (e != best_edge
754	&& (e->flags & EDGE_CAN_FALLTHRU)
755	&& !(e->flags & EDGE_COMPLEX)
756	&& !bb_visited_trace (bb: e->dest)
757	&& single_pred_p (bb: e->dest)
758	&& !(e->flags & EDGE_CROSSING)
759	&& single_succ_p (bb: e->dest)
760	&& (single_succ_edge (bb: e->dest)->flags
761	& EDGE_CAN_FALLTHRU)
762	&& !(single_succ_edge (bb: e->dest)->flags & EDGE_COMPLEX)
763	&& single_succ (bb: e->dest) == best_edge->dest
764	&& (e->dest->count * `2`
765	>= best_edge->count () \|\| for_size))
766	{
767	best_edge = e;
768	if (dump_file)
769	fprintf (stream: dump_file, format: "Selecting BB %d\n",
770	best_edge->dest->index);
771	break;
772	}
773
774	bb->aux = best_edge->dest;
775	bbd[best_edge->dest->index].in_trace = (*n_traces) - `1`;
776	bb = best_edge->dest;
777	}
778	}
779	}
780	while (best_edge);
781	trace->last = bb;
782	bbd[trace->first->index].start_of_trace = *n_traces - `1`;
783	if (bbd[trace->last->index].end_of_trace != *n_traces - `1`)
784	{
785	bbd[trace->last->index].end_of_trace = *n_traces - `1`;
786	/ Update the cached maximum frequency for interesting predecessor*
787	edges for successors of the new trace end. /*
788	FOR_EACH_EDGE (e, ei, trace->last->succs)
789	if (EDGE_FREQUENCY (e) > bbd[e->dest->index].priority)
790	bbd[e->dest->index].priority = EDGE_FREQUENCY (e);
791	}
792
793	/ The trace is terminated so we have to recount the keys in heap*
794	(some block can have a lower key because now one of its predecessors
795	is an end of the trace). /*
796	FOR_EACH_EDGE (e, ei, bb->succs)
797	{
798	if (e->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
799	\|\| bb_visited_trace (bb: e->dest))
800	continue;
801
802	if (bbd[e->dest->index].heap)
803	{
804	key = bb_to_key (e->dest);
805	if (key != bbd[e->dest->index].node->get_key ())
806	{
807	if (dump_file)
808	{
809	fprintf (stream: dump_file,
810	format: "Changing key for bb %d from %ld to %ld.\n",
811	e->dest->index,
812	(long) bbd[e->dest->index].node->get_key (), key);
813	}
814	bbd[e->dest->index].heap->replace_key
815	(node: bbd[e->dest->index].node, key);
816	}
817	}
818	}
819	}
820
821	delete (*heap);
822
823	/ "Return" the new heap. /
824	*heap = new_heap;
825	}
826
827	/ Create a duplicate of the basic block OLD_BB and redirect edge E to it, add*
828	it to trace after BB, mark OLD_BB visited and update pass' data structures
829	(TRACE is a number of trace which OLD_BB is duplicated to). /*
830
831	static basic_block
832	copy_bb (basic_block old_bb, edge e, basic_block bb, int trace)
833	{
834	basic_block new_bb;
835
836	new_bb = duplicate_block (old_bb, e, bb);
837	BB_COPY_PARTITION (new_bb, old_bb);
838
839	gcc_assert (e->dest == new_bb);
840
841	if (dump_file)
842	fprintf (stream: dump_file,
843	format: "Duplicated bb %d (created bb %d)\n",
844	old_bb->index, new_bb->index);
845
846	if (new_bb->index >= array_size
847	\|\| last_basic_block_for_fn (cfun) > array_size)
848	{
849	int i;
850	int new_size;
851
852	new_size = MAX (last_basic_block_for_fn (cfun), new_bb->index + `1`);
853	new_size = GET_ARRAY_SIZE (new_size);
854	bbd = XRESIZEVEC (bbro_basic_block_data, bbd, new_size);
855	for (i = array_size; i < new_size; i++)
856	{
857	bbd[i].start_of_trace = -`1`;
858	bbd[i].end_of_trace = -`1`;
859	bbd[i].in_trace = -`1`;
860	bbd[i].visited = `0`;
861	bbd[i].priority = -`1`;
862	bbd[i].heap = NULL;
863	bbd[i].node = NULL;
864	}
865	array_size = new_size;
866
867	if (dump_file)
868	{
869	fprintf (stream: dump_file,
870	format: "Growing the dynamic array to %d elements.\n",
871	array_size);
872	}
873	}
874
875	gcc_assert (!bb_visited_trace (e->dest));
876	mark_bb_visited (bb: new_bb, trace);
877	new_bb->aux = bb->aux;
878	bb->aux = new_bb;
879
880	bbd[new_bb->index].in_trace = trace;
881
882	return new_bb;
883	}
884
885	/ Compute and return the key (for the heap) of the basic block BB. /
886
887	static long
888	bb_to_key (basic_block bb)
889	{
890	edge e;
891	edge_iterator ei;
892
893	/ Use index as key to align with its original order. /
894	if (optimize_function_for_size_p (cfun))
895	return bb->index;
896
897	/ Do not start in probably never executed blocks. /
898
899	if (BB_PARTITION (bb) == BB_COLD_PARTITION
900	\|\| probably_never_executed_bb_p (cfun, bb))
901	return BB_FREQ_MAX;
902
903	/ Prefer blocks whose predecessor is an end of some trace*
904	or whose predecessor edge is EDGE_DFS_BACK. /*
905	int priority = bbd[bb->index].priority;
906	if (priority == -`1`)
907	{
908	priority = `0`;
909	FOR_EACH_EDGE (e, ei, bb->preds)
910	{
911	if ((e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
912	&& bbd[e->src->index].end_of_trace >= `0`)
913	\|\| (e->flags & EDGE_DFS_BACK))
914	{
915	int edge_freq = EDGE_FREQUENCY (e);
916
917	if (edge_freq > priority)
918	priority = edge_freq;
919	}
920	}
921	bbd[bb->index].priority = priority;
922	}
923
924	if (priority)
925	/ The block with priority should have significantly lower key. /
926	return -(`100` * BB_FREQ_MAX + `100` * priority + bb->count.to_frequency (cfun));
927
928	return -bb->count.to_frequency (cfun);
929	}
930
931	/ Return true when the edge E from basic block BB is better than the temporary*
932	best edge (details are in function). The probability of edge E is PROB. The
933	count of the successor is COUNT. The current best probability is
934	BEST_PROB, the best count is BEST_COUNT.
935	The edge is considered to be equivalent when PROB does not differ much from
936	BEST_PROB; similarly for count. /*
937
938	static bool
939	better_edge_p (const_basic_block bb, const_edge e, profile_probability prob,
940	profile_count count, profile_probability best_prob,
941	profile_count best_count, const_edge cur_best_edge)
942	{
943	bool is_better_edge;
944
945	/ The BEST_* values do not have to be best, but can be a bit smaller than*
946	maximum values. /*
947	profile_probability diff_prob = best_prob / `10`;
948
949	/ The smaller one is better to keep the original order. /
950	if (optimize_function_for_size_p (cfun))
951	return !cur_best_edge
952	\|\| cur_best_edge->dest->index > e->dest->index;
953
954	/ Those edges are so expensive that continuing a trace is not useful*
955	performance wise. /*
956	if (e->flags & (EDGE_ABNORMAL \| EDGE_EH))
957	return false;
958
959	if (prob > best_prob + diff_prob
960	\|\| (!best_prob.initialized_p ()
961	&& prob > profile_probability::guessed_never ()))
962	/ The edge has higher probability than the temporary best edge. /
963	is_better_edge = true;
964	else if (prob < best_prob - diff_prob)
965	/ The edge has lower probability than the temporary best edge. /
966	is_better_edge = false;
967	else
968	{
969	profile_count diff_count = best_count / `10`;
970	if (count < best_count - diff_count
971	\|\| (!best_count.initialized_p ()
972	&& count.nonzero_p ()))
973	/ The edge and the temporary best edge have almost equivalent*
974	probabilities. The higher countuency of a successor now means
975	that there is another edge going into that successor.
976	This successor has lower countuency so it is better. /*
977	is_better_edge = true;
978	else if (count > best_count + diff_count)
979	/ This successor has higher countuency so it is worse. /
980	is_better_edge = false;
981	else if (e->dest->prev_bb == bb)
982	/ The edges have equivalent probabilities and the successors*
983	have equivalent frequencies. Select the previous successor. /*
984	is_better_edge = true;
985	else
986	is_better_edge = false;
987	}
988
989	return is_better_edge;
990	}
991
992	/ Return true when the edge E is better than the temporary best edge*
993	CUR_BEST_EDGE. If SRC_INDEX_P is true, the function compares the src bb of
994	E and CUR_BEST_EDGE; otherwise it will compare the dest bb.
995	BEST_LEN is the trace length of src (or dest) bb in CUR_BEST_EDGE.
996	TRACES record the information about traces.
997	When optimizing for size, the edge with smaller index is better.
998	When optimizing for speed, the edge with bigger probability or longer trace
999	is better. /*
1000
1001	static bool
1002	connect_better_edge_p (const_edge e, bool src_index_p, int best_len,
1003	const_edge cur_best_edge, struct trace *traces)
1004	{
1005	int e_index;
1006	int b_index;
1007	bool is_better_edge;
1008
1009	if (!cur_best_edge)
1010	return true;
1011
1012	if (optimize_function_for_size_p (cfun))
1013	{
1014	e_index = src_index_p ? e->src->index : e->dest->index;
1015	b_index = src_index_p ? cur_best_edge->src->index
1016	: cur_best_edge->dest->index;
1017	/ The smaller one is better to keep the original order. /
1018	return b_index > e_index;
1019	}
1020
1021	if (src_index_p)
1022	{
1023	e_index = e->src->index;
1024
1025	/ We are looking for predecessor, so probabilities are not that*
1026	informative. We do not want to connect A to B because A has
1027	only one successor (probability is 100%) while there is edge
1028	A' to B where probability is 90% but which is much more frequent. /*
1029	if (e->count () > cur_best_edge->count ())
1030	/ The edge has higher probability than the temporary best edge. /
1031	is_better_edge = true;
1032	else if (e->count () < cur_best_edge->count ())
1033	/ The edge has lower probability than the temporary best edge. /
1034	is_better_edge = false;
1035	else if (e->probability > cur_best_edge->probability)
1036	/ The edge has higher probability than the temporary best edge. /
1037	is_better_edge = true;
1038	else if (e->probability < cur_best_edge->probability)
1039	/ The edge has lower probability than the temporary best edge. /
1040	is_better_edge = false;
1041	else if (traces[bbd[e_index].end_of_trace].length > best_len)
1042	/ The edge and the temporary best edge have equivalent probabilities.*
1043	The edge with longer trace is better. /*
1044	is_better_edge = true;
1045	else
1046	is_better_edge = false;
1047	}
1048	else
1049	{
1050	e_index = e->dest->index;
1051
1052	if (e->probability > cur_best_edge->probability)
1053	/ The edge has higher probability than the temporary best edge. /
1054	is_better_edge = true;
1055	else if (e->probability < cur_best_edge->probability)
1056	/ The edge has lower probability than the temporary best edge. /
1057	is_better_edge = false;
1058	else if (traces[bbd[e_index].start_of_trace].length > best_len)
1059	/ The edge and the temporary best edge have equivalent probabilities.*
1060	The edge with longer trace is better. /*
1061	is_better_edge = true;
1062	else
1063	is_better_edge = false;
1064	}
1065
1066	return is_better_edge;
1067	}
1068
1069	/ Connect traces in array TRACES, N_TRACES is the count of traces. /
1070
1071	static void
1072	connect_traces (int n_traces, struct trace *traces)
1073	{
1074	int i;
1075	bool *connected;
1076	bool two_passes;
1077	int last_trace;
1078	int current_pass;
1079	int current_partition;
1080	profile_count count_threshold;
1081	bool for_size = optimize_function_for_size_p (cfun);
1082
1083	count_threshold = max_entry_count.apply_scale (DUPLICATION_THRESHOLD, den: `1000`);
1084
1085	connected = XCNEWVEC (bool, n_traces);
1086	last_trace = -`1`;
1087	current_pass = `1`;
1088	current_partition = BB_PARTITION (traces[`0`].first);
1089	two_passes = false;
1090
1091	if (crtl->has_bb_partition)
1092	for (i = `0`; i < n_traces && !two_passes; i++)
1093	if (BB_PARTITION (traces[`0`].first)
1094	!= BB_PARTITION (traces[i].first))
1095	two_passes = true;
1096
1097	for (i = `0`; i < n_traces \|\| (two_passes && current_pass == `1`) ; i++)
1098	{
1099	int t = i;
1100	int t2;
1101	edge e, best;
1102	int best_len;
1103
1104	if (i >= n_traces)
1105	{
1106	gcc_assert (two_passes && current_pass == `1`);
1107	i = `0`;
1108	t = i;
1109	current_pass = `2`;
1110	if (current_partition == BB_HOT_PARTITION)
1111	current_partition = BB_COLD_PARTITION;
1112	else
1113	current_partition = BB_HOT_PARTITION;
1114	}
1115
1116	if (connected[t])
1117	continue;
1118
1119	if (two_passes
1120	&& BB_PARTITION (traces[t].first) != current_partition)
1121	continue;
1122
1123	connected[t] = true;
1124
1125	/ Find the predecessor traces. /
1126	for (t2 = t; t2 > `0`;)
1127	{
1128	edge_iterator ei;
1129	best = NULL;
1130	best_len = `0`;
1131	FOR_EACH_EDGE (e, ei, traces[t2].first->preds)
1132	{
1133	int si = e->src->index;
1134
1135	if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1136	&& (e->flags & EDGE_CAN_FALLTHRU)
1137	&& !(e->flags & EDGE_COMPLEX)
1138	&& bbd[si].end_of_trace >= `0`
1139	&& !connected[bbd[si].end_of_trace]
1140	&& (BB_PARTITION (e->src) == current_partition)
1141	&& connect_better_edge_p (e, src_index_p: true, best_len, cur_best_edge: best, traces))
1142	{
1143	best = e;
1144	best_len = traces[bbd[si].end_of_trace].length;
1145	}
1146	}
1147	if (best)
1148	{
1149	best->src->aux = best->dest;
1150	t2 = bbd[best->src->index].end_of_trace;
1151	connected[t2] = true;
1152
1153	if (dump_file)
1154	{
1155	fprintf (stream: dump_file, format: "Connection: %d %d\n",
1156	best->src->index, best->dest->index);
1157	}
1158	}
1159	else
1160	break;
1161	}
1162
1163	if (last_trace >= `0`)
1164	traces[last_trace].last->aux = traces[t2].first;
1165	last_trace = t;
1166
1167	/ Find the successor traces. /
1168	while (`1`)
1169	{
1170	/ Find the continuation of the chain. /
1171	edge_iterator ei;
1172	best = NULL;
1173	best_len = `0`;
1174	FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1175	{
1176	int di = e->dest->index;
1177
1178	if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1179	&& (e->flags & EDGE_CAN_FALLTHRU)
1180	&& !(e->flags & EDGE_COMPLEX)
1181	&& bbd[di].start_of_trace >= `0`
1182	&& !connected[bbd[di].start_of_trace]
1183	&& (BB_PARTITION (e->dest) == current_partition)
1184	&& connect_better_edge_p (e, src_index_p: false, best_len, cur_best_edge: best, traces))
1185	{
1186	best = e;
1187	best_len = traces[bbd[di].start_of_trace].length;
1188	}
1189	}
1190
1191	if (for_size)
1192	{
1193	if (!best)
1194	/ Stop finding the successor traces. /
1195	break;
1196
1197	/ It is OK to connect block n with block n + 1 or a block*
1198	before n. For others, only connect to the loop header. /*
1199	if (best->dest->index > (traces[t].last->index + `1`))
1200	{
1201	int count = EDGE_COUNT (best->dest->preds);
1202
1203	FOR_EACH_EDGE (e, ei, best->dest->preds)
1204	if (e->flags & EDGE_DFS_BACK)
1205	count--;
1206
1207	/ If dest has multiple predecessors, skip it. We expect*
1208	that one predecessor with smaller index connects with it
1209	later. /*
1210	if (count != `1`)
1211	break;
1212	}
1213
1214	/ Only connect Trace n with Trace n + 1. It is conservative*
1215	to keep the order as close as possible to the original order.
1216	It also helps to reduce long jumps. /*
1217	if (last_trace != bbd[best->dest->index].start_of_trace - `1`)
1218	break;
1219
1220	if (dump_file)
1221	fprintf (stream: dump_file, format: "Connection: %d %d\n",
1222	best->src->index, best->dest->index);
1223
1224	t = bbd[best->dest->index].start_of_trace;
1225	traces[last_trace].last->aux = traces[t].first;
1226	connected[t] = true;
1227	last_trace = t;
1228	}
1229	else if (best)
1230	{
1231	if (dump_file)
1232	{
1233	fprintf (stream: dump_file, format: "Connection: %d %d\n",
1234	best->src->index, best->dest->index);
1235	}
1236	t = bbd[best->dest->index].start_of_trace;
1237	traces[last_trace].last->aux = traces[t].first;
1238	connected[t] = true;
1239	last_trace = t;
1240	}
1241	else
1242	{
1243	/ Try to connect the traces by duplication of 1 block. /
1244	edge e2;
1245	basic_block next_bb = NULL;
1246	bool try_copy = false;
1247
1248	FOR_EACH_EDGE (e, ei, traces[t].last->succs)
1249	if (e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1250	&& (e->flags & EDGE_CAN_FALLTHRU)
1251	&& !(e->flags & EDGE_COMPLEX)
1252	&& (!best \|\| e->probability > best->probability))
1253	{
1254	edge_iterator ei;
1255	edge best2 = NULL;
1256	int best2_len = `0`;
1257
1258	/ If the destination is a start of a trace which is only*
1259	one block long, then no need to search the successor
1260	blocks of the trace. Accept it. /*
1261	if (bbd[e->dest->index].start_of_trace >= `0`
1262	&& traces[bbd[e->dest->index].start_of_trace].length
1263	== `1`)
1264	{
1265	best = e;
1266	try_copy = true;
1267	continue;
1268	}
1269
1270	FOR_EACH_EDGE (e2, ei, e->dest->succs)
1271	{
1272	int di = e2->dest->index;
1273
1274	if (e2->dest == EXIT_BLOCK_PTR_FOR_FN (cfun)
1275	\|\| ((e2->flags & EDGE_CAN_FALLTHRU)
1276	&& !(e2->flags & EDGE_COMPLEX)
1277	&& bbd[di].start_of_trace >= `0`
1278	&& !connected[bbd[di].start_of_trace]
1279	&& BB_PARTITION (e2->dest) == current_partition
1280	&& e2->count () >= count_threshold
1281	&& (!best2
1282	\|\| e2->probability > best2->probability
1283	\|\| (e2->probability == best2->probability
1284	&& traces[bbd[di].start_of_trace].length
1285	> best2_len))))
1286	{
1287	best = e;
1288	best2 = e2;
1289	if (e2->dest != EXIT_BLOCK_PTR_FOR_FN (cfun))
1290	best2_len = traces[bbd[di].start_of_trace].length;
1291	else
1292	best2_len = INT_MAX;
1293	next_bb = e2->dest;
1294	try_copy = true;
1295	}
1296	}
1297	}
1298
1299	/ Copy tiny blocks always; copy larger blocks only when the*
1300	edge is traversed frequently enough. /*
1301	if (try_copy
1302	&& BB_PARTITION (best->src) == BB_PARTITION (best->dest)
1303	&& copy_bb_p (best->dest,
1304	optimize_edge_for_speed_p (best)
1305	&& (!best->count ().initialized_p ()
1306	\|\| best->count () >= count_threshold)))
1307	{
1308	basic_block new_bb;
1309
1310	if (dump_file)
1311	{
1312	fprintf (stream: dump_file, format: "Connection: %d %d ",
1313	traces[t].last->index, best->dest->index);
1314	if (!next_bb)
1315	fputc (c: `'\n'`, stream: dump_file);
1316	else if (next_bb == EXIT_BLOCK_PTR_FOR_FN (cfun))
1317	fprintf (stream: dump_file, format: "exit\n");
1318	else
1319	fprintf (stream: dump_file, format: "%d\n", next_bb->index);
1320	}
1321
1322	new_bb = copy_bb (old_bb: best->dest, e: best, bb: traces[t].last, trace: t);
1323	traces[t].last = new_bb;
1324	if (next_bb && next_bb != EXIT_BLOCK_PTR_FOR_FN (cfun))
1325	{
1326	t = bbd[next_bb->index].start_of_trace;
1327	traces[last_trace].last->aux = traces[t].first;
1328	connected[t] = true;
1329	last_trace = t;
1330	}
1331	else
1332	break; / Stop finding the successor traces. /
1333	}
1334	else
1335	break; / Stop finding the successor traces. /
1336	}
1337	}
1338	}
1339
1340	if (dump_file)
1341	{
1342	basic_block bb;
1343
1344	fprintf (stream: dump_file, format: "Final order:\n");
1345	for (bb = traces[`0`].first; bb; bb = (basic_block) bb->aux)
1346	fprintf (stream: dump_file, format: "%d ", bb->index);
1347	fprintf (stream: dump_file, format: "\n");
1348	fflush (stream: dump_file);
1349	}
1350
1351	FREE (connected);
1352	}
1353
1354	/ Return true when BB can and should be copied. CODE_MAY_GROW is true*
1355	when code size is allowed to grow by duplication. /*
1356
1357	static bool
1358	copy_bb_p (const_basic_block bb, int code_may_grow)
1359	{
1360	unsigned int size = `0`;
1361	unsigned int max_size = uncond_jump_length;
1362	rtx_insn *insn;
1363
1364	if (EDGE_COUNT (bb->preds) < `2`)
1365	return false;
1366	if (!can_duplicate_block_p (bb))
1367	return false;
1368
1369	/ Avoid duplicating blocks which have many successors (PR/13430). /
1370	if (EDGE_COUNT (bb->succs) > `8`)
1371	return false;
1372
1373	if (code_may_grow && optimize_bb_for_speed_p (bb))
1374	max_size *= param_max_grow_copy_bb_insns;
1375
1376	FOR_BB_INSNS (bb, insn)
1377	{
1378	if (INSN_P (insn))
1379	{
1380	size += get_attr_min_length (insn);
1381	if (size > max_size)
1382	break;
1383	}
1384	}
1385
1386	if (size <= max_size)
1387	return true;
1388
1389	if (dump_file)
1390	{
1391	fprintf (stream: dump_file,
1392	format: "Block %d can't be copied because its size = %u.\n",
1393	bb->index, size);
1394	}
1395
1396	return false;
1397	}
1398
1399	/ Return the length of unconditional jump instruction. /
1400
1401	int
1402	get_uncond_jump_length (void)
1403	{
1404	unsigned int length;
1405
1406	start_sequence ();
1407	rtx_code_label *label = emit_label (gen_label_rtx ());
1408	rtx_insn *jump = emit_jump_insn (targetm.gen_jump (label));
1409	length = get_attr_min_length (jump);
1410	end_sequence ();
1411
1412	gcc_assert (length < INT_MAX);
1413	return length;
1414	}
1415
1416	/ Create a forwarder block to OLD_BB starting with NEW_LABEL and in the*
1417	other partition wrt OLD_BB. /*
1418
1419	static basic_block
1420	create_eh_forwarder_block (rtx_code_label *new_label, basic_block old_bb)
1421	{
1422	/ Split OLD_BB, so that EH pads have always only incoming EH edges,*
1423	bb_has_eh_pred bbs are treated specially by DF infrastructure. /*
1424	old_bb = split_block_after_labels (old_bb)->dest;
1425
1426	/ Put the new label and a jump in the new basic block. /
1427	rtx_insn *label = emit_label (new_label);
1428	rtx_code_label *old_label = block_label (old_bb);
1429	rtx_insn *jump = emit_jump_insn (targetm.gen_jump (old_label));
1430	JUMP_LABEL (jump) = old_label;
1431
1432	/ Create the new basic block and put it in last position. /
1433	basic_block last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
1434	basic_block new_bb = create_basic_block (label, jump, last_bb);
1435	new_bb->aux = last_bb->aux;
1436	new_bb->count = old_bb->count;
1437	last_bb->aux = new_bb;
1438
1439	emit_barrier_after_bb (bb: new_bb);
1440
1441	make_single_succ_edge (new_bb, old_bb, `0`);
1442
1443	/ Make sure the new basic block is in the other partition. /
1444	unsigned new_partition = BB_PARTITION (old_bb);
1445	new_partition ^= BB_HOT_PARTITION \| BB_COLD_PARTITION;
1446	BB_SET_PARTITION (new_bb, new_partition);
1447
1448	return new_bb;
1449	}
1450
1451	/ The common landing pad in block OLD_BB has edges from both partitions.*
1452	Add a new landing pad that will just jump to the old one and split the
1453	edges so that no EH edge crosses partitions. /*
1454
1455	static void
1456	sjlj_fix_up_crossing_landing_pad (basic_block old_bb)
1457	{
1458	const unsigned lp_len = cfun->eh->lp_array->length ();
1459	edge_iterator ei;
1460	edge e;
1461
1462	/ Generate the new common landing-pad label. /
1463	rtx_code_label *new_label = gen_label_rtx ();
1464	LABEL_PRESERVE_P (new_label) = `1`;
1465
1466	/ Create the forwarder block. /
1467	basic_block new_bb = create_eh_forwarder_block (new_label, old_bb);
1468
1469	/ Create the map from old to new lp index and initialize it. /
1470	unsigned index_map = (unsigned* ) alloca (lp_len sizeof (unsigned));
1471	memset (s: index_map, c: `0`, n: lp_len * sizeof (unsigned));
1472
1473	/ Fix up the edges. /
1474	for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (i: ei)) != NULL; )
1475	if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
1476	{
1477	rtx_insn *insn = BB_END (e->src);
1478	rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1479
1480	gcc_assert (note != NULL);
1481	const unsigned old_index = INTVAL (XEXP (note, `0`));
1482
1483	/ Generate the new landing-pad structure. /
1484	if (index_map[old_index] == `0`)
1485	{
1486	eh_landing_pad old_lp = (*cfun->eh->lp_array)[old_index];
1487	eh_landing_pad new_lp = gen_eh_landing_pad (old_lp->region);
1488	new_lp->post_landing_pad = old_lp->post_landing_pad;
1489	new_lp->landing_pad = new_label;
1490	index_map[old_index] = new_lp->index;
1491	}
1492	XEXP (note, `0`) = GEN_INT (index_map[old_index]);
1493
1494	/ Adjust the edge to the new destination. /
1495	redirect_edge_succ (e, new_bb);
1496	}
1497	else
1498	ei_next (i: &ei);
1499	}
1500
1501	/ The landing pad OLD_LP, in block OLD_BB, has edges from both partitions.*
1502	Add a new landing pad that will just jump to the old one and split the
1503	edges so that no EH edge crosses partitions. /*
1504
1505	static void
1506	dw2_fix_up_crossing_landing_pad (eh_landing_pad old_lp, basic_block old_bb)
1507	{
1508	eh_landing_pad new_lp;
1509	edge_iterator ei;
1510	edge e;
1511
1512	/ Generate the new landing-pad structure. /
1513	new_lp = gen_eh_landing_pad (old_lp->region);
1514	new_lp->post_landing_pad = old_lp->post_landing_pad;
1515	new_lp->landing_pad = gen_label_rtx ();
1516	LABEL_PRESERVE_P (new_lp->landing_pad) = `1`;
1517
1518	/ Create the forwarder block. /
1519	basic_block new_bb = create_eh_forwarder_block (new_label: new_lp->landing_pad, old_bb);
1520
1521	/ Fix up the edges. /
1522	for (ei = ei_start (old_bb->preds); (e = ei_safe_edge (i: ei)) != NULL; )
1523	if (e->src != new_bb && BB_PARTITION (e->src) == BB_PARTITION (new_bb))
1524	{
1525	rtx_insn *insn = BB_END (e->src);
1526	rtx note = find_reg_note (insn, REG_EH_REGION, NULL_RTX);
1527
1528	gcc_assert (note != NULL);
1529	gcc_checking_assert (INTVAL (XEXP (note, `0`)) == old_lp->index);
1530	XEXP (note, `0`) = GEN_INT (new_lp->index);
1531
1532	/ Adjust the edge to the new destination. /
1533	redirect_edge_succ (e, new_bb);
1534	}
1535	else
1536	ei_next (i: &ei);
1537	}
1538
1539
1540	/ Ensure that all hot bbs are included in a hot path through the*
1541	procedure. This is done by calling this function twice, once
1542	with WALK_UP true (to look for paths from the entry to hot bbs) and
1543	once with WALK_UP false (to look for paths from hot bbs to the exit).
1544	Returns the updated value of COLD_BB_COUNT and adds newly-hot bbs
1545	to BBS_IN_HOT_PARTITION. /*
1546
1547	static unsigned int
1548	sanitize_hot_paths (bool walk_up, unsigned int cold_bb_count,
1549	vec<basic_block> *bbs_in_hot_partition)
1550	{
1551	/ Callers check this. /
1552	gcc_checking_assert (cold_bb_count);
1553
1554	/ Keep examining hot bbs while we still have some left to check*
1555	and there are remaining cold bbs. /*
1556	vec<basic_block> hot_bbs_to_check = bbs_in_hot_partition->copy ();
1557	while (! hot_bbs_to_check.is_empty ()
1558	&& cold_bb_count)
1559	{
1560	basic_block bb = hot_bbs_to_check.pop ();
1561	vec<edge, va_gc> *edges = walk_up ? bb->preds : bb->succs;
1562	edge e;
1563	edge_iterator ei;
1564	profile_probability highest_probability
1565	= profile_probability::uninitialized ();
1566	profile_count highest_count = profile_count::uninitialized ();
1567	bool found = false;
1568
1569	/ Walk the preds/succs and check if there is at least one already*
1570	marked hot. Keep track of the most frequent pred/succ so that we
1571	can mark it hot if we don't find one. /*
1572	FOR_EACH_EDGE (e, ei, edges)
1573	{
1574	basic_block reach_bb = walk_up ? e->src : e->dest;
1575
1576	if (e->flags & EDGE_DFS_BACK)
1577	continue;
1578
1579	/ Do not expect profile insanities when profile was not adjusted. /
1580	if (e->probability == profile_probability::never ()
1581	\|\| e->count () == profile_count::zero ())
1582	continue;
1583
1584	if (BB_PARTITION (reach_bb) != BB_COLD_PARTITION)
1585	{
1586	found = true;
1587	break;
1588	}
1589	/ The following loop will look for the hottest edge via*
1590	the edge count, if it is non-zero, then fallback to
1591	the edge probability. /*
1592	if (!(e->count () > highest_count))
1593	highest_count = e->count ();
1594	if (!highest_probability.initialized_p ()
1595	\|\| e->probability > highest_probability)
1596	highest_probability = e->probability;
1597	}
1598
1599	/ If bb is reached by (or reaches, in the case of !WALK_UP) another hot*
1600	block (or unpartitioned, e.g. the entry block) then it is ok. If not,
1601	then the most frequent pred (or succ) needs to be adjusted. In the
1602	case where multiple preds/succs have the same frequency (e.g. a
1603	50-50 branch), then both will be adjusted. /*
1604	if (found)
1605	continue;
1606
1607	FOR_EACH_EDGE (e, ei, edges)
1608	{
1609	if (e->flags & EDGE_DFS_BACK)
1610	continue;
1611	/ Do not expect profile insanities when profile was not adjusted. /
1612	if (e->probability == profile_probability::never ()
1613	\|\| e->count () == profile_count::zero ())
1614	continue;
1615	/ Select the hottest edge using the edge count, if it is non-zero,*
1616	then fallback to the edge probability. /*
1617	if (highest_count.initialized_p ())
1618	{
1619	if (!(e->count () >= highest_count))
1620	continue;
1621	}
1622	else if (!(e->probability >= highest_probability))
1623	continue;
1624
1625	basic_block reach_bb = walk_up ? e->src : e->dest;
1626
1627	/ We have a hot bb with an immediate dominator that is cold.*
1628	The dominator needs to be re-marked hot. /*
1629	BB_SET_PARTITION (reach_bb, BB_HOT_PARTITION);
1630	if (dump_file)
1631	fprintf (stream: dump_file, format: "Promoting bb %i to hot partition to sanitize "
1632	"profile of bb %i in %s walk\n", reach_bb->index,
1633	bb->index, walk_up ? "backward" : "forward");
1634	cold_bb_count--;
1635
1636	/ Now we need to examine newly-hot reach_bb to see if it is also*
1637	dominated by a cold bb. /*
1638	bbs_in_hot_partition->safe_push (obj: reach_bb);
1639	hot_bbs_to_check.safe_push (obj: reach_bb);
1640	}
1641	}
1642	hot_bbs_to_check.release ();
1643
1644	return cold_bb_count;
1645	}
1646
1647
1648	/ Find the basic blocks that are rarely executed and need to be moved to*
1649	a separate section of the .o file (to cut down on paging and improve
1650	cache locality). Return a vector of all edges that cross. /*
1651
1652	static vec<edge>
1653	find_rarely_executed_basic_blocks_and_crossing_edges (void)
1654	{
1655	vec<edge> crossing_edges = vNULL;
1656	basic_block bb;
1657	edge e;
1658	edge_iterator ei;
1659	unsigned int cold_bb_count = `0`;
1660	auto_vec<basic_block> bbs_in_hot_partition;
1661
1662	propagate_unlikely_bbs_forward ();
1663
1664	/ Mark which partition (hot/cold) each basic block belongs in. /
1665	FOR_EACH_BB_FN (bb, cfun)
1666	{
1667	bool cold_bb = false;
1668
1669	if (probably_never_executed_bb_p (cfun, bb))
1670	{
1671	cold_bb = true;
1672
1673	/ Handle profile insanities created by upstream optimizations*
1674	by also checking the incoming edge weights. If there is a non-cold
1675	incoming edge, conservatively prevent this block from being split
1676	into the cold section. /*
1677	if (!bb->count.precise_p ())
1678	FOR_EACH_EDGE (e, ei, bb->preds)
1679	if (!probably_never_executed_edge_p (cfun, e))
1680	{
1681	cold_bb = false;
1682	break;
1683	}
1684	}
1685	if (cold_bb)
1686	{
1687	BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1688	cold_bb_count++;
1689	}
1690	else
1691	{
1692	BB_SET_PARTITION (bb, BB_HOT_PARTITION);
1693	bbs_in_hot_partition.safe_push (obj: bb);
1694	}
1695	}
1696
1697	/ Ensure that hot bbs are included along a hot path from the entry to exit.*
1698	Several different possibilities may include cold bbs along all paths
1699	to/from a hot bb. One is that there are edge weight insanities
1700	due to optimization phases that do not properly update basic block profile
1701	counts. The second is that the entry of the function may not be hot, because
1702	it is entered fewer times than the number of profile training runs, but there
1703	is a loop inside the function that causes blocks within the function to be
1704	above the threshold for hotness. This is fixed by walking up from hot bbs
1705	to the entry block, and then down from hot bbs to the exit, performing
1706	partitioning fixups as necessary. /*
1707	if (cold_bb_count)
1708	{
1709	mark_dfs_back_edges ();
1710	cold_bb_count = sanitize_hot_paths (walk_up: true, cold_bb_count,
1711	bbs_in_hot_partition: &bbs_in_hot_partition);
1712	if (cold_bb_count)
1713	sanitize_hot_paths (walk_up: false, cold_bb_count, bbs_in_hot_partition: &bbs_in_hot_partition);
1714
1715	hash_set <basic_block> set;
1716	find_bbs_reachable_by_hot_paths (&set);
1717	FOR_EACH_BB_FN (bb, cfun)
1718	if (!set.contains (k: bb))
1719	BB_SET_PARTITION (bb, BB_COLD_PARTITION);
1720	}
1721
1722	/ The format of .gcc_except_table does not allow landing pads to*
1723	be in a different partition as the throw. Fix this by either
1724	moving the landing pads or inserting forwarder landing pads. /*
1725	if (cfun->eh->lp_array)
1726	{
1727	const bool sjlj
1728	= (targetm_common.except_unwind_info (&global_options) == UI_SJLJ);
1729	unsigned i;
1730	eh_landing_pad lp;
1731
1732	FOR_EACH_VEC_ELT (*cfun->eh->lp_array, i, lp)
1733	{
1734	bool all_same, all_diff;
1735
1736	if (lp == NULL
1737	\|\| lp->landing_pad == NULL_RTX
1738	\|\| !LABEL_P (lp->landing_pad))
1739	continue;
1740
1741	all_same = all_diff = true;
1742	bb = BLOCK_FOR_INSN (insn: lp->landing_pad);
1743	FOR_EACH_EDGE (e, ei, bb->preds)
1744	{
1745	gcc_assert (e->flags & EDGE_EH);
1746	if (BB_PARTITION (bb) == BB_PARTITION (e->src))
1747	all_diff = false;
1748	else
1749	all_same = false;
1750	}
1751
1752	if (all_same)
1753	;
1754	else if (all_diff)
1755	{
1756	int which = BB_PARTITION (bb);
1757	which ^= BB_HOT_PARTITION \| BB_COLD_PARTITION;
1758	BB_SET_PARTITION (bb, which);
1759	}
1760	else if (sjlj)
1761	sjlj_fix_up_crossing_landing_pad (old_bb: bb);
1762	else
1763	dw2_fix_up_crossing_landing_pad (old_lp: lp, old_bb: bb);
1764
1765	/ There is a single, common landing pad in SJLJ mode. /
1766	if (sjlj)
1767	break;
1768	}
1769	}
1770
1771	/ Mark every edge that crosses between sections. /
1772	FOR_EACH_BB_FN (bb, cfun)
1773	FOR_EACH_EDGE (e, ei, bb->succs)
1774	{
1775	unsigned int flags = e->flags;
1776
1777	/ We should never have EDGE_CROSSING set yet. /
1778	gcc_checking_assert ((flags & EDGE_CROSSING) == `0`);
1779
1780	if (e->src != ENTRY_BLOCK_PTR_FOR_FN (cfun)
1781	&& e->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)
1782	&& BB_PARTITION (e->src) != BB_PARTITION (e->dest))
1783	{
1784	crossing_edges.safe_push (obj: e);
1785	flags \|= EDGE_CROSSING;
1786	}
1787
1788	/ Now that we've split eh edges as appropriate, allow landing pads*
1789	to be merged with the post-landing pads. /*
1790	flags &= ~EDGE_PRESERVE;
1791
1792	e->flags = flags;
1793	}
1794
1795	return crossing_edges;
1796	}
1797
1798	/ Set the flag EDGE_CAN_FALLTHRU for edges that can be fallthru. /
1799
1800	static void
1801	set_edge_can_fallthru_flag (void)
1802	{
1803	basic_block bb;
1804
1805	FOR_EACH_BB_FN (bb, cfun)
1806	{
1807	edge e;
1808	edge_iterator ei;
1809
1810	FOR_EACH_EDGE (e, ei, bb->succs)
1811	{
1812	e->flags &= ~EDGE_CAN_FALLTHRU;
1813
1814	/ The FALLTHRU edge is also CAN_FALLTHRU edge. /
1815	if (e->flags & EDGE_FALLTHRU)
1816	e->flags \|= EDGE_CAN_FALLTHRU;
1817	}
1818
1819	/ If the BB ends with an invertible condjump all (2) edges are*
1820	CAN_FALLTHRU edges. /*
1821	if (EDGE_COUNT (bb->succs) != `2`)
1822	continue;
1823	if (!any_condjump_p (BB_END (bb)))
1824	continue;
1825
1826	rtx_jump_insn bb_end_jump = as_a <rtx_jump_insn > (BB_END (bb));
1827	if (!invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), `0`))
1828	continue;
1829	invert_jump (bb_end_jump, JUMP_LABEL (bb_end_jump), `0`);
1830	EDGE_SUCC (bb, `0`)->flags \|= EDGE_CAN_FALLTHRU;
1831	EDGE_SUCC (bb, `1`)->flags \|= EDGE_CAN_FALLTHRU;
1832	}
1833	}
1834
1835	/ If any destination of a crossing edge does not have a label, add label;*
1836	Convert any easy fall-through crossing edges to unconditional jumps. /*
1837
1838	static void
1839	add_labels_and_missing_jumps (vec<edge> crossing_edges)
1840	{
1841	size_t i;
1842	edge e;
1843
1844	FOR_EACH_VEC_ELT (crossing_edges, i, e)
1845	{
1846	basic_block src = e->src;
1847	basic_block dest = e->dest;
1848	rtx_jump_insn *new_jump;
1849
1850	if (dest == EXIT_BLOCK_PTR_FOR_FN (cfun))
1851	continue;
1852
1853	/ Make sure dest has a label. /
1854	rtx_code_label *label = block_label (dest);
1855
1856	/ Nothing to do for non-fallthru edges. /
1857	if (src == ENTRY_BLOCK_PTR_FOR_FN (cfun))
1858	continue;
1859	if ((e->flags & EDGE_FALLTHRU) == `0`)
1860	continue;
1861
1862	/ If the block does not end with a control flow insn, then we*
1863	can trivially add a jump to the end to fixup the crossing.
1864	Otherwise the jump will have to go in a new bb, which will
1865	be handled by fix_up_fall_thru_edges function. /*
1866	if (control_flow_insn_p (BB_END (src)))
1867	continue;
1868
1869	/ Make sure there's only one successor. /
1870	gcc_assert (single_succ_p (src));
1871
1872	new_jump = emit_jump_insn_after (targetm.gen_jump (label), BB_END (src));
1873	BB_END (src) = new_jump;
1874	JUMP_LABEL (new_jump) = label;
1875	LABEL_NUSES (label) += `1`;
1876
1877	emit_barrier_after_bb (bb: src);
1878
1879	/ Mark edge as non-fallthru. /
1880	e->flags &= ~EDGE_FALLTHRU;
1881	}
1882	}
1883
1884	/ Find any bb's where the fall-through edge is a crossing edge (note that*
1885	these bb's must also contain a conditional jump or end with a call
1886	instruction; we've already dealt with fall-through edges for blocks
1887	that didn't have a conditional jump or didn't end with call instruction
1888	in the call to add_labels_and_missing_jumps). Convert the fall-through
1889	edge to non-crossing edge by inserting a new bb to fall-through into.
1890	The new bb will contain an unconditional jump (crossing edge) to the
1891	original fall through destination. /*
1892
1893	static void
1894	fix_up_fall_thru_edges (void)
1895	{
1896	basic_block cur_bb;
1897
1898	FOR_EACH_BB_FN (cur_bb, cfun)
1899	{
1900	edge succ1;
1901	edge succ2;
1902	edge fall_thru = NULL;
1903	edge cond_jump = NULL;
1904
1905	fall_thru = NULL;
1906	if (EDGE_COUNT (cur_bb->succs) > `0`)
1907	succ1 = EDGE_SUCC (cur_bb, `0`);
1908	else
1909	succ1 = NULL;
1910
1911	if (EDGE_COUNT (cur_bb->succs) > `1`)
1912	succ2 = EDGE_SUCC (cur_bb, `1`);
1913	else
1914	succ2 = NULL;
1915
1916	/ Find the fall-through edge. /
1917
1918	if (succ1
1919	&& (succ1->flags & EDGE_FALLTHRU))
1920	{
1921	fall_thru = succ1;
1922	cond_jump = succ2;
1923	}
1924	else if (succ2
1925	&& (succ2->flags & EDGE_FALLTHRU))
1926	{
1927	fall_thru = succ2;
1928	cond_jump = succ1;
1929	}
1930	else if (succ2 && EDGE_COUNT (cur_bb->succs) > `2`)
1931	fall_thru = find_fallthru_edge (edges: cur_bb->succs);
1932
1933	if (fall_thru && (fall_thru->dest != EXIT_BLOCK_PTR_FOR_FN (cfun)))
1934	{
1935	/ Check to see if the fall-thru edge is a crossing edge. /
1936
1937	if (fall_thru->flags & EDGE_CROSSING)
1938	{
1939	/ The fall_thru edge crosses; now check the cond jump edge, if*
1940	it exists. /*
1941
1942	bool cond_jump_crosses = true;
1943	int invert_worked = `0`;
1944	rtx_insn *old_jump = BB_END (cur_bb);
1945
1946	/ Find the jump instruction, if there is one. /
1947
1948	if (cond_jump)
1949	{
1950	if (!(cond_jump->flags & EDGE_CROSSING))
1951	cond_jump_crosses = false;
1952
1953	/ We know the fall-thru edge crosses; if the cond*
1954	jump edge does NOT cross, and its destination is the
1955	next block in the bb order, invert the jump
1956	(i.e. fix it so the fall through does not cross and
1957	the cond jump does). /*
1958
1959	if (!cond_jump_crosses)
1960	{
1961	/ Find label in fall_thru block. We've already added*
1962	any missing labels, so there must be one. /*
1963
1964	rtx_code_label *fall_thru_label
1965	= block_label (fall_thru->dest);
1966
1967	if (old_jump && fall_thru_label)
1968	{
1969	rtx_jump_insn *old_jump_insn
1970	= dyn_cast <rtx_jump_insn *> (p: old_jump);
1971	if (old_jump_insn)
1972	invert_worked = invert_jump (old_jump_insn,
1973	fall_thru_label, `0`);
1974	}
1975
1976	if (invert_worked)
1977	{
1978	fall_thru->flags &= ~EDGE_FALLTHRU;
1979	cond_jump->flags \|= EDGE_FALLTHRU;
1980	update_br_prob_note (cur_bb);
1981	std::swap (a&: fall_thru, b&: cond_jump);
1982	cond_jump->flags \|= EDGE_CROSSING;
1983	fall_thru->flags &= ~EDGE_CROSSING;
1984	}
1985	}
1986	}
1987
1988	if (cond_jump_crosses \|\| !invert_worked)
1989	{
1990	/ This is the case where both edges out of the basic*
1991	block are crossing edges. Here we will fix up the
1992	fall through edge. The jump edge will be taken care
1993	of later. The EDGE_CROSSING flag of fall_thru edge
1994	is unset before the call to force_nonfallthru
1995	function because if a new basic-block is created
1996	this edge remains in the current section boundary
1997	while the edge between new_bb and the fall_thru->dest
1998	becomes EDGE_CROSSING. /*
1999
2000	fall_thru->flags &= ~EDGE_CROSSING;
2001	unsigned old_count = EDGE_COUNT (cur_bb->succs);
2002	basic_block new_bb = force_nonfallthru (fall_thru);
2003
2004	if (new_bb)
2005	{
2006	new_bb->aux = cur_bb->aux;
2007	cur_bb->aux = new_bb;
2008
2009	/ This is done by force_nonfallthru_and_redirect. /
2010	gcc_assert (BB_PARTITION (new_bb)
2011	== BB_PARTITION (cur_bb));
2012
2013	edge e = single_succ_edge (bb: new_bb);
2014	e->flags \|= EDGE_CROSSING;
2015	if (EDGE_COUNT (cur_bb->succs) > old_count)
2016	{
2017	/ If asm goto has a crossing fallthrough edge*
2018	and at least one of the labels to the same bb,
2019	force_nonfallthru can result in the fallthrough
2020	edge being redirected and a new edge added for the
2021	label or more labels to e->dest. As we've
2022	temporarily cleared EDGE_CROSSING flag on the
2023	fallthrough edge, we need to restore it again.
2024	See PR108596. /*
2025	rtx_insn *j = BB_END (cur_bb);
2026	gcc_checking_assert (JUMP_P (j)
2027	&& asm_noperands (PATTERN (j)));
2028	edge e2 = find_edge (cur_bb, e->dest);
2029	if (e2)
2030	e2->flags \|= EDGE_CROSSING;
2031	}
2032	}
2033	else
2034	{
2035	/ If a new basic-block was not created; restore*
2036	the EDGE_CROSSING flag. /*
2037	fall_thru->flags \|= EDGE_CROSSING;
2038	}
2039
2040	/ Add barrier after new jump /
2041	emit_barrier_after_bb (bb: new_bb ? new_bb : cur_bb);
2042	}
2043	}
2044	}
2045	}
2046	}
2047
2048	/ This function checks the destination block of a "crossing jump" to*
2049	see if it has any crossing predecessors that begin with a code label
2050	and end with an unconditional jump. If so, it returns that predecessor
2051	block. (This is to avoid creating lots of new basic blocks that all
2052	contain unconditional jumps to the same destination). /*
2053
2054	static basic_block
2055	find_jump_block (basic_block jump_dest)
2056	{
2057	basic_block source_bb = NULL;
2058	edge e;
2059	rtx_insn *insn;
2060	edge_iterator ei;
2061
2062	FOR_EACH_EDGE (e, ei, jump_dest->preds)
2063	if (e->flags & EDGE_CROSSING)
2064	{
2065	basic_block src = e->src;
2066
2067	/ Check each predecessor to see if it has a label, and contains*
2068	only one executable instruction, which is an unconditional jump.
2069	If so, we can use it. /*
2070
2071	if (LABEL_P (BB_HEAD (src)))
2072	for (insn = BB_HEAD (src);
2073	!INSN_P (insn) && insn != NEXT_INSN (BB_END (src));
2074	insn = NEXT_INSN (insn))
2075	{
2076	if (INSN_P (insn)
2077	&& insn == BB_END (src)
2078	&& JUMP_P (insn)
2079	&& !any_condjump_p (insn))
2080	{
2081	source_bb = src;
2082	break;
2083	}
2084	}
2085
2086	if (source_bb)
2087	break;
2088	}
2089
2090	return source_bb;
2091	}
2092
2093	/ Find all BB's with conditional jumps that are crossing edges;*
2094	insert a new bb and make the conditional jump branch to the new
2095	bb instead (make the new bb same color so conditional branch won't
2096	be a 'crossing' edge). Insert an unconditional jump from the
2097	new bb to the original destination of the conditional jump. /*
2098
2099	static void
2100	fix_crossing_conditional_branches (void)
2101	{
2102	basic_block cur_bb;
2103	basic_block new_bb;
2104	basic_block dest;
2105	edge succ1;
2106	edge succ2;
2107	edge crossing_edge;
2108	edge new_edge;
2109	rtx set_src;
2110	rtx old_label = NULL_RTX;
2111	rtx_code_label *new_label;
2112
2113	FOR_EACH_BB_FN (cur_bb, cfun)
2114	{
2115	crossing_edge = NULL;
2116	if (EDGE_COUNT (cur_bb->succs) > `0`)
2117	succ1 = EDGE_SUCC (cur_bb, `0`);
2118	else
2119	succ1 = NULL;
2120
2121	if (EDGE_COUNT (cur_bb->succs) > `1`)
2122	succ2 = EDGE_SUCC (cur_bb, `1`);
2123	else
2124	succ2 = NULL;
2125
2126	/ We already took care of fall-through edges, so only one successor*
2127	can be a crossing edge. /*
2128
2129	if (succ1 && (succ1->flags & EDGE_CROSSING))
2130	crossing_edge = succ1;
2131	else if (succ2 && (succ2->flags & EDGE_CROSSING))
2132	crossing_edge = succ2;
2133
2134	if (crossing_edge)
2135	{
2136	rtx_insn *old_jump = BB_END (cur_bb);
2137
2138	/ Check to make sure the jump instruction is a*
2139	conditional jump. /*
2140
2141	set_src = NULL_RTX;
2142
2143	if (any_condjump_p (old_jump))
2144	{
2145	if (GET_CODE (PATTERN (old_jump)) == SET)
2146	set_src = SET_SRC (PATTERN (old_jump));
2147	else if (GET_CODE (PATTERN (old_jump)) == PARALLEL)
2148	{
2149	set_src = XVECEXP (PATTERN (old_jump), `0`,`0`);
2150	if (GET_CODE (set_src) == SET)
2151	set_src = SET_SRC (set_src);
2152	else
2153	set_src = NULL_RTX;
2154	}
2155	}
2156
2157	if (set_src && (GET_CODE (set_src) == IF_THEN_ELSE))
2158	{
2159	rtx_jump_insn *old_jump_insn =
2160	as_a <rtx_jump_insn *> (p: old_jump);
2161
2162	if (GET_CODE (XEXP (set_src, `1`)) == PC)
2163	old_label = XEXP (set_src, `2`);
2164	else if (GET_CODE (XEXP (set_src, `2`)) == PC)
2165	old_label = XEXP (set_src, `1`);
2166
2167	/ Check to see if new bb for jumping to that dest has*
2168	already been created; if so, use it; if not, create
2169	a new one. /*
2170
2171	new_bb = find_jump_block (jump_dest: crossing_edge->dest);
2172
2173	if (new_bb)
2174	new_label = block_label (new_bb);
2175	else
2176	{
2177	basic_block last_bb;
2178	rtx_code_label *old_jump_target;
2179	rtx_jump_insn *new_jump;
2180
2181	/ Create new basic block to be dest for*
2182	conditional jump. /*
2183
2184	/ Put appropriate instructions in new bb. /
2185
2186	new_label = gen_label_rtx ();
2187	emit_label (new_label);
2188
2189	gcc_assert (GET_CODE (old_label) == LABEL_REF);
2190	old_jump_target = old_jump_insn->jump_target ();
2191	new_jump = as_a <rtx_jump_insn *>
2192	(p: emit_jump_insn (targetm.gen_jump (old_jump_target)));
2193	new_jump->set_jump_target (old_jump_target);
2194
2195	last_bb = EXIT_BLOCK_PTR_FOR_FN (cfun)->prev_bb;
2196	new_bb = create_basic_block (new_label, new_jump, last_bb);
2197	new_bb->aux = last_bb->aux;
2198	last_bb->aux = new_bb;
2199
2200	emit_barrier_after_bb (bb: new_bb);
2201
2202	/ Make sure new bb is in same partition as source*
2203	of conditional branch. /*
2204	BB_COPY_PARTITION (new_bb, cur_bb);
2205	}
2206
2207	/ Make old jump branch to new bb. /
2208
2209	redirect_jump (old_jump_insn, new_label, `0`);
2210
2211	/ Remove crossing_edge as predecessor of 'dest'. /
2212
2213	dest = crossing_edge->dest;
2214
2215	redirect_edge_succ (crossing_edge, new_bb);
2216
2217	/ Make a new edge from new_bb to old dest; new edge*
2218	will be a successor for new_bb and a predecessor
2219	for 'dest'. /*
2220
2221	if (EDGE_COUNT (new_bb->succs) == `0`)
2222	new_edge = make_single_succ_edge (new_bb, dest, `0`);
2223	else
2224	new_edge = EDGE_SUCC (new_bb, `0`);
2225
2226	crossing_edge->flags &= ~EDGE_CROSSING;
2227	new_edge->flags \|= EDGE_CROSSING;
2228	}
2229	}
2230	}
2231	}
2232
2233	/ Find any unconditional branches that cross between hot and cold*
2234	sections. Convert them into indirect jumps instead. /*
2235
2236	static void
2237	fix_crossing_unconditional_branches (void)
2238	{
2239	basic_block cur_bb;
2240	rtx_insn *last_insn;
2241	rtx label;
2242	rtx label_addr;
2243	rtx_insn *indirect_jump_sequence;
2244	rtx_insn *jump_insn = NULL;
2245	rtx new_reg;
2246	rtx_insn *cur_insn;
2247	edge succ;
2248
2249	FOR_EACH_BB_FN (cur_bb, cfun)
2250	{
2251	last_insn = BB_END (cur_bb);
2252
2253	if (EDGE_COUNT (cur_bb->succs) < `1`)
2254	continue;
2255
2256	succ = EDGE_SUCC (cur_bb, `0`);
2257
2258	/ Check to see if bb ends in a crossing (unconditional) jump. At*
2259	this point, no crossing jumps should be conditional. /*
2260
2261	if (JUMP_P (last_insn)
2262	&& (succ->flags & EDGE_CROSSING))
2263	{
2264	gcc_assert (!any_condjump_p (last_insn));
2265
2266	/ Make sure the jump is not already an indirect or table jump. /
2267
2268	if (!computed_jump_p (last_insn)
2269	&& !tablejump_p (last_insn, NULL, NULL))
2270	{
2271	/ We have found a "crossing" unconditional branch. Now*
2272	we must convert it to an indirect jump. First create
2273	reference of label, as target for jump. /*
2274
2275	label = JUMP_LABEL (last_insn);
2276	label_addr = gen_rtx_LABEL_REF (Pmode, label);
2277	LABEL_NUSES (label) += `1`;
2278
2279	/ Get a register to use for the indirect jump. /
2280
2281	new_reg = gen_reg_rtx (Pmode);
2282
2283	/ Generate indirect the jump sequence. /
2284
2285	start_sequence ();
2286	emit_move_insn (new_reg, label_addr);
2287	emit_indirect_jump (new_reg);
2288	indirect_jump_sequence = get_insns ();
2289	end_sequence ();
2290
2291	/ Make sure every instruction in the new jump sequence has*
2292	its basic block set to be cur_bb. /*
2293
2294	for (cur_insn = indirect_jump_sequence; cur_insn;
2295	cur_insn = NEXT_INSN (insn: cur_insn))
2296	{
2297	if (!BARRIER_P (cur_insn))
2298	BLOCK_FOR_INSN (insn: cur_insn) = cur_bb;
2299	if (JUMP_P (cur_insn))
2300	jump_insn = cur_insn;
2301	}
2302
2303	/ Insert the new (indirect) jump sequence immediately before*
2304	the unconditional jump, then delete the unconditional jump. /*
2305
2306	emit_insn_before (indirect_jump_sequence, last_insn);
2307	delete_insn (last_insn);
2308
2309	JUMP_LABEL (jump_insn) = label;
2310	LABEL_NUSES (label)++;
2311
2312	/ Make BB_END for cur_bb be the jump instruction (NOT the*
2313	barrier instruction at the end of the sequence...). /*
2314
2315	BB_END (cur_bb) = jump_insn;
2316	}
2317	}
2318	}
2319	}
2320
2321	/ Update CROSSING_JUMP_P flags on all jump insns. /
2322
2323	static void
2324	update_crossing_jump_flags (void)
2325	{
2326	basic_block bb;
2327	edge e;
2328	edge_iterator ei;
2329
2330	FOR_EACH_BB_FN (bb, cfun)
2331	FOR_EACH_EDGE (e, ei, bb->succs)
2332	if (e->flags & EDGE_CROSSING)
2333	{
2334	if (JUMP_P (BB_END (bb)))
2335	CROSSING_JUMP_P (BB_END (bb)) = `1`;
2336	break;
2337	}
2338	}
2339
2340	/ Reorder basic blocks using the software trace cache (STC) algorithm. /
2341
2342	static void
2343	reorder_basic_blocks_software_trace_cache (void)
2344	{
2345	if (dump_file)
2346	fprintf (stream: dump_file, format: "\nReordering with the STC algorithm.\n\n");
2347
2348	int n_traces;
2349	int i;
2350	struct trace *traces;
2351
2352	/ We are estimating the length of uncond jump insn only once since the code*
2353	for getting the insn length always returns the minimal length now. /*
2354	if (uncond_jump_length == `0`)
2355	uncond_jump_length = get_uncond_jump_length ();
2356
2357	/ We need to know some information for each basic block. /
2358	array_size = GET_ARRAY_SIZE (last_basic_block_for_fn (cfun));
2359	bbd = XNEWVEC (bbro_basic_block_data, array_size);
2360	for (i = `0`; i < array_size; i++)
2361	{
2362	bbd[i].start_of_trace = -`1`;
2363	bbd[i].end_of_trace = -`1`;
2364	bbd[i].in_trace = -`1`;
2365	bbd[i].visited = `0`;
2366	bbd[i].priority = -`1`;
2367	bbd[i].heap = NULL;
2368	bbd[i].node = NULL;
2369	}
2370
2371	traces = XNEWVEC (struct trace, n_basic_blocks_for_fn (cfun));
2372	n_traces = `0`;
2373	find_traces (n_traces: &n_traces, traces);
2374	connect_traces (n_traces, traces);
2375	FREE (traces);
2376	FREE (bbd);
2377	}
2378
2379	/ Order edges by execution frequency, higher first. /
2380
2381	static int
2382	edge_order (const void ve1, const* void *ve2)
2383	{
2384	edge e1 = (const* edge *) ve1;
2385	edge e2 = (const* edge *) ve2;
2386	profile_count c1 = e1->count ();
2387	profile_count c2 = e2->count ();
2388	/ Since profile_count::operator< does not establish a strict weak order*
2389	in presence of uninitialized counts, use 'max': this makes them appear
2390	as if having execution frequency less than any initialized count. /*
2391	profile_count m = c1.max (other: c2);
2392	return (m == c2) - (m == c1);
2393	}
2394
2395	/ Reorder basic blocks using the "simple" algorithm. This tries to*
2396	maximize the dynamic number of branches that are fallthrough, without
2397	copying instructions. The algorithm is greedy, looking at the most
2398	frequently executed branch first. /*
2399
2400	static void
2401	reorder_basic_blocks_simple (void)
2402	{
2403	if (dump_file)
2404	fprintf (stream: dump_file, format: "\nReordering with the \"simple\" algorithm.\n\n");
2405
2406	edge edges = new* edge[`2` * n_basic_blocks_for_fn (cfun)];
2407
2408	/ First, collect all edges that can be optimized by reordering blocks:*
2409	simple jumps and conditional jumps, as well as the function entry edge. /*
2410
2411	int n = `0`;
2412	edges[n++] = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), `0`);
2413
2414	basic_block bb;
2415	FOR_EACH_BB_FN (bb, cfun)
2416	{
2417	rtx_insn *end = BB_END (bb);
2418
2419	if (computed_jump_p (end) \|\| tablejump_p (end, NULL, NULL))
2420	continue;
2421
2422	/ We cannot optimize asm goto. /
2423	if (JUMP_P (end) && extract_asm_operands (end))
2424	continue;
2425
2426	if (single_succ_p (bb))
2427	edges[n++] = EDGE_SUCC (bb, `0`);
2428	else if (any_condjump_p (end))
2429	{
2430	edge e0 = EDGE_SUCC (bb, `0`);
2431	edge e1 = EDGE_SUCC (bb, `1`);
2432	/ When optimizing for size it is best to keep the original*
2433	fallthrough edges. /*
2434	if (e1->flags & EDGE_FALLTHRU)
2435	std::swap (a&: e0, b&: e1);
2436	edges[n++] = e0;
2437	edges[n++] = e1;
2438	}
2439	}
2440
2441	/ Sort the edges, the most desirable first. When optimizing for size*
2442	all edges are equally desirable. /*
2443
2444	if (optimize_function_for_speed_p (cfun))
2445	gcc_stablesort (edges, n, sizeof *edges, edge_order);
2446
2447	/ Now decide which of those edges to make fallthrough edges. We set*
2448	BB_VISITED if a block already has a fallthrough successor assigned
2449	to it. We make ->AUX of an endpoint point to the opposite endpoint
2450	of a sequence of blocks that fall through, and ->AUX will be NULL
2451	for a block that is in such a sequence but not an endpoint anymore.
2452
2453	To start with, everything points to itself, nothing is assigned yet. /*
2454
2455	FOR_ALL_BB_FN (bb, cfun)
2456	{
2457	bb->aux = bb;
2458	bb->flags &= ~BB_VISITED;
2459	}
2460
2461	EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = `0`;
2462
2463	/ Now for all edges, the most desirable first, see if that edge can*
2464	connect two sequences. If it can, update AUX and BB_VISITED; if it
2465	cannot, zero out the edge in the table. /*
2466
2467	for (int j = `0`; j < n; j++)
2468	{
2469	edge e = edges[j];
2470
2471	basic_block tail_a = e->src;
2472	basic_block head_b = e->dest;
2473	basic_block head_a = (basic_block) tail_a->aux;
2474	basic_block tail_b = (basic_block) head_b->aux;
2475
2476	/ An edge cannot connect two sequences if:*
2477	- it crosses partitions;
2478	- its src is not a current endpoint;
2479	- its dest is not a current endpoint;
2480	- or, it would create a loop. /*
2481
2482	if (e->flags & EDGE_CROSSING
2483	\|\| tail_a->flags & BB_VISITED
2484	\|\| !tail_b
2485	\|\| (!(head_b->flags & BB_VISITED) && head_b != tail_b)
2486	\|\| tail_a == tail_b)
2487	{
2488	edges[j] = `0`;
2489	continue;
2490	}
2491
2492	tail_a->aux = `0`;
2493	head_b->aux = `0`;
2494	head_a->aux = tail_b;
2495	tail_b->aux = head_a;
2496	tail_a->flags \|= BB_VISITED;
2497	}
2498
2499	/ Put the pieces together, in the same order that the start blocks of*
2500	the sequences already had. The hot/cold partitioning gives a little
2501	complication: as a first pass only do this for blocks in the same
2502	partition as the start block, and (if there is anything left to do)
2503	in a second pass handle the other partition. /*
2504
2505	basic_block last_tail = (basic_block) ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2506
2507	int current_partition
2508	= BB_PARTITION (last_tail == ENTRY_BLOCK_PTR_FOR_FN (cfun)
2509	? EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), `0`)->dest
2510	: last_tail);
2511	bool need_another_pass = true;
2512
2513	for (int pass = `0`; pass < `2` && need_another_pass; pass++)
2514	{
2515	need_another_pass = false;
2516
2517	FOR_EACH_BB_FN (bb, cfun)
2518	if ((bb->flags & BB_VISITED && bb->aux) \|\| bb->aux == bb)
2519	{
2520	if (BB_PARTITION (bb) != current_partition)
2521	{
2522	need_another_pass = true;
2523	continue;
2524	}
2525
2526	last_tail->aux = bb;
2527	last_tail = (basic_block) bb->aux;
2528	}
2529
2530	current_partition ^= BB_HOT_PARTITION \| BB_COLD_PARTITION;
2531	}
2532
2533	last_tail->aux = `0`;
2534
2535	/ Finally, link all the chosen fallthrough edges. /
2536
2537	for (int j = `0`; j < n; j++)
2538	if (edges[j])
2539	edges[j]->src->aux = edges[j]->dest;
2540
2541	delete[] edges;
2542
2543	/ If the entry edge no longer falls through we have to make a new*
2544	block so it can do so again. /*
2545
2546	edge e = EDGE_SUCC (ENTRY_BLOCK_PTR_FOR_FN (cfun), `0`);
2547	if (e->dest != ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux)
2548	{
2549	force_nonfallthru (e);
2550	e->src->aux = ENTRY_BLOCK_PTR_FOR_FN (cfun)->aux;
2551	}
2552	}
2553
2554	/ Reorder basic blocks. The main entry point to this file. /
2555
2556	static void
2557	reorder_basic_blocks (void)
2558	{
2559	gcc_assert (current_ir_type () == IR_RTL_CFGLAYOUT);
2560
2561	if (n_basic_blocks_for_fn (cfun) <= NUM_FIXED_BLOCKS + `1`)
2562	return;
2563
2564	set_edge_can_fallthru_flag ();
2565	mark_dfs_back_edges ();
2566
2567	switch (flag_reorder_blocks_algorithm)
2568	{
2569	case REORDER_BLOCKS_ALGORITHM_SIMPLE:
2570	reorder_basic_blocks_simple ();
2571	break;
2572
2573	case REORDER_BLOCKS_ALGORITHM_STC:
2574	reorder_basic_blocks_software_trace_cache ();
2575	break;
2576
2577	default:
2578	gcc_unreachable ();
2579	}
2580
2581	relink_block_chain (/stay_in_cfglayout_mode=/true);
2582
2583	if (dump_file)
2584	{
2585	if (dump_flags & TDF_DETAILS)
2586	dump_reg_info (dump_file);
2587	dump_flow_info (dump_file, dump_flags);
2588	}
2589
2590	/ Signal that rtl_verify_flow_info_1 can now verify that there*
2591	is at most one switch between hot/cold sections. /*
2592	crtl->bb_reorder_complete = true;
2593	}
2594
2595	/ Determine which partition the first basic block in the function*
2596	belongs to, then find the first basic block in the current function
2597	that belongs to a different section, and insert a
2598	NOTE_INSN_SWITCH_TEXT_SECTIONS note immediately before it in the
2599	instruction stream. When writing out the assembly code,
2600	encountering this note will make the compiler switch between the
2601	hot and cold text sections. /*
2602
2603	void
2604	insert_section_boundary_note (void)
2605	{
2606	basic_block bb;
2607	bool switched_sections = false;
2608	int current_partition = `0`;
2609
2610	if (!crtl->has_bb_partition)
2611	return;
2612
2613	FOR_EACH_BB_FN (bb, cfun)
2614	{
2615	if (!current_partition)
2616	current_partition = BB_PARTITION (bb);
2617	if (BB_PARTITION (bb) != current_partition)
2618	{
2619	gcc_assert (!switched_sections);
2620	switched_sections = true;
2621	emit_note_before (NOTE_INSN_SWITCH_TEXT_SECTIONS, BB_HEAD (bb));
2622	current_partition = BB_PARTITION (bb);
2623	}
2624	}
2625
2626	/ Make sure crtl->has_bb_partition matches reality even if bbpart finds*
2627	some hot and some cold basic blocks, but later one of those kinds is
2628	optimized away. /*
2629	crtl->has_bb_partition = switched_sections;
2630	}
2631
2632	namespace {
2633
2634	const pass_data pass_data_reorder_blocks =
2635	{
2636	.type: RTL_PASS, / type /
2637	.name: "bbro", / name /
2638	.optinfo_flags: OPTGROUP_NONE, / optinfo_flags /
2639	.tv_id: TV_REORDER_BLOCKS, / tv_id /
2640	.properties_required: `0`, / properties_required /
2641	.properties_provided: `0`, / properties_provided /
2642	.properties_destroyed: `0`, / properties_destroyed /
2643	.todo_flags_start: `0`, / todo_flags_start /
2644	.todo_flags_finish: `0`, / todo_flags_finish /
2645	};
2646
2647	class pass_reorder_blocks : public rtl_opt_pass
2648	{
2649	public:
2650	pass_reorder_blocks (gcc::context *ctxt)
2651	: rtl_opt_pass (pass_data_reorder_blocks, ctxt)
2652	{}
2653
2654	/ opt_pass methods: /
2655	bool gate (function *) final override
2656	{
2657	if (targetm.cannot_modify_jumps_p ())
2658	return false;
2659	return (optimize > `0`
2660	&& (flag_reorder_blocks \|\| flag_reorder_blocks_and_partition));
2661	}
2662
2663	unsigned int execute (function *) final override;
2664
2665	}; // class pass_reorder_blocks
2666
2667	unsigned int
2668	pass_reorder_blocks::execute (function *fun)
2669	{
2670	basic_block bb;
2671
2672	/ Last attempt to optimize CFG, as scheduling, peepholing and insn*
2673	splitting possibly introduced more crossjumping opportunities. /*
2674	cfg_layout_initialize (CLEANUP_EXPENSIVE);
2675
2676	reorder_basic_blocks ();
2677	cleanup_cfg (CLEANUP_EXPENSIVE \| CLEANUP_NO_PARTITIONING);
2678
2679	FOR_EACH_BB_FN (bb, fun)
2680	if (bb->next_bb != EXIT_BLOCK_PTR_FOR_FN (fun))
2681	bb->aux = bb->next_bb;
2682	cfg_layout_finalize ();
2683
2684	FOR_EACH_BB_FN (bb, fun)
2685	df_recompute_luids (bb);
2686	return `0`;
2687	}
2688
2689	} // anon namespace
2690
2691	rtl_opt_pass *
2692	make_pass_reorder_blocks (gcc::context *ctxt)
2693	{
2694	return new pass_reorder_blocks (ctxt);
2695	}
2696
2697	/ Duplicate a block (that we already know ends in a computed jump) into its*
2698	predecessors, where possible. Return whether anything is changed. /*
2699	static bool
2700	maybe_duplicate_computed_goto (basic_block bb, int max_size)
2701	{
2702	/ Make sure that the block is small enough. /
2703	rtx_insn *insn;
2704	FOR_BB_INSNS (bb, insn)
2705	if (INSN_P (insn))
2706	{
2707	max_size -= get_attr_min_length (insn);
2708	if (max_size < `0`)
2709	return false;
2710	}
2711
2712	bool changed = false;
2713	edge e;
2714	edge_iterator ei;
2715	for (ei = ei_start (bb->preds); (e = ei_safe_edge (i: ei)); )
2716	{
2717	basic_block pred = e->src;
2718
2719	/ Do not duplicate BB into PRED if we cannot merge a copy of BB*
2720	with PRED. /*
2721	if (!single_succ_p (bb: pred)
2722	\|\| e->flags & EDGE_COMPLEX
2723	\|\| pred->index < NUM_FIXED_BLOCKS
2724	\|\| (JUMP_P (BB_END (pred)) && !simplejump_p (BB_END (pred)))
2725	\|\| (JUMP_P (BB_END (pred)) && CROSSING_JUMP_P (BB_END (pred))))
2726	{
2727	ei_next (i: &ei);
2728	continue;
2729	}
2730
2731	if (dump_file)
2732	fprintf (stream: dump_file, format: "Duplicating computed goto bb %d into bb %d\n",
2733	bb->index, e->src->index);
2734
2735	/ Remember if PRED can be duplicated; if so, the copy of BB merged*
2736	with PRED can be duplicated as well. /*
2737	bool can_dup_more = can_duplicate_block_p (pred);
2738
2739	/ Make a copy of BB, merge it into PRED. /
2740	basic_block copy = duplicate_block (bb, e, NULL);
2741	emit_barrier_after_bb (bb: copy);
2742	reorder_insns_nobb (BB_HEAD (copy), BB_END (copy), BB_END (pred));
2743	merge_blocks (pred, copy);
2744
2745	changed = true;
2746
2747	/ Try to merge the resulting merged PRED into further predecessors. /
2748	if (can_dup_more)
2749	maybe_duplicate_computed_goto (bb: pred, max_size);
2750	}
2751
2752	return changed;
2753	}
2754
2755	/ Duplicate the blocks containing computed gotos. This basically unfactors*
2756	computed gotos that were factored early on in the compilation process to
2757	speed up edge based data flow. We used to not unfactor them again, which
2758	can seriously pessimize code with many computed jumps in the source code,
2759	such as interpreters. See e.g. PR15242. /*
2760	static void
2761	duplicate_computed_gotos (function *fun)
2762	{
2763	/ We are estimating the length of uncond jump insn only once*
2764	since the code for getting the insn length always returns
2765	the minimal length now. /*
2766	if (uncond_jump_length == `0`)
2767	uncond_jump_length = get_uncond_jump_length ();
2768
2769	/ Never copy a block larger than this. /
2770	int max_size
2771	= uncond_jump_length * param_max_goto_duplication_insns;
2772
2773	bool changed = false;
2774
2775	/ Try to duplicate all blocks that end in a computed jump and that*
2776	can be duplicated at all. /*
2777	basic_block bb;
2778	FOR_EACH_BB_FN (bb, fun)
2779	if (computed_jump_p (BB_END (bb)) && can_duplicate_block_p (bb))
2780	changed \|= maybe_duplicate_computed_goto (bb, max_size);
2781
2782	/ Some blocks may have become unreachable. /
2783	if (changed)
2784	cleanup_cfg (`0`);
2785
2786	/ Duplicating blocks will redirect edges and may cause hot blocks*
2787	previously reached by both hot and cold blocks to become dominated
2788	only by cold blocks. /*
2789	if (changed)
2790	fixup_partitions ();
2791	}
2792
2793	namespace {
2794
2795	const pass_data pass_data_duplicate_computed_gotos =
2796	{
2797	.type: RTL_PASS, / type /
2798	.name: "compgotos", / name /
2799	.optinfo_flags: OPTGROUP_NONE, / optinfo_flags /
2800	.tv_id: TV_REORDER_BLOCKS, / tv_id /
2801	.properties_required: `0`, / properties_required /
2802	.properties_provided: `0`, / properties_provided /
2803	.properties_destroyed: `0`, / properties_destroyed /
2804	.todo_flags_start: `0`, / todo_flags_start /
2805	.todo_flags_finish: `0`, / todo_flags_finish /
2806	};
2807
2808	class pass_duplicate_computed_gotos : public rtl_opt_pass
2809	{
2810	public:
2811	pass_duplicate_computed_gotos (gcc::context *ctxt)
2812	: rtl_opt_pass (pass_data_duplicate_computed_gotos, ctxt)
2813	{}
2814
2815	/ opt_pass methods: /
2816	bool gate (function *) final override;
2817	unsigned int execute (function *) final override;
2818
2819	}; // class pass_duplicate_computed_gotos
2820
2821	bool
2822	pass_duplicate_computed_gotos::gate (function *fun)
2823	{
2824	if (targetm.cannot_modify_jumps_p ())
2825	return false;
2826	return (optimize > `0`
2827	&& flag_expensive_optimizations
2828	&& ! optimize_function_for_size_p (fun));
2829	}
2830
2831	unsigned int
2832	pass_duplicate_computed_gotos::execute (function *fun)
2833	{
2834	duplicate_computed_gotos (fun);
2835
2836	return `0`;
2837	}
2838
2839	} // anon namespace
2840
2841	rtl_opt_pass *
2842	make_pass_duplicate_computed_gotos (gcc::context *ctxt)
2843	{
2844	return new pass_duplicate_computed_gotos (ctxt);
2845	}
2846
2847	/ This function is the main 'entrance' for the optimization that*
2848	partitions hot and cold basic blocks into separate sections of the
2849	.o file (to improve performance and cache locality). Ideally it
2850	would be called after all optimizations that rearrange the CFG have
2851	been called. However part of this optimization may introduce new
2852	register usage, so it must be called before register allocation has
2853	occurred. This means that this optimization is actually called
2854	well before the optimization that reorders basic blocks (see
2855	function above).
2856
2857	This optimization checks the feedback information to determine
2858	which basic blocks are hot/cold, updates flags on the basic blocks
2859	to indicate which section they belong in. This information is
2860	later used for writing out sections in the .o file. Because hot
2861	and cold sections can be arbitrarily large (within the bounds of
2862	memory), far beyond the size of a single function, it is necessary
2863	to fix up all edges that cross section boundaries, to make sure the
2864	instructions used can actually span the required distance. The
2865	fixes are described below.
2866
2867	Fall-through edges must be changed into jumps; it is not safe or
2868	legal to fall through across a section boundary. Whenever a
2869	fall-through edge crossing a section boundary is encountered, a new
2870	basic block is inserted (in the same section as the fall-through
2871	source), and the fall through edge is redirected to the new basic
2872	block. The new basic block contains an unconditional jump to the
2873	original fall-through target. (If the unconditional jump is
2874	insufficient to cross section boundaries, that is dealt with a
2875	little later, see below).
2876
2877	In order to deal with architectures that have short conditional
2878	branches (which cannot span all of memory) we take any conditional
2879	jump that attempts to cross a section boundary and add a level of
2880	indirection: it becomes a conditional jump to a new basic block, in
2881	the same section. The new basic block contains an unconditional
2882	jump to the original target, in the other section.
2883
2884	For those architectures whose unconditional branch is also
2885	incapable of reaching all of memory, those unconditional jumps are
2886	converted into indirect jumps, through a register.
2887
2888	IMPORTANT NOTE: This optimization causes some messy interactions
2889	with the cfg cleanup optimizations; those optimizations want to
2890	merge blocks wherever possible, and to collapse indirect jump
2891	sequences (change "A jumps to B jumps to C" directly into "A jumps
2892	to C"). Those optimizations can undo the jump fixes that
2893	partitioning is required to make (see above), in order to ensure
2894	that jumps attempting to cross section boundaries are really able
2895	to cover whatever distance the jump requires (on many architectures
2896	conditional or unconditional jumps are not able to reach all of
2897	memory). Therefore tests have to be inserted into each such
2898	optimization to make sure that it does not undo stuff necessary to
2899	cross partition boundaries. This would be much less of a problem
2900	if we could perform this optimization later in the compilation, but
2901	unfortunately the fact that we may need to create indirect jumps
2902	(through registers) requires that this optimization be performed
2903	before register allocation.
2904
2905	Hot and cold basic blocks are partitioned and put in separate
2906	sections of the .o file, to reduce paging and improve cache
2907	performance (hopefully). This can result in bits of code from the
2908	same function being widely separated in the .o file. However this
2909	is not obvious to the current bb structure. Therefore we must take
2910	care to ensure that: 1). There are no fall_thru edges that cross
2911	between sections; 2). For those architectures which have "short"
2912	conditional branches, all conditional branches that attempt to
2913	cross between sections are converted to unconditional branches;
2914	and, 3). For those architectures which have "short" unconditional
2915	branches, all unconditional branches that attempt to cross between
2916	sections are converted to indirect jumps.
2917
2918	The code for fixing up fall_thru edges that cross between hot and
2919	cold basic blocks does so by creating new basic blocks containing
2920	unconditional branches to the appropriate label in the "other"
2921	section. The new basic block is then put in the same (hot or cold)
2922	section as the original conditional branch, and the fall_thru edge
2923	is modified to fall into the new basic block instead. By adding
2924	this level of indirection we end up with only unconditional branches
2925	crossing between hot and cold sections.
2926
2927	Conditional branches are dealt with by adding a level of indirection.
2928	A new basic block is added in the same (hot/cold) section as the
2929	conditional branch, and the conditional branch is retargeted to the
2930	new basic block. The new basic block contains an unconditional branch
2931	to the original target of the conditional branch (in the other section).
2932
2933	Unconditional branches are dealt with by converting them into
2934	indirect jumps. /*
2935
2936	namespace {
2937
2938	const pass_data pass_data_partition_blocks =
2939	{
2940	.type: RTL_PASS, / type /
2941	.name: "bbpart", / name /
2942	.optinfo_flags: OPTGROUP_NONE, / optinfo_flags /
2943	.tv_id: TV_REORDER_BLOCKS, / tv_id /
2944	PROP_cfglayout, / properties_required /
2945	.properties_provided: `0`, / properties_provided /
2946	.properties_destroyed: `0`, / properties_destroyed /
2947	.todo_flags_start: `0`, / todo_flags_start /
2948	.todo_flags_finish: `0`, / todo_flags_finish /
2949	};
2950
2951	class pass_partition_blocks : public rtl_opt_pass
2952	{
2953	public:
2954	pass_partition_blocks (gcc::context *ctxt)
2955	: rtl_opt_pass (pass_data_partition_blocks, ctxt)
2956	{}
2957
2958	/ opt_pass methods: /
2959	bool gate (function *) final override;
2960	unsigned int execute (function *) final override;
2961
2962	}; // class pass_partition_blocks
2963
2964	bool
2965	pass_partition_blocks::gate (function *fun)
2966	{
2967	/ The optimization to partition hot/cold basic blocks into separate*
2968	sections of the .o file does not work well with linkonce or with
2969	user defined section attributes or with naked attribute. Don't call
2970	it if either case arises. /*
2971	return (flag_reorder_blocks_and_partition
2972	&& optimize
2973	/ See pass_reorder_blocks::gate. We should not partition if*
2974	we are going to omit the reordering. /*
2975	&& optimize_function_for_speed_p (fun)
2976	&& !DECL_COMDAT_GROUP (current_function_decl)
2977	&& !lookup_attribute (attr_name: "section", DECL_ATTRIBUTES (fun->decl))
2978	&& !lookup_attribute (attr_name: "naked", DECL_ATTRIBUTES (fun->decl))
2979	/ Workaround a bug in GDB where read_partial_die doesn't cope*
2980	with DIEs with DW_AT_ranges, see PR81115. /*
2981	&& !(in_lto_p && MAIN_NAME_P (DECL_NAME (fun->decl))));
2982	}
2983
2984	unsigned
2985	pass_partition_blocks::execute (function *fun)
2986	{
2987	vec<edge> crossing_edges;
2988
2989	if (n_basic_blocks_for_fn (fun) <= NUM_FIXED_BLOCKS + `1`)
2990	return `0`;
2991
2992	df_set_flags (DF_DEFER_INSN_RESCAN);
2993
2994	crossing_edges = find_rarely_executed_basic_blocks_and_crossing_edges ();
2995	if (!crossing_edges.exists ())
2996	/ Make sure to process deferred rescans and clear changeable df flags. /
2997	return TODO_df_finish;
2998
2999	crtl->has_bb_partition = true;
3000
3001	/ Make sure the source of any crossing edge ends in a jump and the*
3002	destination of any crossing edge has a label. /*
3003	add_labels_and_missing_jumps (crossing_edges);
3004
3005	/ Convert all crossing fall_thru edges to non-crossing fall*
3006	thrus to unconditional jumps (that jump to the original fall
3007	through dest). /*
3008	fix_up_fall_thru_edges ();
3009
3010	/ If the architecture does not have conditional branches that can*
3011	span all of memory, convert crossing conditional branches into
3012	crossing unconditional branches. /*
3013	if (!HAS_LONG_COND_BRANCH)
3014	fix_crossing_conditional_branches ();
3015
3016	/ If the architecture does not have unconditional branches that*
3017	can span all of memory, convert crossing unconditional branches
3018	into indirect jumps. Since adding an indirect jump also adds
3019	a new register usage, update the register usage information as
3020	well. /*
3021	if (!HAS_LONG_UNCOND_BRANCH)
3022	fix_crossing_unconditional_branches ();
3023
3024	update_crossing_jump_flags ();
3025
3026	/ Clear bb->aux fields that the above routines were using. /
3027	clear_aux_for_blocks ();
3028
3029	crossing_edges.release ();
3030
3031	/ ??? FIXME: DF generates the bb info for a block immediately.*
3032	And by immediately, I mean during* creation of the block.*
3033
3034	#0 df_bb_refs_collect
3035	#1 in df_bb_refs_record
3036	#2 in create_basic_block_structure
3037
3038	Which means that the bb_has_eh_pred test in df_bb_refs_collect
3039	will always* fail, because no edges can have been added to the*
3040	block yet. Which of course means we don't add the right
3041	artificial refs, which means we fail df_verify (much) later.
3042
3043	Cleanest solution would seem to make DF_DEFER_INSN_RESCAN imply
3044	that we also shouldn't grab data from the new blocks those new
3045	insns are in either. In this way one can create the block, link
3046	it up properly, and have everything Just Work later, when deferred
3047	insns are processed.
3048
3049	In the meantime, we have no other option but to throw away all
3050	of the DF data and recompute it all. /*
3051	if (fun->eh->lp_array)
3052	{
3053	df_finish_pass (true);
3054	df_scan_alloc (NULL);
3055	df_scan_blocks ();
3056	/ Not all post-landing pads use all of the EH_RETURN_DATA_REGNO*
3057	data. We blindly generated all of them when creating the new
3058	landing pad. Delete those assignments we don't use. /*
3059	df_set_flags (DF_LR_RUN_DCE);
3060	df_analyze ();
3061	}
3062
3063	/ Make sure to process deferred rescans and clear changeable df flags. /
3064	return TODO_df_finish;
3065	}
3066
3067	} // anon namespace
3068
3069	rtl_opt_pass *
3070	make_pass_partition_blocks (gcc::context *ctxt)
3071	{
3072	return new pass_partition_blocks (ctxt);
3073	}
3074

source code of gcc/bb-reorder.cc