public class RollbackTracker
extends java.lang.Object
RollbackTracker is used to detect rollback periods in the log that are the
result of HA replica syncups. These rollback periods affect how LNs should
be processed at recovery. Rollbacks differ from aborts in that a rollback
returns a LN to its previous version, whether intra or inter-txnal, while an
abort always returns an LN to its pre-txn version.
What is a Rollback Period?
--------------------------
The rollback represents the logical truncation of the log. Any transactional
LNs in that rollback period should be undone, even if they are ultimately
part of a committed transaction. See the wiki page on Syncup Recovery for
the full design. See com.sleepycat.je.rep.impl.node.Replay.rollback for the
steps taken at the time of the rollback.
A RollbackStart record is logged at the start of any rollback, and a
RollbackEnd is logged at the completion of a rollback. RollbackStarts refer
to a matchpoint and the area between the matchpoint and the RollbackStart is
the rollback period.The RollbackTracker peruses RollbackStarts and Ends and
generates a map of the rollback periods.
RollbackStarts and their starting Matchpoints can be nested or can be
distinct, but several invariants are in place and can be enforced. For
example:
LSN
---
100 txnA commit
200 txnB abort
250 LN for txnC
300 txnC abort
..
400 RollbackStart A (starting matchpoint = 200)
500 RollbackEnd A
...
600 RollbackStart B (starting matchpoint = 200)
700 RollbackStart C (starting matchpoint = 100)
800 RollbackEnd C
900 txnD abort
1000 RollbackStart D (starting matchpoint = 900)
This log creates four rollback periods
1) LSN 100 -> 700 (defined by RollbackStart C). This has two rollback
periods nested within.
2) LSN 200 -> 400, (defined by RollbackStart A) nested within B
3) LSN 200 -> 600, (defined by RollbackStart B) nested within C
4) LSN 1000 -> 900 (defined by RolbackStart D)
- There can be no commits or aborts within a rollback period, because we
shouldn't have executed a soft recovery that undid a commit or abort. in
the rollback period.
- There can be no LN_TXs between a RollbackStart and its matching
RollbackEnd (should be no LN write operations happening during the syncup.)
However, there might be INs written by a checkpoint, and eviction.
- The recovery period should never see a RollbackEnd without its matching
RollbackStart record, though it is possible to see a RollbackStart that has
no RollbackEnd.
- There can never be any overlapping, or intersection of periods, because a
rollback period is supposed to be like a truncation of the log. Since that
log is "gone", a subsequent rollback shouldn't find a matchpoint inside
another rollback period.
- A child period must be wholly contained between the parent's matchpoint
and RollbackStart. This is simply due to the way rollbacks occur. A parent
rollback has a Matchpoint <= the child's Matchpoint or it wouldn't be
nested. The parent's RollbackStart > the child's RollbackEnd, since the
parent occurs after the child in time.
The Rollback tracker keeps a list of all the rollback periods. Some are
distinct, some are nested.
Recovery processing and rollback periods
----------------------------------------
The actions taken at a rollback may not have been made persistent to the
log, so at recovery, we literally mimic and replay these two steps: (a) make
sure invisible log entries have their invisible bit on and (b) make sure all
INs reflect the correct LNs. All use of the rollback periods and tracker
take place on the backwards scans. The RollbackStart and End entries are
read during the first recovery undo pass When a rollback period is found, a
transaction chain is constructed for each transaction that was active in the
period, to support a repeat of the actions taken originally.
The first undo pass, for the mapping tree, has to construct a map of
recovery periods. Since the mapping tree only has MapLNs, and we never write
any txnal MapLNs, that first pass does not encounter any txnal LNs. The
next two undo passes consult the rollback period map to determine if an LN
needs to be rolledback, or just treated like other LNs.
Rollback periods that precede the checkpoint start can be ignored, because
we can be assured that all the INs and LNs modified by that rollback were
made persistent by the checkpoint. Ignoring such periods is required, and
is not just an optimization, because it guarantees that we will not need to
create a transaction chain that needs to traverse the log beyond the first
active lsn. A rollback period precedes the checkpoint if its RollbackEnd is
before the checkpoint start.
When a rollback period overlaps CkptStart and we recover, we are guaranteed
that the undo passes will process all LNs in the rollback period, because
they are >= to the firstActiveLEnd of the checkpoint.
The lastActiveLSN for the checkpoint will be <= the LSN of the first LN of
any transaction that is being rolled back at the time of CkptStart, since
these transactions were still active at that time.
No file containing a transaction rolled back in the recovery interval, or a
file containing the abortLSN of such a transaction, will be deleted by the
cleaner. An active transaction prevents cleaning of its first logged entry
and beyond. The LN of the abortLSN will be locked, which prevents it from
being cleaned.
All the work lies on the undo side. Recovery redo only needs to ignore
invisible log entries, because we know that the undo pass applied the
invisible bit where needed. Note that the undo pass must be sure to write
the invisible bits after the pass, before redo attempts to read the log.
Each rollback LN_TX belongs to a single rollback period. When periods are
nested, the LN_TX belongs to the closest rollback period that encompasses
it.
Using the example above,
a LN at lsn 350 belongs to rollback period A
a LN at lsn 550 belongs to rollback period B
a LN at lsn 650 belongs to rollback period C
It uses its rollback period's txn chain to find its previous version.