Automatic Reference Counting
More actions
Automatic Reference Counting (ARC) is a feature in the Objective-C language
to do automatic memory management of Objective-C objects and blocks.
The compiler automatically adds calls to retain
and release
,
similar to what the programmer would have written manually without ARC.
Unlike a garbage collector, it doesn't change the runtime model.
It was announced at WWDC 2011, and fully supported in OS X 10.7 (Lion) and iOS 5. By using a compatibility library called "ARCLite", apps can also use it when targeting OS X 10.6 (Snow Leopard) and iOS 4, though without zeroing weak references.
For general explanation of what ARC is, the WWDC 2011 talk is probably the best resource. This article (for now) will focus on the internal changes. This is especially useful when understanding code from a decompiler.
Runtime functions
With the introduction of ARC, several new Objective-C runtime functions were introduced.
For example, the usual explanation of ARC is that
it adds [obj retain]
and [obj release]
calls for you,
but internally, it's actually calling new functions
objc_retain(obj)
and objc_release(obj)
.
The rationale for using these new functions is that
a C function call requires smaller code than an Objective-C message send,
and more importantly, that the clang optimizer can more easily identify and manipulate
retains and releases in the generated code.
The full list of runtime functions is in the clang documentation.
Autoreleased return values
This section documents how things work in OS X 10.7 (Lion) and iOS 5. There were important changes in iOS 9 and again in iOS 16. |
There's a special optimization for autoreleased return values. Some methods such as getters are supposed to return objects with a net +0 reference count, which is done by autoreleasing the object before returning it. The caller of those methods often retains the return value immediately after receiving it.
When using manual reference counting, this is what a developer would write manually:
-(NSString*)name {
NSString* retval = [[NSString alloc] initWithFormat: ...];
return [retval autorelease];
}
-(void)caller {
self->thingName = [[self->thing name] retain];
}
Under ARC, the source code would not have any memory management calls
and would simply have return retval;
.
The compiler will add the needed retain and release calls automatically.
However, it's wasteful to autorelease the object (conceptually decrementing the reference count) just to have the caller immediately retain it again. In addition, putting an object into the autorelease pool has its own overhead costs, and means it's retained for longer than it otherwise would.
Instead of using objc_autorelease and objc_retain, the compiler generates these runtime calls:
-(NSString*)name {
NSString* retval = [[NSString alloc] initWithFormat: ...];
return objc_autoreleaseReturnValue(retval);
}
-(void)caller {
self->thingName = objc_retainAutoreleasedReturnValue([self->thing name]);
}
(Note that objc_retainAutoreleaseReturnValue
and objc_retainAutoreleasedReturnValue
are very different functions!)
At runtime, these two functions basically detect if the other is present, and if so they skip the reference count operations. If an optimized callee detects that the caller is also optimized, it doesn't autorelease the object. If an optimized caller detects the callee is also optimized, it doesn't retain the object again. This effectively "hands off ownership" of the retain count from callee to caller, without needing extra retain/release calls.
- In the caller, the compiler emits the call to
objc_retainAutoreleasedReturnValue
in a specific way that the callee can recognize (architecture-specific, see below). - At runtime,
objc_autoreleaseReturnValue
looks at the caller's instructions as data, by reading via the return address. Since the call toautoreleaseReturnValue
is (usually) a tail call, this reads instructions from the actual caller method, not the callee. - If the instructions show that the caller is also "optimized", then
autoreleaseReturnValue
stores the object pointer in a dedicated thread-local storage (TLS) slot, and doesn't perform the autorelease. Otherwise, it callsobjc_autorelease
as usual. - In the caller,
objc_retainAutoreleasedReturnValue
checks if the object pointer matches the TLS slot. If so, it means the callee did the optimization, soretainAutoreleasedReturnValue
can skip the retain call because the object is already at +1 refcount. It will clear the TLS slot for the next call and return. If the TLS slot doesn't match, it means the callee is not optimized, soretainAutoreleasedReturnValue
retains the object as usual.
This skips both the autorelease on the callee and the retain on the caller.
Note that this mechanism ensures that if either the caller or the callee is not optimized
(eg. because they aren't using ARC),
the other side will do the normal autorelease or retain operation
to preserve correctness.
This also applies if the compiler doesn't tail-call
objc_autoreleaseReturnValue
in the callee for whatever reason.
objc_retainAutoreleaseReturnValue
When the object being returned does not already have a +1 reference count,
it has to be retained first.
For example when returning an ivar, return self->name
turns into return [[self->name retain] autorelease]
.
Normally this would be compiled into objc_autoreleaseReturnValue(objc_retain(self->name))
,
but instead the compiler emits a call to another runtime function objc_retainAutoreleaseReturnValue(self->name)
(again, this is different from "retainAutoreleasedReturnValue").
Before iOS 9 and OSX 10.11,
this function is simply a wrapper for objc_autoreleaseReturnValue(objc_retain(obj))
.
The reason for having it as a combined function is just as a minor code size optimization.
(in iOS 9+ / OS X 10.11+, this is more complex; to be documented later)
How the optimization is recognized
The way the callee recognizes if the caller is optimized depends on the CPU architecture:
x86_64
On x86_64, the compiler ensures that the caller method calls objc_retainAutoreleasedReturnValue
immediately after calling the callee,
with no other instructions in between.
This means the caller (in Intel syntax) looks like:
call objc_msgSend ; call [self->thing name], return value in rax
mov rdi, rax ; use it as the first argument of the next call
call objc_retainAutoreleasedReturnValue
In the callee, objc_autoreleaseReturnValue
reads from the return address
to detect the pattern of mov rdi, rax
followed by a call or jump
to objc_retainAutoreleasedReturnValue
.
Code comments in the objc4 runtime say the same approach is used by 32-bit x86, but this isn't true. The optimization is disabled altogether on x86 (both OS X and iOS Simulator).
armv7
In the caller, the compiler ensures the method call
is immediately followed by the marker instruction mov r7, r7
:
bl objc_msgSend ; call [self->thing name]
mov r7, r7
bl objc_retainAutoreleasedReturnValue
In the callee, objc_autoreleaseReturnValue
reads from the return address
and checks for the mov r7, r7
instruction.
Both ARMv7 and Thumb instruction encodings are checked.
arm64
In the caller, the compiler ensures the method call
is immediately followed by the marker instruction mov x29, x29
:
bl objc_msgSend ; call [self->thing name]
mov x29, x29
bl objc_retainAutoreleasedReturnValue
In the callee, objc_autoreleaseReturnValue
reads from the return address
and checks for the mov x29, x29
instruction.
Resources
- "Introducing Automatic Reference Counting", session 323, WWDC 2011
- "Objective-C Advancements In-Depth", session 322, WWDC 2011
- "Adopting Automatic Reference Counting", session 406, WWDC 2012; largely a rehash of the 2011 introduction talk.
- "Improve app size and runtime performance", WWDC 2022, which explains the new version of autorelease elision in iOS 16.
- "Transitioning to ARC" release notes
- "Objective-C Automatic Reference Counting (ARC)", Clang documentation