Limitations of NSPersistentCloudKitContainer when importing many records

ยท

5 min read

Introduction

The first version of Daily, a time tracking app for Mac, was released in 2013. It used Core Data to store data. Later, when NSPersistentCloudKitContainer was introduced, I added support for iCloud for synchronization and backup purposes. Currently, I'm working on an iOS version that allows existing users to view and edit timesheets while not behind their Mac. Hence, synchronization is crucial. Unfortunately, I experience several limitations, which are described in this article. I'm writing this hoping to get the community's help and make people aware of these limitations when picking a cloud technology.

Data model & collaboration mechanism

Before diving into the limitations, let's first look at the data model, which is relatively simple. Below a simplified version can be seen. It contains an Activity, which might belong to a Group. It also contains a Contribution, which always has a relation to an Activity.

Contribution.png

The timesheet (showing what a user has been doing) for a date is calculated by summing the duration of all records of Contribution with date matching the date. You might wonder why Contribution isn't just Day that has a single record per activity per date with no need to perform a sum operation. This is because cloud-backed data stores need to deal with "conflicts" as multiple (offline) clients might write to the same record without each of them (immediately) knowing. CloudKit uses the last-writer wins strategy to deal with this situation. In practice, for Daily, this could mean that clients constantly overwrite duration, resulting in inaccurate timesheets.

To solve this, Daily uses a collaboration mechanism allowing (offline) clients to contribute changes. There's a great WWDC session (from 26:40) covering this mechanism. For Daily, a client contributes a change (to a timesheet) by adding a Contribution record including the date the change is for, the duration of the Contribution and a reference to an Activity.

Limitations

Initial import may take long

You can understand that the collaboration mechanism with users using Daily daily (๐Ÿ˜œ) for almost ten years resulted in many records. As an example, I have around 5.000 Contribution records. While this shouldn't be a problem for databases (and it isn't for Core Data), NSPersistentCloudKitContainer struggles initially importing those records after a fresh app installation. I have users reporting that it takes multiple hours for all data to appear.

Importing my (~5.000) records takes up to an hour. Throughout this process, when attached to a debugger, the Core Data framework spits out tons of messages like these and many others:

CoreData: debug: CoreData+CloudKit: -[PFCloudKitSerializer applyUpdatedRecords:deletedRecordIDs:toStore:inManagedObjectContext:onlyUpdatingAttributes:andRelationships:madeChanges:error:]_block_invoke(1018): Failed to find matching objectIDs for <[object] in pending relationship: [object]>

What I understand from Apple (after requesting code-level support), is that this is expected behavior. The framework imported a record with a relationship to another record that hasn't been imported yet.

You cannot control the import order

Daily's users are likely most interested in the most recent data. Also, seeing today's data within seconds or minutes would make the app feel more usable faster. Right now, it feels like old(er) data gets imported first, which has less value. Unfortunately, controlling what data gets imported first cannot be controlled.

Also, related to the previous limitation, in Daily's case, it would make sense to load all Group and Activity records first so that relationships of all following Contribution records would resolve immediately. I'm unsure what effect this would have on the time needed to import records, though.

Importing only happens in the foreground

Another major limitation is that importing happens exclusively when the app runs in the foreground. When the app is brought to the background, the import process continues for a few seconds but gets suspended quickly. In a way, this is understandable as importing seems to use a lot of energy. However, it also results in a bad user experience as I need to tell users to keep the app open until all data has appeared. And since this can take up to a few hours, few users will likely wait for this.

It would be helpful if the operating system would continue the import process in the background unless Low Power Mode is enabled. Especially when the iPhone or iPad is connected to a power source, the import process should continue.

Apple engineers have suggested experimenting with Background Tasks to grant the app some extra time to continue the import process. Still, I believe this implicitly should be handled by the framework, especially as the import process can take an unpredictable amount of time, which Background Tasks isn't guaranteeing to provide. Hence, it's a workaround that isn't optimal according to Apple engineers.

As a side-note, interestingly, Apple's own NSPersistentCloudKitContainer implementation is causing Daily to be terminated:

[BackgroundTask] Background Task ("CoreData: CloudKit Import"), was created over 30 seconds ago. In applications running in the background, this creates a risk of termination. Remember to call UIApplication.endBackgroundTask(_:) for your task in a timely manner to avoid this.
[error] error: Unexpected background task assertion cancellation.
[BackgroundTask] Background task still not ended after expiration handlers were called: <_UIBackgroundTaskInfo: 0x282204880>: taskID = 21, taskName = CoreData: CloudKit Import, creationTime = 453429 (elapsed = 37). This app will likely be terminated by the system. Call UIApplication.endBackgroundTask(_:) to avoid this.
[error] error: Unexpected background task assertion cancellation.
[BackgroundTask] Background task still not ended after expiration handlers were called: <_UIBackgroundTaskInfo: 0x282204880>: taskID = 21, taskName = CoreData: CloudKit Import, creationTime = 453429 (elapsed = 64). This app will likely be terminated by the system. Call UIApplication.endBackgroundTask(_:) to avoid this.

Conclusion

NSPersistentCloudKitContainer is a promising technology that brings many benefits. However, as with many other technologies from Apple, its closedness and inflexibility are sometimes challenging for a developer, especially when dealing with many records. There's room for improvement, and I was hoping to see those improvements during last month's WWDC.