Building Mobile Apps at Scale

Author: Gergely Orosz

Notes:

Challenges:

Managing state in the mobile apps is important because there are lots of events that can change the state, not just user inputs but also permission changed, background/ foreground activity, limited resources from the OS because it’s shared with other applications, multiple entrypoints to apps (deeplinks)
Mistakes are hard to revert. This is because releasing new version with bug fixes will require approval on App Store. Users also won’t update as soon as the new version is released on the App Store.
- There are few things we can do to minimize bugs or regression:
  - Do thorough testings. If possible, release internal built that’s is used by employee prior releasing new version.
  - Use feature flags
  - Gradual rollout. To test the stability of new version
  - Force upgrade to user
Old app versions will most likely stay around for years. Especially for users with older phone (eg: iPhone 6 with iOS 12). Do versioning if possible and monitor when planning to deprecate or removing support for older version.
Backward compatible when adding deeplinks. Plan how to handle deeplink especially when it cause side effect to the current state.
Notifications are challenging to test. We can test the simulator but the payload should mimic exactly the backend team sent. Users could opt-out for push notifications.
App crashes are most noticeable bugs on mobile apps. App Store/ Testflight already provide crash logs if user opt in to share with the developer. There are several third party crash reportings such as Crashlytics or Bugsnag which provide more rich logs and more diagnostic informations. Sometimes crashes are hard to reproduce, this due the nature of numerous combinations that leads to the crash such as device state, network, os version, app version etc.
- Bugsnag published a metric on median app stability:
  - 99.46% for apps built by 1-10 engineers
  - 99.60% for apps built by 11-50 engineers
  - 99.89% for apps built by 51-100 engineers
  - 99.79% for apps built by 100+ engineers
Handling offline and spotty network should be planned properly including how to sync data to the backend, retries mechanism, etc
Accessibility needs to be supported from the beginning because implementing them from start is low effort rather than adding them later.
Use CI/CD to automate app delivery. Fastlane is the most populer ones. Use homegrown build infrastructure means more control and you can tailor the experience to suit the team however be wary if there’s no dedicated people or team to maintain it because it often breaks.
When using third party library or SDK, consider few things: security, stability, support when there’s a bug. Also consider how big the library will contribute to the app size.
- If possible, create additional layer or interface. So when you change the library, there will be little changes on the codebase.
When you have few screen, navigation seems simple. Once you have more than dozens, you need a robust system to handle navigation.
When shipping localization, decide if it’s coming from the backend or shipped alongside with app binary. The latter, when you want to update or correct a mistake, you need app release. When doing localizations in the backend, we could reduce the logic as well as resources on the client. Not just strings, think about custom currency also, RTL languages, unique locale, etc.
- Snapshot testing can be used as testing tool for localization
When apps become larger it’s no brainer to extract several functionalities or component into own modules. Use architecture that’s easy to test and composable (i.e: The Composable Architecture)
Use automated testings such as: unit test, snapshot test, integration, ui test.
Improve build time. There are some solutions to improve build process than using xcodebuild such as Bazel, Buck, or Gradle.
When adopting new language or frameworks, consider few things:
- the maturity of language and frameworks
- migrations plan
- risks (impact to the app, business?)
Backend driven mobile apps are the next thing.

Highlights:

“State management is the root of most headaches fornative mobile development, similar to modern weband backend development.”

“Examples of the app-level lifecycle transitions are the app pausing and going to the background, then returningto the foreground or being suspended.”

“Reactive programming is a preferred method for dealingwith a large and stateful app, in order to isolate state changes. You keep state as immutable as possible,storing models as immutable objects that emit state changes.”

“Applications sharing the same resources with all otherapps, and the OS killing apps at short notice, are two of the biggest differences between developing for mobile versus developing for other platforms, like backend and the web.”

“App launch points like deeplinks or internal shortcut navigation points within the app also add complexity to state management. With deeplinks, the applicationstate might need to be set up after the deeplink is activated.”

“Both Apple and Google are strict on allowing executablecode to be sent to apps.”

“Users take days to update to the latest version”

“However, pushing bug fixes thatrevert broken functionality should be within both stores’ policies: for example, when using featureflags. At the same time, Apple does allow executing non-native code like JavaScript, which iswhy solutions likeCodepushare gaining popularity. Codepush allows React Native or Cordovaapps to deliver updates on the fly.”

“You can not assume that all users will get this updatedversion, ever”

“Chuck Rossi, part of release engineering at Facebook,summarizes what it is like to release for mobileon a Software Engineering Daily podcast episode: “It was the most terrifying thing to take 10,000 diffs,package it into effectively a bullet, fire that bullet at the horizon and that bullet, once it leavesthe barrel, it’s gone. I cannot get it back, and it flies flat and true with no friction and no gravitytill the heat death of the universe. It’s gone. I can’t fix it.””

“A common approach at many companiesis to release the beta app to company employees and beta users for it to “bake” for a week, collecting feedback on any issues.”

“Have a feature flagging system” “Consider gradual rollouts” “Force upgrading”

“Even a non-breaking backend change can break an olderversion of the app - such as changing the content of a specific response. A few practices you can doto avoid this breakage”

“Build sturdy network response handling and parsing,using dedicated tooling” “Have an open communications channel with the backend team. Have a way to test old app versions.” “Version your backend endpoints” “Proceed with caution when deprecating endpoints on the backend. Monitor the traffic”

“What percentageof users is lagging three or more versions behind? Once you have this data, it is easier to decide how much effort to dedicate towards ensuring the experience works well on older versions.”

“Put client-side monitoring and alerting in place”

“few things that make deeplinking challenging”

“Backward compatibility:ensuring that existing deeplinkskeep working in older versions of the app, even after significant navigation or logic changes.”
“State problems when deeplinking to a running app withexisting state.”
“Lack of upfront planning.”
“Deeplinks are connected to state management and the navigation architecture.”

“You will have to plan well ahead in building a sensible and scalable deeplink implementation.”

“Challenges with push notifications are numerous”

“A similar set of challenges as deeplinks,in terms of implementing what action the notification should trigger. A push notification is a glorified deeplink: a message with an action that links into the app.”

“Push notifications are usually a “nice to have” for many applications, exactly becauseyou cannot guarantee that each user will opt into them, or that their device will be online toreceive them.”

“Push notification delivery is not guaranteed. Especially when sent in bulk, both Apple and Google might throttle push notifications.”

“Testing push notifications is a challenge.”

“On the web, due to its nature — single-threaded execution within a sandbox — crashesare rarer than on mobile apps.”

“The first rule of crashes is you need to track whenthey happen and have sufficient debug information. Once you track crashes, you want to report on what percentage of sessions end up crashing and reduce this number as much as you can. At Uber, we tracked the crash rates from the early days, working continuously to reduce the rate of crashed sessions.”

“However, an in-house solution wasbuilt later. A shortcoming of many third-party crash reporting solutions is how they only collect health informationon crashes and non-fatal errors, but not on app-not-responding(ANR) and memory problems.”

“Reproducibility and debuggability of crashes are other pain points that impact mobile more than backend or web teams. Especially in the Android world, usershave a variety of devices that run a wide range ofOS versions with a variety of app versions.”

“You need to compare the cost of investigation andfixing, compared to the upside of the fix, and the opportunity cost of an engineer spending time on something else, like building revenue-generating functionality.”

“Bugsnag have published metrics on what median appstability scoreslook like:

99.46% for apps built by 1-10 engineers
99.60% for apps built by 11-50 engineers
99.89% for apps built by 51-100 engineers
99.79% for apps built by 100+ engineers”

“Though offline support is becoming more of a featurewith rich web applications, it has always been acore use case with native mobile apps. People expect appsto stay usable, even when connectivity drops. They certainly expect the state not to get lost when thesignal drops or weakens.”

“Accessibility is a big deal for popular applications,a few reasons: 1. If you have a large number of users, many of themwill have various accessibility needs, finding it difficult or impossible to interact with your appwithout adequate support. 2. If the app is not accessible, there is an inherentlegal risk for the app’s publisher; several accessibility lawsuits targeting native mobile apps arealreadyhappeningin the US.”

“Ensuring the app is workable for sightedpeople over VoiceOver (iOS) / TalkBack (Android)and making sure colors/key elements contrast enough, aretypical baseline expectations.”

“Accessibility goes deeper than ensuring sighted peoplecan use the app. Allowing accessibility preferences to work with the app, such as supporting the user’sfont size of choice — viaDynamic Type supportoniOS andusing scale-independent pixelsas measurementon Android — are both practices you should follow.”

“Implementing accessibility from the start is a surprisinglylow effort task on iOS and a sensible one for Android. Both platforms have thought deeply about accessibility needs and make it relatively painless to add accessibility features.”

“Testing accessibility is something that needs planning.”

“Automate the parts of accessibility checks”

“On iOS, you can also haveVoiceOver content displayed as text and potentially automate these checks as well.”

“Manually test accessibility features.”
“Recruit accessibility users in your beta program to get feedback directly from them.”
“Turn on accessibility features during development, where it is sensible to do so.”

“Be wary of maintaining your homegrown CI system ifyou do not have dedicated people bandwidth to support this.”

“At Uber, the following tests needed to pass for each release phase:”

“UI testsexecuting without failure.”
“Manual sanity tests executed by either teams owning these tests, or QA teams owning the process.”
“Localization completed for all strings.”
“Crash reports. No regressions discovered during the beta testing process.”
“Memory usage. No regressions as per the memory profiling.”
“Business metrics.: No regressions during rollout (“E2EFunnel” metric).”
“Stability and reliability is another issue with third-party libraries.”

“This is why it is good practice to have a featureflag specifically for your third-party libraries,so that you can encapsulate the loading of those libraries and allthe execution points, and easily disable any libraryvia your backend, if there is a breaking change by the teamsdeveloping those SDKs.”

“Making third-party library updates reversible is difficult, and sometimes impossible to do.”

“There are many other risks that come with using third-party libraries”

“App size”
“Tooling upgrade risks”
“Risk of no maintenance”
“Third-party responsiveness”

“Deciding how and when to stop supporting old OS versions is a process your mobile team should put in, early on. The cost of supporting old iOS and Android versions is high and the payoff can be low.”

“When revenue or profit from the old versionis less than the cost to maintain, the pragmaticsolution is to drop support for old OSes.”

“On iOS, thanksto morerapid OS adoption, many businesses drop support for versionsbeyond the last two or three, soon after a new OS release.”

“Apple has a comprehensive overview oftest scenariosyou might want to test forand from which to take”

“To localize your app and define the strings,you want to localize and ship the localized stringsas a separate resource in the binary.”

“If you implement serving localized strings from a vendorsolution, or your backend, you have more flexibilityin this regard and you only need to do one localization passfor both iOS and Android.”

“The more localization the backend does, the better.”

“Backend-heavy localization keeps client-side logic low and reduces the number of resources for localization on mobile.”

“When supporting a large number of locales, you need to ensure that all localization translations are complete before shipping to the app store.”

“Ensuring iOS and Android use the same language andlocalization”

“A consistent language is not only beneficialfor the brand, it also helps customer support handle issues reported by users.”

“Using the same localization IDs orkeys is a good way to reduce duplication and thisneeds to be done through the iOS and Android teams agreeingon conventions.”

“Currencies formatted differently on iOS and Androidon certain locales is another pain point that multi-language, multi-platform apps displaying monetary values encounter.”

“Different countries and regions will display dates and times in various ways.”

“In an idealworld, a native speaker would go through every flowof your app after each localization change. In reality, thisrarely happens because it is too expensive to do.”

“Thousands of Uber employees dogfooded the app every week, alongsidebeta testers. Localization issues almost always got caught before the app got pushed to the app store.”

“Snapshot testing is an underrated testing tool forlocalization. With snapshot tests, you can quicklyand easily generate snapshots for screens in any locale, or evenwith pseudo-localization.”

“On top of helping engineers, you can share the snapshot test screenshots with peopledoing the translation, so they get additional context on how the translated text will appear.”

“Phrases that should not be localizedare one final edge case. At Uber, we decided not to localize certain brand terms like Uber Cash or Uber Wallet.”

“Localization vendors includePOEditor, Loco, Transifex, Crowdin, Phrase, Lokalise, OneSky, Wordbee, Text United, and several others.”

“apps become large, it often makes sense to buildparts of the application as reusable components or modules.”

“At Uber, we built, used, and open-sourced Needle, based on similar concepts. We would have hadtrouble scaling the code with over a hundred engineers working on the same codebase without dependencyinjection.”

“The different types of automated test”

“Unit test” “For iOS and Android, this would usually mean testing thebehavior of a method on a class, or a specific behavior of a class.”

“Integration tests” “They test the behavior of multiple “units” interacting. These tests are more complex and takelonger to run than unit tests.”

“Snapshot test” “It is a cheap and fast way to ensure code changes do not resultin unexpected UI changes.”

“UI test”

“Integration testing is more complex than unit tests.You test how two or more classes, modules, or otherunits work together. The most common case for integrationtests is ensuring that library integrations workas expected.”

“Bazel is the tool thatis becoming the most popular among companies building mobile at scale.”

“You will want to invest effort to evaluate a new languageor framework that is in line with what the work adoption would mean, and the risk it carries.”

“Off-the-shelf experimentation and feature flag systems are plenty, and small to middle-sized companies and teams typically choose one of them.”

“Common real-world performance bottlenecks with large apps”

“App startup time bloating”
“Too many parallel networking calls”
“Networking performance”
“Battery consumption rate”

“Application not responding (ANR) occurs when the UI thread is blocked for too long in an application.”

“Frozen frames and slow rendering framesindicate the app is being slow, and seemingly unresponsive.”

“Automating profiling of apps would in theory be a good step towards measuring performance characteristics, and spotting performance regressions. Unfortunately, automating the performance measurement process is complicated.”

“Sampling real-world app performance measurements is a far more reliable and scalable solution to keeping the performance characteristics of your app in-check.”

“A betterapproach is to enable such sampling for beta users,and optionally measure a small sample size of productionusers.”

“Performance characteristics worth measuring”

“App startup time”
“Latency of loading screens”
“Networking performance”
“Memory consumption”
“Local store size”
“UI performance”

“Automate measuring performance characteristics”

“Today, the market is saturated with APM solutionsthat aim to make Mobile production monitoring easier”

“mobile companies are beginningto realize is that production monitoring is provingto be far less useful on mobile than it has been on thebackend”

“When your backend APM identifies a problem”

“So what is the fix? Be proactive instead of reactive.”

“you can simply roll back”

“On mobile, there is no such rollback mechanism”

“remember to avoid relying on APMsto catch specific performance problems becauseyou will not be able to react quickly enough”

“Data verification on top of certified metrics is wherePinterest has seen the biggest success”

“Anomaly detection”

“Deep diving into the data and comparing it with otherdata points”

“Validating metrics logged correctly”

“Postmortems are a tool to capture details of the incident,such as its context, timeline, business impact, root cause, and corrective actions.”

“A common way of dealing with lint fatigue is to makelinting errors break the build, leaving no choicebut to fix them. It is a bit annoying, but effective.”

“Reasons to implement forced upgrades are plentiful”

“Retiring backend API versions”
“Cost of testing”
“Customer support cost.”
“Cost of backwards compatibility”
“Severe bugs that need fixing”
“Vulnerabilities”

“Snapchat, Facebook Messenger and Whatsappall have a rolling window in place to not allow users to be using a version too far behind”

“ITV Hub (one of the top UK apps) has both a “soft”killswitch (asking the user to update) and a “hard” one, where it is mandatory to update.”

“Binary size is important, as it impacts downloads.The smaller the binary size, the more likely userswill download the app.”

Linked Notes:

Book notes

Collections of highlights and notes from my reading list