Introduction to Federated Learning

Federated Learning (FL) is a Machine Learning (ML) paradigm introduced by Google in 2016, in which many clients (e.g., mobile devices or multiple organizations) collaboratively train a model under the orchestration of a central server (e.g., a service provider), while maintaining decentralized training data at all times. It embodies the principles of focused collection, data minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized ML.

Insurance: Collaboration without compromise - IFoA Data Science Research Section

Although privacy-preserving data analytics has been studied for more than 50 years, it is only in the last decade that large-scale solutions have been deployed. For example, cross-device FL and federated data analytics are now being applied in consumer digital products. Google uses federated learning extensively in the Gboard mobile keyboard, as well as in Pixel phone features and Android Messages. While Google has been a pioneer in applying FL across devices, interest in this environment is now much broader, and other companies are already applying it: Apple has been using cross-device FL since iOS 13, for apps such as the QuickType keyboard and the “Hey Siri” voice classifier; doc.ai is developing cross-device FL solutions for medical research; and Snips has explored cross-device FL for keyword detection.

Although the term FL was initially introduced with an emphasis on mobile and edge device applications, there has been much increased interest in applying FL to other applications, including some that might involve only a small number of relatively reliable clients, e.g., multiple organizations collaborating to train a model. These two FL scenarios are referred to as “cross-device” and “cross-silo” respectively. There are multiple areas where FL has been considered for application in cross-silo scenarios. Examples include financial risk prediction for reinsurance, pharmaceutical product discovery, electronic health record mining, medical data segmentation, smart manufacturing or other applications directly related to Smart Cities transportation systems (improved vehicle-to-vehicle communications, electric vehicles and autonomous vehicles).

A Practical Overview of Federated Learning

There is also a division of the LF according to the existing separation in the data to be worked with. The most common case, applicable to mobile devices, is that of horizontal (or homogeneous) FL. In this case, it refers to horizontal data, i.e. data containing the same features for each individual. On the opposite side is the vertical (or heterogeneous) FL in which two datasets contain the same samples, but with different properties. For example, the bank and the supermarket may contain different data for the same customer. The latter type of FL is more complex, but it is also applicable and particularly useful for Smart Cities scenarios where citizens are users of multiple services.

Leave a Comment

Your email address will not be published. Required fields are marked *