Cross-silo federated learning supports more flexibility in certain aspects of the overall design, but at the same time presents an environment in which achieving other properties may be more difficult.
The cross-silo setup may be relevant when several companies or organizations share incentives to train a model based on all their data, but cannot share their data directly. This could be due to restrictions imposed by confidentiality or due to legal constraints, or even within a single company when they cannot centralize their data between different geographic regions.
Data partitioning
In the cross-device configuration, it is assumed that data will be partitioned by samples. In the cross-silo environment, in addition to partitioning by examples, partitioning by features is of practical relevance.
An example might be when two companies in different businesses have the same set of customers or overlap, such as a local bank and a local retail company in the same city.
Cross-silo FL with data partitioned by features employs a very different training architecture compared to the configuration with data partitioned by example. It may or may not involve a central server as a neutral party, and depending on the specifications of the training algorithm, clients exchange specific intermediate results rather than model parameters, to assist the gradient calculations of other parties.
In this context, techniques such as multi-party computation or homomorphic encryption have been proposed to limit the amount of information that other participants can infer by observing the training process. The disadvantage of this approach is that the training algorithm generally depends on the type of machine learning objective being pursued.
Federated transfer learning is another concept that considers challenging scenarios in which data parts share only a partial overlap in user space or feature space, and leverage existing transfer learning techniques to collaboratively build models.
Partitioning by examples is often relevant in cross-silo FL when a single company cannot centralize its data due to legal constraints, or when organizations with similar goals want to collaboratively improve their models. For example, different banks can collaboratively train classification or anomaly detection models for fraud detection, hospitals can build better diagnostic models, etc.
An open source platform that supports the applications described above is currently available as Federated AI Technology Enabler (FATE). Other platforms focused on a variety of medical applications, such as NVIDIA Clara or for enterprise use cases such as IBM.
Incentive mechanisms
The design of incentive mechanisms for honest participation is an important practical research question, as it is particularly relevant in both cross-device and cross-silo settings, where participants may be at the same time commercial competitors.
The incentive can be in the form of monetary payment or final models with different levels of performance.
The option of delivering models with performance commensurate with the contributions of each client is especially relevant in collaborative learning situations where there are competitions among FL participants. Otherwise, clients may worry that contributing their data to train federated learning models will benefit their competitors, who do not contribute as much but receive the same final model anyway (free-rider problem).
Related objectives include how to divide the gains generated by the federated learning model among contributing data owners to maintain long-term participation, and also how to link incentives with decisions about defending against adversarial data owners to improve system security, optimizing data owner participation to improve system efficiency.
Differential privacy
The discussion of actors and threat models is largely relevant, however, protection against different actors may have different priorities.
For example, in many practical scenarios, the final trained model would be delivered only to those who participate in the training, making concerns about “the rest of the world” less important.
In cases where customers are not considered a significant threat, each customer might control the data of several of their respective users, and a formal privacy assurance at the user level might be necessary.
Tensor factorization
Tensor factorization where multiple sites (each with a horizontally partitioned dataset with the same feature) jointly perform tensor factorization sharing only intermediate factors with the coordinating server while keeping the data private at each site.
Among the existing works, there are studies based on alternating direction method of multipliers (ADMM), as well as methods that improve efficiency with the elastic averaging SGD (EASGD) algorithm and also ensure differential privacy for the intermediate factors.