Class BroadcastableClusterInfoGroup
- java.lang.Object
-
- org.apache.cassandra.spark.bulkwriter.BroadcastableClusterInfoGroup
-
- All Implemented Interfaces:
java.io.Serializable,MultiClusterSupport<IBroadcastableClusterInfo>,IBroadcastableClusterInfo
public final class BroadcastableClusterInfoGroup extends java.lang.Object implements IBroadcastableClusterInfo, MultiClusterSupport<IBroadcastableClusterInfo>
Broadcastable wrapper for coordinated writes with ZERO transient fields to optimize Spark broadcasting.This class wraps multiple BroadcastableCluster instances for multi-cluster scenarios. Pre-computed values (partitioner, lowestCassandraVersion) are extracted from CassandraClusterInfoGroup on the driver to avoid duplicating aggregation/validation logic on executors.
Why ZERO transient fields matters:
Spark'sSizeEstimatoruses reflection to estimate object sizes before broadcasting. Each transient field forces SizeEstimator to inspect the field's type hierarchy, which is expensive. Logger references are particularly costly due to their deep object graphs (appenders, layouts, contexts). By eliminating ALL transient fields and Logger references, we:- Minimize SizeEstimator reflection overhead during broadcast preparation
- Reduce broadcast variable serialization size
- Avoid accidental serialization of non-serializable objects
- See Also:
- Serialized Form
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringclusterId()ID string that can uniquely identify a cluster.voidforEach(java.util.function.BiConsumer<java.lang.String,IBroadcastableClusterInfo> action)Iterate through all valuesstatic BroadcastableClusterInfoGroupfrom(CassandraClusterInfoGroup source, BulkSparkConf conf)Creates a BroadcastableClusterInfoGroup from a source ClusterInfo group.BulkSparkConfgetConf()java.lang.StringgetLowestCassandraVersion()org.apache.cassandra.spark.data.partitioner.PartitionergetPartitioner()IBroadcastableClusterInfogetValueOrNull(java.lang.String clusterId)Look up a value based on clusterIdClusterInforeconstruct()Reconstructs a full ClusterInfo instance from this broadcastable data on executors.intsize()-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.cassandra.spark.bulkwriter.cloudstorage.coordinated.MultiClusterSupport
getValueOrThrow
-
-
-
-
Method Detail
-
from
public static BroadcastableClusterInfoGroup from(@NotNull CassandraClusterInfoGroup source, @NotNull BulkSparkConf conf)
Creates a BroadcastableClusterInfoGroup from a source ClusterInfo group. Extracts pre-computed values (partitioner, lowestCassandraVersion) from the source to avoid duplicating aggregation/validation logic on executors.- Parameters:
source- the source CassandraClusterInfoGroupconf- the BulkSparkConf needed to connect to Sidecar on executors
-
getConf
@NotNull public BulkSparkConf getConf()
- Specified by:
getConfin interfaceIBroadcastableClusterInfo- Returns:
- the BulkSparkConf configuration needed to reconstruct ClusterInfo on executors
-
getLowestCassandraVersion
public java.lang.String getLowestCassandraVersion()
- Specified by:
getLowestCassandraVersionin interfaceIBroadcastableClusterInfo- Returns:
- the lowest Cassandra version in the cluster
-
getPartitioner
public org.apache.cassandra.spark.data.partitioner.Partitioner getPartitioner()
- Specified by:
getPartitionerin interfaceIBroadcastableClusterInfo- Returns:
- the partitioner used by the cluster
-
clusterId
public java.lang.String clusterId()
Description copied from interface:IBroadcastableClusterInfoID string that can uniquely identify a cluster. When writing to a single cluster, this may be null. When in coordinated write mode (writing to multiple clusters), this must return a unique string.- Specified by:
clusterIdin interfaceIBroadcastableClusterInfo- Returns:
- cluster id string, null if absent
-
size
public int size()
- Specified by:
sizein interfaceMultiClusterSupport<IBroadcastableClusterInfo>- Returns:
- the total number of clusters
-
forEach
public void forEach(java.util.function.BiConsumer<java.lang.String,IBroadcastableClusterInfo> action)
Description copied from interface:MultiClusterSupportIterate through all values- Specified by:
forEachin interfaceMultiClusterSupport<IBroadcastableClusterInfo>- Parameters:
action- function to consume the values
-
getValueOrNull
@Nullable public IBroadcastableClusterInfo getValueOrNull(@NotNull java.lang.String clusterId)
Description copied from interface:MultiClusterSupportLook up a value based on clusterId- Specified by:
getValueOrNullin interfaceMultiClusterSupport<IBroadcastableClusterInfo>- Parameters:
clusterId- cluster id- Returns:
- the value of type T associated with the clusterId, or null if not found
-
reconstruct
public ClusterInfo reconstruct()
Description copied from interface:IBroadcastableClusterInfoReconstructs a full ClusterInfo instance from this broadcastable data on executors. Each implementation knows how to reconstruct itself into the appropriate ClusterInfo type. This allows adding new broadcastable types without modifying the reconstruction logic inAbstractBulkWriterContext.- Specified by:
reconstructin interfaceIBroadcastableClusterInfo- Returns:
- reconstructed ClusterInfo (CassandraClusterInfo or CassandraClusterInfoGroup)
-
-