The Author Online Book Forums are Moving

The Author Online Book Forums will soon redirect to Manning's liveBook and liveVideo. All book forum content will migrate to liveBook's discussion forum and all video forum content will migrate to liveVideo. Log in to liveBook or liveVideo with your Manning credentials to join the discussion!

Thank you for your engagement in the AoF over the years! We look forward to offering you a more enhanced forum experience.

svencowart (3) [Avatar] Offline
#1
I am uncertain about how to separate the concept of a StreamTask from a Partition? It's mentioned, “State Stores are Assigned Per Task -
The statement above could be interpreted to mean that each partition has its own state store, but that is not the case. Partitions are assigned to a StreamTask and each StreamTask has it’s own state store.”

What defines a StreamTask? At what point are there multiple copies of a stream topology running via multiple copies of the same StreamTask? In my mind, if I run a StreamTask then there is one instance of the StreamTask so why is there a need to repartition the data? The only way I could map the logic in my mind from the excerpt above, is if each broker has it's own copy of a StreamTask and then I can see the necessity to repartition the data.

I apologize if my question seems silly as I am still fairly new to Kafka and Kafka Streams. I just find the excerpt about a StateStore important and lacking proper explanation.
Bill Bejeck (47) [Avatar] Offline
#2
svencowart,

Thanks for asking the question.

The first point to keep in mind is that you only need to repartition the data when you change keys and your topic has more than one partition, and partitions are used for parallel processing of data.

The second point is that you don't directly run or control a StreamTask. A StreamTask is created by Kafka Streams and is assigned to a StreamThread for processing. The reason a state store

Depending on the number of topics in your topology and the number partitions you may have multiple StreamTask objects, but it's important to keep in mind each one is distinct there are never "copies" of StreamTask objects, each one is distinct.

Does this clear things up?

HTH,
Bill