Browse Source

doc(checkpoint): basic doc

ngjaying 4 years atrás
parent
commit
cd7b913eb7
2 changed files with 66 additions and 11 deletions
  1. 20 11
      docs/en_US/rules/overview.md
  2. 46 0
      docs/en_US/rules/state_and_fault_tolerance.md

+ 20 - 11
docs/en_US/rules/overview.md

@@ -39,6 +39,23 @@ The identification of the rule. The rule name cannot be duplicated in the same K
 
 The sql query to run for the rule. 
 
+## options
+The current options includes:
+
+| Option name | Type & Default Value | Description                                                  |
+| ------------- | -------- | ------------------------------------------------------------ |
+| isEventTime | boolean: false   | Whether to use event time or processing time as the timestamp for an event. If event time is used, the timestamp will be extracted from the payload. The timestamp filed must be specified by the [stream]([extension](../sqls/streams.md)) definition. |
+| lateTolerance        | int64:0   | When working with event-time windowing, it can happen that elements arrive late. LateTolerance can specify by how much time(unit is millisecond) elements can be late before they are dropped. By default, the value is 0 which means late elements are dropped.  |
+| concurrency | int: 1   | A rule is processed by several phases of plans according to the sql statement. This option will specify how many instances will be run for each plan. If the value is bigger than 1, the order of the messages may not be retained. |
+| bufferLength | int: 1024   | Specify how many messages can be buffered in memory for each plan. If the buffered messages exceed the limit, the plan will block message receiving until the buffered messages have been sent out so that the buffered size is less than the limit. A bigger value will accommodate more throughput but will also take up more memory footprint.  |
+| sendMetaToSink | bool:false   | Specify whether the meta data of an event will be sent to the sink. If true, the sink can get te meta data information.  |
+| qos | int:0   | Specify the qos of the stream. The options are 0: At most once; 1: At least once and 2: Exactly once. If qos is bigger than 0, the checkpoint mechanism will be activated to save states periodically so that the rule can be resumed from errors.  |
+| checkpointInterval | int:300000   | Specify the time interval in milliseconds to trigger a checkpoint. This is only effective when qos is bigger than 0.  |
+
+For detail about `qos` and `checkpointInterval`, please check [state and fault tolerance](state_and_fault_tolerance).
+
+## Sources
+
 - Kuiper provides embeded following 3 sources,
   - MQTT source, see  [MQTT source stream](sources/mqtt.md) for more detailed info.
   - EdgeX source by default is shipped in [docker images](https://hub.docker.com/r/emqx/kuiper), but NOT included in single download binary files, you use ``make pkg_with_edgex`` command to build a binary package that supports EdgeX source. Please see [EdgeX source stream](sources/edgex.md) for more detailed info.
@@ -46,7 +63,9 @@ The sql query to run for the rule.
 - See [SQL](../sqls/overview.md) for more info of Kuiper SQL.
 - Sources can be customized, see [extension](../extension/overview.md) for more detailed info.
 
-### sinks/actions
+
+
+# sinks/actions
 
 Currently, below kinds of sinks/actions are supported:
 
@@ -135,13 +154,3 @@ Kuiper extends several functions that can be used in data template.
 - `json para1`: The `json` function is used for convert the map content to a JSON string.
 - `base64 para1`: The `base64` function is used for encoding parameter value to a base64 string.
 - `add para1 para2`: The `add` function is used for adding two numeric value.
-
-### options
-The current options includes:
-
-| Option name | Type & Default Value | Description                                                  |
-| ------------- | -------- | ------------------------------------------------------------ |
-| isEventTime | boolean: false   | Whether to use event time or processing time as the timestamp for an event. If event time is used, the timestamp will be extracted from the payload. The timestamp filed must be specified by the [stream]([extension](../sqls/streams.md)) definition. |
-| lateTolerance        | int64:0   | When working with event-time windowing, it can happen that elements arrive late. LateTolerance can specify by how much time(unit is millisecond) elements can be late before they are dropped. By default, the value is 0 which means late elements are dropped.  |
-| concurrency | int: 1   | A rule is processed by several phases of plans according to the sql statement. This option will specify how many instances will be run for each plan. If the value is bigger than 1, the order of the messages may not be retained. |
-| bufferLength | int: 1024   | Specify how many messages can be buffered in memory for each plan. If the buffered messages exceed the limit, the plan will block message receiving until the buffered messages have been sent out so that the buffered size is less than the limit. A bigger value will accommodate more throughput but will also take up more memory footprint.  |

+ 46 - 0
docs/en_US/rules/state_and_fault_tolerance.md

@@ -0,0 +1,46 @@
+# State
+
+Kuiper supports stateful rule stream. There are two kinds of states in Kuiper:
+1. Internal state for window operation and rewindable source
+2. User state exposed to extensions with stream context, check [state storage](../extension/overview.md#state-storage).
+
+# Fault Tolerance
+
+By default, all the states reside in memory only which means that if the stream exits abnormally, the states will disappear.
+
+In order to make state fault tolerance, Kuipler need to checkpoint the state into persistent storage which will allow a recovery after failure.
+
+## Enable Checkpointing
+
+Set the rule option qos to 1 or 2 will enable the checkpointing. Configure the checkpoint interval by setting the checkpointInterval option.
+
+When things go wrong in a stream processing application, it is possible to have either lost, or duplicated results. For the 3 options of qos, the behavior will be:
+
+1. At-most-once(0): Kuiper makes no effort to recover from failures
+2. At-least-once(1): Nothing is lost, but you may experience duplicated results
+3. Exactly-once(2): Nothing is lost or duplicated 
+
+Given that Kuiper recovers from faults by rewinding and replaying the source data streams, when the ideal situation is described as exactly once does not mean that every event will be processed exactly once. Instead, it means that every event will affect the state being managed by Kuiper exactly once.
+
+If you don’t need "exactly once", you can gain some performance by configuring Kuiper to use AT_LEAST_ONCE.
+
+## Exactly Once End to End
+
+### Source consideration
+
+To have an end to end qos of the stream, the source must be rewindable. That means after recovery, the source can be reverted to the checkpointed offset and resend data from that so that the whole stream can be replayed from the last failure.
+
+For extended source, the user must implement the api.Rewindable interface as well as the default api.Source interface. Kuiper will handle the rewind internally.
+
+```go
+type Rewindable interface {
+	GetOffset() (interface{}, error)
+	Rewind(offset interface{}) error
+}
+```
+
+### Sink consideration
+
+We cannot guarantee the sink to receive a data exactly once. If failures happen during the period of checkpointing, some states which have sent to the sink may not be checkpointed. And those states will be replayed as they are not restored because of not being checkpointed. In this case, the sink may receive them more than once. 
+
+To implement exactly-once, the user will have to implement deduplication tailored to fit the various sinking system.