Class TiktokenTextSplitter
java.lang.Object
com.github.hakenadu.javalangchains.chains.qa.split.MaxLengthBasedTextSplitter
com.github.hakenadu.javalangchains.chains.qa.split.TiktokenTextSplitter
- All Implemented Interfaces:
TextSplitter
public final class TiktokenTextSplitter extends MaxLengthBasedTextSplitter
This
TextSplitter splits documents based on their tiktoken token
count. For that purpose
jtokkit is utilized.-
Constructor Summary
Constructors Constructor Description TiktokenTextSplitter(com.knuddels.jtokkit.api.Encoding encoding, int maxTokens)creates an instance ofTiktokenTextSplitterwith sentence based text streamingTiktokenTextSplitter(com.knuddels.jtokkit.api.Encoding encoding, int maxTokens, TextStreamer textStreamer)creates an instance ofTiktokenTextSplitter -
Method Summary
Methods inherited from class com.github.hakenadu.javalangchains.chains.qa.split.MaxLengthBasedTextSplitter
split
-
Constructor Details
-
TiktokenTextSplitter
public TiktokenTextSplitter(com.knuddels.jtokkit.api.Encoding encoding, int maxTokens, TextStreamer textStreamer)creates an instance ofTiktokenTextSplitter- Parameters:
encoding-encodingmaxTokens- max amount of tokens for each chunktextStreamer- theTextStreamerused for streaming the base text
-
TiktokenTextSplitter
public TiktokenTextSplitter(com.knuddels.jtokkit.api.Encoding encoding, int maxTokens)creates an instance ofTiktokenTextSplitterwith sentence based text streaming- Parameters:
encoding-encodingmaxTokens- max amount of tokens for each chunk
-
-
Method Details
-
getLength
Description copied from class:MaxLengthBasedTextSplitterprovide the length value for a text part- Specified by:
getLengthin classMaxLengthBasedTextSplitter- Parameters:
textPart- the text part which needs to be measured- Returns:
- the length for the passed textPart
-