Class TiktokenTextSplitter

java.lang.Object
com.github.hakenadu.javalangchains.chains.qa.split.MaxLengthBasedTextSplitter
com.github.hakenadu.javalangchains.chains.qa.split.TiktokenTextSplitter
All Implemented Interfaces:
TextSplitter

public final class TiktokenTextSplitter
extends MaxLengthBasedTextSplitter
This TextSplitter splits documents based on their tiktoken token count. For that purpose jtokkit is utilized.
  • Constructor Details

    • TiktokenTextSplitter

      public TiktokenTextSplitter​(com.knuddels.jtokkit.api.Encoding encoding, int maxTokens, TextStreamer textStreamer)
      creates an instance of TiktokenTextSplitter
      Parameters:
      encoding - encoding
      maxTokens - max amount of tokens for each chunk
      textStreamer - the TextStreamer used for streaming the base text
    • TiktokenTextSplitter

      public TiktokenTextSplitter​(com.knuddels.jtokkit.api.Encoding encoding, int maxTokens)
      creates an instance of TiktokenTextSplitter with sentence based text streaming
      Parameters:
      encoding - encoding
      maxTokens - max amount of tokens for each chunk
  • Method Details