Relevant source files
This document describes the input controls of the Text Split Explorer application. Input controls are the user interface elements that allow users to input text, configure splitting parameters, and initiate the text splitting process. For information about how the split results are displayed, see Results Display. For details on code snippet generation, see Code Snippet Generation.
Overview of Input Controls
The Text Split Explorer interface provides several input controls that allow users to customize how text is split:
Sources: splitter.py16-91
Text Input Area
The text input area is a large text box where users can paste or type the text they want to split. This control serves as the primary data source for the application.
Sources: splitter.py88
The text area is implemented using Streamlit's text_area
component:
doc = st.text_area("Paste your text here:")
This creates a multi-line text input field labeled "Paste your text here:" and stores the input in the doc
variable, which is later passed to the text splitter when the splitting process is initiated.
Sources: splitter.py88
Parameter Controls
The parameter controls allow users to configure the numerical settings that determine how text is split.
Chunk Size
The chunk size control determines the maximum size of resulting text chunks, measured in either characters or tokens (depending on the selected length function).
Sources: splitter.py18-19
The chunk size is implemented as a number input with a minimum value of 1 and a default value of 1000:
chunk_size = st.number_input(min_value=1, label="Chunk Size", value=1000)
Chunk Overlap
The chunk overlap control sets the number of characters or tokens (depending on the selected length function) that will overlap between adjacent chunks.
Sources: splitter.py21-28
The chunk overlap is implemented as a number input with:
- Minimum value of 1
- Maximum value dynamically set to
chunk_size - 1
- Default value set to 20% of the chunk size:
chunk_overlap = st.number_input( min_value=1, max_value=chunk_size - 1, label="Chunk Overlap", value=int(chunk_size * 0.2),)
Validation Warning
The interface includes validation logic that warns users if the chunk overlap is not less than the chunk size:
if chunk_overlap >= chunk_size: st.warning("Chunk Overlap should be less than Chunk Length!")
This prevents users from configuring invalid splitting parameters.
Sources: splitter.py30-32
Selection Controls
Selection controls allow users to choose from predefined options that determine how the text splitting process works.
Length Function Selector
The length function selector determines how the system measures text length - either by counting characters or tokens.
Sources: splitter.py34-59
The length function is implemented as a dropdown selector with two options:
length_function = st.selectbox( "Length Function", ["Characters", "Tokens"])
When "Characters" is selected, the system uses Python's built-in len()
function. When "Tokens" is selected, it uses the tiktoken
library to count tokens using the "cl100k_base" encoding.
Text Splitter Type Selector
The text splitter type selector allows users to choose which splitting algorithm to use.
Sources: splitter.py39-44 splitter.py92-109
The splitter type selector is implemented as a dropdown with options including:
- "RecursiveCharacter"
- "Character"
- Various language options from LangChain's
Language
enum
Action Control
The primary action control is the "Split Text" button that initiates the text splitting process.
Sources: splitter.py91-117
When the button is clicked, the system:
- Creates an appropriate text splitter instance based on the selected splitter type
- Configures it with the chunk size, chunk overlap, and length function parameters
- Splits the input text
- Displays the resulting chunks
UI Layout Organization
The input controls are organized in a clean, structured layout using Streamlit's column system:
Sources: splitter.py7-16 splitter.py16-44
The column layout creates a balanced, organized interface that groups related controls together:
- Column 1: Chunk size input
- Column 2: Chunk overlap input and validation warning
- Column 3: Length function selection
- Column 4: Text splitter type selection (wider column to accommodate longer option names)
Connection to Code Entities
The following diagram maps the UI components directly to their implementation in the code:
Sources: splitter.py16-44 splitter.py88 splitter.py91-111
Interaction Flow Between Input Controls and Splitting Process
The following diagram illustrates how the input controls interact with each other and the splitting process:
Sources: splitter.py16-117
Parameter Validation and Constraints
The input controls implement the following validations and constraints:
Parameter | Constraint | Validation | Default Value |
---|---|---|---|
Chunk Size | Minimum value of 1 | N/A | 1000 |
Chunk Overlap | Minimum value of 1, Maximum value of (chunk_size - 1) | Warning if chunk_overlap >= chunk_size | 20% of chunk size |
Length Function | Must be "Characters" or "Tokens" | Limited by dropdown options | N/A |
Splitter Type | Must be one of the predefined options | Limited by dropdown options | N/A |
Sources: splitter.py18-32
The combination of these input controls provides users with a comprehensive and flexible interface for configuring and executing text splitting operations, allowing for experimentation with different splitter types and parameters to achieve optimal results for their specific use cases.