Welcome to Simple Generation’s documentation!#

This is a very simple documentation! Please find below the main class you can instantiate.

class simple_generation.SimpleGenerator(model_name_or_path, tokenizer_name_or_path=None, lora_weights=None, compile_model=False, use_bettertransformer=False, **model_kwargs)[source]#

SimpleGenerator is a wrapper around Hugging Face’s Transformers library that allows for easy generation of text from a given prompt.

property local_rank#: Returns the local rank of the process. If not in DDP, returns 0.

property is_ddp#: Returns True if the model is distributed.

property is_main_process#: Returns True if the process is the main process.

__init__(model_name_or_path, tokenizer_name_or_path=None, lora_weights=None, compile_model=False, use_bettertransformer=False, **model_kwargs)[source]#

Initialize the SimpleGenerator.

Parameters:

model_name_or_path (str) – The model name or path to load from.
tokenizer_name_or_path (str, optional) – The tokenizer name or path to load from. Defaults to None, in which case it will be set to the model_name_or_path.
lora_weights (str, optional) – The path to the LoRA weights. Defaults to None.
compile_model (bool, optional) – Whether to torch.compile() the model. Defaults to False.
use_bettertransformer (bool, optional) – Whether to transform the model with BetterTransformers. Defaults to False.
**model_kwargs – Any other keyword arguments will be passed to the model’s from_pretrained() method.

Returns:

The SimpleGenerator object.

Return type:

SimpleGenerator

Examples

>>> from simple_generation import SimpleGenerator
>>> generator = SimpleGenerator("meta-llama/Llama-2-7b-chat-hf", apply_chat_template=True)

conversation_from_user_prompts(user_prompts: List[str], **kwargs) → List[Dict][source]#

Generate a multi-turn conversation with multiple user prompts.

Generate a conversation out of several user prompts. I.e., every user prompt is fed to the model and the response is appended to the history. The history is then fed to the model again, and so on. Note that this operation is not batched.

Parameters:

user_prompts (List[str]) – A list of turn texts. Each element is the human written text for a turn.
return_last_response (bool, optional) – If True, the last response is returned as well. Defaults to False.

Returns:

A list containing the conversation, one item per turn, following the Hugging Face chat template format.

Return type:

List[Dict]

__call__(texts, batch_size='auto', starting_batch_size=256, num_workers=4, skip_prompt=False, log_batch_sample=-1, show_progress_bar=True, prepare_prompts=False, apply_chat_template=False, add_generation_prompt=False, **generation_kwargs)[source]#

Generate text from a given prompt.

Parameters:

texts (str or List[str]) – The text prompt(s) to generate from.
batch_size (int, optional) – The batch size to use for generation. Defaults to “auto”, in which case it will be found automatically.
starting_batch_size (int, optional) – The starting batch size to use for finding the optimal batch size. Defaults to 256.
num_workers (int, optional) – The number of workers to use for the DataLoader. Defaults to 4.
skip_prompt (bool, optional) – Whether to skip the initial prompt when returning the generated text. Defaults to False. Set it to False if you are using a sequence to sequence model.
log_batch_sample (int, optional) – If >0, every log_batch_sample batches the output text will be logged. Defaults to -1.
show_progress_bar (bool, optional) – Whether to show the progress bar. Defaults to True.
apply_chat_template (bool, optional) – Whether to apply the chat template to the prompts. Defaults to False.
add_generation_prompt (bool, optional) – Whether to add the generation prompt to the prompts. Defaults to False.
**generation_kwargs – Any other keyword arguments will be passed to the model’s generate() method.

Returns:

The generated text(s).

Return type:

str or List[str]

Examples

>>> from simple_generation import SimpleGenerator
>>> generator = SimpleGenerator("meta-llama/Llama-2-7b-chat-hf", apply_chat_template=True)
>>> generator("Tell me what's 2 + 2.", max_new_tokens=16, do_sample=True, top_k=50, skip_prompt=True)
"The answer is 4."

gui(**generation_kwargs)[source]#: Start a GUI for the model.

class simple_generation.DefaultGenerationConfig(max_new_tokens: int = 512, do_sample: bool = True, temperature: float = 0.7, top_p: float = 1.0, top_k: int = 50, num_return_sequences: int = 1)[source]#

Default generation configuration.

We apply this parameters to any .generate() call, unless they are not overridden.

max_new_tokens#

The maximum number of tokens to generate. Defaults to 512.

Type:: int

do_sample#

Whether to use sampling or greedy decoding. Defaults to True.

Type:: bool

temperature#

The sampling temperature. Defaults to 0.7.

Type:: float

top_p#

The cumulative probability for sampling from the top_p distribution. Defaults to 1.0.

Type:: float

top_k#

The number of highest probability vocabulary tokens to keep for top-k-filtering. Defaults to 50.

Type:: int

num_return_sequences#

The number of independently computed returned sequences for each element in the batch. Defaults to 1.

Type:: int

class simple_generation.vlm.SimpleVLMGenerator(model_name_or_path, **model_kwargs)[source]#

property is_ddp#: Returns True if the model is distributed.

property local_rank#: Returns the local rank of the process. If not in DDP, returns 0.

__init__(model_name_or_path, **model_kwargs)[source]#

__call__(texts, images, batch_size='auto', starting_batch_size=256, num_workers=0, skip_prompt=False, log_batch_sample=-1, show_progress_bar=None, macro_batch_size: int = 512, **generation_kwargs)[source]#: Call self as a function.

“””