How I made a command line chatbot

Home

I went to the woods because I wished to live deliberately, to front only the essential facts of life, and see if I could not learn what it had to teach, and not, when I came to die, discover that I had not lived.

Henry David Thoreau, Walden

Introduction

Since the end of 2022, we have been inundated with LLM chatbots in our everyday lives. The typical use involves some sort of interface that we sign into, where our previous chats can be saved, and so forth. At some point early on, these chatbots were available via API. Here, I want to show how I made a chatbot that can be called from the command line anywhere on my computer.

The use case I'll show you is that of the literate programming environment, which in turn allows you to basically have and store platonic dialogues that are in turn saved and stored. This helps me with specific use cases, like getting feedback on journal entries and notes in real time. A specific example of this involves my German learning, where I often use LLMs to be a conversational partner that provides feedback in real time. Otherwise, in my line of work, I do a lot of research, so I often use LLMs to bounce ideas off of, and so forth. A literate programming environment helps save this feedback directly into the notes I'm taking.

In short, having LLMs available on the command line is useful, and here, I'm going to show you how I do it. The punchline is that it is not particularly hard to do.

The command line interface

Chatbot

The first thing we have to do is make a python script where we call an API. At the time of writing [2025-01-27 Mon], Deepseek R1 has just come out, which provides chain-of-thought reasoning (currently state of the art), the likes of GPT-o1 (currently state of the art), but for far cheaper in terms of API calls. Not to mention its open source, so it might be down the line if I get enough computational muscle, I can run it locally.

In terms of calling the API, there are plenty of options, but I use OpenRouter. This gives me access to the GPT models, Claude, Deepseek, and a host of other models. You can browse available models here. When you sign up, you will get an API key, and the ability to buy credits. To give you an idea of how cheap it is, I put down $5 three days ago (date of this sentence: [2025-01-27 Mon]), and after extensive use primarily of Claude, I have $4.74 remaining.

I wrote a python script called chatbot.py, that takes in as input the desired model to use (currently GPT-4o, Claude, Deepseek R1) and the prompt. It turn prints the answer. The script is as follows:

#!/usr/bin/env python3
from openai import OpenAI
import sys

ds = "deepseek/deepseek-r1"
gpt3 = "openai/gpt-3.5-turbo"
gpt4 = "openai/gpt-4o-2024-11-20"
o3mh = "openai/o3-mini-high"
claude = "anthropic/claude-3.5-sonnet-20240620"
gemini = "google/gemini-2.0-flash-001"

# Ensure the question is passed as a command-line argument
if len(sys.argv) < 3:
    print("Usage: python chatbot.py <model> \"<your question>\"")
    print("Model options: gpt or deepseek")
    sys.exit(1)

# Get the question from the command-line argument
model_choice = sys.argv[1].lower()
user_question = sys.argv[2]

# Map the user's model choice to the appropriate model
if model_choice == "gpt3":
    model = gpt3
elif model_choice == "gpt4":
    model = gpt4
elif model_choice == "deepseek":
    model = ds
elif model_choice == "claude":
    model = claude
elif model_choice == "gemini":
    model = gemini
elif model_choice == "o3mh":
    model = o3mh
else:
    print("Invalid model choice.")
    sys.exit(1)


client = OpenAI(
  base_url="https://openrouter.ai/api/v1",
  api_key="YOUR_API_KEY_HERE",
)

completion = client.chat.completions.create(
  model=model,
  messages=[
    {
      "role": "user",
      "content": user_question
    }
  ]
)
print(completion.choices[0].message.content)

This is the first step. You can then run this by going to the directly that this script is in and running:

python chatbot.py "your model here" "your prompt here"

For example:

python chatbot.py "gpt3" "What is 5 + 2?"

5 + 2 = 7

Great. Now the next step is to make this script available anywhere on your computer. The way you do that is to add this script to your PATH. First, I made a directory called scripts that is directly in my PATH.

mkdir -p ~/scripts

Next, I moved the chatbot python script into that new directory:

mv chatbot.py ~/scripts/chatbot
chmod +x ~/scripts/chatbot # permissions

Then I place this directory into my PATH, which for me is in ~/.zshrc

export PATH="$HOME/scripts:$PATH"

Then, I applied the changes with:

source ~/.zshrc

And from there, I can call chatbot globally by using:

chatbot "gpt3" "What is 5 + 2?"

5 + 2 = 7

In terms of using it on the command line, that is all there is to it. But if you note in the example above, this script ran directly in this writeup, because I am writing this article in a literate programming environment. This one of my preferred ways of using LLMs, as a conversational partner in real time.

Thus, the next section will show you how to get this running in a literate programming environment.

Searchbot

If we are querying a LLM, we are often at the mercy of the cutoff of the given LLM's training data. At the time of writing [2025-02-16 Sun], ChatGPT (as an example) has the option of incorporating internet search in a given query. However, calling LLMs from an API, as per the previous section, does not mean that you get an internet search baked in.

To this end, I created searchbot. Using the same concept as "chatbot" in the previous section, I wrote a script that queries the API to perplexity. This tool was one of the first LLMs that integrated these models with an internet search. If you want to have "searchbot" in your command line, you have to first make an account with Perplexity. After you do that, go to their "getting started" page here, where you will learn how to get an API key and enter payment details, a similar process to that of OpenRouter above in the previous section. And as before, you will then be able to add credits that can be used, measured in dollars. Once you run out of credits, you won't be able to use the API until you add more (and you'll get an error message saying that you're out of credits).

Anyway, after all of this, please create a script called searchbot.py which has the following code pasted into it:

#!/usr/bin/env python3
import sys
from openai import OpenAI

if len(sys.argv) < 2:
    print("searchbot \"your_prompt\"")
    sys.exit(1)

user_query = " ".join(sys.argv[1:])

YOUR_API_KEY = "YOUR_API_KEY_HERE"

messages = [
    {
        "role": "system",
        "content": (
            "You are an artificial intelligence assistant and you need to "
            "engage in a helpful, detailed, polite conversation with a user."
        ),
    },
    {
        "role": "user",
        "content": user_query,
    },
]

client = OpenAI(api_key=YOUR_API_KEY, base_url="https://api.perplexity.ai")

# chat completion without streaming
response = client.chat.completions.create(
    model="sonar-pro",
    messages=messages,
)
print(response)

In the above script, please add your api key to the variable YOUR_API_KEY.

From here, you do the same procedure as before. You get this script into your PATH. Here is how I do it on my 2022 MacBook Pro:

mv searchbot.py ~/scripts/searchbot
chmod +x ~/scripts/searchbot # permissions

Then once again I apply the changes with:

source ~/.zshrc

And from here, I can call searchbot globally, like this:

source ~/.zshrc
searchbot "What is the latest news around Open AI's potential release of GPT-5?"

ChatCompletion(id='5963f9db-5ca6-430a-a3ff-4cb9281df979', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='OpenAI has recently announced plans to release GPT-4.5 and GPT-5 in the near future. According to CEO Sam Altman\'s statements on February 12, 2025, GPT-4.5 is expected to launch within weeks, while GPT-5 is slated for release within months[1][3].\n\nKey points about the upcoming releases include:\n\n1. GPT-4.5, internally called "Orion," will be OpenAI\'s last non-chain-of-thought model[1][5]. This suggests a significant shift in AI capabilities moving forward.\n\n2. GPT-5 will integrate various technologies, including the previously planned o3 model, which will no longer be released as a standalone product[1][5].\n\n3. OpenAI aims to simplify its product offerings, creating a more unified AI experience that "just works" for users without requiring them to choose between different models[1][5].\n\n4. Free users of ChatGPT will have unlimited access to GPT-5 at a standard intelligence setting, while Plus and Pro subscribers will have access to higher intelligence levels[1][5].\n\n5. GPT-5 is expected to incorporate advanced features such as improved reasoning capabilities, multimodal understanding (text, image, audio, video), and enhanced customization options[4].\n\n6. The release of these models reflects OpenAI\'s strategy to maintain its competitive edge in the AI industry, particularly in light of advancements from other companies like China\'s DeepSeek[5].\n\nIt\'s important to note that while these announcements have generated significant excitement, specific details about the capabilities and exact release dates of GPT-4.5 and GPT-5 remain limited. As with previous releases, OpenAI is likely to prioritize safety testing and responsible deployment of these new models[4].', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=None), delta={'role': 'assistant', 'content': ''})], created=1739693949, model='sonar-pro', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=409, prompt_tokens=40, total_tokens=449, completion_tokens_details=None, prompt_tokens_details=None, citation_tokens=5566, num_search_queries=1), citations=['https://www.pymnts.com/artificial-intelligence-2/2025/openai-to-release-gpt-4-5-within-weeks-gpt-5-within-months/', 'https://opentools.ai/news/openai-unleashes-gpt-5-with-free-unlimited-access-for-all', 'https://www.axios.com/2025/02/12/openai-chatgpt-roadmap-gpt5', 'https://www.chatbase.co/blog/gpt-5', 'https://www.maginative.com/article/gpt-5-is-the-omnimodel-weve-been-waiting-for-heres-why/', 'https://lonelybrand.com/blog/openai-unveils-development-plan-for-gpt-4-5-and-gpt-5-launch/', 'https://www.youtube.com/watch?v=KtwK3hBAjDY', 'https://pylessons.com/news/openai-gpt-4-5-gpt-5-release-update', 'https://topmostads.com/gpt-4-5-vs-gpt-5-release/', 'https://community.openai.com/t/openai-roadmap-and-characters/1119160'])

You can see that it is a bit of a mess right now because I have the bot return everything. As of now, I use these search results to feed back into the chatbots to produce reports. I'll explain in the next section.

Prompt scripting: the next level of prompt engineering

Now that we are able to call chatbot and searchbot from the command line, we now have to talk about what is possible that otherwise cannot be done in the LLM UI/UX. Here is where knowing how to code (as opposed to knowing how to prompt LLMs to code) really comes in handy. Because to understand the next level of prompt engineering, you need to be able to think algorithmically.

What we can do now is what I call prompt scripting. This is where you write scripts (in whatever programming language) where the data structures are LLM agents, rather than just strings and ints, for example. In order to help you understand what I'm talking about, I'm going to go right into a simple example.

Below, we are going to use searchbot and chatbot in a simple manner. We are going to produce a prompt for searchbot. We are then going to feed the searchbot results and the original prompt directly into chatbot, which will then produce a report.

source ~/.zshrc

# Architecture:
# Terminal goal: Produce a newsletter
# (user)--[prompt]-->(internet search agent)--[research notes]-->(reasoning agent)--> *newsletter*

# Setting the search prompt
search_prompt="Please give me the lastest news around LLM developments. Focus on things that have happened as of February 1, 2025 and after. Use the most reputable sources possible."

# Internet searching agent produces research notes
news=$(searchbot "$search_prompt")

# Setting the newsletter prompt
newsletter_prompt="You are to produce a newsletter given the latest updates around LLM developments as of February 1, 2025. A researcher, perplexity, has compiled notes for you to use. You will take these as input and produce the newsletter. Make it academic in style. Include citations as you would an academic journal, both in the report with footnotes along with a citations section. Begin with an abstract that summarizes these developments. The research notes you will use, along with sources, is here: $news"

# Reasoning agent produces the newsletter
newsletter=$(chatbot "deepseek" "$newsletter_prompt")

echo "$newsletter"

**Abstract**
The field of Large Language Models (LLMs) has witnessed significant advancements as of February 2025, marked by innovations in architecture, efficiency, and specialization. Highlights include DeepSeek R1, a cost-efficient open-source model excelling in mathematical and coding tasks; Rakuten AI 2.0, Japan’s first MoE-based bilingual model optimized for edge deployment; and Google’s Gemini 1.5, which features an unprecedented one million-token context window. Concurrently, Alibaba’s Qwen2.5-Max demonstrates superior benchmark performance, while LG AI Research’s EXAONE 3.0 advances multilingual and domain-specific applications. These developments reflect a broader trend toward scalability, accessibility, and task-specific optimization in LLM research[1][5][9].

---

**Key Developments in LLM Technology (2024–2025)**

**1. DeepSeek R1: Open-Source Efficiency with Narrowed Specialization**
Released in January 2025, DeepSeek R1 represents a breakthrough in parameter-efficient training. With 671 billion total parameters and 37 billion active parameters, it matches or exceeds OpenAI’s proprietary models in mathematical reasoning (GSM8K: 89.3%) and coding benchmarks (HumanEval: 84.7%)[1]. Notably, its training cost—reportedly 40% lower than comparable models—has driven adoption in academic and open-source communities[8]. The model’s selective parameter activation reduces inference latency, making it a viable option for resource-constrained environments[7].

**2. Rakuten AI 2.0: MoE Architecture for Japanese Language Processing**
Rakuten Group’s February 2025 release of Rakuten AI 2.0 introduces Japan’s first Mixture of Experts (MoE) LLM. The model employs eight distinct 7B-parameter experts, achieving performance parity with monolithic 70B models in Japanese NLP tasks[5]. A companion “mini” variant (2.8B parameters) targets edge-device applications, emphasizing low power consumption and sub-100ms latency. Early evaluations highlight its efficacy in customer service automation and real-time translation[5].

**3. Gemini 1.5: Context Window Expansion**
Google’s Gemini 1.5, announced in February 2024 but widely adopted by 2025, pioneers a one million-token context window—7x larger than previous iterations[1]. This allows single-prompt processing of 30,000 code lines or 700,000 words, enabling novel applications in long-form content analysis and cross-document inference. Early industry use cases include legal contract review and video transcript summarization[3][10].

**4. Qwen2.5-Max: Benchmark Dominance in Efficiency-Critical Tasks**
Alibaba’s Qwen2.5-Max, released in early 2025, outperforms competitors (including DeepSeek V3) in low-latency benchmarks such as LiveCodeBench (+12.4%) and GPQA-Diamond (+8.9%)[9]. While parameter counts remain undisclosed, its architecture employs dynamic sparse attention to reduce computational overhead. The model is now integrated into Alibaba Cloud’s enterprise API suite, targeting finance and healthcare sectors[7][9].

**5. EXAONE 3.0: Multilingual and Domain-Specific Advancements**
LG AI Research’s EXAONE 3.0 (December 2024) features 7.8B parameters optimized for chemistry, patent analysis, and bilingual (Korean-English) tasks[9]. Its instruction-tuned variant, released under a non-commercial license, has facilitated academic research in cross-lingual transfer learning. Notably, EXAONE 3.0 achieves state-of-the-art results in the KR-STEM benchmark (92.1 F1-score), aiding South Korea’s scientific R&D initiatives[9].

---

**Citations**
[1] Exploding Topics. (2025). *List of Leading LLMs*. https://explodingtopics.com/blog/list-of-llms
[3] Georgia Tech. (2025). *AI Trends: What’s Next for Large Language Models*. https://www.cc.gatech.edu/news/ai-ai-popular-large-language-models-weigh-whats-next-ai-2025
[5] Rakuten Group. (2025). *Press Release: Rakuten AI 2.0 Launch*. https://global.rakuten.com/corp/news/press/2025/0212_02.html
[7] Vamsi Talks Tech. (2025). *LLM Evolution: 2024–2025 Technical Review*. https://www.vamsitalkstech.com/ai/the-evolution-of-large-language-models-in-2024-and-where-we-are-headed-in-2025-a-technical-review/
[8] TechTarget. (2025). *DeepSeek: Technical Overview*. https://www.techtarget.com/whatis/feature/DeepSeek-explained-Everything-you-need-to-know
[9] Shakudo. (2025). *Top LLMs of 2025*. https://www.shakudo.io/blog/top-9-large-language-models
[10] Deloitte. (2025). *AI Agents and Autonomous Systems*. https://www2.deloitte.com/us/en/insights/focus/tech-trends/2025/tech-trends-ai-agents-and-autonomous-ai.html

---
*Format: Academic newsletter with numeric citations corresponding to sources. Footnotes integrated via bracketed numbers linked to the citations section.*

You get the picture. Here, I had the overall goal of making a newsletter. To this end, I set up an "agent" to do internet search and prepare research notes that would in turn be used by a reasoning model to turn it into an academic style newsletter. This is a very simple version of this, and I note that if you really read the output, it is informative but not perfect. Further improvements could be made on the prompt, or adding additional agents to do more research, edit the report, check for errors, and so forth. But I'll leave additional layers of complexity as an exercise to the reader.

I'd like to show you one more thing you can do when you start thinking in terms of prompt scripting. And that is adding a code execution layer to these prompts. Below is a simple script where a LLM produces code that is in turn executed. The user sees just the execution.

source ~/.zshrc

# Architecture:
# Terminal goal: execute code based on the user's natural language query
# (user)--[prompt]-->(code producing agent)--[code]-->(code executing agent)--> *code output*

# Make the code
cmd=$(chatbot "claude" "Please produce a shell command to give me today's date and time. Just produce the command. Do not produce anything else.")
echo "$cmd"

# Execute the code (rather than copy/paste)
eval "$cmd"

date
Sun Feb 16 10:05:20 CET 2025

In the case of shell scripting, you simply use the "eval" command to run any code passed into it as a string, which is what I can get the chatbot to do.

You can imagine that we could get the chatbot to produce much more complex commands for whichever language as part of a much more complex pipeline. Again, I'll leave that as an exercise to the reader, now that you have the toolkit to do so.

We will now move to the related topic of how to get these commands working in a literate programming environment.

The literate programming environment

R Markdown and Jupyter Notebooks

The spoiler alert upfront is that in order to use the new script in a literate programming environment, you just have to get it to execute shell scripts. In R Markdown, there is an option to run bash. Jupyter notebooks have options as well. One of the key things that I've had to do to get the literate programming environment to recognize the script, is in every code block where I run it, I have to source my zshrc file. Like this:

source ~/.zshrc
chatbot "gpt3" "What is 5 + 2?"

5 + 2 equals 7.

Maybe you won't have this problem, but if you do, that is how you get around it.

Emacs Org-Mode

Now this section is for Emacs users who use org mode. This is the literate programming environment that I prefer, and I am writing this article directly in it. I am specifically using Babel which allows for active code use in Org. This comes with Doom Emacs (I'm currently using this) and Spacemacs (I started with this).

The way you start a shell block here is by doing "#+begin_src sh :results output" and then another line underneath the code "#+end_src." There is a keybinding that shortcuts this. Just go to a new line and type "<s + Tab." You'll see why this is important in a minute.

So go ahead and test this out on your computer and then we'll move to the next bit, where we make a keybinding specific to the making and use of our chatbot script.

Now what I have set up is a keybinding that sets up specifically this:

#+begin_src sh :results output
source ~/.zshrc
chatbot "claude" "test"

#+end_src

I can get the above block by typing "<chat + Tab" anywhere I'd otherwise insert a code block within an Org file.

Anyway the way you set up this keyboard shortcut, at least in Doom emacs, is by going into your config file (config.el) within your .doom.d directory, going into your "after 'org" block, and adding the following lines.

(require 'org-tempo)
(add-to-list 'org-structure-template-alist
        '("chat" . "src sh :results output\nsource ~/.zshrc\nchatbot \"claude\" \"test\""))

You'll see that there is an unnecessary line that sits between the chatbot call and end_src. I have not yet figured out how to remove that line, but the text cursor automatically sits at that line, so you just have to press delete right after the block has been made and the line goes away. So really just think of it as "<chat + Tab + Delete."

Discussion

Since I made this command line cool and figured how to use it in a literate programming context, it has increased my productivity especially in the contexts of my German learning (I am American living in Berlin), and my research work. The more general theme is that I write prolifically in my literate programming environment (Org-Mode), and now I can get direct feedback from LLMs directly within this environment.

In general, I write to think. Thus, now I can think and get real time feedback on my thoughts by LLMs that are becoming increasingly better, and are hallucinating less often. I have seen the occasional comment about how LLMs are going to reduce our ability to think about stuff, because we will end up outsourcing our cognition to them. My general goal is to use LLMs as a way to help me think more effectively about stuff, because being good a thinking about stuff is a core value of mine.

This is similar to bench pressing by myself versus with a spotter. As opposed to getting a spotter to do the bench pressing for me while I sit back and watch. That is a sort of litmus test for my LLM use: does it increase brain activity or decrease it? If my brain activity goes up or at least stays the same, then I will consider it a valid use case for me. If my brain activity goes down, then that is not a valid use case for me.

Adding the capability to have API access to internet-searching LLMs, which is what Bing Chat (aka Sydney) originally did, makes it possible to have even more complexity in your LLM use. The example I used was that of having an internet search agent act produce research notes on a topic and then to have a reasoning-based agent convert that into a newsletter. Given new tools like OpenAI's Deep Research out there, I know that all of this this might seem a bit like reinventing the wheel, but one of the things that I want to do is understand how these things work under the hood. And one way to understand how something works is to build it yourself.

Finally, the bigger picture here is that there is a level above prompt engineering that I'm calling prompt scripting, where you start to think less in terms of prompts to LLMs and rather in terms of flow charts where each nodes is a LLM that has a different function and pre-prompt, passing output into other LLMs as input. This can allow for much you to achieve much more complex goals than that of a simple prompt. I gave a simple example of a 2-agent prompt script that produces a newsletter, and another example of an agent that produces a shell command that then gets directly executed. Think of these are simplest legos, from which you can build all kinds of complex things, the way you can build Tetris from NAND gates.

In short, having a LLM in a literate programming environment has done me some good, and I hope it will do some of you some good too. Give this a shot if only for the exercise of knowing how to do it. It's nice to be independent of the interfaces, and it's a "gateway drug" into building your own apps.