Back to blog
artificial-intelligencevoice-agentscustomer-serviceautomationsmes

AI Voice Agents: How to Handle Calls in Your SME 24/7 Without Expanding Staff

AI voice agents can now hold phone conversations indistinguishable from a human. We explain what they can do, what tools are available, how much they cost, and how to implement them in a Spanish SME to handle calls 24 hours a day without hiring anyone else.

The phone rings at your company at 9:15 PM on a Thursday. A customer wants to confirm the pickup time for their order. No one answers. They leave a message in the mailbox that no one will hear until 9 AM the next day. By that time, they have already called the competition. And the competition did pick up the phone.

This scenario repeats thousands of times a day in Spanish SMEs. According to industry data, between 20% and 40% of incoming calls outside of business hours are lost without being answered. In sectors like reservations, clinics, workshops, or professional services, every lost call is a lost sale.

Hiring a call center costs money. Setting up rotating shifts costs even more. And most of the lost calls are not emergencies: they are routine inquiries that a well-trained machine could resolve in 40 seconds.

In 2026, that machine exists. It is called a voice agent and is the natural evolution of text chatbots: an AI that handles phone calls with natural voice, understands context, queries your systems, and resolves the inquiry without the customer knowing they are speaking to a program. Or knowing it and not caring, because the service works.

What exactly is a voice agent

A voice agent is a system that combines three technologies into a single phone conversation:

  1. Speech-to-text (STT): Transcribes what the customer says in real-time.
  2. Language Model (LLM): Interprets the intention, queries data, and generates the response.
  3. Text-to-speech (TTS): Converts the response into natural voice with human intonation.

All of this happens with a latency of between 500 and 800 milliseconds, which is perceived as a fluid conversation. The latest voice models like ElevenLabs v3, OpenAI Realtime, or Google Gemini Live produce voices so natural that in blind tests, most users cannot distinguish whether they are speaking to a person or an AI.

This is not science fiction. In 2026, Spanish companies are handling thousands of calls daily with voice agents in production, in sectors such as insurance, hospitality, dealerships, private clinics, and home services.

What a voice agent can do today

Manage reservations and appointments

A voice agent can answer a customer who wants to reserve a table, book a medical appointment, or schedule a technical visit. It queries the calendar, offers available slots, confirms customer data, registers the appointment, and sends a confirmation SMS or email. All within a call of less than a minute.

The most typical case: restaurants that receive dozens of calls to reserve during peak hours, just when staff cannot answer the phone because they are serving tables. The voice agent handles all of them; the restaurant doesn't lose reservations, and the staff doesn't have to interrupt service.

Answer frequently asked questions

Hours, locations, prices, return policies, product availability, order status. 60-70% of calls received by many SMEs are repetitive questions whose answers are on the website or in the internal system. A voice agent with access to your data (via RAG or MCP) answers them instantly.

Qualify sales leads

When someone calls to request a quote, a voice agent can ask the initial questions: what exactly they need, what budget they have, when they need it, where they are located. It records the answers in your CRM and only forwards leads that meet the criteria to a human sales representative. Your sales team stops wasting time filtering and focuses on closing deals.

Outbound confirmations and reminders

It doesn't just handle incoming calls. A voice agent can call customers to confirm appointments the day before, remind them of pending payments, notify them of an incoming order, or conduct satisfaction surveys. Repetitive tasks that consume hours of staff time and that an AI does in minutes.

Intelligent transfer to humans

When it detects that the inquiry requires human attention (a complaint, a complex case, an upset customer), it transfers the call to the appropriate agent. But it doesn't do it blankly: it passes a summary of the conversation and relevant data, so the human doesn't have to start from scratch by asking the same questions.

The main platforms in 2026

There are dozens of tools, but four dominate the enterprise market:

Vapi

A developer-oriented platform, very flexible and with good documentation. It allows combining any STT (Deepgram, Whisper), any LLM (Claude, GPT, local models), and any TTS (ElevenLabs, PlayHT, Cartesia). Cost: about $0.05-$0.12 per minute of conversation, depending on the models chosen. Ideal if you need total control over the architecture.

Retell AI

A more managed approach, with very low latencies (below 800 ms) and good support for batch outbound calls. It integrates well with Twilio and has SDKs for building complex flows. Prices starting from $0.07 per minute. Popular in call centers and outbound operations.

ElevenLabs Conversational AI

Takes advantage of ElevenLabs already leading the TTS market with the most natural voices. Its conversational platform is all-in-one and the easiest to set up. Ideal for companies that don't want to get into the technical architecture. Plans starting from $99/month with minutes included.

OpenAI Realtime API

The most integrated option if you already use the OpenAI ecosystem. Direct voice models, without the need to compose STT + LLM + TTS separately. Very low latency. Cost: about $0.06 per minute of input and $0.24 per minute of output. Powerful but can be expensive at volume.

Open Source Options

For those who want total control and maximum privacy, open stacks exist with Pipecat (orchestration), Whisper (STT), Llama 4 or Qwen 3 (LLM, see comparativa open source vs propietaria) and Kokoro or XTTS (TTS). The quality is still slightly below commercial voices, but the data never leaves your infrastructure. Cost: only hosting.

How much does it really cost to implement

The public numbers from the platforms are less scary when put into business context.

Suppose an SME receives an average of 60 calls a day, with an average duration of 2 minutes. That's 120 minutes daily, about 3,600 minutes a month.

  • Platform Cost (API): between €180 and €350/month depending on the tool
  • Telephony Cost (Twilio or similar): about €30-€60/month
  • LLM and TTS Model Cost: included in the above or adds about €100-€200/month extra depending on usage
  • Total Monthly Cost: between €300 and €600/month to handle all calls 24/7

Compare this to the cost of a person managing calls: minimum €1,500/month with payroll, without covering nights, weekends, or peaks.

The savings are not only in the direct cost: they are in the calls that were previously lost and are now answered. If every answered call represents an average of €20-€40 in business, and you used to lose 10 a day, the numbers balance out in weeks.

How to implement a voice agent step by step

1. Identify the specific use case

Don't try to automate everything at once. Choose a typical call that repeats often and whose answer is clear: "confirm appointment," "check order status," "inquire about hours and address." Start there.

2. Design the conversation script

A voice agent doesn't improvise well if it lacks context. Write out the conversation flow: what it must ask, what data it must collect, how it must respond, and when it must transfer to a human. This is what makes the difference between a working agent and one that frustrates the customer.

3. Connect with your systems

A voice without data is useless. The agent must be able to query your calendar, CRM, order database, or reservation system. This is where MCP servers come in, allowing controlled access to the AI without exposing sensitive information.

4. Choose the voices

Voice is part of the brand. ElevenLabs and similar services allow cloning your own voices or choosing from libraries of neutral Spanish, Peninsular, or Latin American voices. Test several before deciding: a voice that sounds good in a test might be annoying after thousands of calls.

5. Integrate with your PBX

For the voice agent to handle your real calls, it must be connected to your phone line. The usual method is to use Twilio, Telnyx, or Vonage as a virtual number provider, with redirection from your current number. It can also integrate with existing PBXs (3CX, Aircall, Dialpad).

6. Deploy in a pilot and measure

Start by handling only a specific time slot (for example, outside of business hours) or only one type of inquiry. Measure three things: resolution rate (how many calls it ends alone), satisfaction (a brief survey at the end), and escalation (how many it transfers to a human). Adjust the script weekly.

7. Scale progressively

Once the pilot is validated, expand: more hours covered, more types of inquiries, outbound calls. The pattern is the same as we saw in email automation: start with human supervision and automate more only when the data gives you confidence.

The most common mistakes

Robotic or unnatural voice

Choosing cheap TTS just to save money and ending up with a voice that sounds like a 2010 GPS. Today there are excellent options at reasonable prices: if the customer perceives they are talking to a robot, they hang up. Invest in voice.

Not having a Plan B for complex cases

A voice agent must know when it doesn't know. If it fails to detect that an inquiry exceeds its capabilities and starts rambling, the customer gets frustrated. Intelligent transfer to a human with context is critical.

Scripts that are too rigid

Modern AI is good at interpreting natural language. Don't trap it in decision trees like "press 1 for sales, press 2 for support." Let the agent understand phrases like "I want to change my appointment date from Friday" and respond accordingly.

Ignoring regulations

In Spain and the EU, recording phone conversations requires informing the interlocutor at the beginning of the call. Using AI to answer must be announced with transparency: "this call is handled by a virtual assistant." The European AI Regulation in force in 2026 requires explicit transparency in these cases. It is not optional.

Thinking it replaces human staff

Voice agents do not replace people: they free up time. Your team stops taking routine calls to dedicate themselves to what a machine cannot do well: negotiations, complex complaints, consultative sales. If you lay off staff thinking that AI covers everything, the customer notices, and the business suffers.

Real-life cases already working in Spain

Some public examples that illustrate the potential:

  • Dental clinics that handle 80% of their reservation calls with voice agents, freeing up receptionists for in-person attention.
  • Dealerships that use outbound agents to call leads and qualify them before the human sales representative calls back.
  • High-volume restaurants that manage all reservations via AI voice and double the number of reserved tables without expanding staff.
  • Insurance companies that use voice agents for initial claims notification: collecting basic data, photos via the app, and forwarding to the adjuster.

The common pattern: a limited use case, integration with real systems, and a pilot phase before scaling.

Privacy, GDPR, and biometric voice

Voice is biometric data according to GDPR. Processing, recording, and analyzing it requires a legal basis, transparency, and security measures. Some keys to doing it right:

  • Inform at the start of the call: "this conversation will be handled by a virtual assistant and may be recorded to improve the service."
  • Keep only what is necessary: do not store recordings longer than strictly necessary.
  • Providers with servers in the EU: ElevenLabs, OpenAI, and Google have European regions; ensure you configure them.
  • Option to speak to a human: the customer must always be able to ask to speak to a person.
  • Activity log: keep logs of what the agent did in each call, in case an audit is needed.

If you work with especially sensitive data (health, financial), consider a self-hosted architecture with open-source models. It costs more but gives you total control.

When it makes sense and when it doesn't

It makes sense if:

  • You receive a significant volume of routine calls (>20/day)
  • You lose calls outside of business hours or during peaks
  • Your team spends more than 30% of its time on the phone
  • The most common inquiries have a clear and structured answer
  • You can integrate the agent with your data systems

It doesn't make sense if:

  • Your calls are always unique, consultative, and high-value
  • Your customer expects human treatment from the first second (private banking, luxury)
  • You do not have digital systems to query data (calendar, CRM, etc.)
  • The volume is so low that the implementation doesn't compensate

As always with AI, the key is choosing the right use case. A brilliant voice agent for the wrong case is wasted money.

How we can help

At Navel Digital, we design and implement voice agents adapted to the real call flow of each company. We analyze what calls you receive, which are candidates for automation and which are not, choose the appropriate platform and voices, connect with your systems, and accompany you through the pilot and production launch.

If you are losing calls outside of business hours or your team is glued to the phone when they could be doing things of greater value, let's talk. The first consultation is without commitment.

Let's talk

Contact

Interested in this topic?

Let's talk about how we can help you implement these systems in your business.

Let’s talk
Tell us what you have in mind.