Engineering perfect time calls and frequency IDs: the small details broadcast listeners notice when they go wrong
By the KAVANA engineering team — June 2026
There are two categories of on-air error that listeners notice immediately and remember long after the broadcast has ended. The first is a time call that is wrong — by a few seconds, by a minute, or simply by being absent when the audience expected it. The second is a frequency ID that sounds like it was produced for a different station, or that plays four times in a row across a single hour, or that mangles the station's actual frequency in a way that no local would pronounce. Both of these failures seem like minor production problems from the inside of a broadcast facility. From outside, in a car or a kitchen or a workshop, they are the clearest possible signal that something is not right.
We have been building broadcast automation systems for twenty years, across county-level and regional stations throughout China. Time calls and frequency IDs are among the most technically constrained elements in broadcast automation — not because the problems are fundamentally hard, but because the constraints compound in ways that are easy to underestimate until something breaks on air. This post is an honest account of what those constraints are, how they interact, and how we think about solving them.
Why you cannot simply use NTP time for a time call
The most common question we receive when a station is setting up automated time calls for the first time is: can we just read the system clock and use that? The clock is synchronized to NTP, the NTP source is synchronized to the national standard — surely the number we read from the clock is the correct one to announce.
The number from the clock is correct. The problem is not accuracy; it is timing. A time call is not a fact being stated — it is a coordination event. When the announcer says "now is eight o'clock," the word "now" has to coincide, at the listener's ear, with the actual moment the hour begins. This means the audio has to arrive at the listener at a predictable and controllable time, and everything between the system clock and the listener's ear introduces delay that the system has to account for.
Those delays are not small. The audio playback buffer in a typical broadcast automation system is 100 to 300 milliseconds. The broadcast processor at the transmitter adds another 50 to 200 milliseconds depending on configuration. An FM transmission chain adds negligible delay — the speed-of-light propagation time for a local FM station is under a millisecond — but the listener's radio receiver adds its own buffer, typically 50 to 100 milliseconds depending on the RDS circuitry. An IP streaming path, if the station also streams online, adds 2 to 15 seconds depending on the CDN and the listener's buffer settings.
Add all of this together for the analog FM path and the cumulative delay between the moment the software decides to start playing a time call and the moment the listener hears it is roughly 300 to 600 milliseconds — and that is a best case with well-characterized equipment and no network variability. The IP streaming path introduces delays that are an order of magnitude larger and not controllable by the broadcast facility.
The approach we take in KAVANA-MGR is to characterize the delay for each station's specific equipment chain, build that characterization into the scheduling logic, and schedule the time call audio to begin playing an appropriate interval before the nominal clock time. This is not complicated in principle — it is just careful measurement and consistent application. The part that is frequently skipped, with predictable consequences, is the measurement step. Assuming a 300-millisecond delay for a chain that actually has 500 milliseconds of delay means the time call consistently arrives a fifth of a second late. That is audible.
Network time variability and the audio buffer delay problem
Even a correctly characterized delay does not solve the problem completely, because the delays are not constant. Audio buffer delay is stable under normal conditions but changes when the playback system is under CPU load. Broadcast processors introduce variable delay when their lookahead limiting is engaged — the amount of lookahead limiting applied depends on the audio content, and a loud program element immediately before a time call changes the processor's state in ways that affect timing.
Network jitter is the largest source of variability for IP-synchronized clocks. GPS-synchronized clocks used in broadcast facilities achieve sub-millisecond accuracy. NTP over a local network achieves 1 to 10 milliseconds. NTP over the public internet achieves 10 to 100 milliseconds on a good day, with occasional large outliers when network routes change. For a time call that is supposed to coordinate with a national time signal, 100 milliseconds of clock error produces an audible and listener-noticeable mistiming.
The solution is not to use better internet NTP — the public internet is not a controllable timing reference for a broadcast application. The solution is to use a local GPS-disciplined clock or a PTP (Precision Time Protocol) reference where precision is required, and to design the time call scheduling logic to be tolerant of the residual uncertainty in the reference. A time call that announces "the time is now eight o'clock" while accounting for measured delay and residual uncertainty of ±20 milliseconds is effectively perfect by any broadcast standard. A time call that uses a local system clock synchronized to an NTP server three hops away, with no delay characterization, may be off by several hundred milliseconds in either direction depending on conditions.
For stations that cannot justify GPS-disciplined clocks, the practical approach is to synchronize to a reliable LAN-local time server, measure the playout delay carefully, and accept that the time call accuracy will be in the range of ±100 milliseconds. This is not noticeable to listeners. What is noticeable is consistent 300-millisecond late timing or inconsistent timing that is different every hour.
Cross-timezone operations and the time call announcement problem
Many broadcast groups operate multiple stations across different time zones, managed from a single automation system. This introduces a problem that is trivial to state and easy to implement incorrectly.
The time call for a station broadcasting to listeners in a specific time zone has to announce the local time for that time zone, not the time at the location of the automation server. This seems obvious. Where it goes wrong is in the details: does the automation system's scheduling database store schedule times in UTC, in server local time, or in station local time? If the answer changes depending on which table or which field you are looking at — a situation that occurs more often than it should in legacy broadcast software — then the time call logic that was correct in one configuration produces wrong announcements in another.
We store all schedule times in UTC and convert to station local time for display and for time call generation. Daylight saving time transitions are handled by the station's configured time zone, not by manually adjusting schedule offsets. This sounds like basic software design, and it is — but the number of broadcast automation systems we have encountered that store times in server local time, or that require manual schedule adjustment for each time zone offset change, is enough to make this worth stating explicitly.
The time call announcement itself — the audio content — also has to match the local time zone. A station in China does not have multiple time zones to worry about; China Standard Time covers the entire country. But for broadcast groups that operate internationally, or for stations that produce multilingual content for audiences across regions, this is a real production constraint.
Frequency ID: how many times per hour, and why it matters
The frequency ID — the short announcement of the station's name and frequency, typically 5 to 15 seconds long — serves a function that is easy to underestimate from inside the production facility. Listeners in China frequently move between reception areas and use their car radios' scan functions. A listener who receives a new station for the first time needs to quickly understand what they are listening to. The frequency ID provides that information.
The standard practice in Chinese broadcasting is to broadcast the frequency ID at the top of each hour, at the bottom of the hour, and at additional points determined by the station's format and regulatory requirements. For county-level AM and FM stations, the regulatory minimum is typically once per hour; the practical standard for competitive FM stations is four to six times per hour.
The engineering problem is not playing the frequency ID — that is straightforward scheduling. The engineering problem is ensuring that the frequency ID does not create listener fatigue through overrepetition, does not collide with other fixed elements (time calls, news headlines, traffic reports), and does not sound inconsistent across its multiple daily playbacks.
Overrepetition is a real risk in automated scheduling. A scheduler that places frequency IDs at fixed intervals — every ten minutes, say — without accounting for the natural structure of the broadcast hour will occasionally place a frequency ID immediately before or immediately after another fixed element, creating a cluster that sounds unplanned. More subtly, a station that plays the same single frequency ID recording every time will produce an effect where listeners unconsciously notice the repetition — the same breath, the same intonation inflection, the same room acoustic — and register it as something that is not organic. This does not require conscious attention; listeners habituate to repeated audio patterns faster than they do to varied ones.
Avoiding repetition fatigue: multiple versions and rotation logic
The practical solution to frequency ID repetition fatigue is to produce multiple versions and rotate through them. This requires planning at the production stage — the versions should be meaningfully different, not just processed differently from the same underlying recording — and it requires rotation logic in the automation system that avoids playing the same version in consecutive slots.
What does "meaningfully different" mean for a frequency ID? The content is the same — the station name and frequency — so the difference has to be in the presentation. Different announcer voice, different musical bed, different production style (spoken only versus spoken over music versus jingle), different emphasis within the same text. A set of four to six versions provides enough variation that repetition is not perceptible across a typical three-hour listening session.
The rotation logic also matters. Simple round-robin rotation is better than random selection, because random selection can and occasionally will play the same version several times in a row. Round-robin with a random starting point at the beginning of each broadcast day avoids the predictable pattern that pure round-robin creates over multiple days.
For stations that broadcast in multiple languages — common in ethnic minority regions of China, where programming mixes Mandarin with local languages — the language of the frequency ID needs to match the language of the surrounding content. Playing a Mandarin frequency ID in the middle of a minority-language program segment is not merely an inconsistency; for some audiences it signals that the station does not actually serve them, which undermines the purpose of the minority-language programming. The KAVANA AI broadcast suite handles multilingual scheduling by associating each content element with a language tag and applying matching logic to the ID rotation.
Multilingual frequency ID production: the localization problems
Producing frequency IDs in multiple languages introduces a set of technical problems that are different from the production problems involved in producing content in those languages.
The first problem is frequency number pronunciation. In Mandarin, a frequency like 106.7 MHz is read as "yī líng liù diǎn qī qiānhèzī" — a specific pronunciation convention that listeners in that market recognize. The same frequency presented to a text-to-speech system without guidance may be pronounced differently depending on how the system parses the number: as "one zero six point seven" in a literal translation mode, or as "106.7" with an incorrect decimal pronunciation, or with an incorrect tone on one of the digits. For minority languages spoken in regions where the local TTS coverage is sparse, the problem is more severe: the pronunciation conventions for frequencies and numbers may not be represented in the training data at all.
The second problem is station name localization. A station name that includes a place name, a frequency number, or a brand term may have different appropriate forms in different languages. The Mandarin name of a county does not necessarily have a direct equivalent in the local minority language; the minority-language version may use a traditional place name that differs from the Mandarin, or may transliterate the Mandarin, or may use a different form entirely. This requires editorial judgment at the production stage, not a technical solution.
The third problem is audio format consistency. Frequency IDs in different languages need to have consistent loudness, consistent frequency response, and consistent duration if they are to be integrated into the same scheduling system without manual adjustment for each language. TTS systems optimized for different languages often produce audio at different nominal levels and with different spectral characteristics; normalization to a consistent target is required before the ID recordings can be scheduled interchangeably. The KAVANA AI utilities page documents the specific normalization targets we use for this.
Putting it together: what a correctly engineered time call and frequency ID system looks like
A correctly engineered time call system characterizes the delay of the specific playout chain it operates in, accounts for that delay in scheduling, uses a reliable local time reference, handles time zone conversions at the data layer rather than through manual schedule adjustment, and produces audio that arrives at the listener's ear within 100 milliseconds of the intended moment.
A correctly engineered frequency ID system produces four to six versions of each ID, rotates through them with a logic that avoids adjacent repetition, matches the language of the ID to the language of the surrounding content, normalizes all versions to consistent loudness and format, and integrates ID placement with the overall scheduling logic so that IDs do not cluster with other fixed elements.
Neither of these systems is technically exotic. The engineering is not complicated. What makes them work correctly is attention to the details that are easy to defer: the delay measurement, the rotation logic, the multilingual matching, the normalization targets. These are not features that listeners notice when they are present. They are details that listeners notice, immediately and persistently, when they are absent.
The KAVANA broadcast management system handles time call scheduling, delay compensation, frequency ID rotation, and multilingual content matching as integrated parts of the scheduling engine rather than as afterthoughts layered on top of a basic playout system. The distinction matters in practice because the failure modes of time calls and frequency IDs are not independent of the rest of the schedule — they interact with program element timing, with loudness processing state, and with the multilingual content structure in ways that require the scheduling system to treat them as first-class concerns.