Standards

Navigation

Comments on DRAFT
AES67-xxxx

last updated 2013-09-10

Comments to date on DRAFT AES67-xxxx, AES standard for audio applications of networks - High-performance streaming audio-over-IP interoperability ,
published 2013-07-29 for comment.


The comment period has closed.

Comment received from Mike Law, 2013-07-31

Can I thank Kevin and all those associated with this project for their efforts here. RFC 3191 / 3551 mention emphasis and DAT compressed bit modes. Its worthwhile mentioning that these are not to be supported, otherwise the receivers are going to have to add support for no benefit.

There is clearly a lot of effort that is needed to derive clock timing information, and to work out the latency present. Am I right in thinking that one single network receiving device could derive all timing from the network and not require a DARS? In a big Broadcast installation, each piece of equipment is likely to have DARS, and might also have multiple Network sources. The latency could be assured to be within a small time, and possibly fixed at a defined setting of say 1mS? Given that delay build up in Broadcast is a real issue, has this been thought of?

Finally, MADI has known delay ( small ) and Point-point Ethernet type links are already in use by Calrec, Glensound and others. These point-point links have very small latency ( even down to one sample time) but each Company does its own thing. AES50 kind of deals with these links but is overcomplicated by support for scrambling, forward error correction, One bit or PCM modes and Clock pair distribution ( even bidirectional!)

Is there a movement towards standardising a simple point-point link that all could use?

thanks,
Mike Law ( BCD Audio )

Reply from Kevin Gross, Vice-chair SC-02-12, 2013-08-01

Mike,

Thanks for your comments. My responses on behalf of SC-02 are inline below.

Please reply by the end of the comment period if this reply is not acceptable to you. You may also ask us to consider your comments again for the next revision of the document. You may also appeal our decision to the Standards Secretariat.

All comments and answers are being accumulated in a subject-named file accessible via the comment Web page.

"RFC 3191 / 3551 mention emphasis and DAT compressed bit modes. Its worthwhile mentioning that these are not to be supported, otherwise the receivers are going to have to add support for no benefit."

A 12-bit "DAT12" payload format is defined in RFC 3190 (section 3). Since the proposed standard does not list this format as an interoperability option in clause 7.1, there is unlikely to be confusion as to whether implementation of this format is required or recommended. It is not.

There is an "emphasis" signaling parameter defined in RFC 3190 (section 5). As a general engineering practice, emphasis is not used on 16 and 24-bit digital audio systems within scope of this proposed standard. There is no requirement or recommendation to support emphasis in the proposed standard. To clarify we will add an informative note indicating that receivers can safely assume emphasis is not in use on interoperable audio streams.

"There is clearly a lot of effort that is needed to derive clock timing information, and to work out the latency present. Am I right in thinking that one single network receiving device could derive all timing from the network and not require a DARS?"

IEEE 1588 can be used to deliver a DARS quality clock on networks with IEEE 1588 or IEEE 802.1AS support. On networks without such support, a separate DARS connection may be useful to improve clock accuracy though the details of such an implementation are outside the scope of the proposed standard.

"In a big Broadcast installation, each piece of equipment is likely to have DARS, and might also have multiple Network sources. The latency could be assured to be within a small time, and possibly fixed at a defined setting of say 1mS? Given that delay build up in Broadcast is a real issue, has this been thought of?"

Latency management was discussed by the task group but no existing standards support this capability so it was deemed out of scope. We did include recommendations in anticipation of supporting latency management in the future. Specifically, clause 7.4 recommends that receivers use a constant link offset and report their offset.

"Finally, MADI has known delay ( small ) and Point-point Ethernet type links are already in use by Calrec, Glensound and others. These point-point links have very small latency ( even down to one sample time) but each Company does its own thing. AES50 kind of deals with these links but is overcomplicated by support for scrambling, forward error correction, One bit or PCM modes and Clock pair distribution ( even bidirectional!) Is there a movement towards standardising a simple point-point link that all could use?"

A point-to-point Ethernet connection is a valid IP network and so AES67 can also support point-to-point applications but this would arguably be more complex than AES50. The motivation for a network standard is to eliminate the need to move wires to establish signal paths. Moving beyond a point-to-point paradigm for signal routing gives us an opportunity to improve interoperability which we are attempting to exploit with this proposed standard.

Comment received from Aidan Williams, 2013-08-12

Hi,

Audinate has the following remarks after reviewing the aes67-xxxx-130729-cfc.pdf document.

Page 13.
"The media clock for a 48 kHz stream will overflow its 32-bit representation approximately every 24,86 hours."

Non-English speaking countries in Europe use the comma instead of the decimal point but this document is in English so it should be 24.86 hours.

Page 17 chapter 7.2.2.
The requirement to support 1ms at 96k puts an excessive memory buffer burden on the transmitter especially for devices supporting many audio channels.

For an AVB implementation, only 12 samples per channel per packet must be stored at a 96kHz sample rate, but for AES X192 96 samples is mandatory if the sender supports a 96k sample rate. This is a huge difference.

Audinate has a high channel count networked audio implementation using a massive FPGA. An X192 implementation in such a device would require of all the block RAMs for the TX buffer resulting in one excessively large 2Mbit memory covering 65% of the die. Such a large TX buffer is not possible for smaller FPGAs (e.g. those used by Brooklyn II or the Dante PCIe card) because they just don't have enough block RAM. X192 would require 1/2 of the BRAMs available in an LX25 for 64 channels.

A much more realistic scenario would be to do 1ms @ 48k and 0.5ms @ 96k (or even better 0.333ms). A sample rate of 96kHz is typically used for lower latency and therefore it is appropriate to process packets at a higher packet rate.

The document should specify a required transmit buffer size range at 48kHz and allow the same sized buffer to be used at higher sample rates. A suggested range is: 2/3ms - 1ms @ 48kHz, with 1ms recommended.

regards
aidan

Reply from Mark Yonge, AES Standards Manager, 2013-08-13

Dear Mr. Williams,

This refers to your comments on DRAFT AES67-xxxx, received 2013-08-12. Your first comment purely concerns editorial style, so I shall answer it here.

AES standards style derives from IEC standards style and, in particular, ISO/IEC Directives, Part 2, "Rules for the structure and drafting of International Standards" The latest version is the sixth edition, 2011.

http://www.iso.org/iso/standards_development/processes_and_procedures/iso_iec_directives_and_iso_supplement.htm

In Annex I (informative) "Quantities and units", the first requirement is that "a) The decimal sign shall be a comma." This is independent of the language used.

A decimal comma has been used in IEC for very many years. The AES has used a decimal comma for over 12 years. We do not propose to change it for this document.

I fully understand that it may seem strange to people accustomed to a decimal point, however our policy is clear and I note an additional advantage in clarifying the difference between decimal numbers and clause references that do use a point (full stop) as a separator.

Sincerely,

Mark Yonge
AES Standards Manager

Reply from Kevin Gross, Vice-chair SC-02-12, 2013-08-14

Aidan,

The comment referencing Page 17 chapter 7.2.2 suggests that a different packet time should be used for 96 compared to 48 kHz sampling frequency. First, it should be noted that this draft standard for interoperability does not require support for 96 kHz ('shall'), but it is recommended ('should'). Second, to maximize interoperability and simplify implementation, the task group felt it best to limit the number of packet times supported and so did not define separate packet times for 96 kHz streams.

A 1 ms packet time requirement was agreed upon by the group based on the belief that 1 ms offers the best opportunity for interoperability among different types of implementations (commercial OS, embedded and hardware) and network configurations. Other options considered included no requirement for specific packet time support and introduction of profiles with different packet time requirements for different classes of application.

It is recognised that the 1 ms packet time strains the buffer capacity on hardware implementations. These implementations are better suited to the shorter packet times (333, 250, 125 us) which are expected to be popular options under the interoperability standard. Implementations are required to indicate in their product documentation which packet times are supported (7.2.0 final paragraph).

In order to assure that hardware implementations can participate at the required 1 ms packet time, language was introduced in 7.2.1 (third paragraph) allowing for implementation of packet time as a mode. Under this provision, a hardware implementation can implement the 1 ms packet time as a separate mode with reduced channel count and buffer reallocated to support longer packet time over the fewer available channels. At a minimum, devices are required to receive an 8-channel stream. Receive buffer requirement for a 1-ms, 8-channel stream is 1152 samples at 48 kHz sampling frequency and 2304 samples at 96 kHz. Transmit buffer requirements are significantly lower.

Please reply by the end of the comment period if this reply is not acceptable to you. You may also ask us to consider your comments again for the next revision of the document. You may also appeal our decision to the Standards Secretariat.

Further comment (2) received from Aidan Williams, 2013-08-22

Hi Kevin,

As we read the standard, supporting 96kHz requires a 1ms packet time to be supported. The fact that 96kHz is optional is beside the point.

Most likely, our FPGA implementation would comply with the standard as currently written for 48kHz only.

Reducing the number of channels to achieve compliance with the standard @96kHz is not in our opinion a workable solution. We would therefore choose to implement a not-strictly-compliant 96kHz implementation rather than implement a channel reduction mode. I don't think this is a desirable state of affairs.

From my point of view, the issue is not resolved - you basically state the case for the document as is - and I don't think that is workable. Fortunately, I believe a simple workable solution exists and that would be to allow (i.e. not prohibit) the packet time from reducing with increasing sample rate, thus allowing the amount of buffering required in the transmitter to remain constant with increasing sample rate.

- aidan

Reply from Kevin Gross, Vice-chair SC-02-12, 2013-08-27

Aidan,

In my last response, yes, I did state the case for the document as it is and so will not belabor that in this response. It appears we have a disagreement as to whether reducing channel count is a workable solution for economizing buffer. I appreciate that reducing channel count places limitations on an implementation but the task group did not see it as unworkable.

Your request to add a 0.5 ms packet time for 96 kHz streams cannot simply be added to the draft at this stage. The request could only be immediately considered by sending the draft back to the task group for revision the restarting the standards approval process. This step would be taken if a change was required to address a serious oversight or deficiency. As outlined in my previous response, I find that the issue you raise is neither an oversight nor a serious deficiency.

Please reply by the end of the comment period if this reply is not acceptable to you. You may also ask us to consider your comments again for the next revision of the document. You may also appeal our decision to the Standards Secretariat.

Kevin Gross

Further comment (3) received from Aidan Williams, 2013-08-28

Hi Kevin,

You're right, we disagree.

Your reply is not acceptable to me since we have carefully considered what would be required for us to implement the draft AES67 standard as written.

I would rather not do this, but we would opt for a non-compliant implementation at 96kHz for the reasons I have outlined. The choice is not simply a matter of taste but derives from customer requirements and implementation constraints.

This is at least a problem for the standard, since I believe the only sensible FPGA implementation option for us at large channel counts is to implement a non-compliant solution for 96kHz.

Why this is a major change anyway? As you already pointed out, 96kHz is optional. Perhaps judicious insertion of a "should" to relax the requirements at 96kHz might be a possible solution? If it is a contentious area it is likely unwise to be too prescriptive.

regards, aidan

Reply from Kevin Gross, Vice-chair SC-02-12, 2013-08-28

What we are bumping up against here are limited options at this late stage of the AES standardization process. Issues like the one you are broaching here are to be worked out in the task group. As we move towards approval, things stabilize until we arrive here at public CFC where there are three options we're focused on:

• Making editorial clarifications to things that are unclear to fresh eyes;
• If someone kicks a big hole in it, sending the draft back to the task group for more work;
• Approving the draft for publication as a standard.

I hope you appreciate that the scope of your comment does not fit in (1). In previous responses I have argued that it does not trigger (2).

Minimum required functionality for a AES67 compliant implementation is clearly 48 kHz @1 ms packet time. Requirements are relaxed when it comes to 96 kHz and other packet times. Furthermore, implementations are free to include functionality beyond what is described in the standard. In the latitude offered beyond 48 kHz @ 1 ms by the draft, I expect that you can find room to implement your "non-compliant 96 kHz solution." To maximize interoperability opportunities, I encourage you to use the other packet times recommended in the draft to do so. I encourage you to bring this to the task group for consideration in a follow-on revision of the standard.

Please reply by the end of the comment period if this reply is still not acceptable to you. I'm happy to discuss this further and I think we can resolve it but you may also choose to appeal this decision to the Standards Secretariat.

Kevin Gross

AES - Audio Engineering Society