Published using Google Docs
Linux device driver vulnerabilities - atlantic
Updated automatically every 5 minutes

Authors: cros-fuzzing-ext@google.com

Status: Final

Last updated: 2021-08-31

TABLE OF CONTENTS

TABLE OF CONTENTS        1

SYNOPSIS        2

PREFACE        2

Threat model        2

Methodology and Tools        3

EVALUATED HARDWARE AND SOFTWARE        4

Software stack        4

Hardware Stack        4

TECHNICAL FINDINGS        4

Summary        4

Details        5

Issue #1: Lack of a bounds check on LRO RXD’s next_desc_ptr        5

Root cause:        5

Observed Effects:        7

#1.1: OOB buff_ring reads overflow into device controlled DMA buffers        7

#1.2 OOB writes to driver and other kernel data structures        8

#1.3: Use of uninitialized rxdata        8

Recommendation:        8

Issue #2: Lack of extra consistency checks for EOP while gathering RX frags        8

Root cause:        8

Observed Effects:        9

#2.1: OOB write outside of packet buffers        9

#2.2: Use of uninitialized frag[0]        10

#2.3: Infinite busy loops        11

Issue #3 Lack of a bound check on TXD hw_head        12

Root Cause:        12

Observed Effects:        13

#3.1: Double kfree of socket buffers        13

Recommendation:        13

FURTHER RECOMMENDATIONS        14

SYNOPSIS

The atlantic (net/ethernet/aquantia/atlantic) Linux kernel driver was evaluated for security vulnerabilities at the device/driver interface. This was done as a prerequisite for allowlisting this device driver to bind against associated USB4 or Thunderbolt devices on Google Chromebook consumer devices.

We have found three separate Root Causes that can manifest as various security weaknesses. We believe that all of the root causes can be patched easily without requiring any intrusive changes to the driver.

Authors: cros-fuzzing-ext@google.com

PREFACE

Threat model

This is the general Threat model that we operate under. Note that some of the blocks below may not be relevant to the specific device/driver that this report covers.

Our threat model here treats Devices and Peripherals as untrusted. Though somewhat unconventional, this is a real concern due to factors such as

It’s worth highlighting that IOMMU is a necessary but not sufficient mitigation for us. Device drivers can have a rich attack surface even via the limited subset of memory that is mapped as accessible to the device.

Methodology and Tools

We pick out the Driver’s DMA surface as the main concern. DMA transactions performed by the device can change memory contents of mapped DMA buffers at arbitrary points in time. This can lead to situations where the data changes under the driver's feet while it is processing a DMA buffer, leading to a much larger and more subtle attack surface than interfaces where the data is either first copied or unmapped from the device. Secondarily, MMIO reads from device BARs - which present similar concerns but generally have a smaller attack surface.

Our methodology is a combination of VM introspection based fuzzing and manual code inspection. Fuzzing uses Kernel Fuzzer for Xen Project. We use it to identify driver sites that parse DMA’d buffers as well as to perform the actual coverage assisted fuzzing.
Like any fuzzing, the quality of results is very heavily based upon domain expertise. We do not claim this report to be comprehensive in any way and invite closer engagement with driver domain experts.

EVALUATED HARDWARE AND SOFTWARE

Software stack

Driver: atlantic

Linux kernel versions: 5.4 (LTS)

Although a distro-specific (Chrome OS) kernel was evaluated in this exercise, the distro does not carry any private patches for this driver. Vulnerabilities were also confirmed to be present in upstream kernel version v5.14 by visual code inspection.

Hardware Stack

Host system: Intel x86_64

Device model: Aquantia Corp. AQC107 NBase-T/IEEE 802.3bz Ethernet Controller [AQtion] (rev 02) via a Cable Matters Thunderbolt 3 to 10 Gb Ethernet Adapter

Caveats: We were unable to source A0, B1 or atl2 rev devices and lack coverage on the corresponding hw_atl code.

TECHNICAL FINDINGS

Summary

#

Title

CVSS score

1

Lack of a bounds check on RXD next_desc_ptr

6.0

2

Lack of a loop iteration bound while gathering RX frags

5.7

3

Lack of a bounds check on TX hw_head

5.7

Details

Issue #1: Lack of a bounds check on LRO RXD’s next_desc_ptr

CVSS:3.1/AV:P/AC:H/PR:N/UI:N/S:U/C:L/I:H/A:H


Root cause:

rxd_wb->next_desc_ptr comes from DMA accessible memory -- from an RX descriptor ring owned by the device. This is used as an index into RX buff_ring via buff->next without any further validation.

static int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s *self,

                                        struct aq_ring_s *ring)

{

/* ... */

                } else {

                        buff->len =

                                rxd_wb->pkt_len > AQ_CFG_RX_FRAME_MAX ?

                                AQ_CFG_RX_FRAME_MAX : rxd_wb->pkt_len;

                        if (HW_ATL_B0_RXD_WB_STAT2_RSCCNT &

                                rxd_wb->status) {

                                /* LRO */

                                buff->next = rxd_wb->next_desc_ptr;

                                ++ring->stats.rx.lro_packets;

                        } else {

                                /* jumbo */

                                buff->next =

                                        aq_ring_next_dx(ring,

                                                        ring->hw_head);

                                ++ring->stats.rx.jumbo_packets;

                        }

                }

/* ... */

}


It is later used as an offset onto driver-internal
buff_ring without any bounds checks.

int aq_ring_rx_clean(struct aq_ring_s *self,

                struct napi_struct *napi,

                int *work_done,

                int budget)

{

/* ... */

        if (!buff->is_eop) {

                buff_ = buff;

                do {

                next_ = buff_->next,

                buff_ = &self->buff_ring[next_];

                is_rsc_completed =

                        aq_ring_dx_in_range(self->sw_head,

                                next_,

                                self->hw_head);

                if (unlikely(!is_rsc_completed))

                        break;

                buff->is_error |= buff_->is_error;

                buff->is_cso_err |= buff_->is_cso_err;

                } while (!buff_->is_eop);

                if (!is_rsc_completed) {

                err = 0;

                goto err_exit;

                }

                if (buff->is_error || buff->is_cso_err) {

                buff_ = buff;

                do {

                        next_ = buff_->next,

                        buff_ = &self->buff_ring[next_];

                        buff_->is_cleaned = true;

                } while (!buff_->is_eop);

                ++self->stats.rx.errors;

                continue;

                }

        }

/* ... */

                        if (!buff->is_eop) {

                                buff_ = buff;

                                i = 1U;

                                do {

                                        next_ = buff_->next,

                                        buff_ = &self->buff_ring[next_];

                                        dma_sync_single_range_for_cpu(

                                                        aq_nic_get_dev(self->aq_nic),

                                                        buff_->rxdata.daddr,

                                                        buff_->rxdata.pg_off,

                                                        buff_->len,

                                                        DMA_FROM_DEVICE);

                                        skb_add_rx_frag(skb, i++,

                                                        buff_->rxdata.page,

                                                        buff_->rxdata.pg_off,

                                                        buff_->len,

                                                        AQ_CFG_RX_FRAME_MAX);

                                        page_ref_inc(buff_->rxdata.page);

                                        buff_->is_cleaned = 1;

                                        buff->is_ip_cso &= buff_->is_ip_cso;

                                        buff->is_udp_cso &= buff_->is_udp_cso;

                                        buff->is_tcp_cso &= buff_->is_tcp_cso;

                                        buff->is_cso_err |= buff_->is_cso_err;

                                } while (!buff_->is_eop);

                        }

/* ... */

}


Observed Effects:

#1.1: OOB buff_ring reads overflow into device controlled DMA buffers

Here’s an example virtual address map observed on a 2 CPU machine with default ring settings.

Alloc

Virtual address

size

[D]MA / [I]nternal

RX 1 dx_ring

0xffff888063e40000

0x8000

D

TX 1 buff_ring

0xffff888063e00000

0x30000

I

RX 1 buff_ring

0xffff888063de0000

0x18000

I

TX 1 dx_ring

0xffff888063dd0000

0x10000

D

Vec 1

0xffff888063dc9000

0x9b8

I

RX 0 dx_ring

0xffff888063dc0000

0x8000

D

TX 0 buff_ring

0xffff888063d80000

0x30000

I

RX 0 buff_ring

0xffff888063d60000

0x18000

I

TX 0 dx_ring

0xffff888063d50000

0x10000

D

Vec 0

0xffff888063d4c000

0x9b8

I

The range and alignment of buff_ is limited by sizeof(next_desc_ptr) and sizeof(buff_ring[0]). The possible range of OOB values for buff_ is buff_ring + X * 48 for X between self->size and UINT16_MAX.

However, the kernel allocates dx_rings right around the same location. dx_rings are DMA mapped and device controlled.  This allows the device to influence a skb_add_rx_frag call with device controlled struct page*, offset, and size.

In the above example, the RX 0 overflow extends a bit beyond RX 1 dx_ring. RX 1 overflow extends into unknown territory.

Important caveat: We have not found a way for the device to discover valid struct page* addresses by itself. Though it offers an attacker a potent capability in the presence of such an assist.

With IP forwarding, we hypothesize the device could construct a preamble for the packet such that it gets forwarded back via the same interface, thereby exfiltrating page contents. Another opportunity would be to route the packet to a cooperating userspace application.

#1.2 OOB writes to driver and other kernel data structures

Similarly, buff_->is_cleaned = true allows the device to set a specific single bit per 48 bytes. With our observed alignment and packing, this is known to overlap with:  is_cleaned of descriptors from other TX and RX rings, which are benign. The more interesting overlaps are with fields of  aq_vec_s such as aq_ring_s::buff_ring, aq_ring_s::size, aq_ring_s::page_order, etc. However these weren’t evaluated further due to the 1 bit restriction and due to the shared root cause with a trivial fix.

#1.3: Use of uninitialized rxdata

Even with an in-bounds buff_, it’s possible that next_ lies beyond sw_tail in which case its rxdata would not be correctly initialized yet.

Recommendation:

Check that next_desc_ptr is inside the boundaries of buff_ring and points to an active RXD.

Issue #2: Lack of extra consistency checks for EOP while gathering RX frags

CVSS:3.1/AV:P/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:H

Root cause:

While assimilating LRO and jumbo packets, aq_ring_rx_clean routinely relies on a single device controlled signal, buff->is_eop, to break loops.

static int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s *self,

                                        struct aq_ring_s *ring)

{

/* ... */

                if (HW_ATL_B0_RXD_WB_STAT2_EOP & rxd_wb->status) {

                        buff->len = rxd_wb->pkt_len %

                                AQ_CFG_RX_FRAME_MAX;

                        buff->len = buff->len ?

                                buff->len : AQ_CFG_RX_FRAME_MAX;

                        buff->next = 0U;

                        buff->is_eop = 1U;

                } else {

/* ... */

}

Observed Effects:

#2.1: OOB write outside of packet buffers

aq_ring_rx_clean continuously calls skb_add_rx_frag with an incrementing value of i until the device signals buff->is_eop.

int aq_ring_rx_clean(struct aq_ring_s *self,

                 struct napi_struct *napi,

                 int *work_done,

                 int budget)

{

/* ... */

                                do {

                                        next_ = buff_->next,

                                        buff_ = &self->buff_ring[next_];

                                        dma_sync_single_range_for_cpu(

                                                        aq_nic_get_dev(self->aq_nic),

                                                        buff_->rxdata.daddr,

                                                        buff_->rxdata.pg_off,

                                                        buff_->len,

                                                        DMA_FROM_DEVICE);

                                        skb_add_rx_frag(skb, i++,

                                                        buff_->rxdata.page,

                                                        buff_->rxdata.pg_off,

                                                        buff_->len,

                                                        AQ_CFG_RX_FRAME_MAX);

                                        page_ref_inc(buff_->rxdata.page);

                                        buff_->is_cleaned = 1;

                                        buff->is_ip_cso &= buff_->is_ip_cso;

                                        buff->is_udp_cso &= buff_->is_udp_cso;

                                        buff->is_tcp_cso &= buff_->is_tcp_cso;

                                        buff->is_cso_err |= buff_->is_cso_err;

                                } while (!buff_->is_eop);

/* ... */

}

static inline void __skb_fill_page_desc(struct sk_buff *skb, int i,

                                 struct page *page, int off, int size)

{

    skb_frag_t *frag = &skb_shinfo(skb)->frags[i];

    /*

     * Propagate page pfmemalloc to the skb if we can. The problem is

     * that not all callers have unique ownership of the page but rely

     * on page_is_pfmemalloc doing the right thing(tm).

     */

    frag->bv_page          = page;

    frag->bv_offset        = off;

    skb_frag_size_set(frag, size);

    page = compound_head(page);

    if (page_is_pfmemalloc(page))

         skb->pfmemalloc    = true;

}

The device can use this to overflow skb_shared_info::frags which is a fixed size array of size MAX_SKB_FRAGS. Furthermore, per Issue #1, page and off could also be influenced by the device by influencing buff_

#2.2: Use of uninitialized frag[0]

!buff->is_eop with buff->len <= AQ_CFG_RX_HDR_SIZE can cause the driver to skip skb_add_rx_frag for i == 0 but then add subsequent frags starting with i == 1.

 int aq_ring_rx_clean(struct aq_ring_s *self,

                struct napi_struct *napi,

                int *work_done,

                int budget)

{

/* ... */

                hdr_len = buff->len;

                 if (hdr_len > AQ_CFG_RX_HDR_SIZE)

                         hdr_len = eth_get_headlen(skb->dev,

                                                   aq_buf_vaddr(&buff->rxdata),

                                                   AQ_CFG_RX_HDR_SIZE);

                 memcpy(__skb_put(skb, hdr_len), aq_buf_vaddr(&buff->rxdata),

                        ALIGN(hdr_len, sizeof(long)));

                 if (buff->len - hdr_len > 0) {

                         skb_add_rx_frag(skb, 0, buff->rxdata.page,

                                         buff->rxdata.pg_off + hdr_len,

                                         buff->len - hdr_len,

                                         AQ_CFG_RX_FRAME_MAX);

                         page_ref_inc(buff->rxdata.page);

                 }

                 if (!buff->is_eop) {

                         buff_ = buff;

                         i = 1U;

                         do {

                                 next_ = buff_->next,

                                 buff_ = &self->buff_ring[next_];

                                 dma_sync_single_range_for_cpu(

                                                 aq_nic_get_dev(self->aq_nic),

                                                 buff_->rxdata.daddr,

                                                 buff_->rxdata.pg_off,

                                                 buff_->len,

                                                 DMA_FROM_DEVICE);

                                 skb_add_rx_frag(skb, i++,

                                                 buff_->rxdata.page,

                                                 buff_->rxdata.pg_off,

                                                 buff_->len,

                                                 AQ_CFG_RX_FRAME_MAX);

                                 page_ref_inc(buff_->rxdata.page);

                                 buff_->is_cleaned = 1;

                                 buff->is_ip_cso &= buff_->is_ip_cso;

                                 buff->is_udp_cso &= buff_->is_udp_cso;

                                 buff->is_tcp_cso &= buff_->is_tcp_cso;

                                 buff->is_cso_err |= buff_->is_cso_err;

                         } while (!buff_->is_eop);

                 }

/* ... */

}

#2.3: Infinite busy loops

The device can easily cause the kernel to infinitely busy loop by not emitting a  is_eop frag.

int aq_ring_rx_clean(struct aq_ring_s *self,

                 struct napi_struct *napi,

                 int *work_done,

                 int budget)

{

/* ... */

                if (!buff->is_eop) {

                        buff_ = buff;

                        do {

                                next_ = buff_->next,

                                buff_ = &self->buff_ring[next_];

                                is_rsc_completed =

                                        aq_ring_dx_in_range(self->sw_head,

                                                         next_,

                                                         self->hw_head);

                                if (unlikely(!is_rsc_completed))

                                        break;

                                buff->is_error |= buff_->is_error;

                                buff->is_cso_err |= buff_->is_cso_err;

                        } while (!buff_->is_eop);

/* ... */

}


RECOMMENDATION:

Limit all affected loop iterations to process at most MAX_SKB_FRAGS frags (accounting for any processed outside the loop). Check that is_eop is consistent with buffer length and any other relevant status conditions.

Issue #3 Lack of a bound check on TXD hw_head

CVSS:3.1/AV:P/AC:L/PR:N/UI:N/S:U/C:N/I:H/A:H

Root Cause:

TX ring’s hw_head is read directly from the device via MMIO without further scrutiny. The device can influence it to be any value between 0x0 and 0x1fff.

static int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_s *self,

                                         struct aq_ring_s *ring)

{

        int err = 0;

        unsigned int hw_head_ = hw_atl_tdm_tx_desc_head_ptr_get(self, ring->idx);

        if (aq_utils_obj_test(&self->flags, AQ_HW_FLAG_ERR_UNPLUG)) {

                err = -ENXIO;

                goto err_exit;

        }

        ring->hw_head = hw_head_;

        err = aq_hw_err_from_flags(self);

err_exit:

        return err;

}

u32 hw_atl_rdm_rx_desc_head_ptr_get(struct aq_hw_s *aq_hw, u32 descriptor)

{

        return aq_hw_read_reg_bit(aq_hw, HW_ATL_RDM_DESCDHD_ADR(descriptor),

                                 HW_ATL_RDM_DESCDHD_MSK,

                                 HW_ATL_RDM_DESCDHD_SHIFT);

}

Observed Effects:

#3.1: Double kfree of socket buffers

Unexpected values of hw_head may cause aq_ring_tx_clean to double dev_kfree_skb_any already cleaned parts of the ring.

bool aq_ring_tx_clean(struct aq_ring_s *self)

{

/* ... */

        for (budget = AQ_CFG_TX_CLEAN_BUDGET;

         budget && self->sw_head != self->hw_head; budget--) {

        struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];

/* ... */

                if (unlikely(buff->is_eop)) {

                        ++self->stats.rx.packets;

                        self->stats.tx.bytes += buff->skb->len;

                        dev_kfree_skb_any(buff->skb);

                }

                buff->pa = 0U;

                buff->eop_index = 0xffffU;

                self->sw_head = aq_ring_next_dx(self, self->sw_head);

        }

        return !!budget;

}

It’s worth noting that aq_ring_next_dx is successful in preventing the buff from overflowing outside the ring.

Recommendation:

Validate that hw_head is within the bounds of sw_head and sw_tail.

FURTHER RECOMMENDATIONS

Also please inspect and apply fixes to A0, B1, and atl2 code as applicable.