commit f6f7927ac664ba23447f8dd3c3dfe2f4ee39272f
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Wed Aug 16 18:22:04 2023 +0200

    Linux 5.15.127
    
    Link: https://lore.kernel.org/r/20230813211710.787645394@linuxfoundation.org
    Tested-by: Thierry Reding <treding@nvidia.com>
    Tested-by: SeongJae Park <sj@kernel.org>
    Tested-by: Guenter Roeck <linux@roeck-us.net>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Tested-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Allen Pais <apais@linux.microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c597d8cb0d33462e449d7be345330e980abc035b
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Sun Aug 13 03:16:20 2023 +0000

    timers/nohz: Last resort update jiffies on nohz_full IRQ entry
    
    [ Upstream commit 53e87e3cdc155f20c3417b689df8d2ac88d79576 ]
    
    When at least one CPU runs in nohz_full mode, a dedicated timekeeper CPU
    is guaranteed to stay online and to never stop its tick.
    
    Meanwhile on some rare case, the dedicated timekeeper may be running
    with interrupts disabled for a while, such as in stop_machine.
    
    If jiffies stop being updated, a nohz_full CPU may end up endlessly
    programming the next tick in the past, taking the last jiffies update
    monotonic timestamp as a stale base, resulting in an tick storm.
    
    Here is a scenario where it matters:
    
    0) CPU 0 is the timekeeper and CPU 1 a nohz_full CPU.
    
    1) A stop machine callback is queued to execute somewhere.
    
    2) CPU 0 reaches MULTI_STOP_DISABLE_IRQ while CPU 1 is still in
       MULTI_STOP_PREPARE. Hence CPU 0 can't do its timekeeping duty. CPU 1
       can still take IRQs.
    
    3) CPU 1 receives an IRQ which queues a timer callback one jiffy forward.
    
    4) On IRQ exit, CPU 1 schedules the tick one jiffy forward, taking
       last_jiffies_update as a base. But last_jiffies_update hasn't been
       updated for 2 jiffies since the timekeeper has interrupts disabled.
    
    5) clockevents_program_event(), which relies on ktime_get(), observes
       that the expiration is in the past and therefore programs the min
       delta event on the clock.
    
    6) The tick fires immediately, goto 3)
    
    7) Tick storm, the nohz_full CPU is drown and takes ages to reach
       MULTI_STOP_DISABLE_IRQ, which is the only way out of this situation.
    
    Solve this with unconditionally updating jiffies if the value is stale
    on nohz_full IRQ entry. IRQs and other disturbances are expected to be
    rare enough on nohz_full for the unconditional call to ktime_get() to
    actually matter.
    
    Reported-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Paul E. McKenney <paulmck@kernel.org>
    Link: https://lore.kernel.org/r/20211026141055.57358-2-frederic@kernel.org
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b4d36e6c5dc417f3f394b5e576dec2760b3999ae
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Sun Aug 13 03:16:19 2023 +0000

    timers/nohz: Switch to ONESHOT_STOPPED in the low-res handler when the tick is stopped
    
    [ Upstream commit 62c1256d544747b38e77ca9b5bfe3a26f9592576 ]
    
    When tick_nohz_stop_tick() stops the tick and high resolution timers are
    disabled, then the clock event device is not put into ONESHOT_STOPPED
    mode. This can lead to spurious timer interrupts with some clock event
    device drivers that don't shut down entirely after firing.
    
    Eliminate these by putting the device into ONESHOT_STOPPED mode at points
    where it is not being reprogrammed. When there are no timers active, then
    tick_program_event() with KTIME_MAX can be used to stop the device. When
    there is a timer active, the device can be stopped at the next tick (any
    new timer added by timers will reprogram the tick).
    
    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20220422141446.915024-1-npiggin@gmail.com
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c3b954a51b6447d060c1b30ec4efb5db34a056f7
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Sun Aug 13 03:16:18 2023 +0000

    tick: Detect and fix jiffies update stall
    
    [ Upstream commit a1ff03cd6fb9c501fff63a4a2bface9adcfa81cd ]
    
    tick: Detect and fix jiffies update stall
    
    On some rare cases, the timekeeper CPU may be delaying its jiffies
    update duty for a while. Known causes include:
    
    * The timekeeper is waiting on stop_machine in a MULTI_STOP_DISABLE_IRQ
      or MULTI_STOP_RUN state. Disabled interrupts prevent from timekeeping
      updates while waiting for the target CPU to complete its
      stop_machine() callback.
    
    * The timekeeper vcpu has VMEXIT'ed for a long while due to some overload
      on the host.
    
    Detect and fix these situations with emergency timekeeping catchups.
    
    Original-patch-by: Paul E. McKenney <paulmck@kernel.org>
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit af99918f0e39aeb14d2cd08ca79faf9ccb1ec47f
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Jun 22 18:15:03 2023 +0000

    sch_netem: fix issues in netem_change() vs get_dist_table()
    
    commit 11b73313c12403f617b47752db0ab3deef201af7 upstream.
    
    In blamed commit, I missed that get_dist_table() was allocating
    memory using GFP_KERNEL, and acquiring qdisc lock to perform
    the swap of newly allocated table with current one.
    
    In this patch, get_dist_table() is allocating memory and
    copy user data before we acquire the qdisc lock.
    
    Then we perform swap operations while being protected by the lock.
    
    Note that after this patch netem_change() no longer can do partial changes.
    If an error is returned, qdisc conf is left unchanged.
    
    Fixes: 2174a08db80d ("sch_netem: acquire qdisc lock in netem_change()")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Stephen Hemminger <stephen@networkplumber.org>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Reviewed-by: Simon Horman <simon.horman@corigine.com>
    Link: https://lore.kernel.org/r/20230622181503.2327695-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 5d094d4e7b99c75da9ece3a9a955eb3728f3988c
Author: Masahiro Yamada <masahiroy@kernel.org>
Date:   Sat Jul 29 16:42:23 2023 +0900

    alpha: remove __init annotation from exported page_is_ram()
    
    commit 6ccbd7fd474674654019a20177c943359469103a upstream.
    
    EXPORT_SYMBOL and __init is a bad combination because the .init.text
    section is freed up after the initialization.
    
    Commit c5a130325f13 ("ACPI/APEI: Add parameter check before error
    injection") exported page_is_ram(), hence the __init annotation should
    be removed.
    
    This fixes the modpost warning in ARCH=alpha builds:
    
      WARNING: modpost: vmlinux: page_is_ram: EXPORT_SYMBOL used for init symbol. Remove __init or EXPORT_SYMBOL.
    
    Fixes: c5a130325f13 ("ACPI/APEI: Add parameter check before error injection")
    Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
    Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f8d6d25756ea51f381ce86166a01ae5018858d88
Author: Nilesh Javali <njavali@marvell.com>
Date:   Mon Aug 7 15:07:24 2023 +0530

    scsi: qedf: Fix firmware halt over suspend and resume
    
    commit ef222f551e7c4e2008fc442ffc9edcd1a7fd8f63 upstream.
    
    While performing certain power-off sequences, PCI drivers are called to
    suspend and resume their underlying devices through PCI PM (power
    management) interface. However the hardware does not support PCI PM
    suspend/resume operations so system wide suspend/resume leads to bad MFW
    (management firmware) state which causes various follow-up errors in driver
    when communicating with the device/firmware.
    
    To fix this driver implements PCI PM suspend handler to indicate
    unsupported operation to the PCI subsystem explicitly, thus avoiding system
    to go into suspended/standby mode.
    
    Fixes: 61d8658b4a43 ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
    Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
    Signed-off-by: Nilesh Javali <njavali@marvell.com>
    Link: https://lore.kernel.org/r/20230807093725.46829-1-njavali@marvell.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 85db1cd1744e1f1fa8d80040aa4727dd88a0e0f4
Author: Nilesh Javali <njavali@marvell.com>
Date:   Mon Aug 7 15:07:25 2023 +0530

    scsi: qedi: Fix firmware halt over suspend and resume
    
    commit 1516ee035df32115197cd93ae3619dba7b020986 upstream.
    
    While performing certain power-off sequences, PCI drivers are called to
    suspend and resume their underlying devices through PCI PM (power
    management) interface. However the hardware does not support PCI PM
    suspend/resume operations so system wide suspend/resume leads to bad MFW
    (management firmware) state which causes various follow-up errors in driver
    when communicating with the device/firmware.
    
    To fix this driver implements PCI PM suspend handler to indicate
    unsupported operation to the PCI subsystem explicitly, thus avoiding system
    to go into suspended/standby mode.
    
    Fixes: ace7f46ba5fd ("scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework.")
    Signed-off-by: Nilesh Javali <njavali@marvell.com>
    Link: https://lore.kernel.org/r/20230807093725.46829-2-njavali@marvell.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e70469c289539cdc2bed00e3e4b63ec5be4cb2cb
Author: Karan Tilak Kumar <kartilak@cisco.com>
Date:   Thu Jul 27 12:39:19 2023 -0700

    scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
    
    commit 5a43b07a87835660f91d88a4db11abfea8c523b7 upstream.
    
    fnic_clean_pending_aborts() was returning a non-zero value irrespective of
    failure or success.  This caused the caller of this function to assume that
    the device reset had failed, even though it would succeed in most cases. As
    a consequence, a successful device reset would escalate to host reset.
    
    Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
    Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
    Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
    Link: https://lore.kernel.org/r/20230727193919.2519-1-kartilak@cisco.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6bc7f4c8c27d526f968788b8a985896755b1df35
Author: Zhu Wang <wangzhu9@huawei.com>
Date:   Thu Aug 3 10:02:30 2023 +0800

    scsi: core: Fix possible memory leak if device_add() fails
    
    commit 04b5b5cb0136ce970333a9c6cec7e46adba1ea3a upstream.
    
    If device_add() returns error, the name allocated by dev_set_name() needs
    be freed. As the comment of device_add() says, put_device() should be used
    to decrease the reference count in the error path. So fix this by calling
    put_device(), then the name can be freed in kobject_cleanp().
    
    Fixes: ee959b00c335 ("SCSI: convert struct class_device to struct device")
    Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
    Link: https://lore.kernel.org/r/20230803020230.226903-1-wangzhu9@huawei.com
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 461f8ac666fa232afee5ed6420099913ec4e4ba2
Author: Zhu Wang <wangzhu9@huawei.com>
Date:   Tue Aug 1 19:14:21 2023 +0800

    scsi: snic: Fix possible memory leak if device_add() fails
    
    commit 41320b18a0e0dfb236dba4edb9be12dba1878156 upstream.
    
    If device_add() returns error, the name allocated by dev_set_name() needs
    be freed. As the comment of device_add() says, put_device() should be used
    to give up the reference in the error path. So fix this by calling
    put_device(), then the name can be freed in kobject_cleanp().
    
    Fixes: c8806b6c9e82 ("snic: driver for Cisco SCSI HBA")
    Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
    Acked-by: Narsimhulu Musini <nmusini@cisco.com>
    Link: https://lore.kernel.org/r/20230801111421.63651-1-wangzhu9@huawei.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 171e117cdc0a13702b7d1fedd875dc0c4a43b7d1
Author: Alexandra Diupina <adiupina@astralinux.ru>
Date:   Fri Jul 28 15:35:21 2023 +0300

    scsi: 53c700: Check that command slot is not NULL
    
    commit 8366d1f1249a0d0bba41d0bd1298d63e5d34c7f7 upstream.
    
    Add a check for the command slot value to avoid dereferencing a NULL
    pointer.
    
    Found by Linux Verification Center (linuxtesting.org) with SVACE.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Co-developed-by: Vladimir Telezhnikov <vtelezhnikov@astralinux.ru>
    Signed-off-by: Vladimir Telezhnikov <vtelezhnikov@astralinux.ru>
    Signed-off-by: Alexandra Diupina <adiupina@astralinux.ru>
    Link: https://lore.kernel.org/r/20230728123521.18293-1-adiupina@astralinux.ru
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7a792b3d888aab2c65389f9f4f9f2f6c000b1a0d
Author: Michael Kelley <mikelley@microsoft.com>
Date:   Fri Jul 28 21:59:24 2023 -0700

    scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
    
    commit 175544ad48cbf56affeef2a679c6a4d4fb1e2881 upstream.
    
    Hyper-V provides the ability to connect Fibre Channel LUNs to the host
    system and present them in a guest VM as a SCSI device. I/O to the vFC
    device is handled by the storvsc driver. The storvsc driver includes a
    partial integration with the FC transport implemented in the generic
    portion of the Linux SCSI subsystem so that FC attributes can be displayed
    in /sys.  However, the partial integration means that some aspects of vFC
    don't work properly. Unfortunately, a full and correct integration isn't
    practical because of limitations in what Hyper-V provides to the guest.
    
    In particular, in the context of Hyper-V storvsc, the FC transport timeout
    function fc_eh_timed_out() causes a kernel panic because it can't find the
    rport and dereferences a NULL pointer. The original patch that added the
    call from storvsc_eh_timed_out() to fc_eh_timed_out() is faulty in this
    regard.
    
    In many cases a timeout is due to a transient condition, so the situation
    can be improved by just continuing to wait like with other I/O requests
    issued by storvsc, and avoiding the guaranteed panic. For a permanent
    failure, continuing to wait may result in a hung thread instead of a panic,
    which again may be better.
    
    So fix the panic by removing the storvsc call to fc_eh_timed_out().  This
    allows storvsc to keep waiting for a response.  The change has been tested
    by users who experienced a panic in fc_eh_timed_out() due to transient
    timeouts, and it solves their problem.
    
    In the future we may want to deprecate the vFC functionality in storvsc
    since it can't be fully fixed. But it has current users for whom it is
    working well enough, so it should probably stay for a while longer.
    
    Fixes: 3930d7309807 ("scsi: storvsc: use default I/O timeout handler for FC devices")
    Cc: stable@vger.kernel.org
    Signed-off-by: Michael Kelley <mikelley@microsoft.com>
    Link: https://lore.kernel.org/r/1690606764-79669-1-git-send-email-mikelley@microsoft.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 0f52d7b782514cc073a8f301b6319b2254a1611f
Author: Tony Battersby <tonyb@cybernetics.com>
Date:   Mon Jul 24 14:25:40 2023 -0400

    scsi: core: Fix legacy /proc parsing buffer overflow
    
    commit 9426d3cef5000824e5f24f80ed5f42fb935f2488 upstream.
    
    (lightly modified commit message mostly by Linus Torvalds)
    
    The parsing code for /proc/scsi/scsi is disgusting and broken.  We should
    have just used 'sscanf()' or something simple like that, but the logic may
    actually predate our kernel sscanf library routine for all I know.  It
    certainly predates both git and BK histories.
    
    And we can't change it to be something sane like that now, because the
    string matching at the start is done case-insensitively, and the separator
    parsing between numbers isn't done at all, so *any* separator will work,
    including a possible terminating NUL character.
    
    This interface is root-only, and entirely for legacy use, so there is
    absolutely no point in trying to tighten up the parsing.  Because any
    separator has traditionally worked, it's entirely possible that people have
    used random characters rather than the suggested space.
    
    So don't bother to try to pretty it up, and let's just make a minimal patch
    that can be back-ported and we can forget about this whole sorry thing for
    another two decades.
    
    Just make it at least not read past the end of the supplied data.
    
    Link: https://lore.kernel.org/linux-scsi/b570f5fe-cb7c-863a-6ed9-f6774c219b88@cybernetics.com/
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Martin K Petersen <martin.petersen@oracle.com>
    Cc: James Bottomley <jejb@linux.ibm.com>
    Cc: Willy Tarreau <w@1wt.eu>
    Cc: stable@kernel.org
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
    Signed-off-by: Martin K Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b757ef99df3988c61e382daef48fd83ce447f2b7
Author: Pablo Neira Ayuso <pablo@netfilter.org>
Date:   Sun Aug 13 00:06:55 2023 +0200

    netfilter: nf_tables: report use refcount overflow
    
    commit 1689f25924ada8fe14a4a82c38925d04994c7142 upstream.
    
    Overflow use refcount checks are not complete.
    
    Add helper function to deal with object reference counter tracking.
    Report -EMFILE in case UINT_MAX is reached.
    
    nft_use_dec() splats in case that reference counter underflows,
    which should not ever happen.
    
    Add nft_use_inc_restore() and nft_use_dec_restore() which are used
    to restore reference counter from error and abort paths.
    
    Use u32 in nft_flowtable and nft_object since helper functions cannot
    work on bitfields.
    
    Remove the few early incomplete checks now that the helper functions
    are in place and used to check for refcount overflow.
    
    Fixes: 96518518cc41 ("netfilter: add nftables")
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9bdbbcf9d148aebd686aabe13e80e1e4e8610c60
Author: Ming Lei <ming.lei@redhat.com>
Date:   Tue Jul 11 17:40:41 2023 +0800

    nvme-rdma: fix potential unbalanced freeze & unfreeze
    
    commit 29b434d1e49252b3ad56ad3197e47fafff5356a1 upstream.
    
    Move start_freeze into nvme_rdma_configure_io_queues(), and there is
    at least two benefits:
    
    1) fix unbalanced freeze and unfreeze, since re-connection work may
    fail or be broken by removal
    
    2) IO during error recovery can be failfast quickly because nvme fabrics
    unquiesces queues after teardown.
    
    One side-effect is that !mpath request may timeout during connecting
    because of queue topo change, but that looks not one big deal:
    
    1) same problem exists with current code base
    
    2) compared with !mpath, mpath use case is dominant
    
    Fixes: 9f98772ba307 ("nvme-rdma: fix controller reset hang during traffic")
    Cc: stable@vger.kernel.org
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Tested-by: Yi Zhang <yi.zhang@redhat.com>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit d68f8ef6ef7047c7544095598add59f6ff156ed9
Author: Ming Lei <ming.lei@redhat.com>
Date:   Tue Jul 11 17:40:40 2023 +0800

    nvme-tcp: fix potential unbalanced freeze & unfreeze
    
    commit 99dc264014d5aed66ee37ddf136a38b5a2b1b529 upstream.
    
    Move start_freeze into nvme_tcp_configure_io_queues(), and there is
    at least two benefits:
    
    1) fix unbalanced freeze and unfreeze, since re-connection work may
    fail or be broken by removal
    
    2) IO during error recovery can be failfast quickly because nvme fabrics
    unquiesces queues after teardown.
    
    One side-effect is that !mpath request may timeout during connecting
    because of queue topo change, but that looks not one big deal:
    
    1) same problem exists with current code base
    
    2) compared with !mpath, mpath use case is dominant
    
    Fixes: 2875b0aecabe ("nvme-tcp: fix controller reset hang during traffic")
    Cc: stable@vger.kernel.org
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Tested-by: Yi Zhang <yi.zhang@redhat.com>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Signed-off-by: Keith Busch <kbusch@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ae6e21f8bb2a29db1c9be7a001b038e6f2b2afb0
Author: Josef Bacik <josef@toxicpanda.com>
Date:   Wed Aug 2 09:20:24 2023 -0400

    btrfs: set cache_block_group_error if we find an error
    
    commit 92fb94b69c6accf1e49fff699640fa0ce03dc910 upstream.
    
    We set cache_block_group_error if btrfs_cache_block_group() returns an
    error, this is because we could end up not finding space to allocate and
    mistakenly return -ENOSPC, and which could then abort the transaction
    with the incorrect errno, and in the case of ENOSPC result in a
    WARN_ON() that will trip up tests like generic/475.
    
    However there's the case where multiple threads can be racing, one
    thread gets the proper error, and the other thread doesn't actually call
    btrfs_cache_block_group(), it instead sees ->cached ==
    BTRFS_CACHE_ERROR.  Again the result is the same, we fail to allocate
    our space and return -ENOSPC.  Instead we need to set
    cache_block_group_error to -EIO in this case to make sure that if we do
    not make our allocation we get the appropriate error returned back to
    the caller.
    
    CC: stable@vger.kernel.org # 4.14+
    Signed-off-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 314135b7bae9618a317874ae195272682cf2d5d4
Author: Qu Wenruo <wqu@suse.com>
Date:   Thu Aug 3 17:20:43 2023 +0800

    btrfs: reject invalid reloc tree root keys with stack dump
    
    commit 6ebcd021c92b8e4b904552e4d87283032100796d upstream.
    
    [BUG]
    Syzbot reported a crash that an ASSERT() got triggered inside
    prepare_to_merge().
    
    That ASSERT() makes sure the reloc tree is properly pointed back by its
    subvolume tree.
    
    [CAUSE]
    After more debugging output, it turns out we had an invalid reloc tree:
    
      BTRFS error (device loop1): reloc tree mismatch, root 8 has no reloc root, expect reloc root key (-8, 132, 8) gen 17
    
    Note the above root key is (TREE_RELOC_OBJECTID, ROOT_ITEM,
    QUOTA_TREE_OBJECTID), meaning it's a reloc tree for quota tree.
    
    But reloc trees can only exist for subvolumes, as for non-subvolume
    trees, we just COW the involved tree block, no need to create a reloc
    tree since those tree blocks won't be shared with other trees.
    
    Only subvolumes tree can share tree blocks with other trees (thus they
    have BTRFS_ROOT_SHAREABLE flag).
    
    Thus this new debug output proves my previous assumption that corrupted
    on-disk data can trigger that ASSERT().
    
    [FIX]
    Besides the dedicated fix and the graceful exit, also let tree-checker to
    check such root keys, to make sure reloc trees can only exist for subvolumes.
    
    CC: stable@vger.kernel.org # 5.15+
    Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 69dd147de419b04d1d8d2ca67ef424cddd5b8fd5
Author: Qu Wenruo <wqu@suse.com>
Date:   Thu Aug 3 17:20:42 2023 +0800

    btrfs: exit gracefully if reloc roots don't match
    
    commit 05d7ce504545f7874529701664c90814ca645c5d upstream.
    
    [BUG]
    Syzbot reported a crash that an ASSERT() got triggered inside
    prepare_to_merge().
    
    [CAUSE]
    The root cause of the triggered ASSERT() is we can have a race between
    quota tree creation and relocation.
    
    This leads us to create a duplicated quota tree in the
    btrfs_read_fs_root() path, and since it's treated as fs tree, it would
    have ROOT_SHAREABLE flag, causing us to create a reloc tree for it.
    
    The bug itself is fixed by a dedicated patch for it, but this already
    taught us the ASSERT() is not something straightforward for
    developers.
    
    [ENHANCEMENT]
    Instead of using an ASSERT(), let's handle it gracefully and output
    extra info about the mismatch reloc roots to help debug.
    
    Also with the above ASSERT() removed, we can trigger ASSERT(0)s inside
    merge_reloc_roots() later.
    Also replace those ASSERT(0)s with WARN_ON()s.
    
    CC: stable@vger.kernel.org # 5.15+
    Reported-by: syzbot+ae97a827ae1c3336bbb4@syzkaller.appspotmail.com
    Reviewed-by: Filipe Manana <fdmanana@suse.com>
    Signed-off-by: Qu Wenruo <wqu@suse.com>
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c40d4b60c58d7f37069bfc5767c0bcd2f9987cb0
Author: Christoph Hellwig <hch@lst.de>
Date:   Mon Jul 24 06:26:53 2023 -0700

    btrfs: don't stop integrity writeback too early
    
    commit effa24f689ce0948f68c754991a445a8d697d3a8 upstream.
    
    extent_write_cache_pages stops writing pages as soon as nr_to_write hits
    zero.  That is the right thing for opportunistic writeback, but incorrect
    for data integrity writeback, which needs to ensure that no dirty pages
    are left in the range.  Thus only stop the writeback for WB_SYNC_NONE
    if nr_to_write hits 0.
    
    This is a port of write_cache_pages changes in commit 05fe478dd04e
    ("mm: write_cache_pages integrity fix").
    
    Note that I've only trigger the problem with other changes to the btrfs
    writeback code, but this condition seems worthwhile fixing anyway.
    
    CC: stable@vger.kernel.org # 4.14+
    Reviewed-by: Josef Bacik <josef@toxicpanda.com>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Reviewed-by: David Sterba <dsterba@suse.com>
    [ updated comment ]
    Signed-off-by: David Sterba <dsterba@suse.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 555e126dd30b886f0f8934da8f01a79653d944d3
Author: Nick Child <nnac123@linux.ibm.com>
Date:   Wed Aug 9 17:10:36 2023 -0500

    ibmvnic: Handle DMA unmapping of login buffs in release functions
    
    commit d78a671eb8996af19d6311ecdee9790d2fa479f0 upstream.
    
    Rather than leaving the DMA unmapping of the login buffers to the
    login response handler, move this work into the login release functions.
    Previously, these functions were only used for freeing the allocated
    buffers. This could lead to issues if there are more than one
    outstanding login buffer requests, which is possible if a login request
    times out.
    
    If a login request times out, then there is another call to send login.
    The send login function makes a call to the login buffer release
    function. In the past, this freed the buffers but did not DMA unmap.
    Therefore, the VIOS could still write to the old login (now freed)
    buffer. It is for this reason that it is a good idea to leave the DMA
    unmap call to the login buffers release function.
    
    Since the login buffer release functions now handle DMA unmapping,
    remove the duplicate DMA unmapping in handle_login_rsp().
    
    Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
    Signed-off-by: Nick Child <nnac123@linux.ibm.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20230809221038.51296-3-nnac123@linux.ibm.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 34fcc823823af4750377851554171b0a0afaf42f
Author: Nick Child <nnac123@linux.ibm.com>
Date:   Wed Aug 9 17:10:35 2023 -0500

    ibmvnic: Unmap DMA login rsp buffer on send login fail
    
    commit 411c565b4bc63e9584a8493882bd566e35a90588 upstream.
    
    If the LOGIN CRQ fails to send then we must DMA unmap the response
    buffer. Previously, if the CRQ failed then the memory was freed without
    DMA unmapping.
    
    Fixes: c98d9cc4170d ("ibmvnic: send_login should check for crq errors")
    Signed-off-by: Nick Child <nnac123@linux.ibm.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20230809221038.51296-2-nnac123@linux.ibm.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit cee62753cf2e26e476a9e2774a912f970f1d25ff
Author: Nick Child <nnac123@linux.ibm.com>
Date:   Wed Aug 9 17:10:34 2023 -0500

    ibmvnic: Enforce stronger sanity checks on login response
    
    commit db17ba719bceb52f0ae4ebca0e4c17d9a3bebf05 upstream.
    
    Ensure that all offsets in a login response buffer are within the size
    of the allocated response buffer. Any offsets or lengths that surpass
    the allocation are likely the result of an incomplete response buffer.
    In these cases, a full reset is necessary.
    
    When attempting to login, the ibmvnic device will allocate a response
    buffer and pass a reference to the VIOS. The VIOS will then send the
    ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled
    with data. If the ibmvnic device does not get a response in 20 seconds,
    the old buffer is freed and a new login request is sent. With 2
    outstanding requests, any LOGIN_RSP CRQ's could be for the older
    login request. If this is the case then the login response buffer (which
    is for the newer login request) could be incomplete and contain invalid
    data. Therefore, we must enforce strict sanity checks on the response
    buffer values.
    
    Testing has shown that the `off_rxadd_buff_size` value is filled in last
    by the VIOS and will be the smoking gun for these circumstances.
    
    Until VIOS can implement a mechanism for tracking outstanding response
    buffers and a method for mapping a LOGIN_RSP CRQ to a particular login
    response buffer, the best ibmvnic can do in this situation is perform a
    full reset.
    
    Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
    Signed-off-by: Nick Child <nnac123@linux.ibm.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20230809221038.51296-1-nnac123@linux.ibm.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 27e8db8380eb82220725ab9725eccb835ab1655f
Author: Moshe Shemesh <moshe@nvidia.com>
Date:   Wed Jul 19 11:33:44 2023 +0300

    net/mlx5: Skip clock update work when device is in error state
    
    commit d006207625657322ba8251b6e7e829f9659755dc upstream.
    
    When device is in error state, marked by the flag
    MLX5_DEVICE_STATE_INTERNAL_ERROR, the HW and PCI may not be accessible
    and so clock update work should be skipped. Furthermore, such access
    through PCI in error state, after calling mlx5_pci_disable_device() can
    result in failing to recover from pci errors.
    
    Fixes: ef9814deafd0 ("net/mlx5e: Add HW timestamping (TS) support")
    Reported-and-tested-by: Ganesh G R <ganeshgr@linux.ibm.com>
    Closes: https://lore.kernel.org/netdev/9bdb9b9d-140a-7a28-f0de-2e64e873c068@nvidia.com
    Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
    Reviewed-by: Aya Levin <ayal@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f638fc2f737766936196f535b20659c47200838f
Author: Daniel Jurgens <danielj@nvidia.com>
Date:   Tue Jul 11 00:28:10 2023 +0300

    net/mlx5: Allow 0 for total host VFs
    
    commit 2dc2b3922d3c0f52d3a792d15dcacfbc4cc76b8f upstream.
    
    When querying eswitch functions 0 is a valid number of host VFs. After
    introducing ARM SRIOV falling through to getting the max value from PCI
    results in using the total VFs allowed on the ARM for the host.
    
    Fixes: 86eec50beaf3 ("net/mlx5: Support querying max VFs from device");
    Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
    Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 086a80eb62131940125eddd43b2fb1ed7d9ab806
Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date:   Wed Jul 12 18:26:45 2023 +0530

    dmaengine: mcf-edma: Fix a potential un-allocated memory access
    
    commit 0a46781c89dece85386885a407244ca26e5c1c44 upstream.
    
    When 'mcf_edma' is allocated, some space is allocated for a
    flexible array at the end of the struct. 'chans' item are allocated, that is
    to say 'pdata->dma_channels'.
    
    Then, this number of item is stored in 'mcf_edma->n_chans'.
    
    A few lines later, if 'mcf_edma->n_chans' is 0, then a default value of 64
    is set.
    
    This ends to no space allocated by devm_kzalloc() because chans was 0, but
    64 items are read and/or written in some not allocated memory.
    
    Change the logic to define a default value before allocating the memory.
    
    Fixes: e7a3ff92eaf1 ("dmaengine: fsl-edma: add ColdFire mcf5441x edma support")
    Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Link: https://lore.kernel.org/r/f55d914407c900828f6fad3ea5fa791a5f17b9a4.1685172449.git.christophe.jaillet@wanadoo.fr
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7e1dc94b2d5089916840bf5b8a61063106713eda
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 8 10:52:33 2023 +0300

    nexthop: Fix infinite nexthop bucket dump when using maximum nexthop ID
    
    commit 8743aeff5bc4dcb5b87b43765f48d5ac3ad7dd9f upstream.
    
    A netlink dump callback can return a positive number to signal that more
    information needs to be dumped or zero to signal that the dump is
    complete. In the second case, the core netlink code will append the
    NLMSG_DONE message to the skb in order to indicate to user space that
    the dump is complete.
    
    The nexthop bucket dump callback always returns a positive number if
    nexthop buckets were filled in the provided skb, even if the dump is
    complete. This means that a dump will span at least two recvmsg() calls
    as long as nexthop buckets are present. In the last recvmsg() call the
    dump callback will not fill in any nexthop buckets because the previous
    call indicated that the dump should restart from the last dumped nexthop
    ID plus one.
    
     # ip link add name dummy1 up type dummy
     # ip nexthop add id 1 dev dummy1
     # ip nexthop add id 10 group 1 type resilient buckets 2
     # strace -e sendto,recvmsg -s 5 ip nexthop bucket
     sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396980, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 128
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
     id 10 index 0 idle_time 6.66 nhid 1
     id 10 index 1 idle_time 6.66 nhid 1
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396980, nlmsg_pid=347}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
     +++ exited with 0 +++
    
    This behavior is both inefficient and buggy. If the last nexthop to be
    dumped had the maximum ID of 0xffffffff, then the dump will restart from
    0 (0xffffffff + 1) and never end:
    
     # ip link add name dummy1 up type dummy
     # ip nexthop add id 1 dev dummy1
     # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
     # ip nexthop bucket
     id 4294967295 index 0 idle_time 5.55 nhid 1
     id 4294967295 index 1 idle_time 5.55 nhid 1
     id 4294967295 index 0 idle_time 5.55 nhid 1
     id 4294967295 index 1 idle_time 5.55 nhid 1
     [...]
    
    Fix by adjusting the dump callback to return zero when the dump is
    complete. After the fix only one recvmsg() call is made and the
    NLMSG_DONE message is appended to the RTM_NEWNEXTHOPBUCKET responses:
    
     # ip link add name dummy1 up type dummy
     # ip nexthop add id 1 dev dummy1
     # ip nexthop add id $((2**32-1)) group 1 type resilient buckets 2
     # strace -e sendto,recvmsg -s 5 ip nexthop bucket
     sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOPBUCKET, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691396737, nlmsg_pid=0}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 148
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=64, nlmsg_type=RTM_NEWNEXTHOPBUCKET, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, {family=AF_UNSPEC, data="\x00\x00\x00\x00\x00"...}], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691396737, nlmsg_pid=350}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 148
     id 4294967295 index 0 idle_time 6.61 nhid 1
     id 4294967295 index 1 idle_time 6.61 nhid 1
     +++ exited with 0 +++
    
    Note that if the NLMSG_DONE message cannot be appended because of size
    limitations, then another recvmsg() will be needed, but the core netlink
    code will not invoke the dump callback and simply reply with a
    NLMSG_DONE message since it knows that the callback previously returned
    zero.
    
    Add a test that fails before the fix:
    
     # ./fib_nexthops.sh -t basic_res
     [...]
     TEST: Maximum nexthop ID dump                                       [FAIL]
     [...]
    
    And passes after it:
    
     # ./fib_nexthops.sh -t basic_res
     [...]
     TEST: Maximum nexthop ID dump                                       [ OK ]
     [...]
    
    Fixes: 8a1bbabb034d ("nexthop: Add netlink handlers for bucket dump")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20230808075233.3337922-4-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 608a4327c257c06f0d9d152ae2168024934700c5
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 8 10:52:32 2023 +0300

    nexthop: Make nexthop bucket dump more efficient
    
    commit f10d3d9df49d9e6ee244fda6ca264f901a9c5d85 upstream.
    
    rtm_dump_nexthop_bucket_nh() is used to dump nexthop buckets belonging
    to a specific resilient nexthop group. The function returns a positive
    return code (the skb length) upon both success and failure.
    
    The above behavior is problematic. When a complete nexthop bucket dump
    is requested, the function that walks the different nexthops treats the
    non-zero return code as an error. This causes buckets belonging to
    different resilient nexthop groups to be dumped using different buffers
    even if they can all fit in the same buffer:
    
     # ip link add name dummy1 up type dummy
     # ip nexthop add id 1 dev dummy1
     # ip nexthop add id 10 group 1 type resilient buckets 1
     # ip nexthop add id 20 group 1 type resilient buckets 1
     # strace -e recvmsg -s 0 ip nexthop bucket
     [...]
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
     id 10 index 0 idle_time 10.27 nhid 1
     [...]
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 64
     id 20 index 0 idle_time 6.44 nhid 1
     [...]
    
    Fix by only returning a non-zero return code when an error occurred and
    restarting the dump from the bucket index we failed to fill in. This
    allows buckets belonging to different resilient nexthop groups to be
    dumped using the same buffer:
    
     # ip link add name dummy1 up type dummy
     # ip nexthop add id 1 dev dummy1
     # ip nexthop add id 10 group 1 type resilient buckets 1
     # ip nexthop add id 20 group 1 type resilient buckets 1
     # strace -e recvmsg -s 0 ip nexthop bucket
     [...]
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[...], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 128
     id 10 index 0 idle_time 30.21 nhid 1
     id 20 index 0 idle_time 26.7 nhid 1
     [...]
    
    While this change is more of a performance improvement change than an
    actual bug fix, it is a prerequisite for a subsequent patch that does
    fix a bug.
    
    Fixes: 8a1bbabb034d ("nexthop: Add netlink handlers for bucket dump")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20230808075233.3337922-3-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4457300cfd843176ce7318e3693235bdba4503f7
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 8 10:52:31 2023 +0300

    nexthop: Fix infinite nexthop dump when using maximum nexthop ID
    
    commit 913f60cacda73ccac8eead94983e5884c03e04cd upstream.
    
    A netlink dump callback can return a positive number to signal that more
    information needs to be dumped or zero to signal that the dump is
    complete. In the second case, the core netlink code will append the
    NLMSG_DONE message to the skb in order to indicate to user space that
    the dump is complete.
    
    The nexthop dump callback always returns a positive number if nexthops
    were filled in the provided skb, even if the dump is complete. This
    means that a dump will span at least two recvmsg() calls as long as
    nexthops are present. In the last recvmsg() call the dump callback will
    not fill in any nexthops because the previous call indicated that the
    dump should restart from the last dumped nexthop ID plus one.
    
     # ip nexthop add id 1 blackhole
     # strace -e sendto,recvmsg -s 5 ip nexthop
     sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394315, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 36
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 1], {nla_len=4, nla_type=NHA_BLACKHOLE}]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
     id 1 blackhole
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 20
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394315, nlmsg_pid=343}, 0], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 20
     +++ exited with 0 +++
    
    This behavior is both inefficient and buggy. If the last nexthop to be
    dumped had the maximum ID of 0xffffffff, then the dump will restart from
    0 (0xffffffff + 1) and never end:
    
     # ip nexthop add id $((2**32-1)) blackhole
     # ip nexthop
     id 4294967295 blackhole
     id 4294967295 blackhole
     [...]
    
    Fix by adjusting the dump callback to return zero when the dump is
    complete. After the fix only one recvmsg() call is made and the
    NLMSG_DONE message is appended to the RTM_NEWNEXTHOP response:
    
     # ip nexthop add id $((2**32-1)) blackhole
     # strace -e sendto,recvmsg -s 5 ip nexthop
     sendto(3, [[{nlmsg_len=24, nlmsg_type=RTM_GETNEXTHOP, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1691394080, nlmsg_pid=0}, {nh_family=AF_UNSPEC, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}], {nlmsg_len=0, nlmsg_type=0 /* NLMSG_??? */, nlmsg_flags=0, nlmsg_seq=0, nlmsg_pid=0}], 152, 0, NULL, 0) = 152
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=NULL, iov_len=0}], msg_iovlen=1, msg_controllen=0, msg_flags=MSG_TRUNC}, MSG_PEEK|MSG_TRUNC) = 56
     recvmsg(3, {msg_name={sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, msg_namelen=12, msg_iov=[{iov_base=[[{nlmsg_len=36, nlmsg_type=RTM_NEWNEXTHOP, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, {nh_family=AF_INET, nh_scope=RT_SCOPE_UNIVERSE, nh_protocol=RTPROT_UNSPEC, nh_flags=0}, [[{nla_len=8, nla_type=NHA_ID}, 4294967295], {nla_len=4, nla_type=NHA_BLACKHOLE}]], [{nlmsg_len=20, nlmsg_type=NLMSG_DONE, nlmsg_flags=NLM_F_MULTI, nlmsg_seq=1691394080, nlmsg_pid=342}, 0]], iov_len=32768}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 56
     id 4294967295 blackhole
     +++ exited with 0 +++
    
    Note that if the NLMSG_DONE message cannot be appended because of size
    limitations, then another recvmsg() will be needed, but the core netlink
    code will not invoke the dump callback and simply reply with a
    NLMSG_DONE message since it knows that the callback previously returned
    zero.
    
    Add a test that fails before the fix:
    
     # ./fib_nexthops.sh -t basic
     [...]
     TEST: Maximum nexthop ID dump                                       [FAIL]
     [...]
    
    And passes after it:
    
     # ./fib_nexthops.sh -t basic
     [...]
     TEST: Maximum nexthop ID dump                                       [ OK ]
     [...]
    
    Fixes: ab84be7e54fc ("net: Initial nexthop code")
    Reported-by: Petr Machata <petrm@nvidia.com>
    Closes: https://lore.kernel.org/netdev/87sf91enuf.fsf@nvidia.com/
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://lore.kernel.org/r/20230808075233.3337922-2-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 91307347d632f179ad3ecc577d99ca2141788067
Author: Jie Wang <wangjie125@huawei.com>
Date:   Mon Aug 7 19:34:51 2023 +0800

    net: hns3: add wait until mac link down
    
    commit 6265e242f7b95f2c1195b42ec912b84ad161470e upstream.
    
    In some configure flow of hns3 driver, for example, change mtu, it will
    disable MAC through firmware before configuration. But firmware disables
    MAC asynchronously. The rx traffic may be not stopped in this case.
    
    So fixes it by waiting until mac link is down.
    
    Fixes: a9775bb64aa7 ("net: hns3: fix set and get link ksettings issue")
    Signed-off-by: Jie Wang <wangjie125@huawei.com>
    Signed-off-by: Jijie Shao <shaojijie@huawei.com>
    Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
    Link: https://lore.kernel.org/r/20230807113452.474224-4-shaojijie@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 094310eb2b93c4db52c59921aae5e0c9b63e75c9
Author: Jie Wang <wangjie125@huawei.com>
Date:   Mon Aug 7 19:34:50 2023 +0800

    net: hns3: refactor hclge_mac_link_status_wait for interface reuse
    
    commit 08469dacfad25428b66549716811807203744f4f upstream.
    
    Some nic configurations could only be performed after link is down. So this
    patch refactor this API for reuse.
    
    Signed-off-by: Jie Wang <wangjie125@huawei.com>
    Signed-off-by: Jijie Shao <shaojijie@huawei.com>
    Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
    Link: https://lore.kernel.org/r/20230807113452.474224-3-shaojijie@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 1ae9703c2e3207168e145f0f5519d5a572f7eb54
Author: Li Yang <leoyang.li@nxp.com>
Date:   Wed Aug 2 14:13:47 2023 -0500

    net: phy: at803x: remove set/get wol callbacks for AR8032
    
    commit d7791cec2304aea22eb2ada944e4d467302f5bfe upstream.
    
    Since the AR8032 part does not support wol, remove related callbacks
    from it.
    
    Fixes: 5800091a2061 ("net: phy: at803x: add support for AR8032 PHY")
    Signed-off-by: Li Yang <leoyang.li@nxp.com>
    Cc: David Bauer <mail@david-bauer.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7d496cd83a9db017766a88707b591e0ca9e923c9
Author: Michael Guralnik <michaelgur@nvidia.com>
Date:   Wed Jul 19 12:02:41 2023 +0300

    RDMA/umem: Set iova in ODP flow
    
    commit 186b169cf1e4be85aa212a893ea783a543400979 upstream.
    
    Fixing the ODP registration flow to set the iova correctly.
    The calculation in ib_umem_num_dma_blocks() function assumes the iova of
    the umem is set correctly.
    
    When iova is not set, the calculation in ib_umem_num_dma_blocks() is
    equivalent to length/page_size, which is true only when memory is aligned.
    For unaligned memory, iova must be set for the ALIGN() in the
    ib_umem_num_dma_blocks() to take effect and return a correct value.
    
    mlx5_ib uses ib_umem_num_dma_blocks() to decide the mkey size to use for
    the MR. Without this fix, when registering unaligned ODP MR, a wrong
    size mkey might be chosen and this might cause the UMR to fail.
    
    UMR would fail over insufficient size to update the mkey translation:
    infiniband mlx5_0: dump_cqe:273:(pid 0): dump error cqe
    00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    00000030: 00 00 00 00 0f 00 78 06 25 00 00 58 00 da ac d2
    infiniband mlx5_0: mlx5_ib_post_send_wait:806:(pid 20311): reg umr
    failed (6)
    infiniband mlx5_0: pagefault_real_mr:661:(pid 20311): Failed to update
    mkey page tables
    
    Fixes: f0093fb1a7cb ("RDMA/mlx5: Move mlx5_ib_cont_pages() to the creation of the mlx5_ib_mr")
    Fixes: a665aca89a41 ("RDMA/umem: Split ib_umem_num_pages() into ib_umem_num_dma_blocks()")
    Signed-off-by: Artemy Kovalyov <artemyko@nvidia.com>
    Signed-off-by: Michael Guralnik <michaelgur@nvidia.com>
    Link: https://lore.kernel.org/r/3d4be7ca2155bf239dd8c00a2d25974a92c26ab8.1689757344.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f78a4238a87332b8e8359854610559dc54add951
Author: Felix Fietkau <nbd@nbd.name>
Date:   Thu Jun 22 18:59:19 2023 +0200

    wifi: cfg80211: fix sband iftype data lookup for AP_VLAN
    
    commit 5fb9a9fb71a33be61d7d8e8ba4597bfb18d604d0 upstream.
    
    AP_VLAN interfaces are virtual, so doesn't really exist as a type for
    capabilities. When passed in as a type, AP is the one that's really intended.
    
    Fixes: c4cbaf7973a7 ("cfg80211: Add support for HE")
    Signed-off-by: Felix Fietkau <nbd@nbd.name>
    Link: https://lore.kernel.org/r/20230622165919.46841-1-nbd@nbd.name
    Signed-off-by: Johannes Berg <johannes.berg@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 26a27dd76054d57ccb68f506673f3258e249b02d
Author: Daniel Stone <daniels@collabora.com>
Date:   Tue Aug 8 11:44:05 2023 +0100

    drm/rockchip: Don't spam logs in atomic check
    
    commit 43dae319b50fac075ad864f84501c703ef20eb2b upstream.
    
    Userspace should not be able to trigger DRM_ERROR messages to spam the
    logs; especially not through atomic commit parameters which are
    completely legitimate for userspace to attempt.
    
    Signed-off-by: Daniel Stone <daniels@collabora.com>
    Fixes: 7707f7227f09 ("drm/rockchip: Add support for afbc")
    Signed-off-by: Heiko Stuebner <heiko@sntech.de>
    Link: https://patchwork.freedesktop.org/patch/msgid/20230808104405.522493-1-daniels@collabora.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 918c1e6843b7e81d0e5cf7994f41f28dc34c98b0
Author: Douglas Miller <doug.miller@cornelisnetworks.com>
Date:   Wed Aug 2 13:32:41 2023 -0400

    IB/hfi1: Fix possible panic during hotplug remove
    
    commit 4fdfaef71fced490835145631a795497646f4555 upstream.
    
    During hotplug remove it is possible that the update counters work
    might be pending, and may run after memory has been freed.
    Cancel the update counters work before freeing memory.
    
    Fixes: 7724105686e7 ("IB/hfi1: add driver files")
    Signed-off-by: Douglas Miller <doug.miller@cornelisnetworks.com>
    Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
    Link: https://lore.kernel.org/r/169099756100.3927190.15284930454106475280.stgit@awfm-02.cornelisnetworks.com
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit df21468bfdc885f2495c01f5bb3e3431f0c07306
Author: Piotr Gardocki <piotrx.gardocki@intel.com>
Date:   Mon Aug 7 13:50:11 2023 -0700

    iavf: fix potential races for FDIR filters
    
    commit 0fb1d8eb234b6979d4981d2d385780dd7d8d9771 upstream.
    
    Add fdir_fltr_lock locking in unprotected places.
    
    The change in iavf_fdir_is_dup_fltr adds a spinlock around a loop which
    iterates over all filters and looks for a duplicate. The filter can be
    removed from list and freed from memory at the same time it's being
    compared. All other places where filters are deleted are already
    protected with spinlock.
    
    The remaining changes protect adapter->fdir_active_fltr variable so now
    all its uses are under a spinlock.
    
    Fixes: 527691bf0682 ("iavf: Support IPv4 Flow Director filters")
    Signed-off-by: Piotr Gardocki <piotrx.gardocki@intel.com>
    Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://lore.kernel.org/r/20230807205011.3129224-1-anthony.l.nguyen@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b1f985cf1c52db6c4e0606fca9a1d3fdbaf348f1
Author: Andrew Kanner <andrew.kanner@gmail.com>
Date:   Thu Aug 3 20:59:48 2023 +0200

    drivers: net: prevent tun_build_skb() to exceed the packet size limit
    
    commit 59eeb232940515590de513b997539ef495faca9a upstream.
    
    Using the syzkaller repro with reduced packet size it was discovered
    that XDP_PACKET_HEADROOM is not checked in tun_can_build_skb(),
    although pad may be incremented in tun_build_skb(). This may end up
    with exceeding the PAGE_SIZE limit in tun_build_skb().
    
    Jason Wang <jasowang@redhat.com> proposed to count XDP_PACKET_HEADROOM
    always (e.g. without rcu_access_pointer(tun->xdp_prog)) in
    tun_can_build_skb() since there's a window during which XDP program
    might be attached between tun_can_build_skb() and tun_build_skb().
    
    Fixes: 7df13219d757 ("tun: reserve extra headroom only when XDP is set")
    Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a
    Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
    Link: https://lore.kernel.org/r/20230803185947.2379988-1-andrew.kanner@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f239c9e1d98b313435481b4926e8bdd06197e4d8
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Aug 3 16:30:21 2023 +0000

    dccp: fix data-race around dp->dccps_mss_cache
    
    commit a47e598fbd8617967e49d85c49c22f9fc642704c upstream.
    
    dccp_sendmsg() reads dp->dccps_mss_cache before locking the socket.
    Same thing in do_dccp_getsockopt().
    
    Add READ_ONCE()/WRITE_ONCE() annotations,
    and change dccp_sendmsg() to check again dccps_mss_cache
    after socket is locked.
    
    Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20230803163021.2958262-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 49a1fee22fae61cae56b22332c1e34d08dba6f55
Author: Ziyang Xuan <william.xuanziyang@huawei.com>
Date:   Wed Aug 2 19:43:20 2023 +0800

    bonding: Fix incorrect deletion of ETH_P_8021AD protocol vid from slaves
    
    commit 01f4fd27087078c90a0e22860d1dfa2cd0510791 upstream.
    
    BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with
    following testcase:
    
      # ip netns add ns1
      # ip netns exec ns1 ip link add bond0 type bond mode 0
      # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2
      # ip netns exec ns1 ip link set bond_slave_1 master bond0
      # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad
      # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad
      # ip netns exec ns1 ip link set bond_slave_1 nomaster
      # ip netns del ns1
    
    The logical analysis of the problem is as follows:
    
    1. create ETH_P_8021AD protocol vlan10 for bond_slave_1:
    register_vlan_dev()
      vlan_vid_add()
        vlan_info_alloc()
        __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1
    
    2. create ETH_P_8021AD protocol bond0_vlan10 for bond0:
    register_vlan_dev()
      vlan_vid_add()
        __vlan_vid_add()
          vlan_add_rx_filter_info()
              if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER
                  return 0;
    
              if (netif_device_present(dev))
                  return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called
                  // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid.
    
    3. detach bond_slave_1 from bond0:
    __bond_release_one()
      vlan_vids_del_by_dev()
        list_for_each_entry(vid_info, &vlan_info->vid_list, list)
            vlan_vid_del(dev, vid_info->proto, vid_info->vid);
            // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted.
            // bond_slave_1->vlan_info will be assigned NULL.
    
    4. delete vlan10 during delete ns1:
    default_device_exit_batch()
      dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10
        vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1
            BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!!
    
    Add S-VLAN tag related features support to bond driver. So the bond driver
    will always propagate the VLAN info to its slaves.
    
    Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support")
    Suggested-by: Ido Schimmel <idosch@idosch.org>
    Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Link: https://lore.kernel.org/r/20230802114320.4156068-1-william.xuanziyang@huawei.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 789fcd94c9cac133dd4d96e193188661aca9f6c3
Author: Magnus Karlsson <magnus.karlsson@intel.com>
Date:   Wed Aug 9 16:28:43 2023 +0200

    xsk: fix refcount underflow in error path
    
    commit 85c2c79a07302fe68a1ad5cc449458cc559e314d upstream.
    
    Fix a refcount underflow problem reported by syzbot that can happen
    when a system is running out of memory. If xp_alloc_tx_descs() fails,
    and it can only fail due to not having enough memory, then the error
    path is triggered. In this error path, the refcount of the pool is
    decremented as it has incremented before. However, the reference to
    the pool in the socket was not nulled. This means that when the socket
    is closed later, the socket teardown logic will think that there is a
    pool attached to the socket and try to decrease the refcount again,
    leading to a refcount underflow.
    
    I chose this fix as it involved adding just a single line. Another
    option would have been to move xp_get_pool() and the assignment of
    xs->pool to after the if-statement and using xs_umem->pool instead of
    xs->pool in the whole if-statement resulting in somewhat simpler code,
    but this would have led to much more churn in the code base perhaps
    making it harder to backport.
    
    Fixes: ba3beec2ec1d ("xsk: Fix possible crash when multiple sockets are created")
    Reported-by: syzbot+8ada0057e69293a05fd4@syzkaller.appspotmail.com
    Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
    Link: https://lore.kernel.org/r/20230809142843.13944-1-magnus.karlsson@gmail.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e95808121953410db8c59f0abfde70ac0d34222c
Author: Florian Westphal <fw@strlen.de>
Date:   Thu Aug 3 17:26:49 2023 +0200

    tunnels: fix kasan splat when generating ipv4 pmtu error
    
    commit 6a7ac3d20593865209dceb554d8b3f094c6bd940 upstream.
    
    If we try to emit an icmp error in response to a nonliner skb, we get
    
    BUG: KASAN: slab-out-of-bounds in ip_compute_csum+0x134/0x220
    Read of size 4 at addr ffff88811c50db00 by task iperf3/1691
    CPU: 2 PID: 1691 Comm: iperf3 Not tainted 6.5.0-rc3+ #309
    [..]
     kasan_report+0x105/0x140
     ip_compute_csum+0x134/0x220
     iptunnel_pmtud_build_icmp+0x554/0x1020
     skb_tunnel_check_pmtu+0x513/0xb80
     vxlan_xmit_one+0x139e/0x2ef0
     vxlan_xmit+0x1867/0x2760
     dev_hard_start_xmit+0x1ee/0x4f0
     br_dev_queue_push_xmit+0x4d1/0x660
     [..]
    
    ip_compute_csum() cannot deal with nonlinear skbs, so avoid it.
    After this change, splat is gone and iperf3 is no longer stuck.
    
    Fixes: 4cb47a8644cc ("tunnels: PMTU discovery support for directly bridged IP packets")
    Signed-off-by: Florian Westphal <fw@strlen.de>
    Link: https://lore.kernel.org/r/20230803152653.29535-2-fw@strlen.de
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7903311b2ceca141d74be04d3e5fb56c1af5c318
Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Aug 3 14:56:00 2023 +0000

    net/packet: annotate data-races around tp->status
    
    commit 8a9896177784063d01068293caea3f74f6830ff6 upstream.
    
    Another syzbot report [1] is about tp->status lockless reads
    from __packet_get_status()
    
    [1]
    BUG: KCSAN: data-race in __packet_rcv_has_room / __packet_set_status
    
    write to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 0:
    __packet_set_status+0x78/0xa0 net/packet/af_packet.c:407
    tpacket_rcv+0x18bb/0x1a60 net/packet/af_packet.c:2483
    deliver_skb net/core/dev.c:2173 [inline]
    __netif_receive_skb_core+0x408/0x1e80 net/core/dev.c:5337
    __netif_receive_skb_one_core net/core/dev.c:5491 [inline]
    __netif_receive_skb+0x57/0x1b0 net/core/dev.c:5607
    process_backlog+0x21f/0x380 net/core/dev.c:5935
    __napi_poll+0x60/0x3b0 net/core/dev.c:6498
    napi_poll net/core/dev.c:6565 [inline]
    net_rx_action+0x32b/0x750 net/core/dev.c:6698
    __do_softirq+0xc1/0x265 kernel/softirq.c:571
    invoke_softirq kernel/softirq.c:445 [inline]
    __irq_exit_rcu+0x57/0xa0 kernel/softirq.c:650
    sysvec_apic_timer_interrupt+0x6d/0x80 arch/x86/kernel/apic/apic.c:1106
    asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:645
    smpboot_thread_fn+0x33c/0x4a0 kernel/smpboot.c:112
    kthread+0x1d7/0x210 kernel/kthread.c:379
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
    
    read to 0xffff888117d7c080 of 8 bytes by interrupt on cpu 1:
    __packet_get_status net/packet/af_packet.c:436 [inline]
    packet_lookup_frame net/packet/af_packet.c:524 [inline]
    __tpacket_has_room net/packet/af_packet.c:1255 [inline]
    __packet_rcv_has_room+0x3f9/0x450 net/packet/af_packet.c:1298
    tpacket_rcv+0x275/0x1a60 net/packet/af_packet.c:2285
    deliver_skb net/core/dev.c:2173 [inline]
    dev_queue_xmit_nit+0x38a/0x5e0 net/core/dev.c:2243
    xmit_one net/core/dev.c:3574 [inline]
    dev_hard_start_xmit+0xcf/0x3f0 net/core/dev.c:3594
    __dev_queue_xmit+0xefb/0x1d10 net/core/dev.c:4244
    dev_queue_xmit include/linux/netdevice.h:3088 [inline]
    can_send+0x4eb/0x5d0 net/can/af_can.c:276
    bcm_can_tx+0x314/0x410 net/can/bcm.c:302
    bcm_tx_timeout_handler+0xdb/0x260
    __run_hrtimer kernel/time/hrtimer.c:1685 [inline]
    __hrtimer_run_queues+0x217/0x700 kernel/time/hrtimer.c:1749
    hrtimer_run_softirq+0xd6/0x120 kernel/time/hrtimer.c:1766
    __do_softirq+0xc1/0x265 kernel/softirq.c:571
    run_ksoftirqd+0x17/0x20 kernel/softirq.c:939
    smpboot_thread_fn+0x30a/0x4a0 kernel/smpboot.c:164
    kthread+0x1d7/0x210 kernel/kthread.c:379
    ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:308
    
    value changed: 0x0000000000000000 -> 0x0000000020000081
    
    Reported by Kernel Concurrency Sanitizer on:
    CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 6.4.0-syzkaller #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
    
    Fixes: 69e3c75f4d54 ("net: TX_RING and packet mmap")
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Willem de Bruijn <willemb@google.com>
    Link: https://lore.kernel.org/r/20230803145600.2937518-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f4614e379bf98727880b21bee74a79bb61d3f456
Author: Nathan Chancellor <nathan@kernel.org>
Date:   Wed Aug 2 10:40:29 2023 -0700

    mISDN: Update parameter type of dsp_cmx_send()
    
    commit 1696ec8654016dad3b1baf6c024303e584400453 upstream.
    
    When booting a kernel with CONFIG_MISDN_DSP=y and CONFIG_CFI_CLANG=y,
    there is a failure when dsp_cmx_send() is called indirectly from
    call_timer_fn():
    
      [    0.371412] CFI failure at call_timer_fn+0x2f/0x150 (target: dsp_cmx_send+0x0/0x530; expected type: 0x92ada1e9)
    
    The function pointer prototype that call_timer_fn() expects is
    
      void (*fn)(struct timer_list *)
    
    whereas dsp_cmx_send() has a parameter type of 'void *', which causes
    the control flow integrity checks to fail because the parameter types do
    not match.
    
    Change dsp_cmx_send()'s parameter type to be 'struct timer_list' to
    match the expected prototype. The argument is unused anyways, so this
    has no functional change, aside from avoiding the CFI failure.
    
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Closes: https://lore.kernel.org/oe-lkp/202308020936.58787e6c-oliver.sang@intel.com
    Signed-off-by: Nathan Chancellor <nathan@kernel.org>
    Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>
    Fixes: e313ac12eb13 ("mISDN: Convert timers to use timer_setup()")
    Link: https://lore.kernel.org/r/20230802-fix-dsp_cmx_send-cfi-failure-v1-1-2f2e79b0178d@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3961761af392c248c05ed1f8833181ac6d1d20fa
Author: Xu Kuohai <xukuohai@huawei.com>
Date:   Fri Aug 4 03:37:38 2023 -0400

    bpf, sockmap: Fix bug that strp_done cannot be called
    
    commit 809e4dc71a0f2b8d2836035d98603694fff11d5d upstream.
    
    strp_done is only called when psock->progs.stream_parser is not NULL,
    but stream_parser was set to NULL by sk_psock_stop_strp(), called
    by sk_psock_drop() earlier. So, strp_done can never be called.
    
    Introduce SK_PSOCK_RX_ENABLED to mark whether there is strp on psock.
    Change the condition for calling strp_done from judging whether
    stream_parser is set to judging whether this flag is set. This flag is
    only set once when strp_init() succeeds, and will never be cleared later.
    
    Fixes: c0d95d3380ee ("bpf, sockmap: Re-evaluate proto ops when psock is removed from sockmap")
    Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
    Reviewed-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/r/20230804073740.194770-3-xukuohai@huaweicloud.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 20d53895d5c0789f2409d0fe41c4877f367f7f03
Author: Xu Kuohai <xukuohai@huawei.com>
Date:   Fri Aug 4 03:37:37 2023 -0400

    bpf, sockmap: Fix map type error in sock_map_del_link
    
    commit 7e96ec0e6605b69bb21bbf6c0ff9051e656ec2b1 upstream.
    
    sock_map_del_link() operates on both SOCKMAP and SOCKHASH, although
    both types have member named "progs", the offset of "progs" member in
    these two types is different, so "progs" should be accessed with the
    real map type.
    
    Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface")
    Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
    Reviewed-by: John Fastabend <john.fastabend@gmail.com>
    Link: https://lore.kernel.org/r/20230804073740.194770-2-xukuohai@huaweicloud.com
    Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a09c258cfa77d3ba0a7acc555c73eb6b005c4bd8
Author: Andrew Kanner <andrew.kanner@gmail.com>
Date:   Thu Aug 3 21:03:18 2023 +0200

    net: core: remove unnecessary frame_sz check in bpf_xdp_adjust_tail()
    
    commit d14eea09edf427fa36bd446f4a3271f99164202f upstream.
    
    Syzkaller reported the following issue:
    =======================================
    Too BIG xdp->frame_sz = 131072
    WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
      ____bpf_xdp_adjust_tail net/core/filter.c:4121 [inline]
    WARNING: CPU: 0 PID: 5020 at net/core/filter.c:4121
      bpf_xdp_adjust_tail+0x466/0xa10 net/core/filter.c:4103
    ...
    Call Trace:
     <TASK>
     bpf_prog_4add87e5301a4105+0x1a/0x1c
     __bpf_prog_run include/linux/filter.h:600 [inline]
     bpf_prog_run_xdp include/linux/filter.h:775 [inline]
     bpf_prog_run_generic_xdp+0x57e/0x11e0 net/core/dev.c:4721
     netif_receive_generic_xdp net/core/dev.c:4807 [inline]
     do_xdp_generic+0x35c/0x770 net/core/dev.c:4866
     tun_get_user+0x2340/0x3ca0 drivers/net/tun.c:1919
     tun_chr_write_iter+0xe8/0x210 drivers/net/tun.c:2043
     call_write_iter include/linux/fs.h:1871 [inline]
     new_sync_write fs/read_write.c:491 [inline]
     vfs_write+0x650/0xe40 fs/read_write.c:584
     ksys_write+0x12f/0x250 fs/read_write.c:637
     do_syscall_x64 arch/x86/entry/common.c:50 [inline]
     do_syscall_64+0x38/0xb0 arch/x86/entry/common.c:80
     entry_SYSCALL_64_after_hwframe+0x63/0xcd
    
    xdp->frame_sz > PAGE_SIZE check was introduced in commit c8741e2bfe87
    ("xdp: Allow bpf_xdp_adjust_tail() to grow packet size"). But Jesper
    Dangaard Brouer <jbrouer@redhat.com> noted that after introducing the
    xdp_init_buff() which all XDP driver use - it's safe to remove this
    check. The original intend was to catch cases where XDP drivers have
    not been updated to use xdp.frame_sz, but that is not longer a concern
    (since xdp_init_buff).
    
    Running the initial syzkaller repro it was discovered that the
    contiguous physical memory allocation is used for both xdp paths in
    tun_get_user(), e.g. tun_build_skb() and tun_alloc_skb(). It was also
    stated by Jesper Dangaard Brouer <jbrouer@redhat.com> that XDP can
    work on higher order pages, as long as this is contiguous physical
    memory (e.g. a page).
    
    Reported-and-tested-by: syzbot+f817490f5bd20541b90a@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/all/000000000000774b9205f1d8a80d@google.com/T/
    Link: https://syzkaller.appspot.com/bug?extid=f817490f5bd20541b90a
    Link: https://lore.kernel.org/all/20230725155403.796-1-andrew.kanner@gmail.com/T/
    Fixes: 43b5169d8355 ("net, xdp: Introduce xdp_init_buff utility routine")
    Signed-off-by: Andrew Kanner <andrew.kanner@gmail.com>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Link: https://lore.kernel.org/r/20230803190316.2380231-1-andrew.kanner@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 85af0b226c0ba0302e9f727c7dfecb44765a7c5e
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 8 17:14:58 2023 +0300

    selftests: forwarding: tc_flower: Relax success criterion
    
    commit 9ee37e53e7687654b487fc94e82569377272a7a8 upstream.
    
    The test checks that filters that match on source or destination MAC
    were only hit once. A host can send more than one packet with a given
    source or destination MAC, resulting in failures.
    
    Fix by relaxing the success criterion and instead check that the filters
    were not hit zero times. Using tc_check_at_least_x_packets() is also an
    option, but it is not available in older kernels.
    
    Fixes: 07e5c75184a1 ("selftests: forwarding: Introduce tc flower matching tests")
    Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.unizg.hr/
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20230808141503.4060661-13-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7b3fa99526f94345c981ae88d891d0e8c7144abb
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 8 17:14:48 2023 +0300

    selftests: forwarding: Switch off timeout
    
    commit 0529883ad102f6c04e19fb7018f31e1bda575bbe upstream.
    
    The default timeout for selftests is 45 seconds, but it is not enough
    for forwarding selftests which can takes minutes to finish depending on
    the number of tests cases:
    
     # make -C tools/testing/selftests TARGETS=net/forwarding run_tests
     TAP version 13
     1..102
     # timeout set to 45
     # selftests: net/forwarding: bridge_igmp.sh
     # TEST: IGMPv2 report 239.10.10.10                                    [ OK ]
     # TEST: IGMPv2 leave 239.10.10.10                                     [ OK ]
     # TEST: IGMPv3 report 239.10.10.10 is_include                         [ OK ]
     # TEST: IGMPv3 report 239.10.10.10 include -> allow                   [ OK ]
     #
     not ok 1 selftests: net/forwarding: bridge_igmp.sh # TIMEOUT 45 seconds
    
    Fix by switching off the timeout and setting it to 0. A similar change
    was done for BPF selftests in commit 6fc5916cc256 ("selftests: bpf:
    Switch off timeout").
    
    Fixes: 81573b18f26d ("selftests/net/forwarding: add Makefile to install tests")
    Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Closes: https://lore.kernel.org/netdev/8d149f8c-818e-d141-a0ce-a6bae606bc22@alu.unizg.hr/
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20230808141503.4060661-3-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit e410f85ebca9e86350bb0391128f07ff7db9de8c
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 8 17:14:47 2023 +0300

    selftests: forwarding: Skip test when no interfaces are specified
    
    commit d72c83b1e4b4a36a38269c77a85ff52f95eb0d08 upstream.
    
    As explained in [1], the forwarding selftests are meant to be run with
    either physical loopbacks or veth pairs. The interfaces are expected to
    be specified in a user-provided forwarding.config file or as command
    line arguments. By default, this file is not present and the tests fail:
    
     # make -C tools/testing/selftests TARGETS=net/forwarding run_tests
     [...]
     TAP version 13
     1..102
     # timeout set to 45
     # selftests: net/forwarding: bridge_igmp.sh
     # Command line is not complete. Try option "help"
     # Failed to create netif
     not ok 1 selftests: net/forwarding: bridge_igmp.sh # exit=1
     [...]
    
    Fix by skipping a test if interfaces are not provided either via the
    configuration file or command line arguments.
    
     # make -C tools/testing/selftests TARGETS=net/forwarding run_tests
     [...]
     TAP version 13
     1..102
     # timeout set to 45
     # selftests: net/forwarding: bridge_igmp.sh
     # SKIP: Cannot create interface. Name not specified
     ok 1 selftests: net/forwarding: bridge_igmp.sh # SKIP
    
    [1] tools/testing/selftests/net/forwarding/README
    
    Fixes: 81573b18f26d ("selftests/net/forwarding: add Makefile to install tests")
    Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Closes: https://lore.kernel.org/netdev/856d454e-f83c-20cf-e166-6dc06cbc1543@alu.unizg.hr/
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20230808141503.4060661-2-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4a449945262019a607ed58246f41a86de3707367
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 8 17:14:54 2023 +0300

    selftests: forwarding: ethtool_extended_state: Skip when using veth pairs
    
    commit b3d9305e60d121dac20a77b6847c4cf14a4c0001 upstream.
    
    Ethtool extended state cannot be tested with veth pairs, resulting in
    failures:
    
     # ./ethtool_extended_state.sh
     TEST: Autoneg, No partner detected                                  [FAIL]
             Expected "Autoneg", got "Link detected: no"
     [...]
    
    Fix by skipping the test when used with veth pairs.
    
    Fixes: 7d10bcce98cd ("selftests: forwarding: Add tests for ethtool extended state")
    Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.unizg.hr/
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20230808141503.4060661-9-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b8d216e9c607d8700489bffdd7f95b42f5a23389
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 8 17:14:53 2023 +0300

    selftests: forwarding: ethtool: Skip when using veth pairs
    
    commit 60a36e21915c31c0375d9427be9406aa8ce2ec34 upstream.
    
    Auto-negotiation cannot be tested with veth pairs, resulting in
    failures:
    
     # ./ethtool.sh
     TEST: force of same speed autoneg off                               [FAIL]
             error in configuration. swp1 speed Not autoneg off
     [...]
    
    Fix by skipping the test when used with veth pairs.
    
    Fixes: 64916b57c0b1 ("selftests: forwarding: Add speed and auto-negotiation test")
    Reported-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Closes: https://lore.kernel.org/netdev/adc5e40d-d040-a65e-eb26-edf47dac5b02@alu.unizg.hr/
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20230808141503.4060661-8-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b9dfb80d9fb2cd6fe50a02c3bafc9fd8c4f913fc
Author: Ido Schimmel <idosch@nvidia.com>
Date:   Tue Aug 8 17:14:52 2023 +0300

    selftests: forwarding: Add a helper to skip test when using veth pairs
    
    commit 66e131861ab7bf754b50813216f5c6885cd32d63 upstream.
    
    A handful of tests require physical loopbacks to be used instead of veth
    pairs. Add a helper that these tests will invoke in order to be skipped
    when executed with veth pairs.
    
    Fixes: 64916b57c0b1 ("selftests: forwarding: Add speed and auto-negotiation test")
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Petr Machata <petrm@nvidia.com>
    Tested-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
    Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
    Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
    Link: https://lore.kernel.org/r/20230808141503.4060661-7-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b973eb76dff3f84d71ab4634a42f281e8cc4aa60
Author: Mark Brown <broonie@kernel.org>
Date:   Fri Aug 4 20:22:11 2023 +0100

    selftests/rseq: Fix build with undefined __weak
    
    commit d5ad9aae13dcced333c1a7816ff0a4fbbb052466 upstream.
    
    Commit 3bcbc20942db ("selftests/rseq: Play nice with binaries statically
    linked against glibc 2.35+") which is now in Linus' tree introduced uses
    of __weak but did nothing to ensure that a definition is provided for it
    resulting in build failures for the rseq tests:
    
    rseq.c:41:1: error: unknown type name '__weak'
    __weak ptrdiff_t __rseq_offset;
    ^
    rseq.c:41:17: error: expected ';' after top level declarator
    __weak ptrdiff_t __rseq_offset;
                    ^
                    ;
    rseq.c:42:1: error: unknown type name '__weak'
    __weak unsigned int __rseq_size;
    ^
    rseq.c:43:1: error: unknown type name '__weak'
    __weak unsigned int __rseq_flags;
    
    Fix this by using the definition from tools/include compiler.h.
    
    Fixes: 3bcbc20942db ("selftests/rseq: Play nice with binaries statically linked against glibc 2.35+")
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Message-Id: <20230804-kselftest-rseq-build-v1-1-015830b66aa9@kernel.org>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit b8b8db5857d4e2fda5ef38b2d9c4586ac54ae7ea
Author: Karol Herbst <kherbst@redhat.com>
Date:   Sat Aug 5 12:18:13 2023 +0200

    drm/nouveau/disp: Revert a NULL check inside nouveau_connector_get_modes
    
    commit d5712cd22b9cf109fded1b7f178f4c1888c8b84b upstream.
    
    The original commit adding that check tried to protect the kenrel against
    a potential invalid NULL pointer access.
    
    However we call nouveau_connector_detect_depth once without a native_mode
    set on purpose for non LVDS connectors and this broke DP support in a few
    cases.
    
    Cc: Olaf Skibbe <news@kravcenko.com>
    Cc: Lyude Paul <lyude@redhat.com>
    Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/238
    Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/245
    Fixes: 20a2ce87fbaf8 ("drm/nouveau/dp: check for NULL nv_connector->native_mode")
    Signed-off-by: Karol Herbst <kherbst@redhat.com>
    Reviewed-by: Lyude Paul <lyude@redhat.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20230805101813.2603989-1-kherbst@redhat.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 4c6767c8bf5e41751bcb3d63ffaee64124a2d506
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Aug 9 15:05:00 2023 +0200

    x86: Move gds_ucode_mitigated() declaration to header
    
    commit eb3515dc99c7c85f4170b50838136b2a193f8012 upstream.
    
    The declaration got placed in the .c file of the caller, but that
    causes a warning for the definition:
    
    arch/x86/kernel/cpu/bugs.c:682:6: error: no previous prototype for 'gds_ucode_mitigated' [-Werror=missing-prototypes]
    
    Move it to a header where both sides can observe it instead.
    
    Fixes: 81ac7e5d74174 ("KVM: Add GDS_NO support to KVM")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Tested-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/all/20230809130530.1913368-2-arnd%40kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f919cbc904410764431c1e733fe78835811bbcd0
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Wed Aug 9 15:04:59 2023 +0200

    x86/speculation: Add cpu_show_gds() prototype
    
    commit a57c27c7ad85c420b7de44c6ee56692d51709dda upstream.
    
    The newly added function has two definitions but no prototypes:
    
    drivers/base/cpu.c:605:16: error: no previous prototype for 'cpu_show_gds' [-Werror=missing-prototypes]
    
    Add a declaration next to the other ones for this file to avoid the
    warning.
    
    Fixes: 8974eb588283b ("x86/speculation: Add Gather Data Sampling mitigation")
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Tested-by: Daniel Sneddon <daniel.sneddon@linux.intel.com>
    Cc: stable@kernel.org
    Link: https://lore.kernel.org/all/20230809130530.1913368-1-arnd%40kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 9290ef14c96b9f924ea50d8c0b9c070801a226fa
Author: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Date:   Thu Aug 3 18:16:09 2023 +0300

    x86/mm: Fix VDSO and VVAR placement on 5-level paging machines
    
    commit 1b8b1aa90c9c0e825b181b98b8d9e249dc395470 upstream.
    
    Yingcong has noticed that on the 5-level paging machine, VDSO and VVAR
    VMAs are placed above the 47-bit border:
    
    8000001a9000-8000001ad000 r--p 00000000 00:00 0                          [vvar]
    8000001ad000-8000001af000 r-xp 00000000 00:00 0                          [vdso]
    
    This might confuse users who are not aware of 5-level paging and expect
    all userspace addresses to be under the 47-bit border.
    
    So far problem has only been triggered with ASLR disabled, although it
    may also occur with ASLR enabled if the layout is randomized in a just
    right way.
    
    The problem happens due to custom placement for the VMAs in the VDSO
    code: vdso_addr() tries to place them above the stack and checks the
    result against TASK_SIZE_MAX, which is wrong. TASK_SIZE_MAX is set to
    the 56-bit border on 5-level paging machines. Use DEFAULT_MAP_WINDOW
    instead.
    
    Fixes: b569bab78d8d ("x86/mm: Prepare to expose larger address space to userspace")
    Reported-by: Yingcong Wu <yingcong.wu@intel.com>
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/all/20230803151609.22141-1-kirill.shutemov%40linux.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 829409510d0036f7707c72cd676a60479aafc3a8
Author: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
Date:   Fri Aug 11 23:37:05 2023 +0300

    x86/cpu/amd: Enable Zenbleed fix for AMD Custom APU 0405
    
    commit 6dbef74aeb090d6bee7d64ef3fa82ae6fa53f271 upstream.
    
    Commit
    
      522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix")
    
    provided a fix for the Zen2 VZEROUPPER data corruption bug affecting
    a range of CPU models, but the AMD Custom APU 0405 found on SteamDeck
    was not listed, although it is clearly affected by the vulnerability.
    
    Add this CPU variant to the Zenbleed erratum list, in order to
    unconditionally enable the fallback fix until a proper microcode update
    is available.
    
    Fixes: 522b1d69219d ("x86/cpu/amd: Add a Zenbleed fix")
    Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20230811203705.1699914-1-cristian.ciocaltea@collabora.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c41a22b93d7c9286da4acb34d45bd305cd307ebb
Author: Nick Desaulniers <ndesaulniers@google.com>
Date:   Wed Aug 9 09:40:26 2023 -0700

    x86/srso: Fix build breakage with the LLVM linker
    
    commit cbe8ded48b939b9d55d2c5589ab56caa7b530709 upstream.
    
    The assertion added to verify the difference in bits set of the
    addresses of srso_untrain_ret_alias() and srso_safe_ret_alias() would fail
    to link in LLVM's ld.lld linker with the following error:
    
      ld.lld: error: ./arch/x86/kernel/vmlinux.lds:210: at least one side of
      the expression must be absolute
      ld.lld: error: ./arch/x86/kernel/vmlinux.lds:211: at least one side of
      the expression must be absolute
    
    Use ABSOLUTE to evaluate the expression referring to at least one of the
    symbols so that LLD can evaluate the linker script.
    
    Also, add linker version info to the comment about XOR being unsupported
    in either ld.bfd or ld.lld until somewhat recently.
    
    Fixes: fb3bd914b3ec ("x86/srso: Add a Speculative RAS Overflow mitigation")
    Closes: https://lore.kernel.org/llvm/CA+G9fYsdUeNu-gwbs0+T6XHi4hYYk=Y9725-wFhZ7gJMspLDRA@mail.gmail.com/
    Reported-by: Nathan Chancellor <nathan@kernel.org>
    Reported-by: Daniel Kolesa <daniel@octaforge.org>
    Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
    Suggested-by: Sven Volkinsfeld <thyrc@gmx.net>
    Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://github.com/ClangBuiltLinux/linux/issues/1907
    Link: https://lore.kernel.org/r/20230809-gds-v1-1-eaac90b0cbcc@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c2372b1559d4b5510f44feb885be3533f9919fe0
Author: Badhri Jagan Sridharan <badhri@google.com>
Date:   Wed Jul 12 08:57:22 2023 +0000

    usb: typec: tcpm: Fix response to vsafe0V event
    
    commit 4270d2b4845e820b274702bfc2a7140f69e4d19d upstream.
    
    Do not transition to SNK_UNATTACHED state when receiving vsafe0v event
    while in SNK_HARD_RESET_WAIT_VBUS. Ignore VBUS off events as well as
    in some platforms VBUS off can be signalled more than once.
    
    [143515.364753] Requesting mux state 1, usb-role 2, orientation 2
    [143515.365520] pending state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_SINK_ON @ 650 ms [rev3 HARD_RESET]
    [143515.632281] CC1: 0 -> 0, CC2: 3 -> 0 [state SNK_HARD_RESET_SINK_OFF, polarity 1, disconnected]
    [143515.637214] VBUS on
    [143515.664985] VBUS off
    [143515.664992] state change SNK_HARD_RESET_SINK_OFF -> SNK_HARD_RESET_WAIT_VBUS [rev3 HARD_RESET]
    [143515.665564] VBUS VSAFE0V
    [143515.665566] state change SNK_HARD_RESET_WAIT_VBUS -> SNK_UNATTACHED [rev3 HARD_RESET]
    
    Fixes: 28b43d3d746b ("usb: typec: tcpm: Introduce vsafe0v for vbus")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Badhri Jagan Sridharan <badhri@google.com>
    Acked-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Link: https://lore.kernel.org/r/20230712085722.1414743-1-badhri@google.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit f776b94ccdf033f97937ce18434dc2af56f6269a
Author: Prashanth K <quic_prashk@quicinc.com>
Date:   Tue Aug 1 14:33:52 2023 +0530

    usb: common: usb-conn-gpio: Prevent bailing out if initial role is none
    
    commit 8e21a620c7e6e00347ade1a6ed4967b359eada5a upstream.
    
    Currently if we bootup a device without cable connected, then
    usb-conn-gpio won't call set_role() because last_role is same
    as current role. This happens since last_role gets initialised
    to zero during the probe.
    
    To avoid this, add a new flag initial_detection into struct
    usb_conn_info, which prevents bailing out during initial
    detection.
    
    Cc: <stable@vger.kernel.org> # 5.4
    Fixes: 4602f3bff266 ("usb: common: add USB GPIO based connection detection driver")
    Signed-off-by: Prashanth K <quic_prashk@quicinc.com>
    Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
    Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
    Link: https://lore.kernel.org/r/1690880632-12588-1-git-send-email-quic_prashk@quicinc.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 00cc14b52d6fcf35726f704aa2bf1fb36410cbba
Author: Elson Roy Serrao <quic_eserrao@quicinc.com>
Date:   Tue Aug 1 12:26:58 2023 -0700

    usb: dwc3: Properly handle processing of pending events
    
    commit 3ddaa6a274578e23745b7466346fc2650df8f959 upstream.
    
    If dwc3 is runtime suspended we defer processing the event buffer
    until resume, by setting the pending_events flag. Set this flag before
    triggering resume to avoid race with the runtime resume callback.
    
    While handling the pending events, in addition to checking the event
    buffer we also need to process it. Handle this by explicitly calling
    dwc3_thread_interrupt(). Also balance the runtime pm get() operation
    that triggered this processing.
    
    Cc: stable@vger.kernel.org
    Fixes: fc8bb91bc83e ("usb: dwc3: implement runtime PM")
    Signed-off-by: Elson Roy Serrao <quic_eserrao@quicinc.com>
    Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
    Reviewed-by: Roger Quadros <rogerq@kernel.org>
    Link: https://lore.kernel.org/r/20230801192658.19275-1-quic_eserrao@quicinc.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 7a11d1e2625bdb2346f6586773b20b20977278ac
Author: Alan Stern <stern@rowland.harvard.edu>
Date:   Wed Aug 2 13:49:02 2023 -0400

    usb-storage: alauda: Fix uninit-value in alauda_check_media()
    
    commit a6ff6e7a9dd69364547751db0f626a10a6d628d2 upstream.
    
    Syzbot got KMSAN to complain about access to an uninitialized value in
    the alauda subdriver of usb-storage:
    
    BUG: KMSAN: uninit-value in alauda_transport+0x462/0x57f0
    drivers/usb/storage/alauda.c:1137
    CPU: 0 PID: 12279 Comm: usb-storage Not tainted 5.3.0-rc7+ #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
    Google 01/01/2011
    Call Trace:
      __dump_stack lib/dump_stack.c:77 [inline]
      dump_stack+0x191/0x1f0 lib/dump_stack.c:113
      kmsan_report+0x13a/0x2b0 mm/kmsan/kmsan_report.c:108
      __msan_warning+0x73/0xe0 mm/kmsan/kmsan_instr.c:250
      alauda_check_media+0x344/0x3310 drivers/usb/storage/alauda.c:460
    
    The problem is that alauda_check_media() doesn't verify that its USB
    transfer succeeded before trying to use the received data.  What
    should happen if the transfer fails isn't entirely clear, but a
    reasonably conservative approach is to pretend that no media is
    present.
    
    A similar problem exists in a usb_stor_dbg() call in
    alauda_get_media_status().  In this case, when an error occurs the
    call is redundant, because usb_stor_ctrl_transfer() already will print
    a debugging message.
    
    Finally, unrelated to the uninitialized memory access, is the fact
    that alauda_check_media() performs DMA to a buffer on the stack.
    Fortunately usb-storage provides a general purpose DMA-able buffer for
    uses like this.  We'll use it instead.
    
    Reported-and-tested-by: syzbot+e7d46eb426883fb97efd@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/all/0000000000007d25ff059457342d@google.com/T/
    Suggested-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
    Fixes: e80b0fade09e ("[PATCH] USB Storage: add alauda support")
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/r/693d5d5e-f09b-42d0-8ed9-1f96cd30bcce@rowland.harvard.edu
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 945e1b3c361bdcdf5ec21f097cfc22a30bf0f716
Author: Ricky WU <ricky_wu@realtek.com>
Date:   Tue Jul 25 09:10:54 2023 +0000

    misc: rtsx: judge ASPM Mode to set PETXCFG Reg
    
    commit 101bd907b4244a726980ee67f95ed9cafab6ff7a upstream.
    
    ASPM Mode is ASPM_MODE_CFG need to judge the value of clkreq_0
    to set HIGH or LOW, if the ASPM Mode is ASPM_MODE_REG
    always set to HIGH during the initialization.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
    Link: https://lore.kernel.org/r/52906c6836374c8cb068225954c5543a@realtek.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 03eebad96233397f951d8e9fafd82a1674a77284
Author: Qi Zheng <zhengqi.arch@bytedance.com>
Date:   Sun Jun 25 15:49:37 2023 +0000

    binder: fix memory leak in binder_init()
    
    commit adb9743d6a08778b78d62d16b4230346d3508986 upstream.
    
    In binder_init(), the destruction of binder_alloc_shrinker_init() is not
    performed in the wrong path, which will cause memory leaks. So this commit
    introduces binder_alloc_shrinker_exit() and calls it in the wrong path to
    fix that.
    
    Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com>
    Acked-by: Carlos Llamas <cmllamas@google.com>
    Fixes: f2517eb76f1f ("android: binder: Add global lru shrinker to binder")
    Cc: stable <stable@kernel.org>
    Link: https://lore.kernel.org/r/20230625154937.64316-1-qi.zheng@linux.dev
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a8e2ae6296d56478fb98ae7f739846ed121f154f
Author: Alvin Šipraga <alsi@bang-olufsen.dk>
Date:   Mon Jun 19 16:12:39 2023 +0200

    iio: adc: ina2xx: avoid NULL pointer dereference on OF device match
    
    commit a41e19cc0d6b6a445a4133170b90271e4a2553dc upstream.
    
    The affected lines were resulting in a NULL pointer dereference on our
    platform because the device tree contained the following list of
    compatible strings:
    
        power-sensor@40 {
            compatible = "ti,ina232", "ti,ina231";
            ...
        };
    
    Since the driver doesn't declare a compatible string "ti,ina232", the OF
    matching succeeds on "ti,ina231". But the I2C device ID info is
    populated via the first compatible string, cf. modalias population in
    of_i2c_get_board_info(). Since there is no "ina232" entry in the legacy
    I2C device ID table either, the struct i2c_device_id *id pointer in the
    probe function is NULL.
    
    Fix this by using the already populated type variable instead, which
    points to the proper driver data. Since the name is also wanted, add a
    generic one to the ina2xx_config table.
    
    Signed-off-by: Alvin Šipraga <alsi@bang-olufsen.dk>
    Fixes: c43a102e67db ("iio: ina2xx: add support for TI INA2xx Power Monitors")
    Link: https://lore.kernel.org/r/20230619141239.2257392-1-alvin@pqrs.dk
    Cc: <Stable@vger.kernel.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2df8ae1e42b8e2a9cb78e99af22e7bf44d38e9ed
Author: Yiyuan Guo <yguoaz@gmail.com>
Date:   Fri Jun 30 22:37:19 2023 +0800

    iio: cros_ec: Fix the allocation size for cros_ec_command
    
    commit 8a4629055ef55177b5b63dab1ecce676bd8cccdd upstream.
    
    The struct cros_ec_command contains several integer fields and a
    trailing array. An allocation size neglecting the integer fields can
    lead to buffer overrun.
    
    Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
    Signed-off-by: Yiyuan Guo <yguoaz@gmail.com>
    Fixes: 974e6f02e27e ("iio: cros_ec_sensors_core: Add common functions for the ChromeOS EC Sensor Hub.")
    Link: https://lore.kernel.org/r/20230630143719.1513906-1-yguoaz@gmail.com
    Cc: <Stable@vger.kerenl.org>
    Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit a7cedc2b76124eece96679214ab4639eaeee95a1
Author: Aleksa Sarai <cyphar@cyphar.com>
Date:   Sat Aug 12 07:19:05 2023 -0600

    io_uring: correct check for O_TMPFILE
    
    Commit 72dbde0f2afbe4af8e8595a89c650ae6b9d9c36f upstream.
    
    O_TMPFILE is actually __O_TMPFILE|O_DIRECTORY. This means that the old
    check for whether RESOLVE_CACHED can be used would incorrectly think
    that O_DIRECTORY could not be used with RESOLVE_CACHED.
    
    Cc: stable@vger.kernel.org # v5.12+
    Fixes: 3a81fd02045c ("io_uring: enable LOOKUP_CACHED path resolution for filename lookups")
    Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
    Link: https://lore.kernel.org/r/20230807-resolve_cached-o_tmpfile-v3-1-e49323e1ef6f@cyphar.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 697bc234632cfb44090dcd6eec39f30c99fdbd4b
Author: Ilya Leoshkevich <iii@linux.ibm.com>
Date:   Fri Aug 4 23:24:59 2023 +0800

    selftests/bpf: Fix sk_assign on s390x
    
    [ Upstream commit 7ce878ca81bca7811e669db4c394b86780e0dbe4 ]
    
    sk_assign is failing on an s390x machine running Debian "bookworm" for
    2 reasons: legacy server_map definition and uninitialized addrlen in
    recvfrom() call.
    
    Fix by adding a new-style server_map definition and dropping addrlen
    (recvfrom() allows NULL values for src_addr and addrlen).
    
    Since the test should support tc built without libbpf, build the prog
    twice: with the old-style definition and with the new-style definition,
    then select the right one at runtime. This could be done at compile
    time too, but this would not be cross-compilation friendly.
    
    Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
    Link: https://lore.kernel.org/r/20230129190501.1624747-2-iii@linux.ibm.com
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Pu Lehui <pulehui@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 127277262110e730dad561e15c1e051020cd6242
Author: Yonghong Song <yhs@fb.com>
Date:   Fri Aug 4 23:24:58 2023 +0800

    selftests/bpf: Workaround verification failure for fexit_bpf2bpf/func_replace_return_code
    
    [ Upstream commit 63d78b7e8ca2d0eb8c687a355fa19d01b6fcc723 ]
    
    With latest llvm17, selftest fexit_bpf2bpf/func_replace_return_code
    has the following verification failure:
    
      0: R1=ctx(off=0,imm=0) R10=fp0
      ; int connect_v4_prog(struct bpf_sock_addr *ctx)
      0: (bf) r7 = r1                       ; R1=ctx(off=0,imm=0) R7_w=ctx(off=0,imm=0)
      1: (b4) w6 = 0                        ; R6_w=0
      ; memset(&tuple.ipv4.saddr, 0, sizeof(tuple.ipv4.saddr));
      ...
      ; return do_bind(ctx) ? 1 : 0;
      179: (bf) r1 = r7                     ; R1=ctx(off=0,imm=0) R7=ctx(off=0,imm=0)
      180: (85) call pc+147
      Func#3 is global and valid. Skipping.
      181: R0_w=scalar()
      181: (bc) w6 = w0                     ; R0_w=scalar() R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
      182: (05) goto pc-129
      ; }
      54: (bc) w0 = w6                      ; R0_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff)) R6_w=scalar(umax=4294967295,var_off=(0x0; 0xffffffff))
      55: (95) exit
      At program exit the register R0 has value (0x0; 0xffffffff) should have been in (0x0; 0x1)
      processed 281 insns (limit 1000000) max_states_per_insn 1 total_states 26 peak_states 26 mark_read 13
      -- END PROG LOAD LOG --
      libbpf: prog 'connect_v4_prog': failed to load: -22
    
    The corresponding source code:
    
      __attribute__ ((noinline))
      int do_bind(struct bpf_sock_addr *ctx)
      {
            struct sockaddr_in sa = {};
    
            sa.sin_family = AF_INET;
            sa.sin_port = bpf_htons(0);
            sa.sin_addr.s_addr = bpf_htonl(SRC_REWRITE_IP4);
    
            if (bpf_bind(ctx, (struct sockaddr *)&sa, sizeof(sa)) != 0)
                    return 0;
    
            return 1;
      }
      ...
      SEC("cgroup/connect4")
      int connect_v4_prog(struct bpf_sock_addr *ctx)
      {
      ...
            return do_bind(ctx) ? 1 : 0;
      }
    
    Insn 180 is a call to 'do_bind'. The call's return value is also the return value
    for the program. Since do_bind() returns 0/1, so it is legitimate for compiler to
    optimize 'return do_bind(ctx) ? 1 : 0' to 'return do_bind(ctx)'. However, such
    optimization breaks verifier as the return value of 'do_bind()' is marked as any
    scalar which violates the requirement of prog return value 0/1.
    
    There are two ways to fix this problem, (1) changing 'return 1' in do_bind() to
    e.g. 'return 10' so the compiler has to do 'do_bind(ctx) ? 1 :0', or (2)
    suggested by Andrii, marking do_bind() with __weak attribute so the compiler
    cannot make any assumption on do_bind() return value.
    
    This patch adopted adding __weak approach which is simpler and more resistant
    to potential compiler optimizations.
    
    Suggested-by: Andrii Nakryiko <andrii@kernel.org>
    Signed-off-by: Yonghong Song <yhs@fb.com>
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/bpf/20230310012410.2920570-1-yhs@fb.com
    Signed-off-by: Pu Lehui <pulehui@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit ee701208f4ccb1f083ce3db3ab4e9e7c3ae57dad
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Fri Aug 4 23:24:57 2023 +0800

    selftests/bpf: make test_align selftest more robust
    
    [ Upstream commit 4f999b767769b76378c3616c624afd6f4bb0d99f ]
    
    test_align selftest relies on BPF verifier log emitting register states
    for specific instructions in expected format. Unfortunately, BPF
    verifier precision backtracking log interferes with such expectations.
    And instruction on which precision propagation happens sometimes don't
    output full expected register states. This does indeed look like
    something to be improved in BPF verifier, but is beyond the scope of
    this patch set.
    
    So to make test_align a bit more robust, inject few dummy R4 = R5
    instructions which capture desired state of R5 and won't have precision
    tracking logs on them. This fixes tests until we can improve BPF
    verifier output in the presence of precision tracking.
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20221104163649.121784-7-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Stable-dep-of: ecdf985d7615 ("bpf: track immediate values written to stack by BPF_ST instruction")
    Signed-off-by: Pu Lehui <pulehui@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 683d2969a0820072ddc05dbcc667664f7a34ac90
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Fri Aug 4 23:24:56 2023 +0800

    bpf: aggressively forget precise markings during state checkpointing
    
    [ Upstream commit 7a830b53c17bbadcf99f778f28aaaa4e6c41df5f ]
    
    Exploit the property of about-to-be-checkpointed state to be able to
    forget all precise markings up to that point even more aggressively. We
    now clear all potentially inherited precise markings right before
    checkpointing and branching off into child state. If any of children
    states require precise knowledge of any SCALAR register, those will be
    propagated backwards later on before this state is finalized, preserving
    correctness.
    
    There is a single selftests BPF program change, but tremendous one: 25x
    reduction in number of verified instructions and states in
    trace_virtqueue_add_sgs.
    
    Cilium results are more modest, but happen across wider range of programs.
    
    SELFTESTS RESULTS
    =================
    
    $ ./veristat -C -e file,prog,insns,states ~/imprecise-early-results.csv ~/imprecise-aggressive-results.csv | grep -v '+0'
    File                 Program                  Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
    -------------------  -----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    loop6.bpf.linked1.o  trace_virtqueue_add_sgs           398057            15114   -382943 (-96.20%)              8717               336      -8381 (-96.15%)
    -------------------  -----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    
    CILIUM RESULTS
    ==============
    
    $ ./veristat -C -e file,prog,insns,states ~/imprecise-early-results-cilium.csv ~/imprecise-aggressive-results-cilium.csv | grep -v '+0'
    File           Program                           Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
    -------------  --------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    bpf_host.o     tail_handle_nat_fwd_ipv4                    23426            23221       -205 (-0.88%)              1537              1515         -22 (-1.43%)
    bpf_host.o     tail_handle_nat_fwd_ipv6                    13009            12904       -105 (-0.81%)               719               708         -11 (-1.53%)
    bpf_host.o     tail_nodeport_nat_ingress_ipv6               5261             5196        -65 (-1.24%)               247               243          -4 (-1.62%)
    bpf_host.o     tail_nodeport_nat_ipv6_egress                3446             3406        -40 (-1.16%)               203               198          -5 (-2.46%)
    bpf_lxc.o      tail_handle_nat_fwd_ipv4                    23426            23221       -205 (-0.88%)              1537              1515         -22 (-1.43%)
    bpf_lxc.o      tail_handle_nat_fwd_ipv6                    13009            12904       -105 (-0.81%)               719               708         -11 (-1.53%)
    bpf_lxc.o      tail_ipv4_ct_egress                          5074             4897       -177 (-3.49%)               255               248          -7 (-2.75%)
    bpf_lxc.o      tail_ipv4_ct_ingress                         5100             4923       -177 (-3.47%)               255               248          -7 (-2.75%)
    bpf_lxc.o      tail_ipv4_ct_ingress_policy_only             5100             4923       -177 (-3.47%)               255               248          -7 (-2.75%)
    bpf_lxc.o      tail_ipv6_ct_egress                          4558             4536        -22 (-0.48%)               188               187          -1 (-0.53%)
    bpf_lxc.o      tail_ipv6_ct_ingress                         4578             4556        -22 (-0.48%)               188               187          -1 (-0.53%)
    bpf_lxc.o      tail_ipv6_ct_ingress_policy_only             4578             4556        -22 (-0.48%)               188               187          -1 (-0.53%)
    bpf_lxc.o      tail_nodeport_nat_ingress_ipv6               5261             5196        -65 (-1.24%)               247               243          -4 (-1.62%)
    bpf_overlay.o  tail_nodeport_nat_ingress_ipv6               5261             5196        -65 (-1.24%)               247               243          -4 (-1.62%)
    bpf_overlay.o  tail_nodeport_nat_ipv6_egress                3482             3442        -40 (-1.15%)               204               201          -3 (-1.47%)
    bpf_xdp.o      tail_nodeport_nat_egress_ipv4               17200            15619      -1581 (-9.19%)              1111              1010        -101 (-9.09%)
    -------------  --------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20221104163649.121784-6-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Stable-dep-of: ecdf985d7615 ("bpf: track immediate values written to stack by BPF_ST instruction")
    Signed-off-by: Pu Lehui <pulehui@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 2516deeb872ab9dda3bf4c66cf24b1ee900f25bf
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Fri Aug 4 23:24:55 2023 +0800

    bpf: stop setting precise in current state
    
    [ Upstream commit f63181b6ae79fd3b034cde641db774268c2c3acf ]
    
    Setting reg->precise to true in current state is not necessary from
    correctness standpoint, but it does pessimise the whole precision (or
    rather "imprecision", because that's what we want to keep as much as
    possible) tracking. Why is somewhat subtle and my best attempt to
    explain this is recorded in an extensive comment for __mark_chain_precise()
    function. Some more careful thinking and code reading is probably required
    still to grok this completely, unfortunately. Whiteboarding and a bunch
    of extra handwaiving in person would be even more helpful, but is deemed
    impractical in Git commit.
    
    Next patch pushes this imprecision property even further, building on top of
    the insights described in this patch.
    
    End results are pretty nice, we get reduction in number of total instructions
    and states verified due to a better states reuse, as some of the states are now
    more generic and permissive due to less unnecessary precise=true requirements.
    
    SELFTESTS RESULTS
    =================
    
    $ ./veristat -C -e file,prog,insns,states ~/subprog-precise-results.csv ~/imprecise-early-results.csv | grep -v '+0'
    File                                     Program                 Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
    ---------------------------------------  ----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    bpf_iter_ksym.bpf.linked1.o              dump_ksym                           347              285       -62 (-17.87%)                20                19          -1 (-5.00%)
    pyperf600_bpf_loop.bpf.linked1.o         on_event                           3678             3736        +58 (+1.58%)               276               285          +9 (+3.26%)
    setget_sockopt.bpf.linked1.o             skops_sockopt                      4038             3947        -91 (-2.25%)               347               343          -4 (-1.15%)
    test_l4lb.bpf.linked1.o                  balancer_ingress                   4559             2611     -1948 (-42.73%)               118               105        -13 (-11.02%)
    test_l4lb_noinline.bpf.linked1.o         balancer_ingress                   6279             6268        -11 (-0.18%)               237               236          -1 (-0.42%)
    test_misc_tcp_hdr_options.bpf.linked1.o  misc_estab                         1307             1303         -4 (-0.31%)               100                99          -1 (-1.00%)
    test_sk_lookup.bpf.linked1.o             ctx_narrow_access                   456              447         -9 (-1.97%)                39                38          -1 (-2.56%)
    test_sysctl_loop1.bpf.linked1.o          sysctl_tcp_mem                     1389             1384         -5 (-0.36%)                26                25          -1 (-3.85%)
    test_tc_dtime.bpf.linked1.o              egress_fwdns_prio101                518              485        -33 (-6.37%)                51                46          -5 (-9.80%)
    test_tc_dtime.bpf.linked1.o              egress_host                         519              468        -51 (-9.83%)                50                44         -6 (-12.00%)
    test_tc_dtime.bpf.linked1.o              ingress_fwdns_prio101               842             1000      +158 (+18.76%)                73                88        +15 (+20.55%)
    xdp_synproxy_kern.bpf.linked1.o          syncookie_tc                     405757           373173     -32584 (-8.03%)             25735             22882      -2853 (-11.09%)
    xdp_synproxy_kern.bpf.linked1.o          syncookie_xdp                    479055           371590   -107465 (-22.43%)             29145             22207      -6938 (-23.81%)
    ---------------------------------------  ----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    
    Slight regression in test_tc_dtime.bpf.linked1.o/ingress_fwdns_prio101
    is left for a follow up, there might be some more precision-related bugs
    in existing BPF verifier logic.
    
    CILIUM RESULTS
    ==============
    
    $ ./veristat -C -e file,prog,insns,states ~/subprog-precise-results-cilium.csv ~/imprecise-early-results-cilium.csv | grep -v '+0'
    File           Program                         Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
    -------------  ------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    bpf_host.o     cil_from_host                               762              556      -206 (-27.03%)                43                37         -6 (-13.95%)
    bpf_host.o     tail_handle_nat_fwd_ipv4                  23541            23426       -115 (-0.49%)              1538              1537          -1 (-0.07%)
    bpf_host.o     tail_nodeport_nat_egress_ipv4             33592            33566        -26 (-0.08%)              2163              2161          -2 (-0.09%)
    bpf_lxc.o      tail_handle_nat_fwd_ipv4                  23541            23426       -115 (-0.49%)              1538              1537          -1 (-0.07%)
    bpf_overlay.o  tail_nodeport_nat_egress_ipv4             33581            33543        -38 (-0.11%)              2160              2157          -3 (-0.14%)
    bpf_xdp.o      tail_handle_nat_fwd_ipv4                  21659            20920       -739 (-3.41%)              1440              1376         -64 (-4.44%)
    bpf_xdp.o      tail_handle_nat_fwd_ipv6                  17084            17039        -45 (-0.26%)               907               905          -2 (-0.22%)
    bpf_xdp.o      tail_lb_ipv4                              73442            73430        -12 (-0.02%)              4370              4369          -1 (-0.02%)
    bpf_xdp.o      tail_lb_ipv6                             152114           151895       -219 (-0.14%)              6493              6479         -14 (-0.22%)
    bpf_xdp.o      tail_nodeport_nat_egress_ipv4             17377            17200       -177 (-1.02%)              1125              1111         -14 (-1.24%)
    bpf_xdp.o      tail_nodeport_nat_ingress_ipv6             6405             6397         -8 (-0.12%)               309               308          -1 (-0.32%)
    bpf_xdp.o      tail_rev_nodeport_lb4                      7126             6934       -192 (-2.69%)               414               402         -12 (-2.90%)
    bpf_xdp.o      tail_rev_nodeport_lb6                     18059            17905       -154 (-0.85%)              1105              1096          -9 (-0.81%)
    -------------  ------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20221104163649.121784-5-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Stable-dep-of: ecdf985d7615 ("bpf: track immediate values written to stack by BPF_ST instruction")
    Signed-off-by: Pu Lehui <pulehui@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit c47d0178ad86ad57de5e4deb64133cff518dcc10
Author: Andrii Nakryiko <andrii@kernel.org>
Date:   Fri Aug 4 23:24:54 2023 +0800

    bpf: allow precision tracking for programs with subprogs
    
    [ Upstream commit be2ef8161572ec1973124ebc50f56dafc2925e07 ]
    
    Stop forcing precise=true for SCALAR registers when BPF program has any
    subprograms. Current restriction means that any BPF program, as soon as
    it uses subprograms, will end up not getting any of the precision
    tracking benefits in reduction of number of verified states.
    
    This patch keeps the fallback mark_all_scalars_precise() behavior if
    precise marking has to cross function frames. E.g., if subprogram
    requires R1 (first input arg) to be marked precise, ideally we'd need to
    backtrack to the parent function and keep marking R1 and its
    dependencies as precise. But right now we give up and force all the
    SCALARs in any of the current and parent states to be forced to
    precise=true. We can lift that restriction in the future.
    
    But this patch fixes two issues identified when trying to enable
    precision tracking for subprogs.
    
    First, prevent "escaping" from top-most state in a global subprog. While
    with entry-level BPF program we never end up requesting precision for
    R1-R5 registers, because R2-R5 are not initialized (and so not readable
    in correct BPF program), and R1 is PTR_TO_CTX, not SCALAR, and so is
    implicitly precise. With global subprogs, though, it's different, as
    global subprog a) can have up to 5 SCALAR input arguments, which might
    get marked as precise=true and b) it is validated in isolation from its
    main entry BPF program. b) means that we can end up exhausting parent
    state chain and still not mark all registers in reg_mask as precise,
    which would lead to verifier bug warning.
    
    To handle that, we need to consider two cases. First, if the very first
    state is not immediately "checkpointed" (i.e., stored in state lookup
    hashtable), it will get correct first_insn_idx and last_insn_idx
    instruction set during state checkpointing. As such, this case is
    already handled and __mark_chain_precision() already handles that by
    just doing nothing when we reach to the very first parent state.
    st->parent will be NULL and we'll just stop. Perhaps some extra check
    for reg_mask and stack_mask is due here, but this patch doesn't address
    that issue.
    
    More problematic second case is when global function's initial state is
    immediately checkpointed before we manage to process the very first
    instruction. This is happening because when there is a call to global
    subprog from the main program the very first subprog's instruction is
    marked as pruning point, so before we manage to process first
    instruction we have to check and checkpoint state. This patch adds
    a special handling for such "empty" state, which is identified by having
    st->last_insn_idx set to -1. In such case, we check that we are indeed
    validating global subprog, and with some sanity checking we mark input
    args as precise if requested.
    
    Note that we also initialize state->first_insn_idx with correct start
    insn_idx offset. For main program zero is correct value, but for any
    subprog it's quite confusing to not have first_insn_idx set. This
    doesn't have any functional impact, but helps with debugging and state
    printing. We also explicitly initialize state->last_insns_idx instead of
    relying on is_state_visited() to do this with env->prev_insns_idx, which
    will be -1 on the very first instruction. This concludes necessary
    changes to handle specifically global subprog's precision tracking.
    
    Second identified problem was missed handling of BPF helper functions
    that call into subprogs (e.g., bpf_loop and few others). From precision
    tracking and backtracking logic's standpoint those are effectively calls
    into subprogs and should be called as BPF_PSEUDO_CALL calls.
    
    This patch takes the least intrusive way and just checks against a short
    list of current BPF helpers that do call subprogs, encapsulated in
    is_callback_calling_function() function. But to prevent accidentally
    forgetting to add new BPF helpers to this "list", we also do a sanity
    check in __check_func_call, which has to be called for each such special
    BPF helper, to validate that BPF helper is indeed recognized as
    callback-calling one. This should catch any missed checks in the future.
    Adding some special flags to be added in function proto definitions
    seemed like an overkill in this case.
    
    With the above changes, it's possible to remove forceful setting of
    reg->precise to true in __mark_reg_unknown, which turns on precision
    tracking both inside subprogs and entry progs that have subprogs. No
    warnings or errors were detected across all the selftests, but also when
    validating with veristat against internal Meta BPF objects and Cilium
    objects. Further, in some BPF programs there are noticeable reduction in
    number of states and instructions validated due to more effective
    precision tracking, especially benefiting syncookie test.
    
    $ ./veristat -C -e file,prog,insns,states ~/baseline-results.csv ~/subprog-precise-results.csv  | grep -v '+0'
    File                                      Program                     Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
    ----------------------------------------  --------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    pyperf600_bpf_loop.bpf.linked1.o          on_event                               3966             3678       -288 (-7.26%)               306               276         -30 (-9.80%)
    pyperf_global.bpf.linked1.o               on_event                               7563             7530        -33 (-0.44%)               520               517          -3 (-0.58%)
    pyperf_subprogs.bpf.linked1.o             on_event                              36358            36934       +576 (+1.58%)              2499              2531         +32 (+1.28%)
    setget_sockopt.bpf.linked1.o              skops_sockopt                          3965             4038        +73 (+1.84%)               343               347          +4 (+1.17%)
    test_cls_redirect_subprogs.bpf.linked1.o  cls_redirect                          64965            64901        -64 (-0.10%)              4619              4612          -7 (-0.15%)
    test_misc_tcp_hdr_options.bpf.linked1.o   misc_estab                             1491             1307      -184 (-12.34%)               110               100         -10 (-9.09%)
    test_pkt_access.bpf.linked1.o             test_pkt_access                         354              349         -5 (-1.41%)                25                24          -1 (-4.00%)
    test_sock_fields.bpf.linked1.o            egress_read_sock_fields                 435              375       -60 (-13.79%)                22                20          -2 (-9.09%)
    test_sysctl_loop2.bpf.linked1.o           sysctl_tcp_mem                         1508             1501         -7 (-0.46%)                29                28          -1 (-3.45%)
    test_tc_dtime.bpf.linked1.o               egress_fwdns_prio100                    468              435        -33 (-7.05%)                45                41          -4 (-8.89%)
    test_tc_dtime.bpf.linked1.o               ingress_fwdns_prio100                   398              408        +10 (+2.51%)                42                39          -3 (-7.14%)
    test_tc_dtime.bpf.linked1.o               ingress_fwdns_prio101                  1096              842      -254 (-23.18%)                97                73        -24 (-24.74%)
    test_tcp_hdr_options.bpf.linked1.o        estab                                  2758             2408      -350 (-12.69%)               208               181        -27 (-12.98%)
    test_urandom_usdt.bpf.linked1.o           urand_read_with_sema                    466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
    test_urandom_usdt.bpf.linked1.o           urand_read_without_sema                 466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
    test_urandom_usdt.bpf.linked1.o           urandlib_read_with_sema                 466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
    test_urandom_usdt.bpf.linked1.o           urandlib_read_without_sema              466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
    test_xdp_noinline.bpf.linked1.o           balancer_ingress_v6                    4302             4294         -8 (-0.19%)               257               256          -1 (-0.39%)
    xdp_synproxy_kern.bpf.linked1.o           syncookie_tc                         583722           405757   -177965 (-30.49%)             35846             25735     -10111 (-28.21%)
    xdp_synproxy_kern.bpf.linked1.o           syncookie_xdp                        609123           479055   -130068 (-21.35%)             35452             29145      -6307 (-17.79%)
    ----------------------------------------  --------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
    
    Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
    Link: https://lore.kernel.org/r/20221104163649.121784-4-andrii@kernel.org
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Stable-dep-of: ecdf985d7615 ("bpf: track immediate values written to stack by BPF_ST instruction")
    Signed-off-by: Pu Lehui <pulehui@huawei.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3645510cf926e6af2f4d44899370d7e5331c93bd
Author: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Date:   Sat Jul 29 04:13:18 2023 +0900

    nilfs2: fix use-after-free of nilfs_root in dirtying inodes via iput
    
    commit f8654743a0e6909dc634cbfad6db6816f10f3399 upstream.
    
    During unmount process of nilfs2, nothing holds nilfs_root structure after
    nilfs2 detaches its writer in nilfs_detach_log_writer().  Previously,
    nilfs_evict_inode() could cause use-after-free read for nilfs_root if
    inodes are left in "garbage_list" and released by nilfs_dispose_list at
    the end of nilfs_detach_log_writer(), and this bug was fixed by commit
    9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root in
    nilfs_evict_inode()").
    
    However, it turned out that there is another possibility of UAF in the
    call path where mark_inode_dirty_sync() is called from iput():
    
    nilfs_detach_log_writer()
      nilfs_dispose_list()
        iput()
          mark_inode_dirty_sync()
            __mark_inode_dirty()
              nilfs_dirty_inode()
                __nilfs_mark_inode_dirty()
                  nilfs_load_inode_block() --> causes UAF of nilfs_root struct
    
    This can happen after commit 0ae45f63d4ef ("vfs: add support for a
    lazytime mount option"), which changed iput() to call
    mark_inode_dirty_sync() on its final reference if i_state has I_DIRTY_TIME
    flag and i_nlink is non-zero.
    
    This issue appears after commit 28a65b49eb53 ("nilfs2: do not write dirty
    data after degenerating to read-only") when using the syzbot reproducer,
    but the issue has potentially existed before.
    
    Fix this issue by adding a "purging flag" to the nilfs structure, setting
    that flag while disposing the "garbage_list" and checking it in
    __nilfs_mark_inode_dirty().
    
    Unlike commit 9b5a04ac3ad9 ("nilfs2: fix use-after-free bug of nilfs_root
    in nilfs_evict_inode()"), this patch does not rely on ns_writer to
    determine whether to skip operations, so as not to break recovery on
    mount.  The nilfs_salvage_orphan_logs routine dirties the buffer of
    salvaged data before attaching the log writer, so changing
    __nilfs_mark_inode_dirty() to skip the operation when ns_writer is NULL
    will cause recovery write to fail.  The purpose of using the cleanup-only
    flag is to allow for narrowing of such conditions.
    
    Link: https://lkml.kernel.org/r/20230728191318.33047-1-konishi.ryusuke@gmail.com
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Reported-by: syzbot+74db8b3087f293d3a13a@syzkaller.appspotmail.com
    Closes: https://lkml.kernel.org/r/000000000000b4e906060113fd63@google.com
    Fixes: 0ae45f63d4ef ("vfs: add support for a lazytime mount option")
    Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: <stable@vger.kernel.org> # 4.0+
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 36a3b560c78d11c4bf887e7b7e7c8d0aa819048e
Author: Colin Ian King <colin.i.king@gmail.com>
Date:   Thu Jul 27 17:09:30 2023 +0100

    radix tree test suite: fix incorrect allocation size for pthreads
    
    commit cac7ea57a06016e4914848b707477fb07ee4ae1c upstream.
    
    Currently the pthread allocation for each array item is based on the size
    of a pthread_t pointer and should be the size of the pthread_t structure,
    so the allocation is under-allocating the correct size.  Fix this by using
    the size of each element in the pthreads array.
    
    Static analysis cppcheck reported:
    tools/testing/radix-tree/regression1.c:180:2: warning: Size of pointer
    'threads' used instead of size of its data. [pointerSize]
    
    Link: https://lkml.kernel.org/r/20230727160930.632674-1-colin.i.king@gmail.com
    Fixes: 1366c37ed84b ("radix tree test harness")
    Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
    Cc: Konstantin Khlebnikov <koct9i@gmail.com>
    Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 8d10284243b7fe26e884b1d1a7e387434d6c617a
Author: Tao Ren <rentao.bupt@gmail.com>
Date:   Fri Aug 4 15:14:03 2023 -0700

    hwmon: (pmbus/bel-pfe) Enable PMBUS_SKIP_STATUS_CHECK for pfe1100
    
    commit f38963b9cd0645a336cf30c5da2e89e34e34fec3 upstream.
    
    Skip status check for both pfe1100 and pfe3000 because the communication
    error is also observed on pfe1100 devices.
    
    Signed-off-by: Tao Ren <rentao.bupt@gmail.com>
    Fixes: 626bb2f3fb3c hwmon: (pmbus) add driver for BEL PFE1100 and PFE3000
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20230804221403.28931-1-rentao.bupt@gmail.com
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3ad4ba2b61124903d348b60026eb05150f4950a2
Author: Melissa Wen <mwen@igalia.com>
Date:   Mon Jul 31 07:35:05 2023 -0100

    drm/amd/display: check attr flag before set cursor degamma on DCN3+
    
    commit 96b020e2163fb2197266b2f71b1007495206e6bb upstream.
    
    Don't set predefined degamma curve to cursor plane if the cursor
    attribute flag is not set. Applying a degamma curve to the cursor by
    default breaks userspace expectation. Checking the flag before
    performing any color transformation prevents too dark cursor gamma in
    DCN3+ on many Linux desktop environment (KDE Plasma, GNOME,
    wlroots-based, etc.) as reported at:
    - https://gitlab.freedesktop.org/drm/amd/-/issues/1513
    
    This is the same approach followed by DCN2 drivers where the issue is
    not present.
    
    Fixes: 03f54d7d3448 ("drm/amd/display: Add DCN3 DPP")
    Link: https://gitlab.freedesktop.org/drm/amd/-/issues/1513
    Signed-off-by: Melissa Wen <mwen@igalia.com>
    Reviewed-by: Harry Wentland <harry.wentland@amd.com>
    Tested-by: Alex Hung <alex.hung@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 621204fca047450dbb4c2e3b47ab1dbf15e3c017
Author: Boris Brezillon <boris.brezillon@collabora.com>
Date:   Mon Jul 24 13:26:10 2023 +0200

    drm/shmem-helper: Reset vma->vm_ops before calling dma_buf_mmap()
    
    commit 07dd476f6116966cb2006e25fdcf48f0715115ff upstream.
    
    The dma-buf backend is supposed to provide its own vm_ops, but some
    implementation just have nothing special to do and leave vm_ops
    untouched, probably expecting this field to be zero initialized (this
    is the case with the system_heap implementation for instance).
    Let's reset vma->vm_ops to NULL to keep things working with these
    implementations.
    
    Fixes: 26d3ac3cb04d ("drm/shmem-helpers: Redirect mmap for imported dma-buf")
    Cc: <stable@vger.kernel.org>
    Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
    Reported-by: Roman Stratiienko <r.stratiienko@gmail.com>
    Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
    Tested-by: Roman Stratiienko <r.stratiienko@gmail.com>
    Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
    Link: https://patchwork.freedesktop.org/patch/msgid/20230724112610.60974-1-boris.brezillon@collabora.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 64e6253f6489610163b3055e85738074b61cbd5d
Author: Karol Herbst <kherbst@redhat.com>
Date:   Thu Jun 22 17:20:17 2023 +0200

    drm/nouveau/gr: enable memory loads on helper invocation on all channels
    
    commit 1cb9e2ef66d53b020842b18762e30d0eb4384de8 upstream.
    
    We have a lurking bug where Fragment Shader Helper Invocations can't load
    from memory. But this is actually required in OpenGL and is causing random
    hangs or failures in random shaders.
    
    It is unknown how widespread this issue is, but shaders hitting this can
    end up with infinite loops.
    
    We enable those only on all Kepler and newer GPUs where we use our own
    Firmware.
    
    Nvidia's firmware provides a way to set a kernelspace controlled list of
    mmio registers in the gr space from push buffers via MME macros.
    
    v2: drop code for gm200 and newer.
    
    Cc: Ben Skeggs <bskeggs@redhat.com>
    Cc: David Airlie <airlied@gmail.com>
    Cc: nouveau@lists.freedesktop.org
    Cc: stable@vger.kernel.org # 4.19+
    Signed-off-by: Karol Herbst <kherbst@redhat.com>
    Reviewed-by: Dave Airlie <airlied@redhat.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20230622152017.2512101-1-kherbst@redhat.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit bcd9eeb3a3090f9d257f4944dfb7138d5ad8e611
Author: Andrea Parri <parri.andrea@gmail.com>
Date:   Thu Aug 3 06:27:38 2023 +0200

    riscv,mmio: Fix readX()-to-delay() ordering
    
    commit 4eb2eb1b4c0eb07793c240744843498564a67b83 upstream.
    
    Section 2.1 of the Platform Specification [1] states:
    
      Unless otherwise specified by a given I/O device, I/O devices are on
      ordering channel 0 (i.e., they are point-to-point strongly ordered).
    
    which is not sufficient to guarantee that a readX() by a hart completes
    before a subsequent delay() on the same hart (cf. memory-barriers.txt,
    "Kernel I/O barrier effects").
    
    Set the I(nput) bit in __io_ar() to restore the ordering, align inline
    comments.
    
    [1] https://github.com/riscv/riscv-platform-specs
    
    Signed-off-by: Andrea Parri <parri.andrea@gmail.com>
    Link: https://lore.kernel.org/r/20230803042738.5937-1-parri.andrea@gmail.com
    Fixes: fab957c11efe ("RISC-V: Atomic and Locking Code")
    Cc: stable@vger.kernel.org
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 57772ae9b339578aecae35aae9c29da72258b6f6
Author: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Date:   Fri May 26 13:54:34 2023 +0300

    dmaengine: pl330: Return DMA_PAUSED when transaction is paused
    
    commit 8cda3ececf07d374774f6a13e5a94bc2dc04c26c upstream.
    
    pl330_pause() does not set anything to indicate paused condition which
    causes pl330_tx_status() to return DMA_IN_PROGRESS. This breaks 8250
    DMA flush after the fix in commit 57e9af7831dc ("serial: 8250_dma: Fix
    DMA Rx rearm race"). The function comment for pl330_pause() claims
    pause is supported but resume is not which is enough for 8250 DMA flush
    to work as long as DMA status reports DMA_PAUSED when appropriate.
    
    Add PAUSED state for descriptor and mark BUSY descriptors with PAUSED
    in pl330_pause(). Return DMA_PAUSED from pl330_tx_status() when the
    descriptor is PAUSED.
    
    Reported-by: Richard Tresidder <rtresidd@electromag.com.au>
    Tested-by: Richard Tresidder <rtresidd@electromag.com.au>
    Fixes: 88987d2c7534 ("dmaengine: pl330: add DMA_PAUSE feature")
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/linux-serial/f8a86ecd-64b1-573f-c2fa-59f541083f1a@electromag.com.au/
    Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
    Link: https://lore.kernel.org/r/20230526105434.14959-1-ilpo.jarvinen@linux.intel.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3ca8f5c733c4691c4e75c25c27af2fb453e2a1a7
Author: Maciej Żenczykowski <maze@google.com>
Date:   Mon Aug 7 03:25:32 2023 -0700

    ipv6: adjust ndisc_is_useropt() to also return true for PIO
    
    commit 048c796beb6eb4fa3a5a647ee1c81f5c6f0f6a2a upstream.
    
    The upcoming (and nearly finalized):
      https://datatracker.ietf.org/doc/draft-collink-6man-pio-pflag/
    will update the IPv6 RA to include a new flag in the PIO field,
    which will serve as a hint to perform DHCPv6-PD.
    
    As we don't want DHCPv6 related logic inside the kernel, this piece of
    information needs to be exposed to userspace.  The simplest option is to
    simply expose the entire PIO through the already existing mechanism.
    
    Even without this new flag, the already existing PIO R (router address)
    flag (from RFC6275) cannot AFAICT be handled entirely in kernel,
    and provides useful information that should be exposed to userspace
    (the router's global address, for use by Mobile IPv6).
    
    Also cc'ing stable@ for inclusion in LTS, as while technically this is
    not quite a bugfix, and instead more of a feature, it is absolutely
    trivial and the alternative is manually cherrypicking into all Android
    Common Kernel trees - and I know Greg will ask for it to be sent in via
    LTS instead...
    
    Cc: Jen Linkova <furry@google.com>
    Cc: Lorenzo Colitti <lorenzo@google.com>
    Cc: David Ahern <dsahern@gmail.com>
    Cc: YOSHIFUJI Hideaki / 吉藤英明 <yoshfuji@linux-ipv6.org>
    Cc: stable@vger.kernel.org
    Signed-off-by: Maciej Żenczykowski <maze@google.com>
    Link: https://lore.kernel.org/r/20230807102533.1147559-1-maze@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 6cde60777675193ce9aa966ae0ab933f3bdaa832
Author: Sergei Antonov <saproj@gmail.com>
Date:   Tue Jun 27 15:05:49 2023 +0300

    mmc: moxart: read scr register without changing byte order
    
    commit d44263222134b5635932974c6177a5cba65a07e8 upstream.
    
    Conversion from big-endian to native is done in a common function
    mmc_app_send_scr(). Converting in moxart_transfer_pio() is extra.
    Double conversion on a LE system returns an incorrect SCR value,
    leads to errors:
    
    mmc0: unrecognised SCR structure version 8
    
    Fixes: 1b66e94e6b99 ("mmc: moxart: Add MOXA ART SD/MMC driver")
    Signed-off-by: Sergei Antonov <saproj@gmail.com>
    Cc: Jonas Jensen <jonas.jensen@gmail.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20230627120549.2400325-1-saproj@gmail.com
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 3f00757ab41612d020eb48c787184bf05815897a
Author: Jason A. Donenfeld <Jason@zx2c4.com>
Date:   Mon Aug 7 15:21:27 2023 +0200

    wireguard: allowedips: expand maximum node depth
    
    commit 46622219aae2b67813fe31a7b8cb7da5baff5c8a upstream.
    
    In the allowedips self-test, nodes are inserted into the tree, but it
    generated an even amount of nodes, but for checking maximum node depth,
    there is of course the root node, which makes the total number
    necessarily odd. With two few nodes added, it never triggered the
    maximum depth check like it should have. So, add 129 nodes instead of
    128 nodes, and do so with a more straightforward scheme, starting with
    all the bits set, and shifting over one each time. Then increase the
    maximum depth to 129, and choose a better name for that variable to
    make it clear that it represents depth as opposed to bits.
    
    Cc: stable@vger.kernel.org
    Fixes: e7096c131e51 ("net: WireGuard secure network tunnel")
    Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
    Link: https://lore.kernel.org/r/20230807132146.2191597-2-Jason@zx2c4.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit aeb974907642be095e38ecb1a400ca583958b2b0
Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Sun Aug 6 08:44:17 2023 +0900

    ksmbd: fix wrong next length validation of ea buffer in smb2_set_ea()
    
    commit 79ed288cef201f1f212dfb934bcaac75572fb8f6 upstream.
    
    There are multiple smb2_ea_info buffers in FILE_FULL_EA_INFORMATION request
    from client. ksmbd find next smb2_ea_info using ->NextEntryOffset of
    current smb2_ea_info. ksmbd need to validate buffer length Before
    accessing the next ea. ksmbd should check buffer length using buf_len,
    not next variable. next is the start offset of current ea that got from
    previous ea.
    
    Cc: stable@vger.kernel.org
    Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-21598
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

commit 595679098bdcdbfbba91ebe07a2f7f208df93870
Author: Long Li <leo.lilong@huawei.com>
Date:   Sat Jul 29 11:36:18 2023 +0800

    ksmbd: validate command request size
    
    commit 5aa4fda5aa9c2a5a7bac67b4a12b089ab81fee3c upstream.
    
    In commit 2b9b8f3b68ed ("ksmbd: validate command payload size"), except
    for SMB2_OPLOCK_BREAK_HE command, the request size of other commands
    is not checked, it's not expected. Fix it by add check for request
    size of other commands.
    
    Cc: stable@vger.kernel.org
    Fixes: 2b9b8f3b68ed ("ksmbd: validate command payload size")
    Acked-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Long Li <leo.lilong@huawei.com>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>