1. 14 Mar, 2011 26 commits
    • Michael's avatar
      ivtv: Fix corrective action taken upon DMA ERR interrupt to avoid hang · 0ab29c52
      Michael authored
      commit d213ad08 upstream.
      
      After upgrading the kernel from stock Ubuntu 7.10 to
      10.04, with no hardware changes, I started getting the dreaded DMA
      TIMEOUT errors, followed by inability to encode until the machine was
      rebooted.
      
      I came across a post from Andy in March
      (http://www.gossamer-threads.com/lists/ivtv/users/40943#40943
      
      ) where he
      speculates that perhaps the corrective actions being taken after a DMA
      ERROR are not sufficient to recover the situation.  After some testing
      I suspect that this is indeed the case, and that in fact the corrective
      action may be what hangs the card's DMA engine, rather than the
      original error.
      
      Specifically these DMA ERROR IRQs seem to present with two different
      values in the IVTV_REG_DMASTATUS register: 0x11 and 0x13.  The current
      corrective action is to clear that status register back to 0x01 or
      0x03, and then issue the next DMA request.  In the case of a 0x13 this
      seems to result in a minor glitch in the encoded stream due to the
      failed transfer that was not retried, but otherwise things continue OK.
      In the case of a 0x11 the card's DMA write engine is never heard from
      again, and a DMA TIMEOUT follows shortly after.  0x11 is the killer.
      
      I suspect that the two cases need to be handled differently.  The
      difference is in bit 1 (0x02), which is set when the error is about to
      be successfully recovered, and clear when things are about to go bad.
      
      Bit 1 of DMASTATUS is described differently in different places either
      as a positive "write finished", or an inverted "write busy".  If we
      take the first definition, then when an error arises with state 0x11,
      it means that the write did not complete.   It makes sense to start a
      new transfer, as in the current code.  But if we take the second
      definition, then 0x11 means "an error but the write engine is still
      busy".  Trying to feed it a new transfer in this situation might not be
      a good idea.
      
      As an experiment, I added code to ignore the DMA ERROR IRQ if DMASTATUS
      is 0x11.  I.e., don't start a new transfer, don't clear our flags, etc.
      The hope was that the card would complete the transfer and issue a ENC
      DMA COMPLETE, either successfully or with an error condition there.
      However the card still hung.
      
      The only remaining corrective action being taken with a 0x11 status was
      then the write back to the status register to clear the error, i.e.
      DMASTATUS = DMASTATUS & ~3.  This would have the effect of clearing the
      error bit 4, while leaving the lower bits indicating DMA write busy.
      
      Strangely enough, removing this write to the status register solved the
      problem!  If the DMA ERROR IRQ with DMASTATUS=0x11 is completely
      ignored, with no corrective action at all, then the card will complete
      the transfer and issue a new IRQ.  If the status register is written to
      when it has the value 0x11, then the DMA engine hangs.  Perhaps it's
      illegal to write to
      DMASTATUS while the read or write busy bit is set?  At any rate, it
      appears that the current corrective action is indeed making things
      worse rather than better.
      
      I put together a patch that modifies ivtv_irq_dma_err to do the
      following:
      
      - Don't write back to IVTV_REG_DMASTATUS.
      - If write-busy is asserted, leave the card alone.  Just extend the
      timeout slightly.
      - If write-busy is de-asserted, retry the current transfer.
      
      This has completely fixed my DMA TIMEOUT woes.  DMA ERR events still
      occur, but now they seem to be correctly handled.  0x11 events no
      longer hang the card, and 0x13 events no longer result in a glitch in
      the stream, as the failed transfer is retried.  I'm happy.
      
      I've inlined the patch below in case it is of interest.  As described
      above, I have a theory about why it works (based on a different
      interpretation of bit 1 of DMASTATUS), but I can't guarantee that my
      theory is correct.  There may be another explanation, or it may be a
      fluke.  Maybe ignoring that IRQ entirely would be equally effective?
      Maybe the status register read/writeback sequence is race condition if
      the card changes it in the mean time?  Also as I am using a PVR-150
      only, I have not been able to test it on other cards, which may be
      especially relevant for 350s that support concurrent decoding.
      Hopefully the patch does not break the DMA READ path.
      
      Mike
      
      [awalls@md.metrocast.net: Modified patch to add a verbose comment, make minor
      brace reformats, and clear the error flags in the IVTV_REG_DMASTATUS iff both
      read and write DMA were not in progress.  Mike's conjecture about a race
      condition with the writeback is correct; it can confuse the DMA engine.]
      
      [Comment and analysis from the ML post by Michael <mike@rsy.com>]
      Signed-off-by: default avatarAndy Walls <awalls@md.metrocast.net>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      0ab29c52
    • Balbir Singh's avatar
      sched: Fix sched rt group scheduling when hierachy is enabled · 4065ec08
      Balbir Singh authored
      commit 0c3b9168
      
       upstream.
      
      The current sched rt code is broken when it comes to hierarchical
      scheduling, this patch fixes two problems
      
      1. It adds redundant enqueuing (harmless) when it finds a queue
         has tasks enqueued, but it has no run time and it is not
         throttled.
      
      2. The most important change is in sched_rt_rq_enqueue/dequeue.
         The code just picks the rt_rq belonging to the current cpu
         on which the period timer runs, the patch fixes it, so that
         the correct rt_se is enqueued/dequeued.
      
      Tested with a simple hierarchy
      
      /c/d, c and d assigned similar runtimes of 50,000 and a while
      1 loop runs within "d". Both c and d get throttled, without
      the patch, the task just stops running and never runs (depends
      on where the sched_rt b/w timer runs). With the patch, the
      task is throttled and runs as expected.
      
      [ bharata, suggestions on how to pick the rt_se belong to the
        rt_rq and correct cpu ]
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: default avatarBharata B Rao <bharata@linux.vnet.ibm.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110303113435.GA2868@balbir.in.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      4065ec08
    • Ivan Vecera's avatar
      drivers/net: Call netif_carrier_off at the end of the probe · fea891e3
      Ivan Vecera authored
      commit 0d672e9f
      
       upstream.
      
      Without calling of netif_carrier_off at the end of the probe the operstate
      is unknown when the device is initially opened. By default the carrier is
      on so when the device is opened and netif_carrier_on is called the link
      watch event is not fired and operstate remains zero (unknown).
      
      This patch fixes this behavior in forcedeth and r8169.
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Acked-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      fea891e3
    • Francois Romieu's avatar
      r8169: prevent RxFIFO induced loops in the irq handler. · 30b7cb31
      Francois Romieu authored
      commit f60ac8e7
      
       upstream.
      
      While the RxFIFO interruption is masked for most 8168, nothing prevents
      it to appear in the irq status word. This is no excuse to crash.
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Cc: Ivan Vecera <ivecera@redhat.com>
      Cc: Hayes <hayeswang@realtek.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      30b7cb31
    • Francois Romieu's avatar
      r8169: RxFIFO overflow oddities with 8168 chipsets. · be70b4e2
      Francois Romieu authored
      commit 1519e57f
      
       upstream.
      
      Some experiment-based action to prevent my 8168 chipsets locking-up hard
      in the irq handler under load (pktgen ~1Mpps). Apparently a reset is not
      always mandatory (is it at all ?).
      
      - RTL_GIGA_MAC_VER_12
      - RTL_GIGA_MAC_VER_25
        Missed ~55% packets. Note:
        - this is an old SiS 965L motherboard
        - the 8168 chipset emits (lots of) control frames towards the sender
      
      - RTL_GIGA_MAC_VER_26
        The chipset does not go into a frenzy of mac control pause when it
        crashes yet but it can still be crashed. It needs more work.
      Signed-off-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Cc: Ivan Vecera <ivecera@redhat.com>
      Cc: Hayes <hayeswang@realtek.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      be70b4e2
    • Ivan Vecera's avatar
      r8169: use RxFIFO overflow workaround for 8168c chipset. · cb1f3fd3
      Ivan Vecera authored
      commit b5ba6d12
      
       upstream.
      
      I found that one of the 8168c chipsets (concretely XID 1c4000c0) starts
      generating RxFIFO overflow errors. The result is an infinite loop in
      interrupt handler as the RxFIFOOver is handled only for ...MAC_VER_11.
      With the workaround everything goes fine.
      Signed-off-by: default avatarIvan Vecera <ivecera@redhat.com>
      Acked-by: default avatarFrancois Romieu <romieu@fr.zoreil.com>
      Cc: Hayes <hayeswang@realtek.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      cb1f3fd3
    • Pablo Neira Ayuso's avatar
      netfilter: arpt_mangle: fix return values of checkentry · 1d328fe9
      Pablo Neira Ayuso authored
      commit 9d0db8b6 upstream.
      
      In 135367b8
      
       "netfilter: xtables: change xt_target.checkentry return type",
      the type returned by checkentry was changed from boolean to int, but the
      return values where not adjusted.
      
      arptables: Input/output error
      
      This broke arptables with the mangle target since it returns true
      under success, which is interpreted by xtables as >0, thus
      returning EIO.
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      1d328fe9
    • Vasiliy Kulikov's avatar
      net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules · 8767008a
      Vasiliy Kulikov authored
      commit 8909c9ad upstream.
      
      Since a8f80e8f any process with
      CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
      that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
      limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
      allow anybody load any module not related to networking.
      
      This patch restricts an ability of autoloading modules to netdev modules
      with explicit aliases.  This fixes CVE-2011-1019.
      
      Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
      of loading netdev modules by name (without any prefix) for processes
      with CAP_SYS_MODULE to maintain the compatibility with network scripts
      that use autoloading netdev modules by aliases like "eth0", "wlan0".
      
      Currently there are only three users of the feature in the upstream
      kernel: ipip, ip_gre and sit.
      
          root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	fffffff800001000
          CapEff:	fffffff800001000
          CapBnd:	fffffff800001000
          root@albatros:~# modprobe xfs
          FATAL: Error inserting xfs
          (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit
          sit: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit0
          sit0      Link encap:IPv6-in-IPv4
      	      NOARP  MTU:1480  Metric:1
      
          root@albatros:~# lsmod | grep sit
          sit                    10457  0
          tunnel4                 2957  1 sit
      
      For CAP_SYS_MODULE module loading is still relaxed:
      
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	ffffffffffffffff
          CapEff:	ffffffffffffffff
          CapBnd:	ffffffffffffffff
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          xfs                   745319  0
      
      Reference: https://lkml.org/lkml/2011/2/24/203
      
      Signed-off-by: default avatarVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: default avatarMichael Tokarev <mjt@tls.msk.ru>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarKees Cook <kees.cook@canonical.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      8767008a
    • Julian Anastasov's avatar
      ipvs: fix dst_lock locking on dest update · 46d5660d
      Julian Anastasov authored
      commit ff75f40f
      
       upstream.
      
      	Fix dst_lock usage in __ip_vs_update_dest. We need
      _bh locking because destination is updated in user context.
      Can cause lockups on frequent destination updates.
      Problem reported by Simon Kirby. Bug was introduced
      in 2.6.37 from the "ipvs: changes for local real server"
      change.
      Signed-off-by: default avatarJulian Anastasov <ja@ssi.bg>
      Signed-off-by: default avatarHans Schillstrom <hans@schillstrom.com>
      Signed-off-by: default avatarSimon Horman <horms@verge.net.au>
      Cc: Simon Kirby <sim@hostway.ca>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      46d5660d
    • Chuck Lever's avatar
      NFS: NFSv4 readdir loses entries · 73c2a0d9
      Chuck Lever authored
      commit d1205f87 upstream.
      
      On recent 2.6.38-rc kernels, connectathon basic test 6 fails on
      NFSv4 mounts of OpenSolaris with something like:
      
      > ./test6: readdir
      > 	./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0
      > 	./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0
      > 	./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0
      > 	./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors
      > basic tests failed
      > Tests failed, leaving /mnt/klimt mounted
      > [cel@matisse cthon04]$
      
      I narrowed the problem down to nfs4_decode_dirent() reporting that the
      decode buffer had overflowed while decoding the entries for those
      missing files.
      
      verify_attr_len() assumes both it's pointer arguments reside on the
      same page.  When these arguments point to locations on two different
      pages, verify_attr_len() can report false errors.  This can happen now
      that a large NFSv4 readdir result can span pages.
      
      We have reasonably good checking in nfs4_decode_dirent() anyway, so
      it should be safe to simply remove the extra checking.
      
      At a guess, this was introduced by commit 6650239a
      
      , "NFS: Don't use
      vm_map_ram() in readdir".
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      73c2a0d9
    • Benjamin Tissoires's avatar
      HID: hid-mosart: ignore buttons report · 1eac7b1d
      Benjamin Tissoires authored
      commit ad6d4267
      
       upstream.
      
      This commit allows the device to be recognized as a touchscreen, and not a
      touchpad by xf86-input-evdev.
      
      The device has 2 modes. The first one is an emulation of a touchscreen by
      sending left and right button, and the second mode is the one used in
      dual-touch (sending trackingID, touch and else).
      
      That's why there is a hid report containing left and right buttons
      (9000001 and 9000002). The point is that xorg relies on these fields to
      determine if it's a touchpad or a touchscreen.
      Clearing the report (return -1) makes xorg detecting it out of the box
      as a quite pleasant (dual)touchscreen.
      Signed-off-by: default avatarBenjamin Tissoires <benjamin.tissoires@enac.fr>
      Acked-by: default avatarChase Douglas <chase.douglas@canonical.com>
      Signed-off-by: default avatarJiri Kosina <jkosina@suse.cz>
      Cc: James Sharam <james.sharam@googlemail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      1eac7b1d
    • roel's avatar
      nfsd: wrong index used in inner loop · b3d26c5d
      roel authored
      commit 3ec07aa9
      
       upstream.
      
      Index i was already used in the outer loop
      Signed-off-by: default avatarRoel Kluin <roel.kluin@gmail.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      b3d26c5d
    • Naga Chumbalkar's avatar
      pcc-cpufreq: don't load driver if get_freq fails during init. · ace8f453
      Naga Chumbalkar authored
      commit 1f858ef2
      
       upstream.
      
      Return 0 on failure. This will cause the initialization of the driver
      to fail and prevent the driver from loading if the BIOS cannot handle
      the PCC interface command to "get frequency". Otherwise, the driver
      will load and display a very high value like "4294967274" (which is
      actually -EINVAL) for frequency:
      
      # cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq
      4294967274
      Signed-off-by: default avatarNaga Chumbalkar <nagananda.chumbalkar@hp.com>
      Signed-off-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ace8f453
    • Jan Engelhardt's avatar
      netfilter: nf_log: avoid oops in (un)bind with invalid nfproto values · 2110938f
      Jan Engelhardt authored
      commit 9ef0298a
      
       upstream.
      
      Like many other places, we have to check that the array index is
      within allowed limits, or otherwise, a kernel oops and other nastiness
      can ensue when we access memory beyond the end of the array.
      
      [ 5954.115381] BUG: unable to handle kernel paging request at 0000004000000000
      [ 5954.120014] IP:  __find_logger+0x6f/0xa0
      [ 5954.123979]  nf_log_bind_pf+0x2b/0x70
      [ 5954.123979]  nfulnl_recv_config+0xc0/0x4a0 [nfnetlink_log]
      [ 5954.123979]  nfnetlink_rcv_msg+0x12c/0x1b0 [nfnetlink]
      ...
      
      The problem goes back to v2.6.30-rc1~1372~1342~31 where nf_log_bind
      was decoupled from nf_log_register.
      
      Reported-by: Miguel Di Ciurcio Filho <miguel.filho@gmail.com>,
        via irc.freenode.net/#netfilter
      Signed-off-by: default avatarJan Engelhardt <jengelh@medozas.de>
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      2110938f
    • Hugh Dickins's avatar
      mm: fix possible cause of a page_mapped BUG · dae78169
      Hugh Dickins authored
      commit a3e8cc64
      
       upstream.
      
      Robert Swiecki reported a BUG_ON(page_mapped) from a fuzzer, punching
      a hole with madvise(,, MADV_REMOVE).  That path is under mutex, and
      cannot be explained by lack of serialization in unmap_mapping_range().
      
      Reviewing the code, I found one place where vm_truncate_count handling
      should have been updated, when I switched at the last minute from one
      way of managing the restart_addr to another: mremap move changes the
      virtual addresses, so it ought to adjust the restart_addr.
      
      But rather than exporting the notion of restart_addr from memory.c, or
      converting to restart_pgoff throughout, simply reset vm_truncate_count
      to 0 to force a rescan if mremap move races with preempted truncation.
      
      We have no confirmation that this fixes Robert's BUG,
      but it is a fix that's worth making anyway.
      Signed-off-by: default avatarHugh Dickins <hughd@google.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Kerin Millar <kerframil@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      dae78169
    • Don Skidmore's avatar
      ixgbe: fix for 82599 erratum on Header Splitting · 1e405a0b
      Don Skidmore authored
      commit a124339a
      
       upstream.
      
      We have found a hardware erratum on 82599 hardware that can lead to
      unpredictable behavior when Header Splitting mode is enabled.  So
      we are no longer enabling this feature on affected hardware.
      
      Please see the 82599 Specification Update for more information.
      Signed-off-by: default avatarDon Skidmore <donald.c.skidmore@intel.com>
      Tested-by: default avatarStephen Ko <stephen.s.ko@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      1e405a0b
    • Mohammed Shafi Shajakhan's avatar
      ath9k: Fix ath9k prevents CPU to enter C3 states · 4ae96ec5
      Mohammed Shafi Shajakhan authored
      This is a backport of upstream commit 0f5cd459.
      
      The DMA latency issue is observed only in Intel pinetrail platforms
      but in the driver we had a default PM-QOS value of 55. This caused
      unnecessary power consumption and battery drain in other platforms.
      
      Remove the pm-qos thing in the driver code and address the throughput
      issue in Intel pinetrail platfroms in user space using any one of
      the scripts in below links:
      
      http://www.kernel.org/pub/linux/kernel/people/mcgrof/scripts/cpudmalatency.c
      http://johannes.sipsolutions.net/files/netlatency.c.txt
      
      More details can be found in the following bugzilla link:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=27532
      
      Signed-off-by: default avatarThomas Bächler <thomas@archlinux.org>
      Acked-by: default avatarMohammed Shafi Shajakhan <mshajakhan@atheros.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      4ae96ec5
    • Anton Blanchard's avatar
      RxRPC: Fix v1 keys · 5f7d5448
      Anton Blanchard authored
      commit f009918a upstream.
      
      commit 33941284
      
       (RxRPC: Allow key payloads to be passed in XDR form)
      broke klog for me. I notice the v1 key struct had a kif_version field
      added:
      
      -struct rxkad_key {
      -       u16     security_index;         /* RxRPC header security index */
      -       u16     ticket_len;             /* length of ticket[] */
      -       u32     expiry;                 /* time at which expires */
      -       u32     kvno;                   /* key version number */
      -       u8      session_key[8];         /* DES session key */
      -       u8      ticket[0];              /* the encrypted ticket */
      -};
      
      +struct rxrpc_key_data_v1 {
      +       u32             kif_version;            /* 1 */
      +       u16             security_index;
      +       u16             ticket_length;
      +       u32             expiry;                 /* time_t */
      +       u32             kvno;
      +       u8              session_key[8];
      +       u8              ticket[0];
      +};
      
      However the code in rxrpc_instantiate strips it away:
      
      	data += sizeof(kver);
      	datalen -= sizeof(kver);
      
      Removing kif_version fixes my problem.
      Signed-off-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      5f7d5448
    • Neil Horman's avatar
      nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) · 4ff51e31
      Neil Horman authored
      commit e9e3d724
      
       upstream.
      
      The "bad_page()" page allocator sanity check was reported recently (call
      chain as follows):
      
        bad_page+0x69/0x91
        free_hot_cold_page+0x81/0x144
        skb_release_data+0x5f/0x98
        __kfree_skb+0x11/0x1a
        tcp_ack+0x6a3/0x1868
        tcp_rcv_established+0x7a6/0x8b9
        tcp_v4_do_rcv+0x2a/0x2fa
        tcp_v4_rcv+0x9a2/0x9f6
        do_timer+0x2df/0x52c
        ip_local_deliver+0x19d/0x263
        ip_rcv+0x539/0x57c
        netif_receive_skb+0x470/0x49f
        :virtio_net:virtnet_poll+0x46b/0x5c5
        net_rx_action+0xac/0x1b3
        __do_softirq+0x89/0x133
        call_softirq+0x1c/0x28
        do_softirq+0x2c/0x7d
        do_IRQ+0xec/0xf5
        default_idle+0x0/0x50
        ret_from_intr+0x0/0xa
        default_idle+0x29/0x50
        cpu_idle+0x95/0xb8
        start_kernel+0x220/0x225
        _sinittext+0x22f/0x236
      
      It occurs because an skb with a fraglist was freed from the tcp
      retransmit queue when it was acked, but a page on that fraglist had
      PG_Slab set (indicating it was allocated from the Slab allocator (which
      means the free path above can't safely free it via put_page.
      
      We tracked this back to an nfsv4 setacl operation, in which the nfs code
      attempted to fill convert the passed in buffer to an array of pages in
      __nfs4_proc_set_acl, which gets used by the skb->frags list in
      xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
      to a page struct via virt_to_page, but the vfs allocates the buffer via
      kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
      kmalloc and free it later in the tcp ack path with put_page, so we need
      to either:
      
      1) ensure that when we create the list of pages, no page struct has
         PG_Slab set
      
       or
      
      2) not use a page list to send this data
      
      Given that these buffers can be multiple pages and arbitrarily sized, I
      think (1) is the right way to go.  I've written the below patch to
      allocate a page from the buddy allocator directly and copy the data over
      to it.  This ensures that we have a put_page free-able page for every
      entry that winds up on an skb frag list, so it can be safely freed when
      the frame is acked.  We do a put page on each entry after the
      rpc_call_sync call so as to drop our own reference count to the page,
      leaving only the ref count taken by tcp_sendpages.  This way the data
      will be properly freed when the ack comes in
      
      Successfully tested by myself to solve the above oops.
      
      Note, as this is the result of a setacl operation that exceeded a page
      of data, I think this amounts to a local DOS triggerable by an
      uprivlidged user, so I'm CCing security on this as well.
      Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      CC: Trond Myklebust <Trond.Myklebust@netapp.com>
      CC: Jeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      4ff51e31
    • Axel Lin's avatar
      drivers/misc/bmp085.c: add MODULE_DEVICE_TABLE · c2cd42c5
      Axel Lin authored
      commit 97e419a0
      
       upstream.
      
      The device table is required to load modules based on modaliases.
      Signed-off-by: default avatarAxel Lin <axel.lin@gmail.com>
      Cc: Shubhrajyoti D <shubhrajyoti@ti.com>
      Cc: Christoph Mair <christoph.mair@gmail.com>
      Cc: Jonathan Cameron <jic23@cam.ac.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      c2cd42c5
    • Takashi Iwai's avatar
      ALSA: hda - Don't set to D3 in Cirrus errata init verbs · 6ba0ed9b
      Takashi Iwai authored
      commit 38c07641
      
       upstream.
      
      The errata init verbs for CS42xx codecs contain the verbs to set
      the power-state of SPDIF nodes to D3, which seem to break the SPDIF
      output on some MacBooks.  Since this is executed during the power-up
      initialization, we shouldn't turn them down there.
      Reported-by: default avatarArun Raghavan <arun.raghavan@collabora.co.uk>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      6ba0ed9b
    • David Henningsson's avatar
      ALSA: HDA: Realtek: Fixup jack detection to input subsystem · 6b01a69d
      David Henningsson authored
      commit f0ce2799
      
       upstream.
      
      This patch fixes an error in the jack detection reporting,
      causing the jack detection sometimes not to be reported
      correctly to the input subsystem. It should apply to several
      Realtek codecs.
      Signed-off-by: default avatarDavid Henningsson <david.henningsson@canonical.com>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      6b01a69d
    • Mark Brown's avatar
      ASoC: Fix WM9081 platform data initialisation · 39ebd87f
      Mark Brown authored
      commit 3ee845ac
      
       upstream.
      
      It went AWOL in the multi-component conversion.
      Signed-off-by: default avatarMark Brown <broonie@opensource.wolfsonmicro.com>
      Acked-by: default avatarLiam Girdwood <lrg@slimlogic.co.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      39ebd87f
    • Dan Carpenter's avatar
      keyboard: integer underflow bug · 4a160e5f
      Dan Carpenter authored
      commit b652277b
      
       upstream.
      
      The "ct" variable should be an unsigned int.  Both struct kbdiacrs
      ->kb_cnt and struct kbd_data ->accent_table_size are unsigned ints.
      
      Making it signed causes a problem in KBDIACRUC because the user could
      set the signed bit and cause a buffer overflow.
      Signed-off-by: default avatarDan Carpenter <error27@gmail.com>
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      4a160e5f
    • Amit Shah's avatar
      virtio: console: Don't access vqs if device was unplugged · 310f5e48
      Amit Shah authored
      commit d7a62cd0
      
       upstream.
      
      If a virtio-console device gets unplugged while a port is open, a
      subsequent close() call on the port accesses vqs to free up buffers.
      This can lead to a crash.
      
      The buffers are already freed up as a result of the call to
      unplug_ports() from virtcons_remove().  The fix is to simply not access
      vq information if port->portdev is NULL.
      Reported-by: default avatarjuzhang <juzhang@redhat.com>
      Signed-off-by: default avatarAmit Shah <amit.shah@redhat.com>
      Signed-off-by: default avatarRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      310f5e48
    • Li Zefan's avatar
      cpuset: add a missing unlock in cpuset_write_resmask() · 381d256c
      Li Zefan authored
      commit b75f38d6
      
       upstream.
      
      Don't forget to release cgroup_mutex if alloc_trial_cpuset() fails.
      
      [akpm@linux-foundation.org: avoid multiple return points]
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Acked-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      381d256c
  2. 07 Mar, 2011 14 commits
    • Greg Kroah-Hartman's avatar
      Linux 2.6.37.3 · af53c4ea
      Greg Kroah-Hartman authored
      af53c4ea
    • Ian Campbell's avatar
      arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS. · de243d98
      Ian Campbell authored
      commit d11327ad
      
       upstream.
      
      NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link
      notification while NETDEV_UP/NETDEV_CHANGEADDR generate link
      notifications as a sort of side effect.
      
      In the later cases the sysctl option is present because link
      notification events can have undesired effects e.g. if the link is
      flapping. I don't think this applies in the case of an explicit
      request from a driver.
      
      This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we
      could add a new sysctl for this case which defaults to on.
      
      This change causes Xen post-migration ARP notifications (which cause
      switches to relearn their MAC tables etc) to be sent by default.
      Signed-off-by: default avatarIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      [reported to solve hyperv live migration problem - gkh]
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Mike Surcouf <mike@surcouf.co.uk>
      Cc: Hank Janssen <hjanssen@microsoft.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      de243d98
    • David Howells's avatar
      DNS: Fix a NULL pointer deref when trying to read an error key [CVE-2011-1076] · 6a5c4eb0
      David Howells authored
      commit 1362fa07 upstream.
      
      When a DNS resolver key is instantiated with an error indication, attempts to
      read that key will result in an oops because user_read() is expecting there to
      be a payload - and there isn't one [CVE-2011-1076].
      
      Give the DNS resolver key its own read handler that returns the error cached in
      key->type_data.x[0] as an error rather than crashing.
      
      Also make the kenter() at the beginning of dns_resolver_instantiate() limit the
      amount of data it prints, since the data is not necessarily NUL-terminated.
      
      The buggy code was added in:
      
      	commit 4a2d7892
      
      
      	Author: Wang Lei <wang840925@gmail.com>
      	Date:   Wed Aug 11 09:37:58 2010 +0100
      	Subject: DNS: If the DNS server returns an error, allow that to be cached [ver #2]
      
      This can trivially be reproduced by any user with the following program
      compiled with -lkeyutils:
      
      	#include <stdlib.h>
      	#include <keyutils.h>
      	#include <err.h>
      	static char payload[] = "#dnserror=6";
      	int main()
      	{
      		key_serial_t key;
      		key = add_key("dns_resolver", "a", payload, sizeof(payload),
      			      KEY_SPEC_SESSION_KEYRING);
      		if (key == -1)
      			err(1, "add_key");
      		if (keyctl_read(key, NULL, 0) == -1)
      			err(1, "read_key");
      		return 0;
      	}
      
      What should happen is that keyctl_read() reports error 6 (ENXIO) to the user:
      
      	dns-break: read_key: No such device or address
      
      but instead the kernel oopses.
      
      This cannot be reproduced with the 'keyutils add' or 'keyutils padd' commands
      as both of those cut the data down below the NUL termination that must be
      included in the data.  Without this dns_resolver_instantiate() will return
      -EINVAL and the key will not be instantiated such that it can be read.
      
      The oops looks like:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
      IP: [<ffffffff811b99f7>] user_read+0x4f/0x8f
      PGD 3bdf8067 PUD 385b9067 PMD 0
      Oops: 0000 [#1] SMP
      last sysfs file: /sys/devices/pci0000:00/0000:00:19.0/irq
      CPU 0
      Modules linked in:
      
      Pid: 2150, comm: dns-break Not tainted 2.6.38-rc7-cachefs+ #468                  /DG965RY
      RIP: 0010:[<ffffffff811b99f7>]  [<ffffffff811b99f7>] user_read+0x4f/0x8f
      RSP: 0018:ffff88003bf47f08  EFLAGS: 00010246
      RAX: 0000000000000001 RBX: ffff88003b5ea378 RCX: ffffffff81972368
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003b5ea378
      RBP: ffff88003bf47f28 R08: ffff88003be56620 R09: 0000000000000000
      R10: 0000000000000395 R11: 0000000000000002 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: ffffffffffffffa1
      FS:  00007feab5751700(0000) GS:ffff88003e000000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000010 CR3: 000000003de40000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process dns-break (pid: 2150, threadinfo ffff88003bf46000, task ffff88003be56090)
      Stack:
       ffff88003b5ea378 ffff88003b5ea3a0 0000000000000000 0000000000000000
       ffff88003bf47f68 ffffffff811b708e ffff88003c442bc8 0000000000000000
       00000000004005a0 00007fffba368060 0000000000000000 0000000000000000
      Call Trace:
       [<ffffffff811b708e>] keyctl_read_key+0xac/0xcf
       [<ffffffff811b7c07>] sys_keyctl+0x75/0xb6
       [<ffffffff81001f7b>] system_call_fastpath+0x16/0x1b
      Code: 75 1f 48 83 7b 28 00 75 18 c6 05 58 2b fb 00 01 be bb 00 00 00 48 c7 c7 76 1c 75 81 e8 13 c2 e9 ff 4c 8b b3 e0 00 00 00 4d 85 ed <41> 0f b7 5e 10 74 2d 4d 85 e4 74 28 e8 98 79 ee ff 49 39 dd 48
      RIP  [<ffffffff811b99f7>] user_read+0x4f/0x8f
       RSP <ffff88003bf47f08>
      CR2: 0000000000000010
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Acked-by: default avatarJeff Layton <jlayton@redhat.com>
      cc: Wang Lei <wang840925@gmail.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      6a5c4eb0
    • Bruce Allan's avatar
      e1000e: disable broken PHY wakeup for ICH10 LOMs, use MAC wakeup instead · ef4fba5d
      Bruce Allan authored
      commit 4def99bb upstream.
      
      When support for 82577/82578 was added[1] in 2.6.31, PHY wakeup was in-
      advertently enabled (even though it does not function properly) on ICH10
      LOMs.  This patch makes it so that the ICH10 LOMs use MAC wakeup instead
      as was done with the initial support for those devices (i.e. 82567LM-3,
      82567LF-3 and 82567V-4).
      
      [1] commit a4f58f54
      
      Reported-by: default avatarAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: default avatarBruce Allan <bruce.w.allan@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      ef4fba5d
    • Gerrit Renker's avatar
      dccp: fix oops on Reset after close · 850d1ff2
      Gerrit Renker authored
      commit 720dc34b
      
       upstream.
      
      This fixes a bug in the order of dccp_rcv_state_process() that still permitted
      reception even after closing the socket. A Reset after close thus causes a NULL
      pointer dereference by not preventing operations on an already torn-down socket.
      
       dccp_v4_do_rcv()
      	|
      	| state other than OPEN
      	v
       dccp_rcv_state_process()
      	|
      	| DCCP_PKT_RESET
      	v
       dccp_rcv_reset()
      	|
      	v
       dccp_time_wait()
      
       WARNING: at net/ipv4/inet_timewait_sock.c:141 __inet_twsk_hashdance+0x48/0x128()
       Modules linked in: arc4 ecb carl9170 rt2870sta(C) mac80211 r8712u(C) crc_ccitt ah
       [<c0038850>] (unwind_backtrace+0x0/0xec) from [<c0055364>] (warn_slowpath_common)
       [<c0055364>] (warn_slowpath_common+0x4c/0x64) from [<c0055398>] (warn_slowpath_n)
       [<c0055398>] (warn_slowpath_null+0x1c/0x24) from [<c02b72d0>] (__inet_twsk_hashd)
       [<c02b72d0>] (__inet_twsk_hashdance+0x48/0x128) from [<c031caa0>] (dccp_time_wai)
       [<c031caa0>] (dccp_time_wait+0x40/0xc8) from [<c031c15c>] (dccp_rcv_state_proces)
       [<c031c15c>] (dccp_rcv_state_process+0x120/0x538) from [<c032609c>] (dccp_v4_do_)
       [<c032609c>] (dccp_v4_do_rcv+0x11c/0x14c) from [<c0286594>] (release_sock+0xac/0)
       [<c0286594>] (release_sock+0xac/0x110) from [<c031fd34>] (dccp_close+0x28c/0x380)
       [<c031fd34>] (dccp_close+0x28c/0x380) from [<c02d9a78>] (inet_release+0x64/0x70)
      
      The fix is by testing the socket state first. Receiving a packet in Closed state
      now also produces the required "No connection" Reset reply of RFC 4340, 8.3.1.
      Reported-and-tested-by: default avatarJohan Hovold <jhovold@gmail.com>
      Signed-off-by: default avatarGerrit Renker <gerrit@erg.abdn.ac.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      850d1ff2
    • Stanislaw Gruszka's avatar
      r8169: disable ASPM · 048882f2
      Stanislaw Gruszka authored
      commit ba04c7c9 upstream.
      
      For some time is known that ASPM is causing troubles on r8169, i.e. make
      device randomly stop working without any errors in dmesg.
      
      Currently Tomi Leppikangas reports that system with r8169 device hangs
      with MCE errors when ASPM is enabled:
      https://bugzilla.redhat.com/show_bug.cgi?id=642861#c4
      
      
      
      Lets disable ASPM for r8169 devices at all, to avoid problems with
      r8169 PCIe devices at least for some users.
      Reported-by: default avatarTomi Leppikangas <tomi.leppikangas@gmail.com>
      Signed-off-by: default avatarStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      048882f2
    • Jan Puk's avatar
      carl9170: add Airlive X.USB a/b/g/n USBID · 5d70c044
      Jan Puk authored
      commit c86664e5
      
       upstream.
      
      "AirLive X.USB now works perfectly under a Linux
      environment!"
      Signed-off-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      5d70c044
    • Ryusuke Konishi's avatar
      nilfs2: fix regression that i-flag is not set on changeless checkpoints · 401f84f1
      Ryusuke Konishi authored
      commit 72746ac6
      
       upstream.
      
      According to the report from Jiro SEKIBA titled "regression in
      2.6.37?"  (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and
      later kernels, lscp command no longer displays "i" flag on checkpoints
      that snapshot operations or garbage collection created.
      
      This is a regression of nilfs2 checkpointing function, and it's
      critical since it broke behavior of a part of nilfs2 applications.
      For instance, snapshot manager of TimeBrowse gets to create
      meaningless snapshots continuously; snapshot creation triggers another
      checkpoint, but applications cannot distinguish whether the new
      checkpoint contains meaningful changes or not without the i-flag.
      
      This patch fixes the regression and brings that application behavior
      back to normal.
      Reported-by: default avatarJiro SEKIBA <jir@unicus.jp>
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: default avatarJiro SEKIBA <jir@unicus.jp>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      401f84f1
    • Christian Lamparter's avatar
      p54usb: add Senao NUB-350 usbid · cea4131a
      Christian Lamparter authored
      commit 2b799a6b
      
       upstream.
      
      Reported-by: Mark Davis
      Signed-off-by: default avatarChristian Lamparter <chunkeey@googlemail.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      cea4131a
    • Sujith Manoharan's avatar
      ath9k_htc: Fix an endian issue · 523e1947
      Sujith Manoharan authored
      commit 2c27392d
      
       upstream.
      
      The stream length/tag fields have to be in little endian
      format. Fixing this makes the driver work on big-endian
      platforms.
      
      Tested-by: raghunathan.kailasanathan@wipro.com
      Signed-off-by: default avatarSujith Manoharan <Sujith.Manoharan@atheros.com>
      Signed-off-by: default avatarJohn W. Linville <linville@tuxdriver.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      523e1947
    • Petr Uzel's avatar
      block: kill loop_mutex · f90b4780
      Petr Uzel authored
      commit fd51469f upstream.
      
      Following steps lead to deadlock in kernel:
      
      dd if=/dev/zero of=img bs=512 count=1000
      losetup -f img
      mkfs.ext2 /dev/loop0
      mount -t ext2 -o loop /dev/loop0 mnt
      umount mnt/
      
      Stacktrace:
      [<c102ec04>] irq_exit+0x36/0x59
      [<c101502c>] smp_apic_timer_interrupt+0x6b/0x75
      [<c127f639>] apic_timer_interrupt+0x31/0x38
      [<c101df88>] mutex_spin_on_owner+0x54/0x5b
      [<fe2250e9>] lo_release+0x12/0x67 [loop]
      [<c10c4eae>] __blkdev_put+0x7c/0x10c
      [<c10a4da5>] fput+0xd5/0x1aa
      [<fe2250cf>] loop_clr_fd+0x1a9/0x1b1 [loop]
      [<fe225110>] lo_release+0x39/0x67 [loop]
      [<c10c4eae>] __blkdev_put+0x7c/0x10c
      [<c10a59d9>] deactivate_locked_super+0x17/0x36
      [<c10b6f37>] sys_umount+0x27e/0x2a5
      [<c10b6f69>] sys_oldumount+0xb/0xe
      [<c1002897>] sysenter_do_call+0x12/0x26
      [<ffffffff>] 0xffffffff
      
      Regression since 2a48fc0a, which introduced the private
      loop_mutex as part of the BKL removal process.
      
      As per [1], the mutex can be safely removed.
      
      [1] http://www.gossamer-threads.com/lists/linux/kernel/1341930
      
      Addresses: https://bugzilla.novell.com/show_bug.cgi?id=669394
      Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=29172
      
      Signed-off-by: default avatarPetr Uzel <petr.uzel@suse.cz>
      Reviewed-by: default avatarNikanth Karthikesan <knikanth@suse.de>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      f90b4780
    • Tejun Heo's avatar
      block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue() · 29bf78d4
      Tejun Heo authored
      commit 255bb490 upstream.
      
      blk-flush decomposes a flush into sequence of multiple requests.  On
      completion of a request, the next one is queued; however, block layer
      must not implicitly call into q->request_fn() directly from completion
      path.  This makes the queue behave unexpectedly when seen from the
      drivers and violates the assumption that q->request_fn() is called
      with process context + queue_lock.
      
      This patch makes blk-flush the following two changes to make sure
      q->request_fn() is not called directly from request completion path.
      
      - blk_flush_complete_seq_end_io() now asks __blk_run_queue() to always
        use kblockd instead of calling directly into q->request_fn().
      
      - queue_next_fseq() uses ELEVATOR_INSERT_REQUEUE instead of
        ELEVATOR_INSERT_FRONT so that elv_insert() doesn't try to unplug the
        request queue directly.
      
      Reported by Jan in the following threads.
      
       http://thread.gmane.org/gmane.linux.ide/48778
       http://thread.gmane.org/gmane.linux.ide/48786
      
      
      
      stable: applicable to v2.6.37.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Reported-by: default avatarJan Beulich <JBeulich@novell.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      29bf78d4
    • Tejun Heo's avatar
      block: add @force_kblockd to __blk_run_queue() · 02d63a7d
      Tejun Heo authored
      commit 1654e741
      
       upstream.
      
      __blk_run_queue() automatically either calls q->request_fn() directly
      or schedules kblockd depending on whether the function is recursed.
      blk-flush implementation needs to be able to explicitly choose
      kblockd.  Add @force_kblockd.
      
      All the current users are converted to specify %false for the
      parameter and this patch doesn't introduce any behavior change.
      
      stable: This is prerequisite for fixing ide oops caused by the new
              blk-flush implementation.
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      02d63a7d
    • Vivek Goyal's avatar
      blk-throttle: Do not use kblockd workqueue for throtl work · e8157889
      Vivek Goyal authored
      commit 450adcbe upstream.
      
      o Dominik Klein reported a system hang issue while doing some blkio
        throttling testing.
      
        https://lkml.org/lkml/2011/2/24/173
      
      
      
      o Some tracing revealed that CFQ was not dispatching any more jobs as
        queue unplug was not happening. And queue unplug was not happening
        because unplug work was not being called as there was one throttling
        work on same cpu which as not finished yet. And throttling work had not
        finished as it was tyring to dispatch a bio to CFQ but all the request
        descriptors were consume to it was put to sleep.
      
      o So basically it is a cyclic dependecny between CFQ unplug work and
        throtl dispatch work. Tejun suggested that use separate workqueue for
        such cases.
      
      o This patch uses a separate workqueue for throttle related work and
        does not rely on kblockd workqueue anymore.
      Reported-by: default avatarDominik Klein <dk@in-telegence.net>
      Signed-off-by: default avatarVivek Goyal <vgoyal@redhat.com>
      Acked-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarJens Axboe <jaxboe@fusionio.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      e8157889