Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance-after-reboot-or-bios-change workaround is to power cycle, bug fixed in PRO/EVO firmware 2B6QCXP7/2B7QCXE7

Posted by Paul Braren on Jan 2 2017 (updated on Jul 24 2017) in
  • Storage
  • HowTo
  • This article has had several significant additions since initially published, to incorporate new findings from amazingly helpful readers who commented below, including a new 960 EVO firmware 2B7QCXE7 that seems to have solved the issue for me on Jan 29 2017, and firmware 2B6QCXP7 fixed the PRO several months later, once I was finally able to download it.

    I'm apparently starting out 2017 smoothing out bumps-in-the-road that a few home lab enthusiasts ran into recently during their normally fun holiday season of home lab rebuilds. 'Twas the season for many IT workers to be off from work, myself included. This is my second such article this year, but at least we've put these issues to rest quickly, together. It's times like these that I'm so glad I have a web site with comments, so we can all spend more time working and playing than fixing.

    Backstory

    qualified-m2-on-supermicro-site-jan-2017
    Supermicro-M.2-List
    Click on image to launch full view of the complete table.

    Turns out Supermicro is making progress with M.2 NVMe SSD support for Xeon D by posting links to their Tested M.2 List on their Xeon D Product Pages, screenshot samples above. Even though the list only includes SSDs they've tested by Intel, Micron, and Toshiba, it's still a baby step forward. Supermicro support folks have actually told me that they didn't support M.2 NVMe drives at all, only the much slower AHCI M.2 drives or SATA3 SSDs. Yikes! Of course, their scripted words were understandably horrifying for me personally, given how many turn-key Bundle 2 systems had already shipped. So I politely pointed out how wonderfully well the Samsung 950 PRO was working with my SYS-5028D-TN4T, for many months, with any OS I threw at it. I even shared some of the fastest benchmarks out there that featured there products:

    how-to-boot-from-NVMe-google-search-Jan-02-2017

    Despite all that, and the year that has gone by since, Supermicro apparently still has an inexplicable reluctance to testing Samsung M.2 NVMe drives, which might well be the most popular M.2 NVMe drives on Amazon for over a year now. It would seem they could sure use a friend at Samsung, and a budget to give them time to expand well beyond their current list of only SATA3 M.2 devices, and half-height PCIe devices. I would propose they divert most of those testing funds toward the forward-looking M.2 NVMe form-factor instead, since apparently it's needed now that we've seen this strange issue crop up. It still makes me nervous that no M.2 NVMe devices have made their list yet.

    First sign of a problem, along with a power-cycle fix (phew!)

    Robert from Austria reporting a the strange Samsung 960 PRO disappearance behavior, as one of the very first reports of anybody getting their hands on one of these world's fastest "gumstick" storage devices, anywhere. Well, somebody who wasn't a lucky blogger receiving an early review unit anyway. Just one data point and something for me to watch, but with my own early pre-order messed up and delayed, so all I could really do was wait and see:

    More reports came in

    wdmia reporting his 960 EVO experiencing the same issue, with the same fix working. Now this really had my attention.

    TheGrayGhost reporting his Supermicro case #SM1701012591, a 960 EVO with a Flex ATX Xeon D motherboard with the latest BIOS 1.0b, same issue, same fix. Dang, this is all Xeon D, Mini-ITX and Flex ATX models, aka, all SuperServers with X10SDV in the motherboard model number.

    The problem

    Supermicro SuperServer owner goes into their BIOS and saves their changes. The OS then boots, but doesn't "see" the 960 PRO or EVO drives, at all.

    The workaround

    After saving any BIOS changes, don't just power down, but also remove the power cord for a few seconds. When plugging back in and powering up, the drives are visible to whatever OS you're running again, with no data lost.

    This is more like a workaround, while we wait for a proper fix that will likely come in the form of a BIOS update at some future date.

    This is one of the stranger issues I've seen, and I don't have any technical explanation for it, that's be pure conjecture. This is a relatively minor inconvenience, but I'd say it could be pretty darn serious if you don't have a way to remove all power from a remote system!

    Of course, I've reported this list of reports, and the below video, to Supermicro.

    The fix

    There is currently no known fix, it's likely a proper fix would involve installing a future BIOS update, whatever comes after BIOS 1.1c for Mini ITX models, or after BIOS 1.0b for Flex ATX models.

    Video

    I was able to easily replicate the issue today using my Supermicro SuperServer SYS-5028D-TN4T system today, as seen in the below video. I'm currently using BIOS 1.1c and IPMI 3.46. No data loss, only time and a bit of product confidence lost.

    How to fix Samsung 960 PRO/EVO M.2 NVMe drive disappearance at Supermicro Xeon D reboot/BIOS change

    Jan 07 2017 Update

    I am so glad I published this article, because comments are coming in that this problem seems to be more serious than I initially thought. Wanted to be sure folks see what blogthis wrote at 8:06pm ET today:

    I have additional serious information on this M2 Samsung 960 PRO NVMe issue (1TB).
    Important: This is WITHOUT ever entering the BIOS setup screen whatsoever.
    Reproduced on: X10SDV-TLN4F (SYS-5028D-TN4T-12C). BIOS: 1.1c. IPMI: 3.46.
    --Scenario-1:--
    Configure internal SATA drive to boot any OS, in my case Win10x64; graceful shutdown.
    Insert a bootable USB stick, in my situation one that has ESXi 6 installed.
    BIOS auto-inserts the new boot device in its list, and changes its boot order to accommodate.
    System hangs, reboots, then exhibits the problem discussed in this article (NVMe vanishes until unplug-replug physical power).
    --Scenario-2:--
    Configure IPMI to boot from external ISO, in my case IPMI 3.46 uses HTML5 Net Path.
    BIOS auto-changes the boot device name and settings in its boot device list.
    System then exhibits the problem discussed in this article (NVMe vanishes until unplug-replug physical power).
    --Summary--
    You can reproduce this issue multiple ways, WITHOUT ever even entering the BIOS setup. I feel this makes this issue worthy of escalation from a mere inconvenience to a seriously high priority issue. [edit: IPMI typo v1.46 corrected to v3.46]


    Jan 08 2017 Update

    Gert commented:

    Thanks to reporting what you have discovered. I can also report, that it is not possible to run vSAN with Samsung SM961 MZVKW512HMJP. It is possible to configure vSAN and everything is reported okay, but when you try to created a VM and install OS, the connection to the NVMe drive is unstable/lost, so the installation or operation is hanging/stalled. When you use the NVMe drive as a a VMFS partition every works fine until you shut down the server and the NVMe drive disappear. I don't have the time to go through the log files, but I agree, that Supermicro has serious issues with NVMe drives from Samsung, so I warns against buying the drives for Supermicro servers. Supermicro has confirmed in an email to me that the drive is supported :-(

    This is obviously a serious bug here. While it's unlikely full vSAN support (see VMware vSAN HCL) will arrive for consumer drives like the 960 series because of the generally lower TBW ratings than enterprise drives and lack of supercapacitors, the sort of nasty behavior that Gert reports makes even simple experimentation with an unsupported vSAN quite challenging.

    There's more from blogthis, who also wrote:

    Thank you for that, hopefully this helps too..

    Did you catch another reader's post that m.2 NVMe SSD's are indeed supported as per this official SuperMicro link(?):
    https://www.supermicro.nl/products/nfo/M.2.cfm

    Duplicate info on their USA/Global site:
    https://www.supermicro.com/products/nfo/M.2.cfm

    And yes, SuperMicro is now listing official support for Samsung-brand m.2 NVMe drives:
    https://www.supermicro.com/products/nfo/M.2.cfm?pg=Vendors&show=SELECT&type=Samsung#Vendor

    (You may want to pass along to your SuperMicro contact, that it appears SuperMicro is now specifically ADVERTISING Samsung m.2 NVMe compatibility.... so hopefully that lights a fire and motivates them to push this issue up their support-chain for the greater good).

    Edit: The direct-link to Samsung m.2 NVMe drives has an "&" in their URL so it won't save here properly, but if you click the link, then the "Samsung" link for "Show Qualified M.2-SSD SKUs" on that page, you'll see the list..

    AND: On that same page, note Supermicro has listed in the "..Systems which support M.2-NVMe-SSD: (Mini-Tower Systems.. SYS-5028D-TN4T).." I think that says pretty clearly this is intended to work.

    I've changed the article title accordingly, from:
    How to fix Samsung 960 PRO/EVO M.2 NVMe drive disappearance after any Supermicro Xeon D SuperServer BIOS change is saved
    to:
    How to workaround Samsung 960 PRO/EVO/SM961 M.2 NVMe drive disappearance after Supermicro Xeon D SuperServer BIOS 1.1c change or other boot changes
    and added an appropriate warning sentence to the related articles.

    Let's dive in here, first, check out the table of supported Samsung drive results:
    https://www.supermicro.com/products/nfo/M.2.cfm?pg=Vendors&show=SELECT&type=Samsung#Vendor

    Below is a cut-and-paste of the first column of that table, where I've done a Google search of samsung.com for you, which lands you on this general page:
    http://www.samsung.com/semiconductor/products/flash-storage/client-ssd/

    I've gone ahead with creating hyperlinks to each product page, and added the key bit of detail that Supermicro's doesn't say anywhere, these are SM951 and SM953 drives:

    The point here is that Supermicro does actually have some 2015 vintage Samsung M.2 NVMe drives on their supported SSD lists. This sets a precedent that some Samsung M.2 NVMe drives have been tested by Supermicro in the past. The hope here is that Supermicro would test and support newer Samsung M.2 NVMe drives such as the Samsung 960 series of products (960 PRO/960 EVO/SM961/PM961).


    Jan 15 2017 Update

    This fascinating and sad saga is roaring along, with no end yet in sight. I thing this particular comment by Aaron sums things up very nicely:

    Aaron Paul Braren • 9 hours ago
    I've been trawling forums all day and the problem is reported pretty much being reported everywhere with the new Samsung 960s, might not be anything to do with Supermicro at all. My money is on either a bad firmware bug on the 960 end or problems related to NVMe 1.2 protocol implementation on most chipsets (could explain why the 950 with NVMe 1.1 works so well).
    A lot of forum replies to posters with the problem are just putting it down to the usual newbie problems with NVMe booting/UEFI and stuff but it seems pretty obvious there is a common cause only affecting the 960s.

    I tested the bejesus out of my TN4T systems (bios 1.1b) last night and have simplified the observed symptoms to these:
    SSD cannot be booted from or chosen as a boot device
    any shutdown of the system will result in the SSD being missing next boot until you physically remove power (OS shutdown or IPMI behave exactly the same)
    SSD seems to work fine writing and reading data
    otherwise
    any reboot of the device (OS, IPMI, physical button) retains detecting the SSD next boot

    Having been burned by Samsung firmware issues on the 850 Evo my bet is they've messed something up rushing the release of this series. I fear this is going to be one of those problems that lurks around for a good 6-12 months before it gets fixed :(
    damnit

    and note that Aaron opened up Supermicro case# SM1701142769 as well.

    I've changed the article title accordingly, from:
    How to workaround Samsung 960 PRO/EVO/SM961/PM961 M.2 NVMe drive disappearance after Supermicro Xeon D SuperServer BIOS 1.1c change or other boot changes
    to:
    Samsung 960 PRO/EVO/SM961/PM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported

    The PM961 isn't in the title of this article, since I've not yet found reports of problems with that drive. I would expect it will show the same issue though, since it's closely related OEM twin called the 960 EVO is afflicted with drive invisibility problems.


    Jan 17 2017 Update

    It does not appear that a proper resolution to this is coming near term. This naturally has folks asking about return options. I found the following info about my Amazon order's return window:

    Samsung 960 EVO Series - 1TB PCIe NVMe - M.2 Internal SSD (MZ-V6E1T0BW)
    Sold by: Amazon.com LLC
    Return eligible through Jan 31, 2017


    Jan 26 2017 Update

    Samsung 960 EVO firmware 2B7QCXE7 arrives

    Report of a new firmware first arrived earlier today from Peter M. Weidich in Copenhagen, so nice of him to leave his comment below. This gives 960 EVO owners hope that we might be might have a fix. No word yet on the 960 PRO or SM961 though, and I'm not sure the OEM SM961 can be updated, at least not as easily. All of Samsung's Consumer SSD downloads start here:
    http://www.samsung.com/semiconductor/minisite/ssd/download/consumer.html

    I reached Samsung and explained that I hadn't heard back from support in days, and that I had just passed my 30 day RMA request Window. They took note of my SR# and said I should have an authorization in 3-5 days. I then finally got a tip from them that allowed me to reach them by phone, it goes like this:

    How to reach Samsung Consumer SSD Technical Support Representatives

    1. Dial 800-SAM-SUNG (800.726.7864)
    2. wait for automated attendant to stop talking
    3. say "Solid State Drive"
    4. wait for automated attendant to stop talking
    5. say "Representative"
    6. wait for automated attendant to stop talking
    7. say "Representative"

    I found the fake keyboard noises and long "thinking" intervals kind of funny, then waited patiently after a ring was heard, with eventual success.

    Here's what one Samsung Consumer SSD Technical Support said

    • the polite person I spoke with told me I was the first 960 owner reporting this issue that he heard of
    • the Samsung 960 EVO (any size) did get a new firmware called 2B7QCE7 this week, not sure which day, doesn't know if it will fix this
    • there won't likely be an ISO bootable DOS media firmware updater for a few more weeks
    • there was a boot ROM in the 950 PRO products for legacy systems that isn't included with the 960 line
    • this 2B7QCXE7 is not available as a separate download at the usual site:
      http://www.samsung.com/semiconductor/minisite/ssd/download/tools.html
    • this 2B7QCXE7 is available from within the new Samsung Magician, currently at 5.0.0.790, which it pulls down automatically from the web, direct download URL:
      http://www.samsung.com/semiconductor/minisite/ssd/downloads/software/Samsung_Magician_Installer.zip
    • Samsung doesn't supply Release Notes for consumer drive firmware

    I hope to try 2B7QCE7 2B7QCXE7 (strike-thru correction made on 1/29/2017) on one of my two 1TB 960 EVO SSDs tonight, and report back here with how it goes. My focus is primarily on whether the vanishing-drive after BIOS change (or graceful power-down) issue goes away. If it does, I would think it's highly likely the other 960 EVO owners reporting similar issues on various brands of motherboards will also enjoy the same easy fix. But I'm getting ahead of myself...

    888296e943fed7788a8a1ce1e782b1a233960f80f8ef9e763141730de9e92a31
    Screenshot courtesy of Peter M. Weidich

    Jan 27 2017 Update

    I have not been able to reproduce the loss of drive visibility in the slightly different drive configuration my system was in. This has meant it's taking me more time to get this test/upgrade fw/retest done than I had thought it would. I will have an update to this article later today.


    Jan 28 2017 Update

    Still not able to reproduce the issue reliably, so I didn't flash my firmware yet. But preliminarily, there appears to be good news reported by Dan below:

    Updated Samsung 960s firmware with magician software today, now 960 nvme persisted through non-power cycling reboot. Still some issues with use as boot device though.

    No word on which Samsung 960 he's referring to yet, 960 EVO or 960 PRO? I asked, awaiting response...

    As for folks with home vSAN aspirations, I never claimed using consumer drives for such things would be a good idea, see Dispelling myths about VSAN and flash. for details about Power Loss Protection.

    Samsung 960s should be fantastic for straight up ultra-fast VMFS datastores, assuming this firmware fixes arrive for both drives, to hopefully resolve these early adopter pains only some folks are experiencing. I'm basing this on my own Samsung 950 PRO 512GB experience that been performing admirably, holding up to a solid year of vSphere 6.0/6.5 abuse, see also:


    Jan 29 2017 Update

    The original article above erroneously stated the new firmware was 2B7QCE7, that was a typo, it's actually 2B7QCXE7, with all instances above now fixed.

    Good news (preliminarily)

    As seen on video below!
    Today, I was able to replicate the drive disappearance after a simple reboot, when using ESXi 6.5 booted from USB. I then recorded video, performing the following steps, uncut, and the only edit was to 5X speed-up reboot segments that had no voice-over anyway:

    1. shut the Xeon D system / ESXi 6.5 system down gracefully
    2. removed all wall power for about 25 seconds
    3. restored power
    4. powered up
    5. showed 960 EVO was now visible again in ESXi 6.5
    6. rebooted over to a fresh copy of Windows 10
    7. downloaded and installed Samsung Magician 5
    8. updated the firmware on one 960 EVO 1TB from 1B7QCXE7 to 2B7QCXE7
    9. when prompted to shut down afterward, I did so
    10. powered up, verified 2B7QCXE7 in Magician (it didn't show right away)
    11. rebooted to ESXi 6.5
    12. my VMFS datastore on my Samsung 960 EVO 1TB was visible
    13. rebooted, drive still visible, success!
    14. rebooted, made BIOS change, drive still visible, success!

    Not as good news

    I still have not heard if a firmware update is available for the Samsung 960 PRO owners, and the outlook for SM961 owners afflicted is may be glum, since historically Magician won't update the firmware on OEM drives.

    Next steps

    I'm on a plane early tomorrow and away all week, so I won't be deciding on processing those 2 RMAs for my 960 EVOs anytime all that soon. I'll test a few more reboots tonight to continue to build confidence that this issue is resolved, updating the article if something new crops up.

    Ideally, I'll sneak in some boot from NVMe tests as well, after first evacuating all the data off that VMFS datastore that's on there now. Luckily, that's easy with vSphere's Storage vMotion.

    I'll also leave my other 960 at the original 1B7QCXE7 firmware for now, in case I need to test with Angelbird Wings or (new/just received) Amfeltec 2 way and 1 way riser cards later on. Even if those do work-around the issue though, I'd much prefer a firmware fix, of course.

    1B7QCXE7-on-Samsung-960-EVO-1TB-at-TinkerTry-2017-01-29
    Before the upgrade, my Samsung 960 EVO 1TB was at 1B7QCXE7.
    2B7QCXE7-on-Samsung-960-EVO-1TB-at-TinkerTry-2017-01-29
    After the upgrade, my Samsung 960 EVO 1TB is at 2B7QCXE7.

    Video #2

    Samsung 960 EVO firmware update from 1B7QCXE7 to 2B7QCXE7 seems to resolve disappearance problem

    Jan 30 2017 Update

    I used Veeam Agent for Microsoft Windows (beta) to restore my fresh Windows 10 installation onto the 960 EVO 1TB, and yes, boot from NVMe is working fine on this SuperServer, this is good! Since the firmware upgrade, no further incidents of drive disappearance have been seen.

    2017-01-30_2-01-52
    Yep, it's bootable!
    2017-01-30_2-18-15
    2017-01-30_2-16-37
    performing well, with Samsung's NVMe driver 2.1 installed, PC cover off w/ no airflow across M.2
    2017-01-30_8-05-08
    performing well, with Samsung's NVMe driver 2.1 installed, PC cover on, fan speed Full
    2017-01-30_8-45-07
    performing well, with Samsung's NVMe driver 2.1 installed, PC cover on, fan speed Standard

    With many reboots and BIOS changes and reboots tested without incident, including a full test of boot from NVMe functionality, I left it was safe to make the following additional article change in the title, from:

    • Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported

    to:

    • Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported, firmware 2B7QCXE7 appears to fix EVO

    Feb 10 2017 Update

    The ISO way to save your day is here! Yes, Samsung has now published the 960 EVO's firmware update in ISO form, so no need to boot your VMware ESXi system over to Windows just to get your get your 960 EVO firmware updated Samsung_SSD_960_EVO_2B7QCXE7. Instead, you can get your Samsung_SSD_960_EVO_2B7QCXE7.iso from the usual samsung.com/semiconductor/minisite/ssd/download/tools.html site that looks like this:

    2017-02-10_20-44-48

    Alternatively, you can use these direct download links:

    I have not yet had a chance to test this ISO, but glad I kept my 2nd 1TB 960 EVO at the older firmware so I can give this method a try!

    Still no info on 960 PRO firmware.


    Feb 14 2017 Update

    10026863

    I have ordered an AOC-SLG3-2M2 to add to my home lab testbed:

     
     
     
    I also have this very kind comment that I shared on Twitter:

    826102951726952450

    "Thanks to Paul and http://Tinkertry.com for being the only place on the internet that was talking about this!" (960 EVO firmware fixes)
    Jan 30 2017 by Dan Emmelhainz at TinkerTry here


    Jul 24 2017 Update

    I change the article's title from

    Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported, firmware 2B7QCXE7 appears to fix EVO

    to

    Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance-after-reboot-or-bios-change workaround is to power cycle, bug fixed in PRO/EVO firmware 2B6QCXP7/2B7QCXE7

    since months have testing seem to confirm that the firmware fixes really worked.

    See also:

    "Update Samsung 960 EVO firmware with ISO for bootable USB drive over iKVM, followed by speed tests" published by Paul Braren on YouTube on Apr 27 2017

    See also at TinkerTry

    first-look-samsung-960-evo-m2-nvme-temps
    ubiquiti-mpower-pro-8-port-outlet-measures-watts