Samsung 960 PRO/EVO/SM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported

Posted by Paul Braren on Jan 2 2017 (updated on Jan 17 2017) in
  • Storage
  • HowTo
  • This article has had several significant additions since initially published, to incorporate new findings from amazingly helpful readers who commented below.

    I'm apparently starting out 2017 smoothing out bumps-in-the-road that a few home lab enthusiasts ran into recently during their normally fun holiday season of home lab rebuilds. 'Twas the season for many IT workers to be off from work, myself included. This is my second such article this year, but at least we've put these issues to rest quickly, together. It's times like these that I'm so glad I have a web site with comments, so we can all spend more time working and playing than fixing.

    Backstory

    Supermicro-M.2-List
    Click on image to launch full view of the complete table.

    Turns out Supermicro is making progress with M.2 NVMe SSD support for Xeon D by posting links to their Tested M.2 List on their Xeon D Product Pages. Even though the list only includes SSDs they've tested by Intel, Micron, and Toshiba, it's still a baby step forward. Supermicro support folks have actually told me that they didn't support M.2 NVMe drives at all, only the much slower AHCI M.2 drives or SATA3 SSDs. Yikes! Of course, their scripted words were understandably horrifying for me personally, given how many turn-key Bundle 2 systems had already shipped. So I politely pointed out how wonderfully well the Samsung 950 PRO was working with my SYS-5028D-TN4T, for many months, with any OS I threw at it. I even shared some of the fastest benchmarks out there that featured there products:

    how-to-boot-from-NVMe-google-search-Jan-02-2017

    Despite all that, and the year that has gone by since, Supermicro apparently still has an inexplicable reluctance to testing Samsung M.2 NVMe drives, which might well be the most popular M.2 NVMe drives on Amazon for over a year now. It would seem they could sure use a friend at Samsung, and a budget to give them time to expand well beyond their current list of only SATA3 M.2 devices, and half-height PCIe devices. I would propose they divert most of those testing funds toward the forward-looking M.2 NVMe form-factor instead, since apparently it's needed now that we've seen this strange issue crop up. It still makes me nervous that no M.2 NVMe devices have made their list yet.

    First sign of a problem, along with a power-cycle fix (phew!)

    Robert from Austria reporting a the strange Samsung 960 PRO disappearance behavior, as one of the very first reports of anybody getting their hands on one of these world's fastest "gumstick" storage devices, anywhere. Well, somebody who wasn't a lucky blogger receiving an early review unit anyway. Just one data point and something for me to watch, but with my own early pre-order messed up and delayed, so all I could really do was wait and see:

    More reports came in

    wdmia reporting his 960 EVO experiencing the same issue, with the same fix working. Now this really had my attention.

    TheGrayGhost reporting his Supermicro case #SM1701012591, a 960 EVO with a Flex ATX Xeon D motherboard with the latest BIOS 1.0b, same issue, same fix. Dang, this is all Xeon D, Mini-ITX and Flex ATX models, aka, all SuperServers with X10SDV in the motherboard model number.

    The problem

    Supermicro SuperServer owner goes into their BIOS and saves their changes. The OS then boots, but doesn't "see" the 960 PRO or EVO drives, at all.

    The workaround

    After saving any BIOS changes, don't just power down, but also remove the power cord for a few seconds. When plugging back in and powering up, the drives are visible to whatever OS you're running again, with no data lost.

    This is more like a workaround, while we wait for a proper fix that will likely come in the form of a BIOS update at some future date.

    This is one of the stranger issues I've seen, and I don't have any technical explanation for it, that's be pure conjecture. This is a relatively minor inconvenience, but I'd say it could be pretty darn serious if you don't have a way to remove all power from a remote system!

    Of course, I've reported this list of reports, and the below video, to Supermicro.

    The fix

    There is currently no known fix, it's likely a proper fix would involve installing a future BIOS update, whatever comes after BIOS 1.1c for Mini ITX models, or after BIOS 1.0b for Flex ATX models.

    Video

    I was able to easily replicate the issue today using my Supermicro SuperServer SYS-5028D-TN4T system today, as seen in the below video. I'm currently using BIOS 1.1c and IPMI 3.46. No data loss, only time and a bit of product confidence lost.

    How to fix Samsung 960 PRO/EVO M.2 NVMe drive disappearance after any Supermicro Xeon D BIOS change.

    Jan 07 2017 Update

    I am so glad I published this article, because comments are coming in that this problem seems to be more serious than I initially thought. Wanted to be sure folks see what blogthis wrote at 8:06pm ET today:

    I have additional serious information on this M2 Samsung 960 PRO NVMe issue (1TB).
    Important: This is WITHOUT ever entering the BIOS setup screen whatsoever.
    Reproduced on: X10SDV-TLN4F (SYS-5028D-TN4T-12C). BIOS: 1.1c. IPMI: 3.46.
    --Scenario-1:--
    Configure internal SATA drive to boot any OS, in my case Win10x64; graceful shutdown.
    Insert a bootable USB stick, in my situation one that has ESXi 6 installed.
    BIOS auto-inserts the new boot device in its list, and changes its boot order to accommodate.
    System hangs, reboots, then exhibits the problem discussed in this article (NVMe vanishes until unplug-replug physical power).
    --Scenario-2:--
    Configure IPMI to boot from external ISO, in my case IPMI 3.46 uses HTML5 Net Path.
    BIOS auto-changes the boot device name and settings in its boot device list.
    System then exhibits the problem discussed in this article (NVMe vanishes until unplug-replug physical power).
    --Summary--
    You can reproduce this issue multiple ways, WITHOUT ever even entering the BIOS setup. I feel this makes this issue worthy of escalation from a mere inconvenience to a seriously high priority issue. [edit: IPMI typo v1.46 corrected to v3.46]


    Jan 08 2017 Update

    Gert commented:

    Thanks to reporting what you have discovered. I can also report, that it is not possible to run vSAN with Samsung SM961 MZVKW512HMJP. It is possible to configure vSAN and everything is reported okay, but when you try to created a VM and install OS, the connection to the NVMe drive is unstable/lost, so the installation or operation is hanging/stalled. When you use the NVMe drive as a a VMFS partition every works fine until you shut down the server and the NVMe drive disappear. I don't have the time to go through the log files, but I agree, that Supermicro has serious issues with NVMe drives from Samsung, so I warns against buying the drives for Supermicro servers. Supermicro has confirmed in an email to me that the drive is supported :-(

    This is obviously a serious bug here. While it's unlikely full vSAN support (see VMware vSAN HCL) will arrive for consumer drives like the 960 series because of the generally lower TBW ratings than enterprise drives and lack of supercapacitors, the sort of nasty behavior that Gert reports makes even simple experimentation with an unsupported vSAN quite challenging.

    There's more from blogthis, who also wrote:

    Thank you for that, hopefully this helps too..

    Did you catch another reader's post that m.2 NVMe SSD's are indeed supported as per this official SuperMicro link(?):
    https://www.supermicro.nl/products/nfo/M.2.cfm

    Duplicate info on their USA/Global site:
    https://www.supermicro.com/products/nfo/M.2.cfm

    And yes, SuperMicro is now listing official support for Samsung-brand m.2 NVMe drives:
    https://www.supermicro.com/products/nfo/M.2.cfm?pg=Vendors&show=SELECT&type=Samsung#Vendor

    (You may want to pass along to your SuperMicro contact, that it appears SuperMicro is now specifically ADVERTISING Samsung m.2 NVMe compatibility.... so hopefully that lights a fire and motivates them to push this issue up their support-chain for the greater good).

    Edit: The direct-link to Samsung m.2 NVMe drives has an "&" in their URL so it won't save here properly, but if you click the link, then the "Samsung" link for "Show Qualified M.2-SSD SKUs" on that page, you'll see the list..

    AND: On that same page, note Supermicro has listed in the "..Systems which support M.2-NVMe-SSD: (Mini-Tower Systems.. SYS-5028D-TN4T).." I think that says pretty clearly this is intended to work.

    I've changed the article title accordingly, from:
    How to fix Samsung 960 PRO/EVO M.2 NVMe drive disappearance after any Supermicro Xeon D SuperServer BIOS change is saved
    to:
    How to workaround Samsung 960 PRO/EVO/SM961 M.2 NVMe drive disappearance after Supermicro Xeon D SuperServer BIOS 1.1c change or other boot changes
    and added an appropriate warning sentence to the related articles.

    Let's dive in here, first, check out the table of supported Samsung drive results:
    https://www.supermicro.com/products/nfo/M.2.cfm?pg=Vendors&show=SELECT&type=Samsung#Vendor

    Below is a cut-and-paste of the first column of that table, where I've done a Google search of samsung.com for you, which lands you on this general page:
    http://www.samsung.com/semiconductor/products/flash-storage/client-ssd/

    I've gone ahead with creating hyperlinks to each product page, and added the key bit of detail that Supermicro's doesn't say anywhere, these are SM951 and SM953 drives:

    The point here is that Supermicro does actually have some 2015 vintage Samsung M.2 NVMe drives on their supported SSD lists. This sets a precedent that some Samsung M.2 NVMe drives have been tested by Supermicro in the past. The hope here is that Supermicro would test and support newer Samsung M.2 NVMe drives such as the Samsung 960 series of products (960 PRO/960 EVO/SM961/PM961).


    Jan 15 2017 Update

    This fascinating and sad saga is roaring along, with no end yet in sight. I thing this particular comment by Aaron sums things up very nicely:

    Aaron Paul Braren • 9 hours ago
    I've been trawling forums all day and the problem is reported pretty much being reported everywhere with the new Samsung 960s, might not be anything to do with Supermicro at all. My money is on either a bad firmware bug on the 960 end or problems related to NVMe 1.2 protocol implementation on most chipsets (could explain why the 950 with NVMe 1.1 works so well).
    A lot of forum replies to posters with the problem are just putting it down to the usual newbie problems with NVMe booting/UEFI and stuff but it seems pretty obvious there is a common cause only affecting the 960s.

    I tested the bejesus out of my TN4T systems (bios 1.1b) last night and have simplified the observed symptoms to these:
    SSD cannot be booted from or chosen as a boot device
    any shutdown of the system will result in the SSD being missing next boot until you physically remove power (OS shutdown or IPMI behave exactly the same)
    SSD seems to work fine writing and reading data
    otherwise
    any reboot of the device (OS, IPMI, physical button) retains detecting the SSD next boot

    Having been burned by Samsung firmware issues on the 850 Evo my bet is they've messed something up rushing the release of this series. I fear this is going to be one of those problems that lurks around for a good 6-12 months before it gets fixed :(
    damnit

    and note that Aaron opened up Supermicro case# SM1701142769 as well.

    I've changed the article title accordingly, from:
    How to workaround Samsung 960 PRO/EVO/SM961/PM961 M.2 NVMe drive disappearance after Supermicro Xeon D SuperServer BIOS 1.1c change or other boot changes
    to:
    Samsung 960 PRO/EVO/SM961/PM961 M.2 NVMe SSD disappearance workaround is to power cycle, boot from NVMe problems also reported

    The PM961 isn't in the title of this article, since I've not yet found reports of problems with that drive. I would expect it will show the same issue though, since it's closely related OEM twin called the 960 EVO is afflicted with drive invisibility problems.


    Jan 17 2017 Update

    It does not appear that a proper resolution to this is coming near term. This naturally has folks asking about return options. I found the following info about my Amazon order's return window:

    Samsung 960 EVO Series - 1TB PCIe NVMe - M.2 Internal SSD (MZ-V6E1T0BW)
    Sold by: Amazon.com LLC
    Return eligible through Jan 31, 2017


    See also at TinkerTry

    ubiquiti-mpower-pro-8-port-outlet-measures-watts