get:
Show a patch.

patch:
Update a patch.

put:
Update a patch.

GET /api/patches/2224/?format=api
HTTP 200 OK
Allow: GET, PUT, PATCH, HEAD, OPTIONS
Content-Type: application/json
Vary: Accept

{
    "id": 2224,
    "url": "https://patches.dpdk.org/api/patches/2224/?format=api",
    "web_url": "https://patches.dpdk.org/project/dpdk/patch/CAF28U9ORGNY7=QUKrd-ZCGn6HqBw7h6NE7wxUszf6WxOY18geg@mail.gmail.com/",
    "project": {
        "id": 1,
        "url": "https://patches.dpdk.org/api/projects/1/?format=api",
        "name": "DPDK",
        "link_name": "dpdk",
        "list_id": "dev.dpdk.org",
        "list_email": "dev@dpdk.org",
        "web_url": "http://core.dpdk.org",
        "scm_url": "git://dpdk.org/dpdk",
        "webscm_url": "http://git.dpdk.org/dpdk",
        "list_archive_url": "https://inbox.dpdk.org/dev",
        "list_archive_url_format": "https://inbox.dpdk.org/dev/{}",
        "commit_url_format": ""
    },
    "msgid": "<CAF28U9ORGNY7=QUKrd-ZCGn6HqBw7h6NE7wxUszf6WxOY18geg@mail.gmail.com>",
    "list_archive_url": "https://inbox.dpdk.org/dev/CAF28U9ORGNY7=QUKrd-ZCGn6HqBw7h6NE7wxUszf6WxOY18geg@mail.gmail.com",
    "date": "2015-01-10T19:26:03",
    "name": "[dpdk-dev] rte_mempool_create fails with ENOMEM",
    "commit_ref": null,
    "pull_url": null,
    "state": "not-applicable",
    "archived": true,
    "hash": "34d8e685440f626d2c2e12f22e00aba85f602178",
    "submitter": {
        "id": 151,
        "url": "https://patches.dpdk.org/api/people/151/?format=api",
        "name": "Liran Zvibel",
        "email": "liran@weka.io"
    },
    "delegate": null,
    "mbox": "https://patches.dpdk.org/project/dpdk/patch/CAF28U9ORGNY7=QUKrd-ZCGn6HqBw7h6NE7wxUszf6WxOY18geg@mail.gmail.com/mbox/",
    "series": [],
    "comments": "https://patches.dpdk.org/api/patches/2224/comments/",
    "check": "pending",
    "checks": "https://patches.dpdk.org/api/patches/2224/checks/",
    "tags": {},
    "related": [],
    "headers": {
        "Return-Path": "<dev-bounces@dpdk.org>",
        "X-Original-To": "patchwork@dpdk.org",
        "Delivered-To": "patchwork@dpdk.org",
        "Received": [
            "from [92.243.14.124] (localhost [IPv6:::1])\n\tby dpdk.org (Postfix) with ESMTP id 6E1D658DD;\n\tSat, 10 Jan 2015 20:26:09 +0100 (CET)",
            "from mail-qc0-f169.google.com (mail-qc0-f169.google.com\n\t[209.85.216.169]) by dpdk.org (Postfix) with ESMTP id 248C65684\n\tfor <dev@dpdk.org>; Sat, 10 Jan 2015 20:26:04 +0100 (CET)",
            "by mail-qc0-f169.google.com with SMTP id w7so13423215qcr.0\n\tfor <dev@dpdk.org>; Sat, 10 Jan 2015 11:26:03 -0800 (PST)",
            "by 10.140.29.101 with HTTP; Sat, 10 Jan 2015 11:26:03 -0800 (PST)"
        ],
        "X-Google-DKIM-Signature": "v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20130820;\n\th=x-gm-message-state:mime-version:in-reply-to:references:date\n\t:message-id:subject:from:to:cc:content-type\n\t:content-transfer-encoding;\n\tbh=aTtucESPmMf8J/ohhapX236s1ql+xEHThNWCokwH6fY=;\n\tb=Umeb5iidujIMFhvkkuDXM8cWcVSwypfL54Z+g1gVb7JHhNu0nQ3p9+2zIuXzy5oowW\n\txMbZ+bNs2q2OXGgncacKYAULfJ9WRINIT3yblaRYWwxQtCpyr9P1HyGamgGUWhUxSAD3\n\tk8cibGuYe6mLOvckHWP3h3OeEs+3PPwHv1RSKtvg+Yuk0Us9zPmE9GvKWn7J+5s3jEeu\n\tG2ptc2hKmyP9Ko/KN+oEGXMsuGGbF18cf/wFC0MiOHt1anfKzBdR8XlYAIo186y3V+uE\n\tWP05p+BUdIXbBHVBOFLHT/B6Pes95DeUFJe5VnH1S7rJLw1f3t61vgnkGCQ6Mw3s6tZT\n\tJN/g==",
        "X-Gm-Message-State": "ALoCoQnnDOKdkKvE+jtEJrGwur7Hkc0g4VfyDoUA4AmiBPn9oIuEL0agd8Ga7ao9mEVDUBbj0zXv",
        "MIME-Version": "1.0",
        "X-Received": "by 10.140.48.197 with SMTP id o63mr35550762qga.81.1420917963453; \n\tSat, 10 Jan 2015 11:26:03 -0800 (PST)",
        "In-Reply-To": "<CAHW=9Pvwze9RJ2-Km6-HRq7QjxeYkq+tagT7g-w73k_DaVT1FQ@mail.gmail.com>",
        "References": "<CAHW=9PuuEjnA5jnuGaHkB28aaznrJdNysh=398oE3LwOFovQQg@mail.gmail.com>\n\t<2601191342CEEE43887BDE71AB977258213C2046@IRSMSX105.ger.corp.intel.com>\n\t<2601191342CEEE43887BDE71AB977258213C2099@IRSMSX105.ger.corp.intel.com>\n\t<CAHW=9PsyYKrV1MBXT8pWYk4jN4_UChk7zbbrB17q3uEpcfETew@mail.gmail.com>\n\t<CAOaVG17AQ1Bipo0kwuDPja=j80ME6bmu_Lr-VT3J57zd2qYH6Q@mail.gmail.com>\n\t<CAHW=9PtiuHN=d5J1aMbp_T9YUVMw3Bu8s7zS_83TY4J0LE=VUQ@mail.gmail.com>\n\t<CAHW=9Pvwze9RJ2-Km6-HRq7QjxeYkq+tagT7g-w73k_DaVT1FQ@mail.gmail.com>",
        "Date": "Sat, 10 Jan 2015 21:26:03 +0200",
        "Message-ID": "<CAF28U9ORGNY7=QUKrd-ZCGn6HqBw7h6NE7wxUszf6WxOY18geg@mail.gmail.com>",
        "From": "Liran Zvibel <liran@weka.io>",
        "To": "Newman Poborsky <newman555p@gmail.com>, \"dev@dpdk.org\" <dev@dpdk.org>",
        "Content-Type": "text/plain; charset=UTF-8",
        "Content-Transfer-Encoding": "quoted-printable",
        "Subject": "Re: [dpdk-dev] rte_mempool_create fails with ENOMEM",
        "X-BeenThere": "dev@dpdk.org",
        "X-Mailman-Version": "2.1.15",
        "Precedence": "list",
        "List-Id": "patches and discussions about DPDK <dev.dpdk.org>",
        "List-Unsubscribe": "<http://dpdk.org/ml/options/dev>,\n\t<mailto:dev-request@dpdk.org?subject=unsubscribe>",
        "List-Archive": "<http://dpdk.org/ml/archives/dev/>",
        "List-Post": "<mailto:dev@dpdk.org>",
        "List-Help": "<mailto:dev-request@dpdk.org?subject=help>",
        "List-Subscribe": "<http://dpdk.org/ml/listinfo/dev>,\n\t<mailto:dev-request@dpdk.org?subject=subscribe>",
        "Errors-To": "dev-bounces@dpdk.org",
        "Sender": "\"dev\" <dev-bounces@dpdk.org>"
    },
    "content": "Hi Newman,\n\nThere are two options, either one of your pools is very large, and\njust does not fit in half of the memory,\nso if the physical memory must be split it just can never work, or\nwhat you’re seeing is localized to your\nenvironment, and just when allocating from both NUMAs the huge pages\njust happen to be to scattered\nfor your pools to be allocated.\n\nIn any case, we also have to deal with large pools that don’t always\nfit into consecutive huge pages as\nallocated by the kernel. I have created a small patch to DPDK itself,\nthen some more code that can live\nas part of the dpdk application that does the scattered allocation.\n\nI’m going to send both parts here (the change to the DPDK and the user\npart). I don’t know what are the\nrules that allow pushing to the repository, so I won’t try to do so.\n\nFirst — the DPDK patch, that just makes sure that the huge pates are\nmapped in a continuous virtual memory,\nand then the memory segments are allocated continuously in virtual\nmemory: I’m attaching full mbox content to make it easier\nfor you to use if you’d like. I created it against 1.7.1, since that\nis the version we’re  using. If you’d like, I can also create it\nagainst 1.8.0\n\n                        vma_len = num_pages * hugepage_sz;\n\n                        /* get the biggest virtual memory area up to\n@@ -1268,6 +1277,16 @@ rte_eal_hugepage_init(void)\n                        new_memseg = 1;\n\n                if (new_memseg) {\n+#ifdef RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR\n+                       if (0 <= j) {\n+                               RTE_LOG(DEBUG, EAL, \"Closing memory\nsegment #%d(%p) vaddr is %p phys is 0x%lx size is 0x%lx \"\n+                                       \"which is #%ld pages next\nvaddr will be at 0x%lx\\n\",\n+                                       j,&mcfg->memseg[j],\n+                                       mcfg->memseg[j].addr,\nmcfg->memseg[j].phys_addr, mcfg->memseg[j].len,\n+                                       mcfg->memseg[j].len /\nmcfg->memseg[j].hugepage_sz,\n+                                       mcfg->memseg[j].addr_64 +\nmcfg->memseg[j].len);\n+                       }\n+#endif\n                        j += 1;\n                        if (j == RTE_MAX_MEMSEG)\n                                break;\n--\n1.9.3 (Apple Git-50)\n\n================================================================\n\nThen there is the dpdk-application library part that implements the\nstruct rte_mempool *scattered_mempool_create(uint32_t elt_size,\nuint32_t elt_num, int32_t socket_id,\n                                             rte_mempool_ctor_t\n*mp_init, void *mp_init_arg,\n                                             rte_mempool_obj_ctor_t\n*obj_init, void *obj_init_arg)\n\ninterface. If you would like, I can easily break the different\nfunctions into their right place in the rte_memseg and rte_mempool\nDPDK modules and have it included as another interface of the DPDK\nlibrary (as suggested by Konstantin below)\n\n=====================================================\nstatic inline int  is_memseg_valid(struct rte_memseg * free_memseg,\nsize_t requested_page_size,\n                                   int socket_id)\n{\n        if (free_memseg->len == 0) {\n                return 0;\n        }\n\n        if (socket_id != SOCKET_ID_ANY &&\n            free_memseg->socket_id != SOCKET_ID_ANY &&\n            free_memseg->socket_id != socket_id) {\n                RTE_LOG(DEBUG, USER1, \"memseg goes not qualify for\nsocked_id, requested %d got %d\",\n                         socket_id, free_memseg->socket_id);\n                return 0;\n        }\n\n        if (free_memseg->len < requested_page_size) {\n                RTE_LOG(DEBUG, USER1, \"memseg too small. len %lu <\nrequested_page_size %lu\",\n                         free_memseg->len, requested_page_size);\n                return 0;\n        }\n\n\n        if (free_memseg->hugepage_sz != requested_page_size) {\n                RTE_LOG(DEBUG, USER1, \"memset hugepage size !=\nrequested page size %lu != %lu\",\n                         free_memseg->hugepage_sz,\n                         requested_page_size);\n                return 0;\n        }\n\n        return 1;\n}\n\nstatic int try_allocating_memseg_range(struct rte_memseg *\nfree_memseg, int start,\n                                       int requested_page_size, size_t\nlen, int socket_id)\n{\n        int i;\n        for (i = start; i < RTE_MAX_MEMSEG; i++) {\n                if (free_memseg[i].addr == NULL) {\n                        return -1;\n                }\n\n                if (!is_memseg_valid(free_memseg +i,\nrequested_page_size, socket_id)) {\n                        return -1;\n                }\n\n                if ((start != i) &&\n                    ((char *)free_memseg[i].addr !=\n(char*)free_memseg[i-1].addr + free_memseg[i-1].len)) {\n                        RTE_LOG(DEBUG, USER1, \"Looking for cont memseg range. \"\n                                 \"[%d].vaddr %p != [%d].vaddr %p +\n[i-1].len %lu == %p\",\n                                 i, free_memseg[i].addr, i-1,\nfree_memseg[i-1].addr,\n                                 free_memseg[i-1].len,\n                                 (char *)(free_memseg[i-1].addr) +\nfree_memseg[i-1].len);\n                        return -1;\n                }\n\n                if ((free_memseg[i].len < len) && ((free_memseg[i].len\n% requested_page_size) != 0)) {\n                RTE_LOG(DEBUG, USER1, \"#%d memseg length not a\nmultplie of page size, or last.\"\n                         \" len %lu len %% requsted_pg_size %lu,\nrequested_pg_sz %d\",\n                         i, free_memseg[i].len, free_memseg[i].len %\nrequested_page_size, requested_page_size);\n                return -1;\n                }\n\n\n                if (len <= free_memseg[i].len) {\n                        RTE_LOG(DEBUG, USER1, \"Successfuly finished\nlookng for memsegs. remaining req. \"\n                                 \"len %lu seg_len %lu, start %d i %d\",\n                                 len, free_memseg[i].len, start, i);\n                        return i - start +1;\n                }\n\n                if (i == start)  {\n                        // We may not start on the beginning, have to\nmove to next pagesize alignment...\n                        char * aligned_vaddr =\nRTE_PTR_ALIGN_CEIL(free_memseg[i].addr, requested_page_size);\n                        size_t diff = (size_t)(aligned_vaddr - (char\n*)free_memseg[i].addr);\n                        if ((free_memseg[i].len - diff) %\nrequested_page_size != 0) {\n                                RTE_LOG(ERR, USER1, \"BUG! First\nsegment is not page aligned! vaddr %p aligned \"\n                                           \"vaddr %p diff %lu len %lu,\nlen - diff %lu, \"\n                                           \"(len%%diff)/%d == %lu\",\n                                           free_memseg[i].addr,\naligned_vaddr, diff, free_memseg[i].len,\n                                           free_memseg[i].len - diff,\n                                           requested_page_size,\n                                           (free_memseg[i].len - diff)\n% requested_page_size);\n                                return -1;\n                        } else if (0 == free_memseg[i].len - diff) {\n                                RTE_LOG(DEBUG, USER1, \"After\nalignment, first memseg is empty!\");\n                                return -1;\n                        }\n\n                        RTE_LOG(DEBUG, USER1, \"First memseg gives\n(after alignment) len %lu out of potential %lu\",\n                                 (free_memseg[i].len - diff),\nfree_memseg[i].len);\n                        len -= (free_memseg[i].len - diff);\n                }\n                len -= free_memseg[i].len;\n        }\n\n        return -1;\n}\n\n\n/**\n * Will register several memory zones, in continueues virtual\naddresses of large size.\n * All first memzones will use full pages, only the last memzone may\nrequest less than a full hugepage.\n *\n * It will go through all the free memory segments, once it finds a\nmemsegment with full hugepages, it\n * will check wheter it can start allocating from that memory segment on.\n */\nstatic const  struct rte_memzone *\nmemzone_reserve_multiple_cont_mz(const char * basename, size_t *\nzones_len, size_t len, int socket_id,\n                                 unsigned flags, unsigned align)\n{\nstruct rte_mem_config *mcfg;\n        const struct rte_memzone * ret = NULL;\n        size_t requested_page_size;\n        int i;\n        struct rte_memseg * free_memseg = NULL;\n        int first_memseg = -1;\n        int memseg_count = -1;\n\n        mcfg = rte_eal_get_configuration()->mem_config;\n        free_memseg = mcfg->free_memseg;\n\n        RTE_LOG(DEBUG, USER1, \"mcfg is at %p free_memseg at %p memseg\nat %p\", mcfg, mcfg->free_memseg, mcfg->memseg);\n\n        for (i = 0; i  < 10 && (free_memseg[i].addr != NULL); i++) {\n                RTE_LOG(DEBUG, USER1, \"free_memseg[%d] : vaddr 0x%p\nphys_addr 0x%p len %lu pages: %lu [0x%lu]\", i,\n                         free_memseg[i].addr,\n                         (void*)free_memseg[i].phys_addr,\nfree_memseg[i].len, free_memseg[i].len/free_memseg[i].hugepage_sz,\n                         free_memseg[i].hugepage_sz);\n        }\n\n\n        for (i = 0; i  < 10 && (mcfg->memseg[i].addr != NULL); i++) {\n                RTE_LOG(DEBUG, USER1, \"memseg[%d] : vaddr 0x%p\nphys_addr 0x%p len %lu pages: %lu [0x%lu]\", i,\n                         mcfg->memseg[i].addr,\n                         (void*)mcfg->memseg[i].phys_addr, mcfg->memseg[i].len,\n                         mcfg->memseg[i].len/mcfg->memseg[i].hugepage_sz,\n                         mcfg->memseg[i].hugepage_sz);\n        }\n\n        *zones_len = 0;\n\n        if (mcfg->memzone_idx >= RTE_MAX_MEMZONE) {\n                RTE_LOG(DEBUG, USER1, \"No more room for new memzones\");\n                return NULL;\n        }\n\n        if ((flags & (RTE_MEMZONE_2MB | RTE_MEMZONE_1GB)) == 0) {\n                RTE_LOG(DEBUG, USER1, \"Must request either 2MB or 1GB pages\");\n                return NULL;\n        }\n\n        if ((flags & RTE_MEMZONE_1GB ) && (flags & RTE_MEMZONE_2MB)) {\n                RTE_LOG(DEBUG, USER1, \"Cannot request both 1GB and 2MB pages\");\n                return NULL;\n        }\n\n        if (flags & RTE_MEMZONE_2MB) {\n                requested_page_size = RTE_PGSIZE_2M;\n        } else {\n                requested_page_size = RTE_PGSIZE_1G;\n        }\n\n        if (len < requested_page_size) {\n                RTE_LOG(DEBUG, USER1, \"Requested length %lu is smaller\nthan requested pages size %lu\",\n                         len , requested_page_size);\n                return NULL;\n        }\n\n        ret = rte_memzone_reserve_aligned(basename, len, socket_id,\nflags, align);\n        if (ret != NULL) {\n                RTE_LOG(DEBUG, USER1, \"Normal\nrte_memzone_reserve_aligned worked!\");\n                *zones_len = 1;\n                return ret;\n        }\n\n        RTE_LOG(DEBUG, USER1, \"rte_memzone_reserve_aligned failed.\nWill have to allocate on our own\");\n        rte_rwlock_write_lock(&mcfg->mlock);\n\n        for (i = 0; i < RTE_MAX_MEMSEG; i++) {\n                if (free_memseg[i].addr == NULL) {\n                        break;\n                }\n\n                if (!is_memseg_valid(free_memseg +i,\nrequested_page_size, socket_id)) {\n                        continue;\n                }\n\n                memseg_count =\ntry_allocating_memseg_range(free_memseg, i, requested_page_size, len,\n                                                           socket_id);\n                if (0 < memseg_count ) {\n                        RTE_LOG(DEBUG, USER1, \"Was able to find\nmemsegments for zone! \"\n                                 \"first segment: %d segment_count %d len %lu\",\n                                 i, memseg_count, len);\n                        first_memseg = i;\n\n                        // Fix first memseg -- make sure it's page aligned!\n                        char * aligned_vaddr =\nRTE_PTR_ALIGN_CEIL(free_memseg[i].addr,\n\nrequested_page_size);\n                        size_t diff = (size_t)(aligned_vaddr - (char\n*)free_memseg[i].addr);\n                        RTE_LOG(DEBUG, USER1, \"Decreasing first\nsegment by %lu\", diff);\n                        free_memseg[i].addr = aligned_vaddr;\n                        free_memseg[i].phys_addr += diff;\n                        free_memseg[i].len -= diff;\n                        if ((free_memseg[i].phys_addr %\nrequested_page_size != 0)) {\n                                RTE_LOG(ERR, USER1, \"After aligning\nfirst free memseg, \"\n                                           \"physical address NOT page\naligned! %p\",\n                                           (void*)free_memseg[i].phys_addr);\n                                abort();\n                        }\n\n                        break;\n                }\n        }\n\n        if (first_memseg < 0) {\n                RTE_LOG(DEBUG, USER1, \"Could not find memsegs to\nallocate enough memory\");\n                goto out;\n        }\n\n        // now perform actual allocation.\n        if (mcfg->memzone_idx + memseg_count >= RTE_MAX_MEMZONE) {\n                RTE_LOG(DEBUG, USER1, \"There are not enough memzones\nto allocate. \"\n                         \"memzone_idx %d memseg_count %d max %s=%d\",\n                         mcfg->memzone_idx, memseg_count,\nRTE_STR(RTE_MAX_MEMZONE), RTE_MAX_MEMZONE);\n                goto out;\n        }\n\n        ret = &mcfg->memzone[mcfg->memzone_idx];\n        *zones_len = memseg_count;\n        for (i = first_memseg; i < first_memseg + memseg_count; i++) {\n                size_t allocated_length;\n                if (free_memseg[i].len <= len) {\n                        allocated_length = free_memseg[i].len;\n                } else {\n                        allocated_length = len;\n                }\n\n                struct rte_memzone * mz = &mcfg->memzone[mcfg->memzone_idx++];\n                snprintf(mz->name, sizeof(mz->name), \"%s%d\", basename,\ni - first_memseg);\n                mz->phys_addr   = free_memseg[i].phys_addr;\n                mz->addr        = free_memseg[i].addr;\n                mz->len         = allocated_length;\n                mz->hugepage_sz = free_memseg[i].hugepage_sz;\n                mz->socket_id   = free_memseg[i].socket_id;\n                mz->flags       = 0;\n                mz->memseg_id   = i;\n\n                free_memseg[i].len -= allocated_length;\n                free_memseg[i].phys_addr += allocated_length;\n                free_memseg[i].addr_64 += allocated_length;\n                len -= allocated_length;\n        }\n\n        if (len != 0) {\n                RTE_LOG(DEBUG, USER1, \"After registering all the\nmemzone, len is too small! Len is %lu\", len);\n                ret = NULL;\n                goto out;\n        }\nout:\n        rte_rwlock_write_unlock(&mcfg->mlock);\n        return ret;\n}\n\n\nstatic inline void build_physical_pages(phys_addr_t * phys_pages, int\nnum_phys_pages, size_t sz,\n                                        const struct rte_memzone * mz,\nint num_zones)\n{\n        size_t accounted_for_size =0;\n        int curr_page = 0;\n        int i;\n        unsigned j;\n\n        RTE_LOG(DEBUG, USER1, \"Phys pages are at %p 2M is %d mz\npagesize is %lu trailing zeros: %d\",\n                 phys_pages, RTE_PGSIZE_2M, mz->hugepage_sz,\n__builtin_ctz(mz->hugepage_sz));\n\n        for (i = 0; i < num_zones; i++) {\n                size_t mz_remaining_len = mz[i].len;\n                for (j = 0; (j <= mz[i].len / RTE_PGSIZE_2M) && (0 <\nmz_remaining_len) ; j++) {\n                        phys_pages[curr_page++] = mz[i].phys_addr + j\n* RTE_PGSIZE_2M;\n\n                        size_t added_len =\nRTE_MIN((size_t)RTE_PGSIZE_2M, mz_remaining_len);\n                        accounted_for_size += added_len;\n                        mz_remaining_len -= added_len;\n\n                        if (sz <= accounted_for_size) {\n                                RTE_LOG(DEBUG, USER1, \"Filled in %d\npages of the physical pages array\", curr_page);\n                                return;\n                        }\n                        if (num_phys_pages < curr_page) {\n                                RTE_LOG(ERR, USER1, \"When building\nphyscial pages array, \"\n                                           \"used pages (%d) is more\nthan allocated pages %d. \"\n                                           \"accounted size %lu size %lu\",\n                                           curr_page, num_phys_pages,\naccounted_for_size, sz);\n                                abort();\n                        }\n                }\n        }\n\n        if (accounted_for_size < sz) {\n                RTE_LOG(ERR, USER1, \"Finished going over %d memory\nzones, and still accounted size is %lu \"\n                           \"and requested size is %lu\",\n                           num_zones, accounted_for_size, sz);\n                abort();\n        }\n}\n\nstruct rte_mempool *scattered_mempool_create(uint32_t elt_size,\nuint32_t elt_num, int32_t socket_id,\n                                             rte_mempool_ctor_t\n*mp_init, void *mp_init_arg,\n                                             rte_mempool_obj_ctor_t\n*obj_init, void *obj_init_arg)\n{\n        struct rte_mempool *mp;\n        const struct rte_memzone *mz;\n        size_t                          num_zones;\n        struct rte_mempool_objsz obj_sz;\n        uint32_t flags, total_size;\n        size_t sz;\n\n        flags = (MEMPOOL_F_NO_SPREAD|MEMPOOL_F_SC_GET|MEMPOOL_F_SP_PUT);\n\n        total_size = rte_mempool_calc_obj_size(elt_size, flags, &obj_sz);\n\n        sz = elt_num * total_size;\n        /* We now have to account for the \"gaps\" at the end of each\npage. Worst case is that we get\n         * all distinct pages, so we have to add the gap for each\npossible page */\n        int pages_num = (sz + RTE_PGSIZE_2M -1) / RTE_PGSIZE_2M;\n        int page_gap = RTE_PGSIZE_2M % elt_size;\n        sz += pages_num + page_gap;\n\n        RTE_LOG(DEBUG, USER1, \"Will have to allocate %d 2M pages for\nthe page table.\", pages_num);\n\n        if ((mz = memzone_reserve_multiple_cont_mz(\"data_obj\",\n&num_zones, sz, socket_id,\n                                                   RTE_MEMZONE_2MB,\nRTE_PGSIZE_2M)) == NULL) {\n                RTE_LOG(WARNING, USER1, \"memzone reserve multi mz\nreturned NULL for socket id %d, will try ANY\",\n                          socket_id);\n                if ((mz =\n                     memzone_reserve_multiple_cont_mz(\"data_obj\",\n&num_zones, sz, socket_id,\n                                                      RTE_MEMZONE_2MB,\nRTE_PGSIZE_2M)) == NULL) {\n                        RTE_LOG(ERR, USER1, \"memzone reserve multi mz\nreturned NULL even for any socket\");\n                        return NULL;\n                } else {\n                        RTE_LOG(DEBUG, USER1, \"memzone reserve multi\nmz returne %p with %lu zones for SOCKET_ID_ANY\",\n                                 mz, num_zones);\n                }\n        } else {\n                RTE_LOG(DEBUG, USER1, \"memzone reserve multi mz\nreturned %p with %lu zones for size %lu  socket %d\",\n                         mz, num_zones, sz, socket_id);\n        }\n\n        // Now will \"break\" the pages into smaller ones\n        phys_addr_t * phys_pages = malloc(sizeof(phys_addr_t)*pages_num);\n        if(phys_pages == NULL) {\n            RTE_LOG(DEBUG, USER1, \"phys_pages is null. aborting\");\n            abort();\n        }\n\n        build_physical_pages(phys_pages, pages_num, sz, mz, num_zones);\n        RTE_LOG(DEBUG, USER1, \"Beginning of vaddr is %p beginning of\nphysical addr is 0x%lx\", mz->addr, mz->phys_addr);\n        mp = rte_mempool_xmem_create(\"data_pool\", elt_num, elt_size,\n                                     257 , sizeof(struct\nrte_pktmbuf_pool_private),\n                                     mp_init, mp_init_arg, obj_init,\nobj_init_arg,\n                                     socket_id, flags, (char *)mz[0].addr,\n                                     phys_pages, pages_num,\nrte_bsf32(RTE_PGSIZE_2M));\n\n        RTE_LOG(DEBUG, USER1, \"rte_mempool_xmem_create returned %p\", mp);\n        return mp;\n}\n\n=================================================================\n\nPlease let me know if you have any questions/comments about this code.\n\nBest Regards,\n\nLiran.\n\nOn Jan 8, 2015, at 10:19, Newman Poborsky <newman555p@gmail.com> wrote:\n\nI finally found the time to try this and I noticed that on a server\nwith 1 NUMA node, this works, but if  server has 2 NUMA nodes than by\ndefault memory policy, reserved hugepages are divided on each node and\nagain DPDK test app fails for the reason already mentioned. I found\nout that 'solution' for this is to deallocate hugepages on node1\n(after boot) and leave them only on node0:\necho 0 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages\n\nCould someone please explain what changes when there are hugepages on\nboth nodes? Does this cause some memory fragmentation so that there\naren't enough contiguous segments? If so, how?\n\nThanks!\n\nNewman\n\nOn Mon, Dec 22, 2014 at 11:48 AM, Newman Poborsky <newman555p@gmail.com> wrote:\n\nOn Sat, Dec 20, 2014 at 2:34 AM, Stephen Hemminger\n<stephen@networkplumber.org> wrote:\n\nYou can reserve hugepages on the kernel cmdline (GRUB).\n\n\nGreat, thanks, I'll try that!\n\nNewman\n\n\nOn Fri, Dec 19, 2014 at 12:13 PM, Newman Poborsky <newman555p@gmail.com>\nwrote:\n\n\nOn Thu, Dec 18, 2014 at 9:03 PM, Ananyev, Konstantin <\nkonstantin.ananyev@intel.com> wrote:\n\n\n\n-----Original Message-----\nFrom: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Ananyev,\nKonstantin\nSent: Thursday, December 18, 2014 5:43 PM\nTo: Newman Poborsky; dev@dpdk.org\nSubject: Re: [dpdk-dev] rte_mempool_create fails with ENOMEM\n\nHi\n\n-----Original Message-----\nFrom: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Newman Poborsky\nSent: Thursday, December 18, 2014 1:26 PM\nTo: dev@dpdk.org\nSubject: [dpdk-dev] rte_mempool_create fails with ENOMEM\n\nHi,\n\ncould someone please provide any explanation why sometimes mempool\n\ncreation\n\nfails with ENOMEM?\n\nI run my test app several times without any problems and then I\nstart\ngetting ENOMEM error when creating mempool that are used for\npackets.\n\nI try\n\nto delete everything from /mnt/huge, I increase the number of huge\n\npages,\n\nremount /mnt/huge but nothing helps.\n\nThere is more than enough memory on server. I tried to debug\nrte_mempool_create() call and it seems that after server is\nrestarted\n\nfree\n\nmem segments are bigger than 2MB, but after running test app for\n\nseveral\n\ntimes, it seems that all free mem segments have a size of 2MB, and\n\nsince I\n\nam requesting 8MB for my packet mempool, this fails.  I'm not really\n\nsure\n\nthat this conclusion is correct.\n\n\nYes,rte_mempool_create uses  rte_memzone_reserve() to allocate\nsingle physically continuous chunk of memory.\nIf no such chunk exist, then it would fail.\nWhy physically continuous?\nMain reason - to make things easier for us, as in that case we don't\n\nhave to worry\n\nabout situation when mbuf crosses page boundary.\nSo you can overcome that problem like that:\nAllocate max amount of memory you would need to hold all mbufs in\nworst\n\ncase (all pages physically disjoint)\n\nusing rte_malloc().\n\n\nActually my wrong: rte_malloc()s wouldn't help you here.\nYou probably need to allocate some external (not managed by EAL) memory\nin\nthat case,\nmay be mmap() with MAP_HUGETLB, or something similar.\n\nFigure out it's physical mappings.\nCall  rte_mempool_xmem_create().\nYou can look at: app/test-pmd/mempool_anon.c as a reference.\nIt uses same approach to create mempool over 4K pages.\n\nWe probably add similar function into mempool API\n\n(create_scatter_mempool or something)\n\nor just add a new flag (USE_SCATTER_MEM) into rte_mempool_create().\nThough right now it is not there.\n\nAnother quick alternative - use 1G pages.\n\nKonstantin\n\n\n\n\nOk, thanks for the explanation. I understand that this is probably an OS\nquestion more than DPDK, but is there a way to again allocate a contiguous\nmemory for n-th run of my test app?  It seems that hugepages get\ndivded/separated to individual 2MB hugepage. Shouldn't OS's memory\nmanagement system try to group those hupages back to one contiguous chunk\nonce my app/process is done?   Again, I know very little about Linux\nmemory\nmanagement and hugepages, so forgive me if this is a stupid question.\nIs rebooting the OS the only way to deal with this problem?  Or should I\njust try to use 1GB hugepages?\n\np.s. Konstantin, sorry for the double reply, I accidentally forgot to\ninclude dev list in my first reply  :)\n\nNewman\n\n\n\nDoes anybody have any idea what to check and how running my test app\nseveral times affects hugepages?\n\nFor me, this doesn't make any since because after test app exits,\n\nresources\n\nshould be freed, right?\n\nThis has been driving me crazy for days now. I tried reading a bit\nmore\ntheory about hugepages, but didn't find out anything that could help\n\nme.\n\nMaybe it's something else and completely trivial, but I can't figure\nit\nout, so any help is appreciated.\n\nThank you!\n\nBR,\nNewman P.",
    "diff": "====================================================\n\nFrom 10ebc74eda2c3fe9e5a34815e0f7ee1f44d99aa3 Mon Sep 17 00:00:00 2001\nFrom: Liran Zvibel <liran@weka.io>\nDate: Sat, 10 Jan 2015 12:46:54 +0200\nSubject: [PATCH] Add an option to allocate huge pages in contiunous virtual\n addresses\nTo: dev@dpdk.org\n\nAdd a configuration option: CONFIG_RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR\nthat advises the memory sengment allocation code to allocate as many\nhugemages in a continuous way in virtual addresses as possible.\n\nThis way, a mempool may be created out of disparsed memzones allocated\nfrom these new continuos memory segments.\n---\n lib/librte_eal/linuxapp/eal/eal_memory.c | 19 +++++++++++++++++++\n 1 file changed, 19 insertions(+)\n\ndiff --git a/lib/librte_eal/linuxapp/eal/eal_memory.c\nb/lib/librte_eal/linuxapp/eal/eal_memory.c\nindex f2454f4..b8d68b0 100644\n--- a/lib/librte_eal/linuxapp/eal/eal_memory.c\n+++ b/lib/librte_eal/linuxapp/eal/eal_memory.c\n@@ -329,6 +329,7 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,\n\n #ifndef RTE_EAL_SINGLE_FILE_SEGMENTS\n                else if (vma_len == 0) {\n+#ifndef RTE_EAL_HUGEPAGES_SINGLE_CONT_VADDR\n                        unsigned j, num_pages;\n\n                        /* reserve a virtual area for next contiguous\n@@ -340,6 +341,14 @@ map_all_hugepages(struct hugepage_file *hugepg_tbl,\n                                        break;\n                        }\n                        num_pages = j - i;\n+#else // hugepages are will be allocated in a continous virtual address way\n+                       unsigned num_pages;\n+                       /* We will reserve a virtual area large enough\nto fit ALL\n+                        * physical blocks.\n+                        * This way we can have bigger mempools even\nif there is no\n+                        * continuos physcial region.\n        */\n+                       num_pages = hpi->num_pages[0] - i;\n+#endif\n",
    "prefixes": [
        "dpdk-dev"
    ]
}