epoll編程，epoll邊緣觸發_epoll事件通知機制詳解，水平觸發和邊沿觸發的區別-基礎知識庫-匯編語言學習筆記

epoll編程，epoll邊緣觸發_epoll事件通知機制詳解，水平觸發和邊沿觸發的區別

2023-11-18 阅读 20 评论 0

摘要：看到網上有不少討論epoll，但大多不夠詳細準確，以前面試有被問到這個問題。不去更深入的了解，只能停留在知其然而不知其所以然。于是，把epoll手冊翻譯一遍，更深入理解和掌握epoll事件處理相關知識，也涉及到了操作系統內核的知識。ep

看到網上有不少討論epoll，但大多不夠詳細準確，以前面試有被問到這個問題。不去更深入的了解，只能停留在知其然而不知其所以然。于是，把epoll手冊翻譯一遍，更深入理解和掌握epoll事件處理相關知識，也涉及到了操作系統內核的知識。

epoll編程、EPOLL(7) ? ? ? ? ? ?Linux Programmer's Manual

NAME

epoll - I/O event notification facility

epoll - I/O 事件通知機制翻譯：6700662@qq.com, 轉載請注明出處。

DESCRIPTION

The ?epoll ?API ?performs ?a similar task to poll(2): monitoring multiple file descriptors to see if I/O is possible on any of them. ?The epoll API can be used either as anedge-triggered or a level-triggered interface and scales well to large numbers of watched file descriptors. ?The following system calls are provided to create and manage anepoll instance:

Epoll API執行類似于poll的任務：監控多個文件描述符，看它們其中任何一個是否有可能I/O。Epoll API既可以用作邊緣觸發(ET)或水平觸發(LT)，并良好的適用大量被監控的文件描述符。提供下面這些系統調用去創建和管理一個epoll實例:

* ?epoll_create(2) ?creates ?an ?epoll ?instance ?and ?returns a file descriptor referring to that instance. ?(The more recent epoll_create1(2) extends the functionality ofepoll_create(2).)

* epoll_create(2) ?創建一個epoll實例，并返回關聯到該實例的文件描述符。(較新的epoll_create1(2)擴展了這個API的功能。)

* ?Interest in particular file descriptors is then registered via epoll_ctl(2). ?The set of file descriptors currently registered on an epoll instance is ?sometimes ?calledan epoll set.

* ???通過 epoll_ctl(2)來注冊，以關注特定的文件描述符。當前已在epoll實例注冊的文件描述符集合，有時候稱作epoll set。

* ?epoll_wait(2) waits for I/O events, blocking the calling thread if no events are currently available.

* ?epoll_wait(2)等待I/O事件，如果當前沒有可用的事件則阻塞調用線程。

Level-triggered and edge-triggered

水平觸發和邊沿觸發

The ?epoll ?event ?distribution ?interface ?is ?able ?to ?behave ?both as edge-triggered (ET) and as level-triggered (LT). ?The difference between the two mechanisms can bedescribed as follows. ?Suppose that this scenario happens:

Epoll事件分派接口可以表現為邊沿前觸發 (ET)和水平觸發(LT).這兩個機制之間的區別可以描述如下。假設這個發生了這個場景：

1. The file descriptor that represents the read side of a pipe (rfd) is registered on the epoll instance.

2. A pipe writer writes 2 kB of data on the write side of the pipe.

3. A call to epoll_wait(2) is done that will return rfd as a ready file descriptor.

4. The pipe reader reads 1 kB of data from rfd.

5. A call to epoll_wait(2) is done.

1. 表示管道讀端的文件描述符(rfd)已在epoll實例注冊。

2. 管道寫入程序，寫了2kB的數據在管道寫入端

3. 對epoll_wait(2)的調用已完成，將返回rfd作為已就緒的文件描述符。

4. 管道讀取程序，從rfd讀入1kB的數據。

5.一個對epoll_wait(2)的調用已完成。

If the rfd file descriptor has been added to the epoll interface using the EPOLLET (edge-triggered) flag, the call to epoll_wait(2) ?done ?in ?step ?5 ?will ?probably ?hangdespite ?the ?available ?data still present in the file input buffer; meanwhile the remote peer might be expecting a response based on the data it already sent. ?The reasonfor this is that edge-triggered mode delivers events only when changes occur on the monitored file descriptor. ?So, in step 5 the caller might end up waiting for some ?datathat ?is ?already ?present ?inside the input buffer. ?In the above example, an event on rfd will be generated because of the write done in 2 and the event is consumed in 3.Since the read operation done in 4 does not consume the whole buffer data, the call to epoll_wait(2) done in step 5 might block indefinitely.

如果rfd文件描述符是用EPOLLET (邊沿觸發) 標志被加入到epoll接口，在第5步中調用的epoll_wait(2)可能阻塞，盡管可用的數據任然還存在于文件輸入緩存中；此時遠程對端可能期待它已發送數據的響應。原因是ET模式只有在被監控文件描述符發生變化時才遞交事件。所以，第5步的調用者可能終止于等待一些已經存在于輸入緩存中的數據(沒有觸發事件，還在等待接收).在上述例子中，一次rfd上的事件被產生是因為第2步寫入完成，并在第3步中消耗。第4步的讀操作沒有消耗整個緩存數據，在第5步中調用的 epoll_wait(2)，可能立即阻塞。

An application that employs the EPOLLET flag should use nonblocking file descriptors to avoid having a blocking read or write starve a task that is handling ?multiple ?filedescriptors. ?The suggested way to use epoll as an edge-triggered (EPOLLET) interface is as follows:

采用EPOLLET標志的應用程序應當使用非阻塞文件描述符，以防止阻塞讀或寫造成處理多文件描述符的任務發生饑餓。以邊沿觸發接口(EPOLLET)使用epoll的建議方式如下：

i ??with nonblocking file descriptors; and

i ?使用非阻塞文件描述符;并且

ii ?by waiting for an event only after read(2) or write(2) return EAGAIN.

ii 只有在read(2)或 write(2)返回EAGAIN之后才等待事件。

By contrast, when used as a level-triggered interface (the default, when EPOLLET is not specified), epoll is simply a faster poll(2), and can be used wherever the latter isused since it shares the same semantics.

與之相比，當作為水平觸發接口使用(默認地，當EPOLLET沒有被指定)，epoll僅僅是更快的poll，并能被用于不管后面用什么，因為它共享相同的語義。

Since even with edge-triggered epoll, multiple events can be generated upon receipt of multiple chunks of data, the caller has the option to specify the EPOLLONESHOT ?flag,to ?tell ?epoll ?to ?disable ?the ?associated file descriptor after the receipt of an event with epoll_wait(2). ?When the EPOLLONESHOT flag is specified, it is the caller'sresponsibility to rearm the file descriptor using epoll_ctl(2) with EPOLL_CTL_MOD.

因為即使在邊沿觸發epoll，在收到多個數據塊之后會產生多個事件，調用者還有指定EPOLLONESHOT標志的選項，來告知epoll在epoll_wait(2)收到一個事件之后禁止關聯的文件描述符。當EPOLLONESHOT被指明，由調用者負責使用epoll_ctl(2) 和 EPOLL_CTL_MOD來重新授權文件描述符。

Interaction with autosleep

與autosleep的交互

If the system is in autosleep mode via /sys/power/autosleep and an event happens which wakes the device from sleep, the device driver will keep the device awake only ?untilthat event is queued. ?To keep the device awake until the event has been processed, it is necessary to use the epoll(7) EPOLLWAKEUP flag.

如果系統通過/sys/power/autosleep進入autosleep模式，并且發生事件把設備從睡眠中喚醒,設備驅動僅僅保持設備喚醒到那個事件進入隊列。要保持設備喚醒到事件被處理，必須使用epoll(7) EPOLLWAKEUP標志。

When ?the EPOLLWAKEUP flag is set in the events field for a struct epoll_event, the system will be kept awake from the moment the event is queued, through the epoll_wait(2)call which returns the event until the subsequent epoll_wait(2) call. ?If the event should keep the system awake beyond that time, then a separate wake_lock should be takenbefore the second epoll_wait(2) call.

當EPOLLWAKEUP標志設置在epoll_event結構的事件字段，系統將從事件進入隊列開始保持喚醒，通過返回事件的epoll_wait(2)直到后續的epoll_wait(2)調用。如果事件要在那個時間之外保持系統喚醒，那么單獨的wake_lock應當在第二次調用epoll_wait(2)之前被調用。

/proc interfaces

The following interfaces can be used to limit the amount of kernel memory consumed by epoll:

以下是接口可被用于限制epoll消耗的內核內存總數：

/proc/sys/fs/epoll/max_user_watches (since Linux 2.6.28)

This ?specifies ?a ?limit ?on the total number of file descriptors that a user can register across all epoll instances on the system. ?The limit is per real user ID.Each registered file descriptor costs roughly 90 bytes on a ?32-bit ?kernel, ?and ?roughly ?160 ?bytes ?on ?a ?64-bit ?kernel. ??Currently, ?the ?default ?value ?formax_user_watches is 1/25 (4%) of the available low memory, divided by the registration cost in bytes.

指定一個用戶通過系統中所有epoll實例能夠注冊的文件描述符的限制。這個限制是對每個真實用戶ID的。每個注冊的文件描述符,在32位內核中大致占用90字節，在64位內核中大致占用160字節。一般的，max_user_watches的默認值是1/25(4%)的可用最低內存，除以注冊占用字節數。

Example for suggested usage

建議的用法示例

While the usage of epoll when employed as a level-triggered interface does have the same semantics as poll(2), the edge-triggered usage requires more clarification to avoidstalls in the application event loop. ?In this example, listener is a nonblocking socket on which listen(2) has been called. ?The function do_use_fd() uses ?the ?new ?readyfile ?descriptor until EAGAIN is returned by either read(2) or write(2). ?An event-driven state machine application should, after having received EAGAIN, record its currentstate so that at the next call to do_use_fd() it will continue to read(2) or write(2) from where it stopped before.

當epoll采用水平觸發接口時具有poll相同的語義，邊沿觸發用法要求更清楚說明以防止應用程序事件循環停轉。在這個示例中，調用了lister(2)的listener是非阻塞socket.do_use_fd()函數使用新的就緒文件描述符直到read(2)或write(2)返回EAGAIN。事件驅動狀態機應用程序應當，在接收到EAGAIN之后，記錄它當前的狀態所以在下次調用do_use_fd()將從之前停止的地方繼續read(2)或 write(2)。

#define MAX_EVENTS 10

struct epoll_event ev, events[MAX_EVENTS];

int listen_sock, conn_sock, nfds, epollfd;

/* Code to set up listening socket, 'listen_sock',

(socket(), bind(), listen()) omitted */

epollfd = epoll_create1(0);

if (epollfd == -1) {

perror("epoll_create1");

exit(EXIT_FAILURE);

}

ev.events = EPOLLIN;

ev.data.fd = listen_sock;

if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == -1) {

perror("epoll_ctl: listen_sock");

exit(EXIT_FAILURE);

}

for (;;) {

nfds = epoll_wait(epollfd, events, MAX_EVENTS, -1);

if (nfds == -1) {

perror("epoll_wait");

exit(EXIT_FAILURE);

}

for (n = 0; n < nfds; ++n) {

if (events[n].data.fd == listen_sock) {

conn_sock = accept(listen_sock,

(struct sockaddr *) &local, &addrlen);

if (conn_sock == -1) {

perror("accept");

exit(EXIT_FAILURE);

}

setnonblocking(conn_sock);

ev.events = EPOLLIN | EPOLLET;

ev.data.fd = conn_sock;

if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock,

&ev) == -1) {

perror("epoll_ctl: conn_sock");

exit(EXIT_FAILURE);

}

} else {

do_use_fd(events[n].data.fd);

}

When used as an edge-triggered interface, for performance reasons, it is possible to add the file descriptor inside the epoll interface (EPOLL_CTL_ADD) once ?by ?specifying(EPOLLIN|EPOLLOUT). ?This allows you to avoid continuously switching between EPOLLIN and EPOLLOUT calling epoll_ctl(2) with EPOLL_CTL_MOD.

當作為邊沿觸發(ET)接口使用，為性能原因，有可能通過指明(EPOLLIN|EPOLLOUT)一次性添加文件描述符到epoll接口(EPOLL_CTL_ADD).這允許你在調用epoll_ctl(2)和EPOLL_CTL_MOD時，防止持續在EPOLLIN和EPOLLOUT之間切換。(注：EPOLLIN和EPOLLOUT分兩次調用epoll_ctl更耗時間性能).

Questions and answers

Q0 ?What is the key used to distinguish the file descriptors registered in an epoll set?

用于區分在epoll set中已注冊文件描述符的key是什么？

A0 ?The ?key is the combination of the file descriptor number and the open file description (also known as an "open file handle", the kernel's internal representation of anopen file).

key是文件描述符數字和”打開文件描述符”的組合(也就是已知的"open file handle"，打開文件句柄，內核的一個打開文件的內部表示)。

Q1 ?What happens if you register the same file descriptor on an epoll instance twice?

在一個epoll實例中對相同的文件描述符注冊兩次，會發生什么？

A1 ?You will probably get EEXIST. ?However, it is possible to add a duplicate (dup(2), dup2(2), fcntl(2) F_DUPFD) descriptor to the same epoll instance. ?This can be a useful technique for filtering events, if the duplicate file descriptors are registered with different events masks.

你將可能收到EEXIST。然而, 有可能添加副本描述符到相同的epoll實例.這可以是一個過濾事件的有用技巧，如果副本文件描述符用不同的事件掩碼去注冊。

Q2 ?Can two epoll instances wait for the same file descriptor? ?If so, are events reported to both epoll file descriptors?

能用兩個epoll實例去等待同一個文件描述符嗎？如果那樣，事件被報告到兩個epoll文件描述符嗎？

A2 ?Yes, and events would be reported to both. ?However, careful programming may be needed to do this correctly.

是的，并且事件將被報告到兩者。不管怎樣，需要仔細編程以做正確這事。

Q3 ?Is the epoll file descriptor itself poll/epoll/selectable?

epoll文件描述符本身是poll/epoll可輪詢的嗎？

A3 ?Yes. ?If an epoll file descriptor has events waiting, then it will indicate as being readable.

是的。如果一個epoll文件描述符有事件在等待，那么它將指示為可讀。

Q4 ?What happens if one attempts to put an epoll file descriptor into its own file descriptor set?

當嘗試把epoll文件描述符放入它自己的文件描述符集合中會發生什么？

A4 ?The epoll_ctl(2) call will fail (EINVAL). ?However, you can add an epoll file descriptor inside another epoll file descriptor set.

epoll_ctl(2)調用將以(EINVAL)失敗. 然而，你可以添加epoll文件描述符到另一個epoll文件描述符集合內。

Q5 ?Can I send an epoll file descriptor over a UNIX domain socket to another process?

可以通過UNIX域socket發送一個epoll文件描述符到另一個進程嗎？

A5 ?Yes, but it does not make sense to do this, since the receiving process would not have copies of the file descriptors in the epoll set.

是的，但這樣做沒有任何意義，因為接收進程不會有epoll set中的文件描述符副本。

Q6 ?Will closing a file descriptor cause it to be removed from all epoll sets automatically?

關閉一個文件描述符，會導致它自動從所有epoll set中被移除嗎？

A6 ?Yes, ?but be aware of the following point. ?A file descriptor is a reference to an open file description (see open(2)). ?Whenever a descriptor is duplicated via dup(2),dup2(2), fcntl(2) F_DUPFD, or fork(2), a new file descriptor referring to the same open file description is created. ?An open file description continues to exist ?untilall ?file ?descriptors referring to it have been closed. ?A file descriptor is removed from an epoll set only after all the file descriptors referring to the underlying

open file description have been closed (or before if the descriptor is explicitly removed using epoll_ctl(2) EPOLL_CTL_DEL). ?This means that even after a file descriptor ?that ?is ?part ?of ?an ?epoll ?set has been closed, events may be reported for that file descriptor if other file descriptors referring to the same underlying file

description remain open.

是的，但需要清楚以下幾點。文件描述符是一個”打開文件描述符”的引用(見 open(2))。每當描述符是副本，通過dup(2),dup2(2), fcntl(2) F_DUPFD, or fork(2)，一個指向同一“打開文件描述符”的引用的文件描述符被創建。一個“打開文件描述符”持續存在直達所有到它的文件描述符引用被關閉。只有在指向下層“打開文件描述符”的所有文件描述符引用被關閉時，文件描述符才從epoll set中被移除(或者之前如果描述符是使用epoll_ctl(2) EPOLL_CTL_DEL被明確的移除)。這意味著即使epoll set部分的文件描述符被關閉之后，那個文件描述符的事件可能被報告，如果其他文件描述符引用指向的相同下層文件描述符保持打開.

Q7 ?If more than one event occurs between epoll_wait(2) calls, are they combined or reported separately?

如果在epoll_wait(2)調用之間多于一個事件產生，它們是合并的還是分別報告？

A7 ?They will be combined.

它們會被合并。

Q8 ?Does an operation on a file descriptor affect the already collected but not yet reported events?

文件描述符上的操作會影響已經收集但沒有報告的事件嗎？

A8 ?You can do two operations on an existing file descriptor. ?Remove would be meaningless for this case. ?Modify will reread available I/O.

你能做兩個操作，在一個已存在的文件描述符上。移除將是毫無意義的，對這種情形。修改將會重讀可用的I/O(再次產生event?).

Q9 ?Do I need to continuously read/write a file descriptor until EAGAIN when using the EPOLLET flag (edge-triggered behavior) ?

當使用EPOLLET標志時(邊沿觸發行為)，需要持續的在文件描述符連續的read/write，直到EAGAIN ？

A9 ?Receiving an event from epoll_wait(2) should suggest to you that such file descriptor is ready for the requested I/O operation. ?You must consider it ?ready ?until ?thenext (nonblocking) read/write yields EAGAIN. ?When and how you will use the file descriptor is entirely up to you.

從epoll_wait(2)收到事件,應當指示你如此的文件描述是已就緒于請求I/O操作。你必須認為它是就緒的，直到下一個(非阻塞)read/write產生EAGIN. 何時、如何使用這個文件描述符完全取決于你。

For ?packet/token-oriented ?files ?(e.g., ?datagram ?socket, ?terminal ?in canonical mode), the only way to detect the end of the read/write I/O space is to continue toread/write until EAGAIN.

對于包/符號導向的文件(比如 UDP socket,標準模式的終端), 唯一檢測read/write I/O空間結束的方法，是連續read/write直到EAGIN.

For stream-oriented files (e.g., pipe, FIFO, stream socket), the condition that the read/write I/O space is exhausted can also be detected by ?checking ?the ?amount ?ofdata ?read from / written to the target file descriptor. ?For example, if you call read(2) by asking to read a certain amount of data and read(2) returns a lower numberof bytes, you can be sure of having exhausted the read I/O space for the file descriptor. ?The same is true when writing using write(2). ?(Avoid this ?latter ?techniqueif you cannot guarantee that the monitored file descriptor always refers to a stream-oriented file.)

對于流導向的文件(例如 pipe, FIFO, TCP socket)，read/write I/O空間耗盡的條件也能通過讀取于/寫入到目標文件描述符的數據總數來檢測。例如，如果你調用read(2)要求讀取確定的數據總數，并且read(2)返回更低的字節數，你能確認該文件描述符的 read I/O 空間已經耗盡。使用write(2)來寫入時也一樣。(如果你不能保證被監控的文件描述符一直指向流式文件，避免使用后面的字節數技巧)。

Possible pitfalls and ways to avoid them

可能的陷阱和避免方法

o Starvation (edge-triggered)

饑餓(邊沿觸發)

If there is a large amount of I/O space, it is possible that by trying to drain it the other files will not get processed causing starvation. ?(This problem is not specificto epoll.)

如果有大量的I/O空間，有可能嘗試耗盡它，其它文件將得不到處理而導致饑餓。(這個問題不是epoll特有的)

The solution is to maintain a ready list and mark the file descriptor as ready in its associated data structure, thereby allowing the application to ?remember ?which ?filesneed ?to ?be ?processed ?but still round robin amongst all the ready files. ?This also supports ignoring subsequent events you receive for file descriptors that are alreadyready.

解決方案是維護一個就緒列表，并在它關聯的數據結構中標記文件描述符已就緒，從而允許應用程序記住那個文件需要被處理，但還在所有就緒文件中循環競爭。這樣也支持對那些已就緒的文件描述符忽略你收到的后續事件.

o If using an event cache...

如果使用一個事件緩存...

If you use an event cache or store all the file descriptors returned from epoll_wait(2), then make sure to provide a way to mark its closure dynamically (i.e., caused by ?aprevious ?event's ?processing). ?Suppose you receive 100 events from epoll_wait(2), and in event #47 a condition causes event #13 to be closed. ?If you remove the structureand close(2) the file descriptor for event #13, then your event cache might still say there are events waiting for that file descriptor causing confusion.

如果你使用一個事件緩存或存儲所有從epoll_wait(2)返回的文件描述符, 那么要確信提供一個方法去標記它的動態關閉(例如,在前一個事件處理中導致的)。假設你從epoll_wait(2)收到100個事件，并且在#47事件中一個條件導致#13事件關閉。如果你移除數據結構并關閉事件#13的文件描述符，那么你的事件緩存可能任然說還有事件在等待那個文件描述符，導致混亂。

One solution for this is to call, during the processing of event 47, epoll_ctl(EPOLL_CTL_DEL) to delete file descriptor 13 and ?close(2), ?then ?mark ?its ?associated ?datastructure ?as ?removed ?and link it to a cleanup list. ?If you find another event for file descriptor 13 in your batch processing, you will discover the file descriptor hadbeen previously removed and there will be no confusion.

這個問題的一個解決方案是，在#47事件的處理過程中，調用epoll_ctl(EPOLL_CTL_DEL)去刪除文件描述符13并close(2)，然后標記它的關聯數據結構為已移除，并鏈接到一個cleanup list.如果在你的批量處理中發現#13文件描述符的事件，你將發現文件描述符在之前已經移除，就不會混亂。

翻譯：6700662@qq.com, 轉載請注明出處。

VERSIONS

The epoll API was introduced in Linux kernel 2.5.44. ?Support was added to glibc in version 2.3.2.

CONFORMING TO

The epoll API is Linux-specific. ?Some other systems provide similar mechanisms, for example, FreeBSD has kqueue, and Solaris has /dev/poll.