This commit adds TCP Appropriate Byte Counting (ABC) support based on
RFC 3465
ABC replaces the previous congestion window growth mechanism and has been
configured with limit of 2 SMSS. See task #14128 for discussion on
defaults, but the goal is to mitigate the performance impact of delayed
ACKs on congestion window growth
This commit also introduces a mechanism to track when the stack is
undergoing a period following an RTO where data is being retransmitted.
Lastly, this adds a unit test to verify RTO period tracking and some
basic ABC cwnd checking
This commit corrects what looks like an ancient incorrect organization
of the logic for processing an ACK which acks new data. Once moved,
we can also change to using TCP_SEQ_LEQ on ackno instead of TCP_BETWEEN
because ackno has already been checked against snd_nxt
The work of checking the unsent queue and updating pcb->snd_buf (both
steps required for new data ACK) should be located under the conditional
that checks TCP_SEQ_BETWEEN(ackno, pcb->lastack+1, pcb->snd_nxt)
The comment following the unsent queue check/pcb->snd_buf update even
indicates "End of ACK for new data processing" when the logic is clearly
outside of this check
From what I can tell, this mis-organization isn't causing any incorrect
behavior since the unsent queue checked that ackno was between start of
segment and snd_nxt and recv_acked would be 0 during pcb->snd_buf update.
Instead this is waisted work for duplicate ACKS (can be common) and other
old ACKs
If a locally generated TCP SYN packet is replied to with an ACK
packet, lwIP immediately sends a RST packet followed by resending the
SYN packet. This is expected, but on loopback interfaces the resent
SYN packet may immediately get another ACK reply, typically when the
other endpoint is in TIME_WAIT state (which ignores the RSTs). The
result is an endless loop of SYN, ACK, RST packets.
This patch applies the normal SYN retransmission limit in this
scenario, such that the endless loop is limited to a brief storm.
This commit changes ssthresh to be the largest effective congestion
window (amount of in-flight data). This follows the guidance of RFC
5681 which recommends setting ssthresh arbitrarily high.
LwIP was previously using the receive window value at the end of the
3-way handshake and in the case of an active open where the receiver
used window scaling and/or window auto-tuning, this resulted in a very
small ssthresh value even though the window ramped up once the connection
was established
If LWIP_CALLBACK_API is not defined, but TCP_LISTEN_BACKLOG is, then
the LWIP_EVENT_ACCEPT TCP event may be triggered for closed listening
sockets. This case is just as disastrous for the event API as it is
for the callback API, as there is no way for the event hook to tell
whether the listening PCB is still around. Add the same protection
against this case for TCP_LISTEN_BACKLOG as was already in place for
LWIP_CALLBACK_API.
Also remove one NULL check for LWIP_CALLBACK_API that had already
become redundant for all callers, making the TCP_EVENT_ACCEPT code
for that callback wrapper more in line with the rest of the wrappers.
This commit adds support for responding to a zero-window probe when
the refused_data pointer is set
A zero-window probe is a data segment received when rcv_ann_wnd
is 0. This corrects a standards violation where LwIP would not
respond to a zero-window probe with its current ACK value (RCV.NXT)
when it has refused data, thus leading to the probing TCP closing
out the connection
lwIP produces a TCP Initial Sequence Number (ISN) for each new TCP
connection. The current algorithm is simple and predictable however.
The result is that lwIP TCP connections may be the target of TCP
spoofing attacks. The problem of such attacks is well known, and a
recommended ISN generation algorithm is standardized in RFC 6528.
This algorithm requires a high-resolution timer and cryptographic
hashing function, though. The implementation (or best-effort
approximation) of both of these aspects is well beyond the scope of
lwIP itself.
For that reason, this patch adds LWIP_HOOK_TCP_ISN, a hook that
allows each platform to implement its own ISN generation using
locally available means. The hook provides full flexibility, in
that the hook may generate anything from a simple random number
(by being set to LWIP_RAND()) to a full RFC 6528 implementation.
Implementation note:
Users of the hook would typically declare the function prototype of
the hook function in arch/cc.h, as this is the last place where such
prototypes can be supplied. However, at that point, the ip_addr_t
type has not yet been defined. For that reason, this patch removes
the leading underscore from "struct _ip_addr", so that a prototype
of the hook function can use "struct ip_addr" instead of "ip_addr_t".
Signed-off-by: sg <goldsimon@gmx.de>
Let lwip use functions/macros prefixed by lwip_ internally to avoid naming clashes with external #includes.
Remove over-complicated #define handling in def.h
Make functions easier to override in cc.h. The following is sufficient now (no more LWIP_PLATFORM_BYTESWAP):
#define lwip_htons(x) <your_htons>
#define lwip_htonl(x) <your_htonl>
TCP's snd_nxt represents the next sequence number after sent data, and
as such does not cover any unsent data queued on the connection. The
current implementation does not take the latter point into account
when processing FIN acknowledgments, mistakenly assuming that an
outgoing FIN is ACK'ed when the acknowledgment covers up to snd_nxt
while there is still unsent data. This patch adds a check for unsent
data to correct this, effectively preventing that TCP connections are
closed prematurely.
If lwIP encounters a half-open connection (e.g. due to a restarted
application reusing the same port numbers) it will correctly send a
RST but will not resend the SYN until one retransmission timeout later
(approximately three seconds). This can increase the time taken by
lpxelinux.0 to fetch its configuration file from a few milliseconds to
around 30 seconds.
Fix by immediately retransmitting the SYN whenever a half-open
connection is detected.
Signed-off-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: goldsimon <goldsimon@gmx.de>
Since allowing input validation to trip the ASSERT handler is bad,
let's just drop the packets instead if validation fails.
Signed-off-by: sg <goldsimon@gmx.de>
This commit hooks up the TCP cachehit stat to the PCB locality feature
so that when a PCB is moved to the head of the list and a segment comes
in, we consider this a cache hit
This also matches the usage of the cachehit stat in UDP