An ArrayList for huge numbers of add operations is not performant. LinkedList wi...

Calzifer · 2025-05-23T21:00:35 1748034035

Arrays are fast and ArrayList is like a fancy array with bound check and auto grows. Only the grow part can be problematic if it has to grow very often. But that can be avoided by providing an appropriate initial size or reusing the ArrayList by using clear() instead of creating a new one. Both is used by OP in this project. Especially since the code copies lists quite often I would expect LinkedList to perform way worse.

SillyUsername · 2025-05-24T10:25:32 1748082332

Wrong. In fact downvoters are wrong too I'm guessing most are junior devs who don't want to be proven wrong. LinkedList is much faster for inserts and slow for retrieval. ArrayLists are the opposite. To the downvoters; I say try it, this is why LinkedList is in the standard library. When you find I'm right, please consider re-upvoting for the free education.

pkulak · 2025-05-23T17:35:42 1748021742

I've literally never seen a linked list be faster than an array list in a real application, so if you're right, this is kinda huge for me.

SillyUsername · 2025-05-24T10:27:35 1748082455

LinkedList => use when adds total more than reads

ArrayList => use when reads total more than adds.

pkulak · 2025-06-03T17:29:54 1748971794

No, that's not true at all. Adds aren't free. Adding in the middle involves following pointers into the heap all over the disk n/2 times, making them generally as expensive as reads. The only situation I can imagine a linked list making sense is if you only add to the front and only read from/delete the front (or back, if it's doubly linked). So a stack or queue.

But even then, I'm pretty sure Go actually uses an array for it's green stacks nowadays, even while paying the copy penalty for expansion.

stopthe · 2025-05-24T14:17:34 1748096254

Did you count an allocation of LinkedList.Node<E> on every add operation? You may say it's negligible thanks to TLAB, and I will agree that fast allocation is Java's strength, but in practice I've seen that creating new objects gives order-of-magnitude perf degradation.

SillyUsername · 2025-05-24T17:16:02 1748106962

I have seen it for millions of add/del operations, an analytics framework actually for a big American games company (first guess and you'll probably say it), which is where I originally did the analysis about 10 years ago.

I've also written a a video processor around that time too that was bottle necked using ArrayLists - typically a decode, store and read once op. It was at this point I looked at other collections, other list implementations and blocking deques (ArrayList was the wrong collection type to use, but I'd been in a rush for MVP) and ultimately came across https://github.com/conversant/disruptor and used that instead.

The ArrayList Vs Linkedlist was a real eye opener for me in two different systems this same behaviour was replicated when using ArrayLists like queues or incorrect sizing of the buffer increments as load increases.

stopthe · 2025-05-26T09:53:04 1748253184

Of course, deletion is a whole different story. I was talking about addition in isolation.

Anyway, I felt I had to run the benchmarks myself.

  @Benchmark
  @Fork(1)
  @BenchmarkMode(Mode.Throughput)
  @OutputTimeUnit(TimeUnit.SECONDS)
  public Object arrayListPreallocAddMillionNulls() {
    ArrayList<Object> arrList = new ArrayList<>(1048576);
    for (int i = 0; i <= 1_000_000; i++) {
      arrList.add(null);
    }
    return arrList;
  }

  @Benchmark
  @Fork(1)
  @BenchmarkMode(Mode.Throughput)
  @OutputTimeUnit(TimeUnit.SECONDS)
  public Object arrayListAddMillionNulls() {
    ArrayList<Object> arrList = new ArrayList<>();
    for (int i = 0; i <= 1_000_000; i++) {
      arrList.add(null);
    }
    return arrList;
  }

  @Benchmark
  @Fork(1)
  @BenchmarkMode(Mode.Throughput)
  @OutputTimeUnit(TimeUnit.SECONDS)
  public Object linkedListAddMillionNulls() {
    LinkedList<Object> linkList = new LinkedList<>();
    for (int i = 0; i <= 1_000_000; i++) {
      linkList.add(null);
    }
    return linkList;
  }

And as I expected, on JDK 8 ArrayList with an appropriate initial capacity was faster than LinkedList. Admittedly not an order of magnitude difference, only 1.7x.

  JDK8
  Benchmark                                      Mode  Cnt    Score    Error  Units
  MyBenchmark.arrayListAddMillionNulls          thrpt    5  229.950 ±  9.994  ops/s
  MyBenchmark.arrayListPreallocAddMillionNulls  thrpt    5  344.116 ±  7.070  ops/s
  MyBenchmark.linkedListAddMillionNulls         thrpt    5  199.446 ± 15.910  ops/s

But! On JDK 17 the situation is completely upside-down:

  JDK17
  Benchmark                                      Mode  Cnt    Score    Error  Units
  MyBenchmark.arrayListAddMillionNulls          thrpt    5   90.462 ± 18.576  ops/s
  MyBenchmark.arrayListPreallocAddMillionNulls  thrpt    5  214.079 ± 15.505  ops/s
  MyBenchmark.linkedListAddMillionNulls         thrpt    5  216.796 ± 19.392  ops/s

I wonder why ArrayList with default initial capacity got so much worse. Worth investigating further.

SillyUsername · 2025-05-30T05:56:36 1748584596

Thanks for taking the time to test.

This helps prove my point that adds (and deletes) are generally faster by default when not pre sizing, or removing.

Typically (in my experience) ArrayLists are used without thought to sizing, often because initial capacity and amount to resize, cannot be determined sensibly or consistently.

If in your example you were also to resize the lists, (perhaps adding then dropping those in the Fibonacci sequence?), it would help prove my statement further.

Certainly not worth the -2 points I got from making the statement, but hey you can "please some people some of the time..." :D

fedsocpuppet · 2025-05-23T17:19:12 1748020752

Huh? It'll be slower and eat a massive amount of memory too.

SillyUsername · 2025-05-24T10:29:34 1748082574

It's holding a reference on each element, but it no longer has to add large chunks of memory on insert when the current array size is exceeded, just single elements. So reads are slower and a small amount of reference memory is used per node. Writes however are much faster particularly when the lists are huge (as in this case). Also I've written video frame processors so I am experienced in this area.